8. Create Lookup Feature
Create Lookup features¶
We've learned how to define a target and materialize data using observation tables. Now, let's dive into basic feature engineering.
The most straightforward features we can craft with FeatureByte are known as lookup features. These are either direct columns taken from the source table or simple computations that don't require any aggregations.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:07:20 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088 16:07:20 | INFO | Using profile: tutorial 16:07:20 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:07:20 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:07:20 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:07:20 | INFO | No catalog activated. 16:07:20 | INFO | Catalog activated: Grocery Dataset Tutorial
In [2]:
Copied!
# Get view from GROCERYCUSTOMER scd table.
grocerycustomer_view = catalog.get_view("GROCERYCUSTOMER")
# Get view from GROCERYCUSTOMER scd table.
grocerycustomer_view = catalog.get_view("GROCERYCUSTOMER")
Create Lookup feature¶
In [3]:
Copied!
# Create lookup feature from DateOfBirth column for customer entity.
customer_dateofbirth = grocerycustomer_view["DateOfBirth"].as_feature("CUSTOMER_DateOfBirth")
# Create lookup feature from DateOfBirth column for customer entity.
customer_dateofbirth = grocerycustomer_view["DateOfBirth"].as_feature("CUSTOMER_DateOfBirth")
Derive Age at the point-in-time of the request observation¶
In [4]:
Copied!
# Derive Age from the point-in-time and the date of birth.
customer_age = (
(fb.RequestColumn.point_in_time() - customer_dateofbirth).dt.day / 365.25
).floor()
# Name feature
customer_age.name = "CUSTOMER_Age"
# Derive Age from the point-in-time and the date of birth.
customer_age = (
(fb.RequestColumn.point_in_time() - customer_dateofbirth).dt.day / 365.25
).floor()
# Name feature
customer_age.name = "CUSTOMER_Age"
In [5]:
Copied!
# Transform age into a 5 year age band.
customer_age_band = (
((customer_age + 1) / 5).ceil() - 1
) * 5
customer_age_band = (
customer_age_band.astype(str)
+ "-" + (customer_age_band + 4).astype(str)
)
# Name feature
customer_age_band.name = "CUSTOMER_Age_band"
# Transform age into a 5 year age band.
customer_age_band = (
((customer_age + 1) / 5).ceil() - 1
) * 5
customer_age_band = (
customer_age_band.astype(str)
+ "-" + (customer_age_band + 4).astype(str)
)
# Name feature
customer_age_band.name = "CUSTOMER_Age_band"
Preview feature¶
We will use observation table we created in previous tutorial here.
In [6]:
Copied!
# Check the primary entity of the feature'
customer_age.primary_entity
# Check the primary entity of the feature'
customer_age.primary_entity
Out[6]:
[<featurebyte.api.entity.Entity at 0x13a78b0c0> { 'name': 'customer', 'created_at': '2024-06-12T08:05:47.417000', 'updated_at': '2024-06-12T08:05:50.497000', 'description': None, 'serving_names': [ 'GROCERYCUSTOMERGUID' ], 'catalog_name': 'Grocery Dataset Tutorial' }]
In [7]:
Copied!
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
In [8]:
Copied!
# Preview CUSTOMER_Age
customer_age.preview(preview_table)
# Preview CUSTOMER_Age
customer_age.preview(preview_table)
Out[8]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Age | |
---|---|---|---|
0 | 2022-08-17 19:13:52 | 40a07ca4-a991-4d21-b5cf-74ee61220f96 | 78 |
1 | 2022-12-10 21:08:26 | 77d02174-f1e1-41c1-9fb9-01c6246b0009 | 76 |
2 | 2023-04-11 17:23:57 | 6084f39f-9d2c-4111-b1cc-502e1559c0c0 | 32 |
3 | 2022-09-18 18:52:36 | ac7edfb5-63ed-49fb-9b89-76b0288ed2f8 | 38 |
4 | 2023-03-31 18:50:00 | 213ef7d3-c27b-43e0-bc0a-57d6c7c254b0 | 45 |
5 | 2023-02-07 11:04:26 | fd1caae1-77e6-4667-8c83-df13f05bf2f5 | 22 |
6 | 2023-05-05 08:00:42 | 57ca0770-eb8b-4769-8e67-eb1b7cc0a934 | 36 |
7 | 2023-03-17 11:15:09 | 1b627a25-7eb4-4f61-b243-c93db487bff0 | 62 |
8 | 2022-12-26 15:01:07 | 264f79fd-c24a-47cc-8a68-fe3753a4d74b | 44 |
9 | 2023-05-28 19:27:14 | 15973b2f-2256-4caa-b65b-cbbfdff0905b | 80 |
In [9]:
Copied!
# Preview CUSTOMER_Age_band
customer_age_band.preview(preview_table)
# Preview CUSTOMER_Age_band
customer_age_band.preview(preview_table)
Out[9]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Age_band | |
---|---|---|---|
0 | 2022-08-17 19:13:52 | 40a07ca4-a991-4d21-b5cf-74ee61220f96 | 75-79 |
1 | 2022-12-10 21:08:26 | 77d02174-f1e1-41c1-9fb9-01c6246b0009 | 75-79 |
2 | 2023-04-11 17:23:57 | 6084f39f-9d2c-4111-b1cc-502e1559c0c0 | 30-34 |
3 | 2022-09-18 18:52:36 | ac7edfb5-63ed-49fb-9b89-76b0288ed2f8 | 35-39 |
4 | 2023-03-31 18:50:00 | 213ef7d3-c27b-43e0-bc0a-57d6c7c254b0 | 45-49 |
5 | 2023-02-07 11:04:26 | fd1caae1-77e6-4667-8c83-df13f05bf2f5 | 20-24 |
6 | 2023-05-05 08:00:42 | 57ca0770-eb8b-4769-8e67-eb1b7cc0a934 | 35-39 |
7 | 2023-03-17 11:15:09 | 1b627a25-7eb4-4f61-b243-c93db487bff0 | 60-64 |
8 | 2022-12-26 15:01:07 | 264f79fd-c24a-47cc-8a68-fe3753a4d74b | 40-44 |
9 | 2023-05-28 19:27:14 | 15973b2f-2256-4caa-b65b-cbbfdff0905b | 80-84 |
In [10]:
Copied!
# Save features to catalog
customer_age.save()
customer_age_band.save()
# Save features to catalog
customer_age.save()
customer_age_band.save()
Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s)
In [11]:
Copied!
# Add description
customer_age.update_description("Age of the customer.")
customer_age_band.update_description("Age Band of the customer.")
# Add description
customer_age.update_description("Age of the customer.")
customer_age_band.update_description("Age Band of the customer.")
Check feature definition files (same as definition files we discussed in target tutorial)¶
In [12]:
Copied!
customer_age.definition
customer_age.definition
Out[12]:
# Generated by SDK version: 1.1.0.dev7
from bson import ObjectId
from featurebyte import SCDTable
from featurebyte.api.request_column import RequestColumn
# scd_table name: "GROCERYCUSTOMER"
scd_table = SCDTable.get_by_id(ObjectId("666956c28080c62d0dc616df"))
scd_view = scd_table.get_view(
view_mode="manual",
drop_column_names=["record_available_at", "CurrentRecord"],
column_cleaning_operations=[],
)
grouped = scd_view.as_features(
column_names=["DateOfBirth"],
feature_names=["CUSTOMER_DateOfBirth"],
offset=None,
)
feat = grouped["CUSTOMER_DateOfBirth"]
request_col = RequestColumn.point_in_time()
feat_1 = ((request_col - feat).dt.day / 365.25).floor()
feat_1.name = "CUSTOMER_Age"
output = feat_1
output.save(_id=ObjectId("666957381ecbdd152339dece"))
In [13]:
Copied!
customer_age_band.definition
customer_age_band.definition
Out[13]:
# Generated by SDK version: 1.1.0.dev7
from bson import ObjectId
from featurebyte import SCDTable
from featurebyte.api.request_column import RequestColumn
# scd_table name: "GROCERYCUSTOMER"
scd_table = SCDTable.get_by_id(ObjectId("666956c28080c62d0dc616df"))
scd_view = scd_table.get_view(
view_mode="manual",
drop_column_names=["record_available_at", "CurrentRecord"],
column_cleaning_operations=[],
)
grouped = scd_view.as_features(
column_names=["DateOfBirth"],
feature_names=["CUSTOMER_DateOfBirth"],
offset=None,
)
feat = grouped["CUSTOMER_DateOfBirth"]
request_col = RequestColumn.point_in_time()
feat_1 = ((request_col - feat).dt.day / 365.25).floor()
feat_1.name = "CUSTOMER_Age"
feat_2 = ((((feat_1 + 1) / 5).ceil() - 1) * 5) + 4
feat_3 = ((((feat_1 + 1) / 5).ceil() - 1) * 5).astype(str) + "-"
feat_4 = feat_3 + feat_2.astype(str)
feat_4.name = "CUSTOMER_Age_band"
output = feat_4
output.save(_id=ObjectId("666957381ecbdd152339ded8"))