8. Create Lookup Feature
Create Lookup features¶
We've learned how to define a target and materialize data using observation tables. Now, let's dive into basic feature engineering.
The most straightforward features we can craft with FeatureByte are known as lookup features. These are either direct columns taken from the source table or simple computations that don't require any aggregations.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
11:12:12 | INFO | SDK version: 3.2.0.dev66 INFO :featurebyte:SDK version: 3.2.0.dev66 11:12:12 | INFO | No catalog activated. INFO :featurebyte:No catalog activated. 11:12:12 | INFO | Using profile: staging INFO :featurebyte:Using profile: staging 11:12:12 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml INFO :featurebyte:Using configuration file at: /Users/gxav/.featurebyte/config.yaml 11:12:12 | INFO | Active profile: staging (https://staging.featurebyte.com/api/v1) INFO :featurebyte:Active profile: staging (https://staging.featurebyte.com/api/v1) 11:12:12 | INFO | SDK version: 3.2.0.dev66 INFO :featurebyte:SDK version: 3.2.0.dev66 11:12:12 | INFO | No catalog activated. INFO :featurebyte:No catalog activated. 11:12:12 | INFO | Catalog activated: Grocery Dataset SDK Tutorial INFO :featurebyte.api.catalog:Catalog activated: Grocery Dataset SDK Tutorial 16:07:20 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:07:20 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:07:20 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:07:20 | INFO | No catalog activated. 16:07:20 | INFO | Catalog activated: Grocery Dataset Tutorial
In [2]:
Copied!
# Get view from GROCERYCUSTOMER scd table.
grocerycustomer_view = catalog.get_view("GROCERYCUSTOMER")
# Get view from GROCERYCUSTOMER scd table.
grocerycustomer_view = catalog.get_view("GROCERYCUSTOMER")
Create Lookup feature¶
In [3]:
Copied!
# Create lookup feature from DateOfBirth column for customer entity.
customer_dateofbirth = grocerycustomer_view["DateOfBirth"].as_feature("CUSTOMER_DateOfBirth")
# Create lookup feature from DateOfBirth column for customer entity.
customer_dateofbirth = grocerycustomer_view["DateOfBirth"].as_feature("CUSTOMER_DateOfBirth")
Derive Age at the point-in-time of the request observation¶
In [4]:
Copied!
# Derive Age from the point-in-time and the date of birth.
customer_age = (
(fb.RequestColumn.point_in_time() - customer_dateofbirth).dt.day / 365.25
).floor()
# Name feature
customer_age.name = "CUSTOMER_Age"
# Derive Age from the point-in-time and the date of birth.
customer_age = (
(fb.RequestColumn.point_in_time() - customer_dateofbirth).dt.day / 365.25
).floor()
# Name feature
customer_age.name = "CUSTOMER_Age"
In [5]:
Copied!
# Transform age into a 5 year age band.
customer_age_band = (
((customer_age + 1) / 5).ceil() - 1
) * 5
customer_age_band = (
customer_age_band.astype(str)
+ "-" + (customer_age_band + 4).astype(str)
)
# Name feature
customer_age_band.name = "CUSTOMER_Age_band"
# Transform age into a 5 year age band.
customer_age_band = (
((customer_age + 1) / 5).ceil() - 1
) * 5
customer_age_band = (
customer_age_band.astype(str)
+ "-" + (customer_age_band + 4).astype(str)
)
# Name feature
customer_age_band.name = "CUSTOMER_Age_band"
Preview feature¶
We will use observation table we created in previous tutorial here.
In [6]:
Copied!
# Check the primary entity of the feature'
customer_age.primary_entity
# Check the primary entity of the feature'
customer_age.primary_entity
Out[6]:
[<featurebyte.api.entity.Entity at 0x14ec45cb0> { 'name': 'customer', 'created_at': '2025-10-15T03:06:58.482000', 'updated_at': '2025-10-15T03:07:01.153000', 'description': None, 'serving_names': [ 'GROCERYCUSTOMERGUID' ], 'catalog_name': 'Grocery Dataset SDK Tutorial' }]
In [7]:
Copied!
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
In [8]:
Copied!
# Preview CUSTOMER_Age
customer_age.preview(preview_table)
# Preview CUSTOMER_Age
customer_age.preview(preview_table)
Out[8]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Age | |
---|---|---|---|
0 | 2022-07-14 18:12:49 | a3245668-aeba-4259-87e1-1b99a1e7391c | 45 |
1 | 2022-09-12 16:29:49 | f7de9fec-9e01-4478-8b2e-5a17427a53c1 | 31 |
2 | 2023-03-18 09:43:40 | 7ec0da36-b85f-47cf-873e-461b5c2b1cbf | 30 |
3 | 2022-12-01 13:28:57 | 600a7549-9f3e-42f9-8408-e3996d6c4750 | 50 |
4 | 2023-05-14 15:41:46 | e904b0bc-b342-4491-a7d9-216489b6ac01 | 54 |
5 | 2022-10-30 09:08:11 | 5ee681b9-583c-4968-88bb-993704a6c54e | 85 |
6 | 2022-10-03 12:02:23 | f25a7864-4e0d-43bb-8191-0ffdf56b5a21 | 60 |
7 | 2023-01-08 15:40:02 | 879de04d-36ef-49a3-b1f7-a96495100dbe | 41 |
8 | 2023-04-07 13:41:35 | f017e72e-1645-4fbe-988e-a9c09920c506 | 21 |
9 | 2022-12-12 10:27:30 | 9526bbd7-3b85-4bd5-99d7-2eecda85ada2 | 52 |
In [9]:
Copied!
# Preview CUSTOMER_Age_band
customer_age_band.preview(preview_table)
# Preview CUSTOMER_Age_band
customer_age_band.preview(preview_table)
Out[9]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Age_band | |
---|---|---|---|
0 | 2022-07-14 18:12:49 | a3245668-aeba-4259-87e1-1b99a1e7391c | 45-49 |
1 | 2022-09-12 16:29:49 | f7de9fec-9e01-4478-8b2e-5a17427a53c1 | 30-34 |
2 | 2023-03-18 09:43:40 | 7ec0da36-b85f-47cf-873e-461b5c2b1cbf | 30-34 |
3 | 2022-12-01 13:28:57 | 600a7549-9f3e-42f9-8408-e3996d6c4750 | 50-54 |
4 | 2023-05-14 15:41:46 | e904b0bc-b342-4491-a7d9-216489b6ac01 | 50-54 |
5 | 2022-10-30 09:08:11 | 5ee681b9-583c-4968-88bb-993704a6c54e | 85-89 |
6 | 2022-10-03 12:02:23 | f25a7864-4e0d-43bb-8191-0ffdf56b5a21 | 60-64 |
7 | 2023-01-08 15:40:02 | 879de04d-36ef-49a3-b1f7-a96495100dbe | 40-44 |
8 | 2023-04-07 13:41:35 | f017e72e-1645-4fbe-988e-a9c09920c506 | 20-24 |
9 | 2022-12-12 10:27:30 | 9526bbd7-3b85-4bd5-99d7-2eecda85ada2 | 50-54 |
In [10]:
Copied!
# Save features to catalog
customer_age.save()
customer_age_band.save()
# Save features to catalog
customer_age.save()
customer_age_band.save()
Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s)
In [11]:
Copied!
# Add description
customer_age.update_description("Age of the customer.")
customer_age_band.update_description("Age Band of the customer.")
# Add description
customer_age.update_description("Age of the customer.")
customer_age_band.update_description("Age Band of the customer.")
Check feature definition files (same as definition files we discussed in target tutorial)¶
In [12]:
Copied!
customer_age.definition
customer_age.definition
Out[12]:
# Generated by SDK version: 3.2.0.dev66
from bson import ObjectId
from featurebyte import SCDTable
from featurebyte.api.request_column import RequestColumn
request_col = RequestColumn.point_in_time()
# scd_table name: "GROCERYCUSTOMER"
scd_table = SCDTable.get_by_id(ObjectId("68ef0d6293cce39d676ca317"))
scd_view = scd_table.get_view(
view_mode="manual",
drop_column_names=["record_available_at", "CurrentRecord"],
column_cleaning_operations=[],
)
grouped = scd_view.as_features(
column_names=["DateOfBirth"],
feature_names=["CUSTOMER_DateOfBirth"],
offset=None,
)
feat = grouped["CUSTOMER_DateOfBirth"]
feat_1 = ((request_col - feat).dt.day / 365.25).floor()
feat_1.name = "CUSTOMER_Age"
output = feat_1
output.save(_id=ObjectId("68ef110cd0ec86dc5121f298"))
In [13]:
Copied!
customer_age_band.definition
customer_age_band.definition
Out[13]:
# Generated by SDK version: 3.2.0.dev66
from bson import ObjectId
from featurebyte import SCDTable
from featurebyte.api.request_column import RequestColumn
request_col = RequestColumn.point_in_time()
# scd_table name: "GROCERYCUSTOMER"
scd_table = SCDTable.get_by_id(ObjectId("68ef0d6293cce39d676ca317"))
scd_view = scd_table.get_view(
view_mode="manual",
drop_column_names=["record_available_at", "CurrentRecord"],
column_cleaning_operations=[],
)
grouped = scd_view.as_features(
column_names=["DateOfBirth"],
feature_names=["CUSTOMER_DateOfBirth"],
offset=None,
)
feat = grouped["CUSTOMER_DateOfBirth"]
feat_1 = ((request_col - feat).dt.day / 365.25).floor()
feat_1.name = "CUSTOMER_Age"
feat_2 = ((((feat_1 + 1) / 5).ceil() - 1) * 5).astype(str) + "-"
feat_3 = ((((feat_1 + 1) / 5).ceil() - 1) * 5) + 4
feat_4 = feat_2 + feat_3.astype(str)
feat_4.name = "CUSTOMER_Age_band"
output = feat_4
output.save(_id=ObjectId("68ef110cd0ec86dc5121f2a2"))
In [ ]:
Copied!