13. Create Feature List
Create a feature list¶
A feature list is an essential component in machine learning, comprising a collection of features that are used to train models. Let's compile a feature list using some of the features we've created.
For additional features:
- Visit the 'Learn by example' section for a variety of features tailored to different entities and signals.
- Check out the 'Bring Your Own Transformer' tutorials to learn about integrating Large Language Models (LLMs) within the FeatureByte ecosystem.
For those in an enterprise setting, explore 'Ideate Features with FeatureByte Copilot' which adopts an agentic approach to ideate features tailored to your use case.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
10:56:48 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088/ 10:56:48 | INFO | Using profile: tutorial 10:56:48 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 10:56:48 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 10:56:48 | INFO | SDK version: 2.0.1.dev67 10:56:48 | INFO | No catalog activated. 10:56:48 | INFO | Catalog activated: Grocery Dataset Tutorial 16:11:44 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:11:44 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:11:44 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:11:44 | INFO | No catalog activated. 16:11:44 | INFO | Catalog activated: Grocery Dataset Tutorial
List all features we created so far¶
In [2]:
Copied!
catalog.list_features()
catalog.list_features()
Out[2]:
id | name | dtype | readiness | online_enabled | tables | primary_tables | entities | primary_entities | created_at | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 6762395b79a148e88cc4a645 | CUSTOMER_Mean_vector_of_item_product_ProductGr... | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [GROCERYINVOICE, INVOICEITEMS] | [customer] | [customer] | 2024-12-18T02:55:46.148000 |
1 | 676238eeb0d4bd308cd86255 | CUSTOMER_vs_OVERALL_item_TotalCost_across_prod... | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer] | [customer] | 2024-12-18T02:52:47.291000 |
2 | 676238a93fe253046e0e68bb | CUSTOMER_Latest_invoice_Amount_Z_Score_to_invo... | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:51:29.984000 |
3 | 676237a960842f89380978dd | CUSTOMER_Std_of_invoice_Amount_28d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:47:22.367000 |
4 | 676237a960842f89380978dc | CUSTOMER_Std_of_invoice_Amount_14d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:47:21.903000 |
5 | 676237a960842f89380978db | CUSTOMER_Avg_of_invoice_Amount_28d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:47:21.471000 |
6 | 676237a960842f89380978da | CUSTOMER_Avg_of_invoice_Amount_14d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:47:21.022000 |
7 | 676237a960842f89380978d9 | CUSTOMER_Count_of_invoice_28d | INT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:47:20.613000 |
8 | 676237a960842f89380978d8 | CUSTOMER_Count_of_invoice_14d | INT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:47:20.247000 |
9 | 676237a960842f89380978d7 | CUSTOMER_Latest_invoice_Amount | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-12-18T02:47:19.766000 |
10 | 676237a960842f89380978d3 | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_28d | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-12-18T02:47:19.349000 |
11 | 676237a960842f89380978d2 | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-12-18T02:47:18.850000 |
12 | 676237a960842f89380978d6 | CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Time... | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-12-18T02:47:18.197000 |
13 | 67623766043662b29958c254 | CUSTOMER_Age_band | VARCHAR | DRAFT | False | [GROCERYCUSTOMER] | [GROCERYCUSTOMER] | [customer] | [customer] | 2024-12-18T02:46:11.067000 |
14 | 67623766043662b29958c24a | CUSTOMER_Age | INT | DRAFT | False | [GROCERYCUSTOMER] | [GROCERYCUSTOMER] | [customer] | [customer] | 2024-12-18T02:46:04.282000 |
Get features from catalog¶
In [3]:
Copied!
customer_age_band = catalog.get_feature("CUSTOMER_Age_band")
customer_latest_invoice_amount = catalog.get_feature("CUSTOMER_Latest_invoice_Amount")
customer_count_of_invoice_14d = catalog.get_feature("CUSTOMER_Count_of_invoice_14d")
customer_avg_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Avg_of_invoice_Amount_14d")
customer_std_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Std_of_invoice_Amount_14d")
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d = catalog.get_feature(
"CUSTOMER_Latest_invoice_Amount_Z_Score_to_invoice_Amount_28d"
)
customer_vs_overall_item_totalcost_across_product_productgroups_26w = catalog.get_feature(
"CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w"
)
customer_x_productgroup_sum_of_item_totalcost_14d = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d")
customer_x_productgroup_time_since_latest_timestamp = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Timestamp")
customer_age_band = catalog.get_feature("CUSTOMER_Age_band")
customer_latest_invoice_amount = catalog.get_feature("CUSTOMER_Latest_invoice_Amount")
customer_count_of_invoice_14d = catalog.get_feature("CUSTOMER_Count_of_invoice_14d")
customer_avg_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Avg_of_invoice_Amount_14d")
customer_std_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Std_of_invoice_Amount_14d")
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d = catalog.get_feature(
"CUSTOMER_Latest_invoice_Amount_Z_Score_to_invoice_Amount_28d"
)
customer_vs_overall_item_totalcost_across_product_productgroups_26w = catalog.get_feature(
"CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w"
)
customer_x_productgroup_sum_of_item_totalcost_14d = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d")
customer_x_productgroup_time_since_latest_timestamp = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Timestamp")
Create feature list¶
In [4]:
Copied!
simple_feature_list = fb.FeatureList(
[
customer_age_band,
customer_latest_invoice_amount,
customer_count_of_invoice_14d,
customer_avg_of_invoice_amount_14d,
customer_std_of_invoice_amount_14d,
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d,
customer_vs_overall_item_totalcost_across_product_productgroups_26w,
customer_x_productgroup_sum_of_item_totalcost_14d,
customer_x_productgroup_time_since_latest_timestamp,
],
name="Customer x ProductGroup Simple FeatureList",
)
simple_feature_list = fb.FeatureList(
[
customer_age_band,
customer_latest_invoice_amount,
customer_count_of_invoice_14d,
customer_avg_of_invoice_amount_14d,
customer_std_of_invoice_amount_14d,
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d,
customer_vs_overall_item_totalcost_across_product_productgroups_26w,
customer_x_productgroup_sum_of_item_totalcost_14d,
customer_x_productgroup_time_since_latest_timestamp,
],
name="Customer x ProductGroup Simple FeatureList",
)
Preview feature list¶
In [5]:
Copied!
# Check the primary entity of the feature list
simple_feature_list.primary_entity
# Check the primary entity of the feature list
simple_feature_list.primary_entity
Out[5]:
[<featurebyte.api.entity.Entity at 0x316c6dc60> { 'name': 'customer', 'created_at': '2024-12-18T02:40:54.536000', 'updated_at': '2024-12-18T02:40:58.932000', 'description': None, 'serving_names': [ 'GROCERYCUSTOMERGUID' ], 'catalog_name': 'Grocery Dataset Tutorial' }, <featurebyte.api.entity.Entity at 0x1687c8d60> { 'name': 'productgroup', 'created_at': '2024-12-18T02:40:55.766000', 'updated_at': '2024-12-18T02:41:00.663000', 'description': None, 'serving_names': [ 'PRODUCTGROUP' ], 'catalog_name': 'Grocery Dataset Tutorial' }]
In [6]:
Copied!
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
In [7]:
Copied!
# Preview simple_feature_list
simple_feature_list.preview(preview_table)
# Preview simple_feature_list
simple_feature_list.preview(preview_table)
Out[7]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Age_band | CUSTOMER_Latest_invoice_Amount | CUSTOMER_Count_of_invoice_14d | CUSTOMER_Avg_of_invoice_Amount_14d | CUSTOMER_Std_of_invoice_Amount_14d | CUSTOMER_Latest_invoice_Amount_Z_Score_to_invoice_Amount_28d | CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d | CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Timestamp | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-07-23 17:33:25 | 849454c5-6640-419d-871d-1f0895a1c3b4 | 35-39 | 4.64 | 1.0 | 4.640000 | 0.000000 | NaN | 0.668405 | NaN | 813.648056 |
1 | 2023-01-06 08:49:29 | dad86e21-3af4-4b5b-8058-60f946b6dac5 | 50-54 | 16.51 | 7.0 | 10.618571 | 3.572868 | 0.016777 | 0.640181 | NaN | 596.949444 |
2 | 2022-09-07 09:16:15 | 53eac49e-7ecd-4cb6-a1c0-38e3900efd7a | 80-84 | 48.67 | 5.0 | 41.372000 | 7.211342 | 0.901901 | 0.747673 | 18.05 | 69.813889 |
3 | 2023-05-21 13:59:44 | 989643fe-1377-4f5f-8f38-f349a611da0c | 65-69 | 4.79 | 4.0 | 22.697500 | 15.060992 | -1.404144 | 0.722324 | 4.27 | 171.889444 |
4 | 2023-06-06 16:13:44 | bdc6b6bb-a23a-48c8-bb3a-622d9161c0e8 | 45-49 | 21.27 | NaN | NaN | NaN | 1.008156 | 0.803588 | NaN | 1007.838611 |
5 | 2022-08-22 06:10:42 | 9267d0dd-9685-4667-8f06-1761abe73c4d | 25-29 | 7.77 | 2.0 | 34.075000 | 26.305000 | -0.820547 | 0.809058 | 5.99 | 181.691667 |
6 | 2023-03-07 17:53:55 | cb478a4e-9266-4523-8ee0-e205881cc5f5 | 20-24 | 24.21 | NaN | NaN | NaN | NaN | 0.677664 | NaN | 2761.916944 |
7 | 2022-12-30 16:42:05 | 4dd1487a-0379-4eef-b200-97ac1bb1164f | 65-69 | 27.73 | 8.0 | 16.381250 | 9.945394 | 1.340493 | 0.800607 | 0.89 | 292.786944 |
8 | 2022-09-23 18:09:33 | 459a1b6e-1239-46d1-9e40-539c7e895483 | 45-49 | 41.98 | 2.0 | 65.835000 | 23.855000 | -0.181665 | 0.781515 | 2.69 | 217.736944 |
9 | 2023-03-21 13:49:48 | 84225bbb-adb8-451e-98c0-897c83c2fad9 | 65-69 | 21.90 | 2.0 | 19.690000 | 2.210000 | -0.063562 | 0.803208 | 7.67 | 259.306111 |
Save feature list¶
In [8]:
Copied!
# Save feature list
simple_feature_list.save()
# Add description
simple_feature_list.update_description("Simple feature list for the customer x productgroup engagement")
# Save feature list
simple_feature_list.save()
# Add description
simple_feature_list.update_description("Simple feature list for the customer x productgroup engagement")
Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Loading Feature(s) |████████████████████████████████████████| 9/9 [100%] in 0.5s Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Loading Feature(s) |████████████████████████████████████████| 9/9 [100%] in 0.4s