13. Create Feature List
Create a feature list¶
A feature list is an essential component in machine learning, comprising a collection of features that are used to train models. Let's compile a feature list using some of the features we've created.
For additional features:
- Visit the 'Learn by example' section for a variety of features tailored to different entities and signals.
- Check out the 'Bring Your Own Transformer' tutorials to learn about integrating Large Language Models (LLMs) within the FeatureByte ecosystem.
For those in an enterprise setting, explore 'FeatureByte Copilot'. This tool automatically suggests and evaluates features based on the semantics of your data and the specifics of your use case.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:11:44 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088 16:11:44 | INFO | Using profile: tutorial 16:11:44 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:11:44 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:11:44 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:11:44 | INFO | No catalog activated. 16:11:44 | INFO | Catalog activated: Grocery Dataset Tutorial
List all features we created so far¶
In [2]:
Copied!
catalog.list_features()
catalog.list_features()
Out[2]:
id | name | dtype | readiness | online_enabled | tables | primary_tables | entities | primary_entities | created_at | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 666957cd3fab5208644858b2 | CUSTOMER_Mean_vector_of_item_product_ProductGr... | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [GROCERYINVOICE, INVOICEITEMS] | [customer] | [customer] | 2024-06-12T08:11:35.077000 |
1 | 6669578ded0c9d417ba58fff | CUSTOMER_vs_OVERALL_item_TotalCost_across_prod... | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer] | [customer] | 2024-06-12T08:08:56.428000 |
2 | 6669577ca1b61f71af4710cd | CUSTOMER_Latest_invoice_Amount_Z_Score_to_invo... | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:34.434000 |
3 | 6669575033eb5cd5aebc1ff0 | CUSTOMER_Std_of_invoice_Amount_28d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:09.891000 |
4 | 6669575033eb5cd5aebc1fef | CUSTOMER_Std_of_invoice_Amount_14d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:09.341000 |
5 | 6669575033eb5cd5aebc1fee | CUSTOMER_Avg_of_invoice_Amount_28d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:08.818000 |
6 | 6669575033eb5cd5aebc1fed | CUSTOMER_Avg_of_invoice_Amount_14d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:08.270000 |
7 | 6669575033eb5cd5aebc1fec | CUSTOMER_Count_of_invoice_28d | INT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:07.750000 |
8 | 6669575033eb5cd5aebc1feb | CUSTOMER_Count_of_invoice_14d | INT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:07.361000 |
9 | 6669575033eb5cd5aebc1fea | CUSTOMER_Latest_invoice_Amount | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:06.967000 |
10 | 6669575033eb5cd5aebc1fe6 | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_28d | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-06-12T08:08:06.426000 |
11 | 6669575033eb5cd5aebc1fe5 | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-06-12T08:08:05.758000 |
12 | 6669575033eb5cd5aebc1fe9 | CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Time... | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-06-12T08:08:05.091000 |
13 | 666957381ecbdd152339ded8 | CUSTOMER_Age_band | VARCHAR | DRAFT | False | [GROCERYCUSTOMER] | [GROCERYCUSTOMER] | [customer] | [customer] | 2024-06-12T08:07:33.012000 |
14 | 666957381ecbdd152339dece | CUSTOMER_Age | INT | DRAFT | False | [GROCERYCUSTOMER] | [GROCERYCUSTOMER] | [customer] | [customer] | 2024-06-12T08:07:26.610000 |
Get features from catalog¶
In [3]:
Copied!
customer_age_band = catalog.get_feature("CUSTOMER_Age_band")
customer_latest_invoice_amount = catalog.get_feature("CUSTOMER_Latest_invoice_Amount")
customer_count_of_invoice_14d = catalog.get_feature("CUSTOMER_Count_of_invoice_14d")
customer_avg_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Avg_of_invoice_Amount_14d")
customer_std_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Std_of_invoice_Amount_14d")
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d = catalog.get_feature(
"CUSTOMER_Latest_invoice_Amount_Z_Score_to_invoice_Amount_28d"
)
customer_vs_overall_item_totalcost_across_product_productgroups_26w = catalog.get_feature(
"CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w"
)
customer_x_productgroup_sum_of_item_totalcost_14d = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d")
customer_x_productgroup_time_since_latest_timestamp = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Timestamp")
customer_age_band = catalog.get_feature("CUSTOMER_Age_band")
customer_latest_invoice_amount = catalog.get_feature("CUSTOMER_Latest_invoice_Amount")
customer_count_of_invoice_14d = catalog.get_feature("CUSTOMER_Count_of_invoice_14d")
customer_avg_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Avg_of_invoice_Amount_14d")
customer_std_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Std_of_invoice_Amount_14d")
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d = catalog.get_feature(
"CUSTOMER_Latest_invoice_Amount_Z_Score_to_invoice_Amount_28d"
)
customer_vs_overall_item_totalcost_across_product_productgroups_26w = catalog.get_feature(
"CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w"
)
customer_x_productgroup_sum_of_item_totalcost_14d = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d")
customer_x_productgroup_time_since_latest_timestamp = \
catalog.get_feature("CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Timestamp")
Create feature list¶
In [4]:
Copied!
simple_feature_list = fb.FeatureList(
[
customer_age_band,
customer_latest_invoice_amount,
customer_count_of_invoice_14d,
customer_avg_of_invoice_amount_14d,
customer_std_of_invoice_amount_14d,
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d,
customer_vs_overall_item_totalcost_across_product_productgroups_26w,
customer_x_productgroup_sum_of_item_totalcost_14d,
customer_x_productgroup_time_since_latest_timestamp,
],
name="Customer x ProductGroup Simple FeatureList",
)
simple_feature_list = fb.FeatureList(
[
customer_age_band,
customer_latest_invoice_amount,
customer_count_of_invoice_14d,
customer_avg_of_invoice_amount_14d,
customer_std_of_invoice_amount_14d,
customer_latest_invoice_amount_Z_score_to_invoice_amount_28d,
customer_vs_overall_item_totalcost_across_product_productgroups_26w,
customer_x_productgroup_sum_of_item_totalcost_14d,
customer_x_productgroup_time_since_latest_timestamp,
],
name="Customer x ProductGroup Simple FeatureList",
)
Preview feature list¶
In [5]:
Copied!
# Check the primary entity of the feature list
simple_feature_list.primary_entity
# Check the primary entity of the feature list
simple_feature_list.primary_entity
Out[5]:
[<featurebyte.api.entity.Entity at 0x176a74c00> { 'name': 'customer', 'created_at': '2024-06-12T08:05:47.417000', 'updated_at': '2024-06-12T08:05:50.497000', 'description': None, 'serving_names': [ 'GROCERYCUSTOMERGUID' ], 'catalog_name': 'Grocery Dataset Tutorial' }, <featurebyte.api.entity.Entity at 0x1769d41c0> { 'name': 'productgroup', 'created_at': '2024-06-12T08:05:48.244000', 'updated_at': '2024-06-12T08:05:51.678000', 'description': None, 'serving_names': [ 'PRODUCTGROUP' ], 'catalog_name': 'Grocery Dataset Tutorial' }]
In [6]:
Copied!
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
In [7]:
Copied!
# Preview simple_feature_list
simple_feature_list.preview(preview_table)
# Preview simple_feature_list
simple_feature_list.preview(preview_table)
Out[7]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Age_band | CUSTOMER_Latest_invoice_Amount | CUSTOMER_Count_of_invoice_14d | CUSTOMER_Avg_of_invoice_Amount_14d | CUSTOMER_Std_of_invoice_Amount_14d | CUSTOMER_Latest_invoice_Amount_Z_Score_to_invoice_Amount_28d | CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d | CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Timestamp | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-05-28 19:27:14 | 15973b2f-2256-4caa-b65b-cbbfdff0905b | 80-84 | 4.17 | 1.0 | 4.170 | 0.000000 | NaN | 0.603565 | NaN | 1106.468611 |
1 | 2023-02-07 11:04:26 | fd1caae1-77e6-4667-8c83-df13f05bf2f5 | 20-24 | 36.35 | 1.0 | 36.350 | 0.000000 | -0.218570 | 0.736515 | NaN | 360.113056 |
2 | 2022-09-18 18:52:36 | ac7edfb5-63ed-49fb-9b89-76b0288ed2f8 | 35-39 | 95.05 | 6.0 | 34.825 | 27.195796 | 2.357059 | 0.705695 | 16.63 | 99.261667 |
3 | 2023-03-31 18:50:00 | 213ef7d3-c27b-43e0-bc0a-57d6c7c254b0 | 45-49 | 53.11 | 1.0 | 53.110 | 0.000000 | NaN | 0.840879 | 8.00 | 269.623333 |
4 | 2022-12-26 15:01:07 | 264f79fd-c24a-47cc-8a68-fe3753a4d74b | 40-44 | 39.25 | 1.0 | 39.250 | 0.000000 | -0.153150 | 0.836482 | 6.28 | 73.213056 |
5 | 2023-04-11 17:23:57 | 6084f39f-9d2c-4111-b1cc-502e1559c0c0 | 30-34 | 19.70 | 2.0 | 18.490 | 1.210000 | 1.000000 | 0.733969 | 6.00 | 197.481944 |
6 | 2022-12-10 21:08:26 | 77d02174-f1e1-41c1-9fb9-01c6246b0009 | 75-79 | 31.73 | 1.0 | 31.730 | 0.000000 | NaN | 0.794200 | NaN | NaN |
7 | 2022-08-17 19:13:52 | 40a07ca4-a991-4d21-b5cf-74ee61220f96 | 75-79 | 34.27 | NaN | NaN | NaN | 1.658670 | 0.576912 | NaN | 343.933611 |
8 | 2023-03-17 11:15:09 | 1b627a25-7eb4-4f61-b243-c93db487bff0 | 60-64 | 9.82 | 5.0 | 23.380 | 21.979151 | -0.564107 | 0.655646 | 1.67 | 140.312500 |
9 | 2023-05-05 08:00:42 | 57ca0770-eb8b-4769-8e67-eb1b7cc0a934 | 35-39 | 8.47 | 2.0 | 14.795 | 6.325000 | -0.508099 | 0.697578 | NaN | 1097.448611 |
Save feature list¶
In [8]:
Copied!
# Save feature list
simple_feature_list.save()
# Add description
simple_feature_list.update_description("Simple feature list for the customer x productgroup engagement")
# Save feature list
simple_feature_list.save()
# Add description
simple_feature_list.update_description("Simple feature list for the customer x productgroup engagement")
Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Loading Feature(s) |████████████████████████████████████████| 9/9 [100%] in 0.4s