13. Compute Historical Feature Values
Compute historical feature values¶
Historical feature values are needed to train and test Machine Learning models.
Let's take the feature list we just created and compute feature values for a given observation table.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Credit Default Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Credit Default Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:44:44 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088/ 16:44:44 | INFO | Using profile: tutorial 16:44:44 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:44:44 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:44:44 | INFO | SDK version: 2.1.0.dev113 16:44:44 | INFO | No catalog activated. 16:44:44 | INFO | Catalog activated: Credit Default Dataset SDK Tutorial 16:12:25 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:12:25 | INFO | No catalog activated. 16:12:25 | INFO | Catalog activated: Grocery Dataset Tutorial
List feature lists in Catalog¶
In [2]:
Copied!
catalog.list_feature_lists()
catalog.list_feature_lists()
Out[2]:
id | name | num_feature | status | deployed | readiness_frac | online_frac | tables | entities | primary_entity | created_at | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 67c2c8de53e241b5a68dd616 | 51 features for Credit Default | 51 | DRAFT | False | 0.0 | 0.0 | [NEW_APPLICATION, PRIOR_APPLICATIONS, CONSUMER... | [New Application, Client] | [New Application] | 2025-03-01T08:44:22.006000 |
Get Feature List from Catalog¶
In [3]:
Copied!
feature_list_name = "51 features for Credit Default"
simple_feature_list = catalog.get_feature_list(feature_list_name)
feature_list_name = "51 features for Credit Default"
simple_feature_list = catalog.get_feature_list(feature_list_name)
Loading Feature(s) |████████████████████████████████████████| 51/51 [100%] in 0.
Get an observation table¶
In [4]:
Copied!
# List observation tables
catalog.list_observation_tables()
# List observation tables
catalog.list_observation_tables()
Out[4]:
id | name | type | shape | feature_store_name | created_at | |
---|---|---|---|---|---|---|
0 | 67c2c801fbb34f46dc0daa56 | PREVIEW_TABLE | source_table | [50, 3] | playground | 2025-03-01T08:40:43.630000 |
1 | 67c2c7f0fbb34f46dc0daa55 | CREDIT_DEFAULT_EDA_2019_2023 | source_table | [49963, 3] | playground | 2025-03-01T08:40:27.363000 |
2 | 67c2c7e3fbb34f46dc0daa54 | CREDIT_DEFAULT_HOLDOUT_2024_1H | source_table | [25666, 3] | playground | 2025-03-01T08:40:12.763000 |
3 | 67c2c7cffbb34f46dc0daa53 | CREDIT_DEFAULT_TRAIN_2019_2023 | source_table | [256273, 3] | playground | 2025-03-01T08:39:58.304000 |
In [5]:
Copied!
# Get observation table: 'CREDIT_DEFAULT_TRAIN_2019_2023'
training_observations = catalog.get_observation_table("CREDIT_DEFAULT_TRAIN_2019_2023")
# Get observation table: 'CREDIT_DEFAULT_TRAIN_2019_2023'
training_observations = catalog.get_observation_table("CREDIT_DEFAULT_TRAIN_2019_2023")
In [6]:
Copied!
# Get observation table: 'CREDIT_DEFAULT_HOLDOUT_2024_1H'
holdout_observations = catalog.get_observation_table("CREDIT_DEFAULT_HOLDOUT_2024_1H")
# Get observation table: 'CREDIT_DEFAULT_HOLDOUT_2024_1H'
holdout_observations = catalog.get_observation_table("CREDIT_DEFAULT_HOLDOUT_2024_1H")
Compute historical features¶
In [7]:
Copied!
# Create training data
training_data_table = simple_feature_list.compute_historical_feature_table(
training_observations,
historical_feature_table_name=f"{feature_list_name} - TRAIN_2019_2023",
)
# Create training data
training_data_table = simple_feature_list.compute_historical_feature_table(
training_observations,
historical_feature_table_name=f"{feature_list_name} - TRAIN_2019_2023",
)
Done! |████████████████████████████████████████| 100% in 2:44.2 (0.01%/s) Done! |████████████████████████████████████████| 100% in 36.4s (0.03%/s)
In [8]:
Copied!
holdout_data_table = simple_feature_list.compute_historical_feature_table(
holdout_observations,
historical_feature_table_name=f"{feature_list_name} - HOLDOUT_2024_1H",
)
holdout_data_table = simple_feature_list.compute_historical_feature_table(
holdout_observations,
historical_feature_table_name=f"{feature_list_name} - HOLDOUT_2024_1H",
)
Done! |████████████████████████████████████████| 100% in 1:55.4 (0.01%/s)
In [9]:
Copied!
### List historical feature tables from catalog
catalog.list_historical_feature_tables()
### List historical feature tables from catalog
catalog.list_historical_feature_tables()
Out[9]:
id | name | feature_store_name | observation_table_name | shape | created_at | |
---|---|---|---|---|---|---|
0 | 67c2c9a34e08d83e21381725 | 51 features for Credit Default - HOLDOUT_2024_1H | playground | CREDIT_DEFAULT_HOLDOUT_2024_1H | [25666, 54] | 2025-03-01T08:49:21.154000 |
1 | 67c2c8fe3df413286793fb5a | 51 features for Credit Default - TRAIN_2019_2023 | playground | CREDIT_DEFAULT_TRAIN_2019_2023 | [256273, 54] | 2025-03-01T08:47:25.134000 |
Concepts in this tutorial¶
SDK reference for¶
- Historical feature table
- FeatureList.compute historical feature table()
- FeatureList.compute_historical_features() to compute directly a data frame