13. Compute historical feature values
Compute historical feature values¶
Historical feature values are needed to train and test Machine Learning models.
Let's take the feature list we just created and compute feature values for a given observation table.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
22:02:48 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 22:02:48 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 22:02:49 | WARNING | Remote SDK version (0.5.0.dev6) is different from local (0.5.0.dev1). Update local SDK to avoid unexpected behavior. 22:02:49 | INFO | No catalog activated. 22:02:49 | INFO | 6 feature lists, 31 features deployed 22:02:49 | INFO | Using profile: tutorial 22:02:50 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 22:02:50 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 22:02:50 | WARNING | Remote SDK version (0.5.0.dev6) is different from local (0.5.0.dev1). Update local SDK to avoid unexpected behavior. 22:02:50 | INFO | No catalog activated. 22:02:50 | INFO | 6 feature lists, 31 features deployed 22:02:51 | INFO | Catalog activated: Grocery Dataset Tutorial
List feature lists in Catalog¶
In [2]:
Copied!
catalog.list_feature_lists()
catalog.list_feature_lists()
Out[2]:
id | name | num_feature | status | deployed | readiness_frac | online_frac | tables | entities | primary_entities | created_at | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 64ff1dec72f1e0466e55f6a3 | Customer Simple FeatureList | 7 | DRAFT | False | 0.0 | 0.0 | [GROCERYCUSTOMER, GROCERYINVOICE, INVOICEITEMS... | [customer] | [customer] | 2023-09-11T14:02:35.008000 |
Get Feature List from Catalog¶
In [3]:
Copied!
simple_feature_list = catalog.get_feature_list("Customer Simple FeatureList")
simple_feature_list = catalog.get_feature_list("Customer Simple FeatureList")
Loading Feature(s) |████████████████████████████████████████| 7/7 [100%] in 1.0s
Get an observation table¶
In [4]:
Copied!
# List observation tables
catalog.list_observation_tables()
# List observation tables
catalog.list_observation_tables()
Out[4]:
id | name | type | shape | feature_store_name | created_at | |
---|---|---|---|---|---|---|
0 | 64ff1d2b66704a9790300a44 | Preview Table with 10 items | view | [10, 2] | playground | 2023-09-11T13:59:11.124000 |
1 | 64ff1d18c0038ba1e425262a | 1K Customers Spending next 2 weeks at time of ... | observation_table | [1000, 3] | playground | 2023-09-11T13:58:53.315000 |
2 | 64ff1d0f66704a9790300a42 | 1K Customers at time of purchase 22S2-23S1 | view | [1000, 2] | playground | 2023-09-11T13:58:43.075000 |
3 | 64ff1d0666704a9790300a41 | Preview Table with 10 Customers | view | [10, 2] | playground | 2023-09-11T13:58:33.781000 |
In [5]:
Copied!
# Get observation table: '1K Customers Spending next 2 weeks at time of purchase 22S2-23S1'
training_observations = catalog.get_observation_table(
"1K Customers Spending next 2 weeks at time of purchase 22S2-23S1"
)
# Get observation table: '1K Customers Spending next 2 weeks at time of purchase 22S2-23S1'
training_observations = catalog.get_observation_table(
"1K Customers Spending next 2 weeks at time of purchase 22S2-23S1"
)
In [6]:
Copied!
training_observations.sample()
training_observations.sample()
Out[6]:
POINT_IN_TIME | GROCERYCUSTOMERGUID | CUSTOMER_Sum_of_invoice_Amount_next_14d | |
---|---|---|---|
0 | 2023-03-05 12:23:45 | 0401635c-e6ab-4525-bb5d-00aba7f6d0c4 | 0.00 |
1 | 2023-06-15 08:24:19 | cea213d4-36e4-48c3-ae8d-c7a25911e11c | 100.88 |
2 | 2022-12-11 13:10:30 | f027222e-1a98-45e0-9acd-7137e927ac28 | 9.02 |
3 | 2022-09-17 12:30:23 | 166ee5dc-fd4c-465c-ada6-dcc75ebe7a91 | 18.11 |
4 | 2023-03-26 13:31:05 | cea213d4-36e4-48c3-ae8d-c7a25911e11c | 168.95 |
5 | 2022-07-30 08:26:15 | b6cf0881-0e22-4c13-bbfc-c495de4e3a34 | 183.05 |
6 | 2022-12-30 12:00:29 | df0b0c04-f51b-48a5-b330-772cae5b9283 | 26.68 |
7 | 2023-06-04 13:40:14 | 11d0850b-f235-4ff6-bb56-c9f0acdf9fd3 | 187.55 |
8 | 2022-09-26 09:46:38 | df3dc0a5-5f13-4818-acdb-027083662eba | 0.00 |
9 | 2022-10-07 16:05:41 | 42c510f3-1e79-453c-9b81-61b00262d64b | 54.39 |
Compute historical features¶
In [7]:
Copied!
# Create 'Simple Training data 1K Spending n2w 22S2-23S1' historical feature table
training_data_table = simple_feature_list.compute_historical_feature_table(
training_observations,
historical_feature_table_name="Simple Training data 1K Spending n2w 22S2-23S1",
)
# Create 'Simple Training data 1K Spending n2w 22S2-23S1' historical feature table
training_data_table = simple_feature_list.compute_historical_feature_table(
training_observations,
historical_feature_table_name="Simple Training data 1K Spending n2w 22S2-23S1",
)
Done! |████████████████████████████████████████| 100% in 47.6s (0.02%/s)
In [8]:
Copied!
display(training_data_table.to_pandas())
display(training_data_table.to_pandas())
Downloading table |████████████████████████████████████████| 1000/1000 [100%] in
POINT_IN_TIME | GROCERYCUSTOMERGUID | CUSTOMER_Sum_of_invoice_Amount_next_14d | CUSTOMER_Age_band | CUSTOMER_Latest_invoice_Amount | CUSTOMER_Count_of_invoice_14d | CUSTOMER_Avg_of_invoice_Amount_14d | CUSTOMER_Std_of_invoice_Amount_14d | CUSTOMER_Latest_invoice_Amount_Z_Score_to_invoice_Amount_28d | CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-11-26 14:13:21 | 5f18f733-ef27-423b-8fb7-6172948c9255 | 71.31 | 75-79 | 4.13 | 10.0 | 4.789000 | 3.484381 | -0.230751 | 0.712665 |
1 | 2022-08-05 16:41:12 | cc4220ec-16ab-4bb9-991d-deef994bf27a | 151.48 | 35-39 | 87.69 | 7.0 | 33.887143 | 25.595703 | 2.355269 | 0.719676 |
2 | 2022-10-18 14:24:58 | c0ca0bda-e7f5-4748-9b14-0e7ba9a07a47 | 98.51 | 65-69 | 9.58 | 7.0 | 14.647143 | 9.676206 | -0.214018 | 0.779449 |
3 | 2022-10-19 12:35:16 | 53b76d93-0577-4dca-bc7b-dc493120c3be | 8.67 | 25-29 | 62.83 | 1.0 | 62.830000 | 0.000000 | 1.000000 | 0.349948 |
4 | 2022-09-01 10:17:56 | 85738676-0861-4922-81c0-6f8849ba419a | 117.47 | 50-54 | 47.12 | 1.0 | 47.120000 | 0.000000 | -1.000000 | 0.497410 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
995 | 2022-12-31 16:46:08 | 26417985-a02f-4db7-ae77-e7072d4a0fe7 | 163.96 | 35-39 | 93.23 | 0.0 | NaN | NaN | 0.847508 | 0.583593 |
996 | 2022-12-16 16:48:47 | 09f35825-38ef-4a01-8385-c41822f59de9 | 0.00 | 50-54 | 41.78 | 0.0 | NaN | NaN | NaN | 0.413224 |
997 | 2022-11-28 15:44:51 | 59f788b5-0731-4aa8-9cb4-cf76e803fae1 | 17.58 | 50-54 | 6.99 | 0.0 | NaN | NaN | -0.741130 | 0.599209 |
998 | 2023-06-18 17:24:28 | c1079585-2704-4031-9f76-1ceb760f5dc7 | 0.00 | 45-49 | 4.33 | 0.0 | NaN | NaN | NaN | 0.539391 |
999 | 2022-11-29 08:40:17 | 1c50ba28-9434-4934-ace0-83e7817eb60d | 0.00 | 80-84 | 102.63 | 0.0 | NaN | NaN | 1.000000 | 0.707877 |
1000 rows × 10 columns
In [9]:
Copied!
### List historical feature tables from catalog
catalog.list_historical_feature_tables()
### List historical feature tables from catalog
catalog.list_historical_feature_tables()
Out[9]:
id | name | feature_store_name | observation_table_name | shape | created_at | |
---|---|---|---|---|---|---|
0 | 64ff1e18c0038ba1e425268a | Simple Training data 1K Spending n2w 22S2-23S1 | playground | 1K Customers Spending next 2 weeks at time of ... | [1000, 10] | 2023-09-11T14:03:46.041000 |
Concepts in this tutorial¶
SDK reference for¶
- Historical feature table
- FeatureList.compute historical feature table()
- FeatureList.compute_historical_features() to compute directly a data frame
In [10]:
Copied!
catalog.list_tables()
catalog.list_tables()
Out[10]:
id | name | type | status | entities | created_at | |
---|---|---|---|---|---|---|
0 | 64ff1c940d5bfbfb21bce78c | GROCERYPRODUCT | dimension_table | PUBLIC_DRAFT | [product, productgroup] | 2023-09-11T13:56:37.144000 |
1 | 64ff1c920d5bfbfb21bce78b | INVOICEITEMS | item_table | PUBLIC_DRAFT | [item, invoice, product] | 2023-09-11T13:56:35.927000 |
2 | 64ff1c910d5bfbfb21bce78a | GROCERYINVOICE | event_table | PUBLIC_DRAFT | [invoice, customer] | 2023-09-11T13:56:34.182000 |
3 | 64ff1c8f0d5bfbfb21bce789 | GROCERYCUSTOMER | scd_table | PUBLIC_DRAFT | [customer, frenchstate] | 2023-09-11T13:56:32.993000 |
In [ ]:
Copied!