7. Create Observation Tables
Create Observation Table¶
FeatureByte has two important concepts related to feature materialization:
- Observation Set - a collection that combines specific moments in history (timestamps) with related entity key values, used to determine feature values for those moments.
- Observation Table - representation of observation set in the feature store.
In this tutorial we will:
- create 2 observation tables for a use case requiring predictions about a grocery customer whenever an invoice event occurs. One will be used to preview features for the customer entity. One will be used to get historical features for training with the customer entity.
- compute values for the target we previously created.
- create one observation table at the item level that can be used to preview features for any entity.
In [1]:
Copied!
import featurebyte as fb
import pandas as pd
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
import featurebyte as fb
import pandas as pd
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
16:41:03 | INFO | Using configuration file at: /Users/viktor/.featurebyte/config.yaml 16:41:03 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:41:03 | INFO | SDK version: 0.6.0.dev121 16:41:03 | INFO | No catalog activated. 16:41:03 | INFO | 10 feature lists, 59 features deployed 16:41:03 | INFO | Using profile: tutorial 16:41:04 | INFO | Using configuration file at: /Users/viktor/.featurebyte/config.yaml 16:41:04 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:41:04 | INFO | SDK version: 0.6.0.dev121 16:41:04 | INFO | No catalog activated. 16:41:04 | INFO | 10 feature lists, 59 features deployed 16:41:05 | INFO | Catalog activated: Grocery Dataset Tutorial
Let's create 2 observation tables:
- one for preview (with the size of 10)
- one for training (with the size of 1K)
In [2]:
Copied!
cond = (groceryinvoice_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
groceryinvoice_view["Timestamp"] < pd.to_datetime("2023-07-01")
)
groceryinvoice_1y_view = groceryinvoice_view[cond].copy()
for size in [10, 1000]:
if size == 10:
table_name = "Preview Table with 10 Customers"
else:
table_name = f"1K Customers at time of purchase 22S2-23S1"
observation_table = groceryinvoice_1y_view.create_observation_table(
name=table_name,
sample_rows=size,
columns=["Timestamp", "GroceryCustomerGuid"],
columns_rename_mapping={
"Timestamp": "POINT_IN_TIME",
"GroceryCustomerGuid": "GROCERYCUSTOMERGUID",
},
)
observation_table.update_description(
f"{size} customers at time of purchase between 01-Jul-2022 and 30-Jun-2023"
)
cond = (groceryinvoice_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
groceryinvoice_view["Timestamp"] < pd.to_datetime("2023-07-01")
)
groceryinvoice_1y_view = groceryinvoice_view[cond].copy()
for size in [10, 1000]:
if size == 10:
table_name = "Preview Table with 10 Customers"
else:
table_name = f"1K Customers at time of purchase 22S2-23S1"
observation_table = groceryinvoice_1y_view.create_observation_table(
name=table_name,
sample_rows=size,
columns=["Timestamp", "GroceryCustomerGuid"],
columns_rename_mapping={
"Timestamp": "POINT_IN_TIME",
"GroceryCustomerGuid": "GROCERYCUSTOMERGUID",
},
)
observation_table.update_description(
f"{size} customers at time of purchase between 01-Jul-2022 and 30-Jun-2023"
)
Done! |████████████████████████████████████████| 100% in 9.7s (0.10%/s) Done! |████████████████████████████████████████| 100% in 6.5s (0.16%/s)
Add a target to observation tables¶
In [3]:
Copied!
target = catalog.get_target("CUSTOMER_Sum_of_invoice_Amount_next_14d")
target = catalog.get_target("CUSTOMER_Sum_of_invoice_Amount_next_14d")
In [4]:
Copied!
# get new observation table with the target populated
observation_table = catalog.get_observation_table("1K Customers at time of purchase 22S2-23S1")
target.compute_target_table(observation_table, "1K Customers Spending next 2 weeks at time of purchase 22S2-23S1")
# get new observation table with the target populated
observation_table = catalog.get_observation_table("1K Customers at time of purchase 22S2-23S1")
target.compute_target_table(observation_table, "1K Customers Spending next 2 weeks at time of purchase 22S2-23S1")
Done! |████████████████████████████████████████| 100% in 9.7s (0.10%/s)
Out[4]:
Observation Table
name | 1K Customers Spending next 2 weeks at time of purchase 22S2-23S1 | ||||||
created_at | 2023-11-27 15:41:29 | ||||||
updated_at | None | ||||||
description | None | ||||||
type | observation_table | ||||||
feature_store_name | playground | ||||||
table_details |
|
In [5]:
Copied!
training_table = catalog.get_observation_table(
"1K Customers Spending next 2 weeks at time of purchase 22S2-23S1"
)
training_table.to_pandas().head()
training_table = catalog.get_observation_table(
"1K Customers Spending next 2 weeks at time of purchase 22S2-23S1"
)
training_table.to_pandas().head()
Downloading table |████████████████████████████████████████| 1000/1000 [100%] in
Out[5]:
POINT_IN_TIME | GROCERYCUSTOMERGUID | CUSTOMER_Sum_of_invoice_Amount_next_14d | |
---|---|---|---|
0 | 2022-09-24 18:03:04 | 5c96089d-95f7-4a12-ab13-e082836253f1 | 53.60 |
1 | 2022-10-05 07:15:20 | 5c96089d-95f7-4a12-ab13-e082836253f1 | 41.37 |
2 | 2022-11-15 17:57:45 | 6b8a9be8-25b0-42e3-8374-6abcac14afac | 14.74 |
3 | 2022-08-25 09:28:58 | ef39897d-3562-4b1d-aa0e-3398d4e62084 | 179.68 |
4 | 2022-11-10 17:02:24 | ef39897d-3562-4b1d-aa0e-3398d4e62084 | 132.98 |
Create Preview observation table at the item level¶
This observation table can be used to materialize target, features and feature lists with primary entities that are parents of the item entity. This includes:
- item
- product
- productgroup
- invoice
- customer
- frenchstate
- customer, product
- ...
In [6]:
Copied!
invoiceitems_view = catalog.get_view("INVOICEITEMS")
invoiceitems_view = catalog.get_view("INVOICEITEMS")
In [7]:
Copied!
# filter the view to extract items during a specific period of time
cond = (invoiceitems_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
invoiceitems_view["Timestamp"] < pd.to_datetime("2023-07-01")
)
invoiceitems_1y_view = invoiceitems_view[cond].copy()
observation_table = invoiceitems_1y_view.create_observation_table(
name=f"Preview Table with 10 items",
sample_rows=10,
columns=["Timestamp", "GroceryInvoiceItemGuid"],
columns_rename_mapping={
"Timestamp": "POINT_IN_TIME",
"GroceryInvoiceItemGuid": "GROCERYINVOICEITEMGUID",
},
)
observation_table.update_description(
f"Preview Table with 10 items between 01-Jul-2022 and 30-Jun-2023"
)
# filter the view to extract items during a specific period of time
cond = (invoiceitems_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
invoiceitems_view["Timestamp"] < pd.to_datetime("2023-07-01")
)
invoiceitems_1y_view = invoiceitems_view[cond].copy()
observation_table = invoiceitems_1y_view.create_observation_table(
name=f"Preview Table with 10 items",
sample_rows=10,
columns=["Timestamp", "GroceryInvoiceItemGuid"],
columns_rename_mapping={
"Timestamp": "POINT_IN_TIME",
"GroceryInvoiceItemGuid": "GROCERYINVOICEITEMGUID",
},
)
observation_table.update_description(
f"Preview Table with 10 items between 01-Jul-2022 and 30-Jun-2023"
)
Done! |████████████████████████████████████████| 100% in 6.5s (0.16%/s)
In [8]:
Copied!
# preview target with the preview table we just created
target.preview(observation_table.to_pandas())
# preview target with the preview table we just created
target.preview(observation_table.to_pandas())
Downloading table |████████████████████████████████████████| 10/10 [100%] in 0.1
Out[8]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Sum_of_invoice_Amount_next_14d | |
---|---|---|---|
0 | 2022-07-12 05:41:08 | f5c7565b-093b-4827-9737-adec41de8f71 | 115.42 |
1 | 2022-07-29 19:43:37 | c360c283-7d76-4cab-bce6-a84bbcae358a | 100.97 |
2 | 2022-08-08 16:13:32 | aa0c70cb-065a-48e5-8408-85d51ffa54f9 | 21.70 |
3 | 2022-08-23 22:03:08 | 2154665c-9084-44a1-9586-8e4e8db0719a | 22.80 |
4 | 2022-08-31 17:39:10 | 17a7bc72-7c3a-4abc-b4be-c62509a208bc | 28.35 |
5 | 2022-09-09 08:58:33 | 52d80b44-40e0-45e9-9738-7c19eeb4714b | 77.61 |
6 | 2022-11-06 11:48:02 | 4bca160d-754d-4598-bf79-1349baccba04 | 35.05 |
7 | 2023-01-31 15:16:15 | 5abdf2a6-415d-4f9c-ae78-8b929cab6567 | 48.50 |
8 | 2023-03-03 13:37:43 | 31b77be6-dd7c-4475-a158-c86c22939da2 | 61.13 |
9 | 2023-03-05 15:38:41 | e0fe6acb-de52-49f2-b27b-f01a0e287892 | 51.86 |
List observation tables in catalog¶
In [9]:
Copied!
catalog.list_observation_tables()
catalog.list_observation_tables()
Out[9]:
id | name | type | shape | feature_store_name | created_at | |
---|---|---|---|---|---|---|
0 | 6564b8b466074d16d2f61078 | Preview Table with 10 items | view | [10, 2] | playground | 2023-11-27T15:41:42.922000 |
1 | 6564b8a56f39a417e1370dbd | 1K Customers Spending next 2 weeks at time of ... | observation_table | [1000, 3] | playground | 2023-11-27T15:41:29.401000 |
2 | 6564b89d66074d16d2f61075 | 1K Customers at time of purchase 22S2-23S1 | view | [1000, 2] | playground | 2023-11-27T15:41:20.022000 |
3 | 6564b89266074d16d2f61073 | Preview Table with 10 Customers | view | [10, 2] | playground | 2023-11-27T15:41:10.029000 |
In [ ]:
Copied!