7. Create Observation Tables
Create Observation Table¶
An Observation Set is a collection that combines specific moments in history (timestamps) with related entity key values, used to determine feature values for those moments. Think of it as the backbone of a training dataset.
An Observation Table is its representation in the feature store.
You can either:
- upload an Observation Table from a parquet or csv file
- create it from a View,
- or create it from a Source Table.
This guide explains how to configure Observation Tables from a Source Table and link them to our Credit Default context and use case.
We will create four Observation Tables:
- CREDIT_DEFAULT_TRAIN_2019_2023: Credit Default Observations for training ranging from 2019 to 2023.
- CREDIT_DEFAULT_HOLDOUT_2024_1H: Credit Default Observations for testing ranging from 2024/01/01 to 2024/07/01.
- CREDIT_DEFAULT_EDA_2019_2023: Credit Default Observations for EDA ranging from 2019 to 2023.
- PREVIEW_TABLE: 50 Credit Default Observations for Feature Preview.
For an example how to upload an Observation Table or create it from a view, check out the Grocery SDK Tutorial. The tutorial also covers how to add Target values when your target has been registered with a logical approach.
In [1]:
Copied!
import featurebyte as fb
import pandas as pd
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Credit Default Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
import pandas as pd
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Credit Default Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:39:41 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088/ 16:39:41 | INFO | Using profile: tutorial 16:39:41 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:39:41 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:39:41 | INFO | SDK version: 2.1.0.dev113 16:39:41 | INFO | No catalog activated. 16:39:41 | INFO | Catalog activated: Credit Default Dataset SDK Tutorial 16:06:21 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:06:21 | INFO | No catalog activated. 16:06:21 | INFO | Catalog activated: Grocery Dataset Tutorial
Locate Source Tables¶
In [2]:
Copied!
ds = catalog.get_data_source()
DATABASE_NAME = "DEMO_DATASETS"
SCHEMA_NAME = "CREDIT_DEFAULT_TUTORIAL"
credit_default_full_observations = ds.get_source_table(
database_name=DATABASE_NAME,
schema_name=SCHEMA_NAME,
table_name="CREDIT_DEFAULT_FULL_OBSERVATIONS",
)
credit_default_sample_observations = ds.get_source_table(
database_name=DATABASE_NAME,
schema_name=SCHEMA_NAME,
table_name="CREDIT_DEFAULT_SAMPLE_OBSERVATIONS",
)
ds = catalog.get_data_source()
DATABASE_NAME = "DEMO_DATASETS"
SCHEMA_NAME = "CREDIT_DEFAULT_TUTORIAL"
credit_default_full_observations = ds.get_source_table(
database_name=DATABASE_NAME,
schema_name=SCHEMA_NAME,
table_name="CREDIT_DEFAULT_FULL_OBSERVATIONS",
)
credit_default_sample_observations = ds.get_source_table(
database_name=DATABASE_NAME,
schema_name=SCHEMA_NAME,
table_name="CREDIT_DEFAULT_SAMPLE_OBSERVATIONS",
)
Locate the Target, Context and Use Case, the observation tables will be linked to¶
In [3]:
Copied!
context_name = "New Loan Application (Early Stage)"
target_name = "Loan_Default"
use_case_name = "Loan Default by client"
usecase = catalog.get_use_case(use_case_name)
context_name = "New Loan Application (Early Stage)"
target_name = "Loan_Default"
use_case_name = "Loan Default by client"
usecase = catalog.get_use_case(use_case_name)
Create CREDIT_DEFAULT_TRAIN_2019_2023 table¶
In [4]:
Copied!
observation_train_table_name = "CREDIT_DEFAULT_TRAIN_2019_2023"
observation_train_table = credit_default_full_observations.create_observation_table(
name=observation_train_table_name,
sample_rows=None,
sample_from_timestamp="2019-01-01",
sample_to_timestamp="2024-01-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_train_table.update_purpose(fb.Purpose.TRAINING)
# link it to the use case
usecase.add_observation_table(observation_train_table_name)
observation_train_table_name = "CREDIT_DEFAULT_TRAIN_2019_2023"
observation_train_table = credit_default_full_observations.create_observation_table(
name=observation_train_table_name,
sample_rows=None,
sample_from_timestamp="2019-01-01",
sample_to_timestamp="2024-01-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_train_table.update_purpose(fb.Purpose.TRAINING)
# link it to the use case
usecase.add_observation_table(observation_train_table_name)
Done! |████████████████████████████████████████| 100% in 18.3s (0.06%/s) Done! |████████████████████████████████████████| 100% in 15.2s (0.07%/s)
Create CREDIT_DEFAULT_HOLDOUT_2024_1H¶
In [5]:
Copied!
observation_holdout_table_name = "CREDIT_DEFAULT_HOLDOUT_2024_1H"
observation_holdout_table = credit_default_full_observations.create_observation_table(
name=observation_holdout_table_name,
sample_rows=None,
sample_from_timestamp="2024-01-01",
sample_to_timestamp="2024-07-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_holdout_table.update_purpose(fb.Purpose.VALIDATION_TEST)
# link it to the use case
usecase.add_observation_table(observation_holdout_table_name)
observation_holdout_table_name = "CREDIT_DEFAULT_HOLDOUT_2024_1H"
observation_holdout_table = credit_default_full_observations.create_observation_table(
name=observation_holdout_table_name,
sample_rows=None,
sample_from_timestamp="2024-01-01",
sample_to_timestamp="2024-07-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_holdout_table.update_purpose(fb.Purpose.VALIDATION_TEST)
# link it to the use case
usecase.add_observation_table(observation_holdout_table_name)
Done! |████████████████████████████████████████| 100% in 12.2s (0.08%/s)
Create CREDIT_DEFAULT_EDA_2019_2023¶
In [6]:
Copied!
observation_eda_table_name = "CREDIT_DEFAULT_EDA_2019_2023"
observation_eda_table = credit_default_sample_observations.create_observation_table(
name=observation_eda_table_name,
sample_rows=None,
sample_from_timestamp="2019-01-01",
sample_to_timestamp="2024-01-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_eda_table.update_purpose(fb.Purpose.EDA)
# link it to the use case
usecase.add_observation_table(observation_eda_table_name)
# make it the default eda table
usecase.update_default_eda_table(observation_eda_table_name)
observation_eda_table_name = "CREDIT_DEFAULT_EDA_2019_2023"
observation_eda_table = credit_default_sample_observations.create_observation_table(
name=observation_eda_table_name,
sample_rows=None,
sample_from_timestamp="2019-01-01",
sample_to_timestamp="2024-01-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_eda_table.update_purpose(fb.Purpose.EDA)
# link it to the use case
usecase.add_observation_table(observation_eda_table_name)
# make it the default eda table
usecase.update_default_eda_table(observation_eda_table_name)
Done! |████████████████████████████████████████| 100% in 15.2s (0.07%/s)
Create CREDIT_DEFAULT_PREVIEW¶
In [7]:
Copied!
observation_preview_table_name = "PREVIEW_TABLE"
observation_preview_table = credit_default_sample_observations.create_observation_table(
name=observation_preview_table_name,
sample_rows=50,
sample_from_timestamp="2019-01-01",
sample_to_timestamp="2024-01-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_preview_table.update_purpose(fb.Purpose.PREVIEW)
# link it to the use case
usecase.add_observation_table(observation_preview_table_name)
# make it the default preview table
usecase.update_default_preview_table(observation_preview_table_name)
observation_preview_table_name = "PREVIEW_TABLE"
observation_preview_table = credit_default_sample_observations.create_observation_table(
name=observation_preview_table_name,
sample_rows=50,
sample_from_timestamp="2019-01-01",
sample_to_timestamp="2024-01-01",
context_name=context_name,
primary_entities=["New Application"],
target_column=target_name,
)
observation_preview_table.update_purpose(fb.Purpose.PREVIEW)
# link it to the use case
usecase.add_observation_table(observation_preview_table_name)
# make it the default preview table
usecase.update_default_preview_table(observation_preview_table_name)
Done! |████████████████████████████████████████| 100% in 15.2s (0.07%/s)
List observation tables in catalog¶
In [8]:
Copied!
catalog.list_observation_tables()
catalog.list_observation_tables()
Out[8]:
id | name | type | shape | feature_store_name | created_at | |
---|---|---|---|---|---|---|
0 | 67c2c801fbb34f46dc0daa56 | PREVIEW_TABLE | source_table | [50, 3] | playground | 2025-03-01T08:40:43.630000 |
1 | 67c2c7f0fbb34f46dc0daa55 | CREDIT_DEFAULT_EDA_2019_2023 | source_table | [49963, 3] | playground | 2025-03-01T08:40:27.363000 |
2 | 67c2c7e3fbb34f46dc0daa54 | CREDIT_DEFAULT_HOLDOUT_2024_1H | source_table | [25666, 3] | playground | 2025-03-01T08:40:12.763000 |
3 | 67c2c7cffbb34f46dc0daa53 | CREDIT_DEFAULT_TRAIN_2019_2023 | source_table | [256273, 3] | playground | 2025-03-01T08:39:58.304000 |
In [ ]:
Copied!