ObservationTable
An ObservationTable object is a representation of an observation set in the feature store. It combines historical points-in-time and entity values to make historical feature requests, usually for training and testing machine learning applications.
Creating ObservationTable Objects¶
To create an ObservationTable object, you have have multiple options: you can upload a CSV or Parquet file directy, or alternatively, you can utilize either a SourceTable object or a View object.
Note
The column with entity values must use an accepted serving name.
The column containing points-in-time must be labelled "POINT_IN_TIME" and should contain UTC timestamps.
For forecast use cases, the observation table must also include a "FORECAST_POINT" column representing the future date/time being predicted for. The data type of this column must match the ForecastPointSchema defined in the associated Context. If the schema references a timezone column, the observation table must include a column with the same name as specified in the schema.
If the Context defines user-provided columns via UserProvidedColumn, the observation table must include columns matching those names and data types.
Uploading a file:
To upload a file:
- Use the
upload()method with the file path, table name, purpose, and primary entities specified. This file can be either a CSV or a Parquet file. - Ensure the file's column names include "POINT-IN-TIME" and the accepted serving names for primary entities.
observation_table = fb.ObservationTable.upload(
file_path="path/to/csv/file.csv",
name=<observation_table_name>,
purpose=fb.Purpose.PREVIEW,
primary_entities=[<primary_entity_name>],
)
Creating from a SourceTable object:
To create an ObservationTable object from a SourceTable object:
- Select the source table from the feature store.
- Use the
create_observation_table()method, specifying columns, renaming if necessary, sample size, and table name.observation_table = source_table.create_observation_table( name=<observation_table_name>, sample_rows=<desired_sample_size>, columns=[<timestamp_column_name>, <entity_column_name>], columns_rename_mapping={ <timestamp_column_name>: "POINT_IN_TIME", <entity_column_name>: <entity_serving_name>, }, primary_entities=[<primary_entity_name>], )
Creating from a View object:
To create an ObservationTable object from a View object:
- Use the
create_observation_table()method from the View object with similar parameters as above.observation_table = view.create_observation_table( name=<observation_table_name>, sample_rows=<desired_sample_size>, columns=[<timestamp_column_name>, <entity_column_name>], columns_rename_mapping={ <timestamp_column_name>: "POINT_IN_TIME", <entity_column_name>: <entity_serving_name>, }, primary_entities=[<primary_entity_name>], )
Additional Operations:
-
Download the table by using the
download()method: -
Convert the table to a Pandas DataFrame by using the
to_pandas()method: -
Delete the table, if not needed, with
delete()method.
Forecast Point in Observation Tables¶
When an observation table is created for a forecast Context (one with a ForecastPointSchema), FeatureByte automatically computes and stores additional metadata:
- Most/Least Recent Forecast Point: The range of forecast point values in the table
- Forecast Horizon: The maximum time span between
POINT_IN_TIMEandFORECAST_POINT, expressed in the schema's granularity (e.g., "7 DAY") - Forecast Timezone Column: Whether a separate timezone column is present
FeatureByte also validates that:
- The
FORECAST_POINTcolumn data type matches the schema's expected dtype - If the schema uses
VARCHARwith aformat_string, all values parse correctly - If a timezone column is referenced, all timezone values are valid IANA timezone names or UTC offsets
Linking an Observation Table to a Context¶
After creating an Observation Table, it can be linked to a Context to facilitate its reuse using the add_observation_table() method.
You can also define an observation table to be used as the default preview / eda table for the Context using the update_default_eda_table() and update_default_preview_table() methods.
Finally, you can list observation tables associated with the Context using the list_observation_tables() method.
Adding Target values to an Observation Table¶
Follow these steps to add target values to an observation table:
-
First get the relevant Target object:
-
Then use its
compute_target_table()method to return a new ObservationTable object that includes target values. This method also stores the new table:
Note
For forecast observation tables (those with a FORECAST_POINT column), the target is computed relative to the forecast point rather than the point-in-time.
This will automatically associate the Observation Table with the Use Case linked to the source Observation Table's Context and the Target.
If needed, the table can be manually linked to a Use Case. To do this, use the add_observation_table() method.
use_case = catalog.get_use_case("Credit Card Fraud Detection")
use_case.add_observation_table(<observation_table_name>)
Updating the Purpose of an Observation Table¶
To update the purpose of an ObservationTable object, use the update_purpose() method:
To get the purpose of an ObservationTable object, use the purpose property:
Splitting an Observation Table¶
Use the split() method to divide an observation table into non-overlapping subsets for training and evaluation. The splits are determined using a seeded random assignment, ensuring reproducibility.
Split into train (70%) and test (30%) sets:
train_table, test_table = observation_table.split(
split_ratios=[0.7, 0.3],
names=["train_data", "test_data"],
)
Split into train (60%), validation (20%), and test (20%) sets:
train, val, test = observation_table.split(
split_ratios=[0.6, 0.2, 0.2],
names=["train_data", "validation_data", "test_data"],
seed=42,
)
The first split is automatically assigned the TRAINING purpose, while subsequent splits are assigned VALIDATION_TEST. Custom names are optional — if omitted, they are auto-generated from the source table name.
Listing and Retrieving ObservationTable Objects¶
To list the ObservationTable objects in the catalog, use the list_observation_tables() method:
To retrieve a specific ObservationTable by its name from the catalog, use the get_observation_table() method:
To retrieve a specific ObservationTable by its Object ID from the catalog, use the get_observation_table_by_id() method: