ObservationTable

An ObservationTable object is a representation of an observation set in the feature store. It combines historical points-in-time and entity values to make historical feature requests, usually for training and testing machine learning applications.

Creating ObservationTable Objects¶

To create an ObservationTable object, you have have multiple options: you can upload a CSV or Parquet file directy, or alternatively, you can utilize either a SourceTable object or a View object.

Note

The column with entity values must use an accepted serving name.

The column containing points-in-time must be labelled "POINT_IN_TIME" and should contain UTC timestamps.

For forecast use cases, the observation table must also include a "FORECAST_POINT" column representing the future date/time being predicted for. The data type of this column must match the ForecastPointSchema defined in the associated Context. If the schema references a timezone column, the observation table must include a column with the same name as specified in the schema.

If the Context defines user-provided columns via UserProvidedColumn, the observation table must include columns matching those names and data types.

Uploading a file:

To upload a file:

Use the upload() method with the file path, table name, purpose, and primary entities specified. This file can be either a CSV or a Parquet file.
Ensure the file's column names include "POINT-IN-TIME" and the accepted serving names for primary entities.

observation_table = fb.ObservationTable.upload(
    file_path="path/to/csv/file.csv",
    name=<observation_table_name>,
    purpose=fb.Purpose.PREVIEW,
    primary_entities=[<primary_entity_name>],
)

Creating from a SourceTable object:

To create an ObservationTable object from a SourceTable object:

Select the source table from the feature store.

ds = fb.FeatureStore.get(<feature_store_name>).get_data_source()
source_table = ds.get_source_table(
    database_name=<data_base_name>,
    schema_name=<schema_name>,
    table_name=<table_name>
)

Use the create_observation_table() method, specifying columns, renaming if necessary, sample size, and table name.

observation_table = source_table.create_observation_table(
    name=<observation_table_name>,
    sample_rows=<desired_sample_size>,
    columns=[<timestamp_column_name>, <entity_column_name>],
    columns_rename_mapping={
        <timestamp_column_name>: "POINT_IN_TIME",
        <entity_column_name>: <entity_serving_name>,
    },
    primary_entities=[<primary_entity_name>],
)

Creating from a View object:

To create an ObservationTable object from a View object:

Use the create_observation_table() method from the View object with similar parameters as above.

observation_table = view.create_observation_table(
    name=<observation_table_name>,
    sample_rows=<desired_sample_size>,
    columns=[<timestamp_column_name>, <entity_column_name>],
    columns_rename_mapping={
        <timestamp_column_name>: "POINT_IN_TIME",
        <entity_column_name>: <entity_serving_name>,
    },
    primary_entities=[<primary_entity_name>],
)

Additional Operations:

Download the table by using the download() method:
```
observation_table.download()
```
Convert the table to a Pandas DataFrame by using the to_pandas() method:
```
observation_table.to_pandas()
```
Delete the table, if not needed, with delete() method.
```
observation_table.delete()
```

Forecast Point in Observation Tables¶

When an observation table is created for a forecast Context (one with a ForecastPointSchema), FeatureByte automatically computes and stores additional metadata:

Most/Least Recent Forecast Point: The range of forecast point values in the table
Forecast Horizon: The maximum time span between POINT_IN_TIME and FORECAST_POINT, expressed in the schema's granularity (e.g., "7 DAY")
Forecast Timezone Column: Whether a separate timezone column is present

FeatureByte also validates that:

The FORECAST_POINT column data type matches the schema's expected dtype
If the schema uses VARCHAR with a format_string, all values parse correctly
If a timezone column is referenced, all timezone values are valid IANA timezone names or UTC offsets

Linking an Observation Table to a Context¶

After creating an Observation Table, it can be linked to a Context to facilitate its reuse using the add_observation_table() method.

context = catalog.get_context("context")
context.add_observation_table(<observation_table_name>)

You can also define an observation table to be used as the default preview / eda table for the Context using the update_default_eda_table() and update_default_preview_table() methods.

context.update_default_eda_table(<observation_table_name>)

context.update_default_preview_table(<observation_table_name>)

Finally, you can list observation tables associated with the Context using the list_observation_tables() method.

context.list_observation_tables()

Adding Target values to an Observation Table¶

Follow these steps to add target values to an observation table:

First get the relevant Target object:

my_target = catalog.get_target(<target_name>)

Then use its compute_target_table() method to return a new ObservationTable object that includes target values. This method also stores the new table:

observation_table_with_target = my_target.compute_target_table(
    observation_table,
    observation_table_name='Customer Purchase next 2w'
)

Note

For forecast observation tables (those with a FORECAST_POINT column), the target is computed relative to the forecast point rather than the point-in-time.

This will automatically associate the Observation Table with the Use Case linked to the source Observation Table's Context and the Target.

If needed, the table can be manually linked to a Use Case. To do this, use the add_observation_table() method.

use_case = catalog.get_use_case("Credit Card Fraud Detection")
use_case.add_observation_table(<observation_table_name>)

Updating the Purpose of an Observation Table¶

To update the purpose of an ObservationTable object, use the update_purpose() method:

observation_table_with_target.update_purpose("training")

To get the purpose of an ObservationTable object, use the purpose property:

observation_table_with_target.update_purpose("training")

Splitting an Observation Table¶

Use the split() method to divide an observation table into non-overlapping subsets for training and evaluation. The splits are determined using a seeded random assignment, ensuring reproducibility.

Split into train (70%) and test (30%) sets:

train_table, test_table = observation_table.split(
    split_ratios=[0.7, 0.3],
    names=["train_data", "test_data"],
)

Split into train (60%), validation (20%), and test (20%) sets:

train, val, test = observation_table.split(
    split_ratios=[0.6, 0.2, 0.2],
    names=["train_data", "validation_data", "test_data"],
    seed=42,
)

The first split is automatically assigned the TRAINING purpose, while subsequent splits are assigned VALIDATION_TEST. Custom names are optional — if omitted, they are auto-generated from the source table name.

Listing and Retrieving ObservationTable Objects¶

To list the ObservationTable objects in the catalog, use the list_observation_tables() method:

catalog.list_observation_tables()

To retrieve a specific ObservationTable by its name from the catalog, use the get_observation_table() method:

observation_table = catalog.get_observation_table(<observation_table_name>)

To retrieve a specific ObservationTable by its Object ID from the catalog, use the get_observation_table_by_id() method:

observation_table = catalog.get_observation_table_by_id(<observation_table_id>)