Skip to content

ObservationTable

An ObservationTable object is a representation of an observation set in the feature store. It combines historical points-in-time and entity values to make historical feature requests, usually for training and testing machine learning applications.

Creating ObservationTable Objects

To create an ObservationTable object, you have have multiple options: you can upload a CSV or Parquet file directy, or alternatively, you can utilize either a SourceTable object or a View object.

Note

The column with entity values must use an accepted serving name.

The column containing points-in-time must be labelled "POINT_IN_TIME" and should contain UTC timestamps.

For forecast use cases, the observation table must also include a "FORECAST_POINT" column representing the future date/time being predicted for. The data type of this column must match the ForecastPointSchema defined in the associated Context. If the schema references a timezone column, the observation table must include a column with the same name as specified in the schema.

If the Context defines user-provided columns via UserProvidedColumn, the observation table must include columns matching those names and data types.

Uploading a file:

To upload a file:

  1. Use the upload() method with the file path, table name, purpose, and primary entities specified. This file can be either a CSV or a Parquet file.
  2. Ensure the file's column names include "POINT-IN-TIME" and the accepted serving names for primary entities.
observation_table = fb.ObservationTable.upload(
    file_path="path/to/csv/file.csv",
    name=<observation_table_name>,
    purpose=fb.Purpose.PREVIEW,
    primary_entities=[<primary_entity_name>],
)

Creating from a SourceTable object:

To create an ObservationTable object from a SourceTable object:

  1. Select the source table from the feature store.
    ds = fb.FeatureStore.get(<feature_store_name>).get_data_source()
    source_table = ds.get_source_table(
        database_name=<data_base_name>,
        schema_name=<schema_name>,
        table_name=<table_name>
    )
    
  2. Use the create_observation_table() method, specifying columns, renaming if necessary, sample size, and table name.
    observation_table = source_table.create_observation_table(
        name=<observation_table_name>,
        sample_rows=<desired_sample_size>,
        columns=[<timestamp_column_name>, <entity_column_name>],
        columns_rename_mapping={
            <timestamp_column_name>: "POINT_IN_TIME",
            <entity_column_name>: <entity_serving_name>,
        },
        primary_entities=[<primary_entity_name>],
    )
    

Creating from a View object:

To create an ObservationTable object from a View object:

  1. Use the create_observation_table() method from the View object with similar parameters as above.
    observation_table = view.create_observation_table(
        name=<observation_table_name>,
        sample_rows=<desired_sample_size>,
        columns=[<timestamp_column_name>, <entity_column_name>],
        columns_rename_mapping={
            <timestamp_column_name>: "POINT_IN_TIME",
            <entity_column_name>: <entity_serving_name>,
        },
        primary_entities=[<primary_entity_name>],
    )
    

Additional Operations:

  • Download the table by using the download() method:

    observation_table.download()
    

  • Convert the table to a Pandas DataFrame by using the to_pandas() method:

    observation_table.to_pandas()
    

  • Delete the table, if not needed, with delete() method.

    observation_table.delete()
    

Forecast Point in Observation Tables

When an observation table is created for a forecast Context (one with a ForecastPointSchema), FeatureByte automatically computes and stores additional metadata:

  • Most/Least Recent Forecast Point: The range of forecast point values in the table
  • Forecast Horizon: The maximum time span between POINT_IN_TIME and FORECAST_POINT, expressed in the schema's granularity (e.g., "7 DAY")
  • Forecast Timezone Column: Whether a separate timezone column is present

FeatureByte also validates that:

  • The FORECAST_POINT column data type matches the schema's expected dtype
  • If the schema uses VARCHAR with a format_string, all values parse correctly
  • If a timezone column is referenced, all timezone values are valid IANA timezone names or UTC offsets

Linking an Observation Table to a Context

After creating an Observation Table, it can be linked to a Context to facilitate its reuse using the add_observation_table() method.

context = catalog.get_context("context")
context.add_observation_table(<observation_table_name>)

You can also define an observation table to be used as the default preview / eda table for the Context using the update_default_eda_table() and update_default_preview_table() methods.

context.update_default_eda_table(<observation_table_name>)
context.update_default_preview_table(<observation_table_name>)

Finally, you can list observation tables associated with the Context using the list_observation_tables() method.

context.list_observation_tables()

Adding Target values to an Observation Table

Follow these steps to add target values to an observation table:

  1. First get the relevant Target object:

    my_target = catalog.get_target(<target_name>)
    

  2. Then use its compute_target_table() method to return a new ObservationTable object that includes target values. This method also stores the new table:

    observation_table_with_target = my_target.compute_target_table(
        observation_table,
        observation_table_name='Customer Purchase next 2w'
    )
    

Note

For forecast observation tables (those with a FORECAST_POINT column), the target is computed relative to the forecast point rather than the point-in-time.

This will automatically associate the Observation Table with the Use Case linked to the source Observation Table's Context and the Target.

If needed, the table can be manually linked to a Use Case. To do this, use the add_observation_table() method.

use_case = catalog.get_use_case("Credit Card Fraud Detection")
use_case.add_observation_table(<observation_table_name>)

Updating the Purpose of an Observation Table

To update the purpose of an ObservationTable object, use the update_purpose() method:

observation_table_with_target.update_purpose("training")

To get the purpose of an ObservationTable object, use the purpose property:

observation_table_with_target.update_purpose("training")

Splitting an Observation Table

Use the split() method to divide an observation table into non-overlapping subsets for training and evaluation. The splits are determined using a seeded random assignment, ensuring reproducibility.

Split into train (70%) and test (30%) sets:

train_table, test_table = observation_table.split(
    split_ratios=[0.7, 0.3],
    names=["train_data", "test_data"],
)

Split into train (60%), validation (20%), and test (20%) sets:

train, val, test = observation_table.split(
    split_ratios=[0.6, 0.2, 0.2],
    names=["train_data", "validation_data", "test_data"],
    seed=42,
)

The first split is automatically assigned the TRAINING purpose, while subsequent splits are assigned VALIDATION_TEST. Custom names are optional — if omitted, they are auto-generated from the source table name.

Listing and Retrieving ObservationTable Objects

To list the ObservationTable objects in the catalog, use the list_observation_tables() method:

catalog.list_observation_tables()

To retrieve a specific ObservationTable by its name from the catalog, use the get_observation_table() method:

observation_table = catalog.get_observation_table(<observation_table_name>)

To retrieve a specific ObservationTable by its Object ID from the catalog, use the get_observation_table_by_id() method:

observation_table = catalog.get_observation_table_by_id(<observation_table_id>)