A Table object provides a centralized location for metadata about a source table. This metadata determines the type of operations that can be applied to the table's views and includes essential information for feature engineering.
A source table can only be associated with one active Table object in a catalog at a time. This means that the active Table object in the catalog is the source of truth for the metadata of the source table. If a Table object becomes deprecated, a new Table object can be registered with the same source table.
Before registering tables, ensure that the catalog you want to work with is active.
Select the source table you are interested in.
To create Table objects from a SourceTable object, you must use specific methods depending on the type of data contained in the source table:
create_event_table(): creates an EventTable object from a source table, where each row indicates a unique business event occurring at a particular time.
create_item_table()creates an ItemTable object from a source table containing detailed information about a specific business event.
create_dimension_table(): creates a DimensionTable object from a source table containing static descriptive data.
create_scd_table(): creates an SCDTable object from a source table containing data that changes slowly and unpredictably over time, known as a Slowly Changing Dimension (SCD) table.
Registering a table according to its type determines the types of feature engineering operations that are possible on the table's views and enforces guardrails accordingly.
Example of registering an event table using the
Implementing Default Job Settings for Consistency¶
A default feature job setting is established at the table level to help streamline the configuration of feature job settings for features and ensure consistency across features developed by different team members. For an EventTable, the default feature job setting can be initialized using an automated analysis of the table data's availability and freshness. This analysis depends on the presence of record creation timestamps in the source table that are typically included during data warehouse updates.
The initialization of the default feature job setting is done using the
ItemTable objects inherit the default feature job setting from their related EventTable objects. For Views that originate from SCDTable objects, features that require aggregation operations have a default feature job setting that executes daily, aligning with the view's creation time.
To help you manage the default feature job settings, you can perform the following actions:
- Execute a new analysis using the
create_new_feature_job_setting_analysis()method or view previous analyses using the
list_feature_job_setting_analysis()method from a EventTable object,
- Obtain an analysis using the
- Create a custom setting using the
- Perform backtests on custom settings with the
backtest()method from an analysis,
- Manually update the default feature job setting of a EventTable object using the
# Create a new analysis with a specific time period
analysis = invoice_table.create_new_feature_job_setting_analysis(
# List previous analyses
# Retrieve a specific analysis
analysis = fb.FeatureJobSettingAnalysis.get_by_id(<analysis_id>)
# Backtest a manual setting
manual_setting = fb.FeatureJobSetting(
backtest_result = analysis.backtest(feature_job_setting=manual_setting)
# Update the default feature job setting
Enhancing Feature Engineering with Metadata¶
Optionally, you can include additional metadata at the column level after creating a table to support feature engineering further.
For more details, refer to the TableColumn documentation page.
Managing Table status¶
When a table is created, it is automatically added to the active catalog with its status set to 'PUBLIC_DRAFT'. Once the table is prepared for feature engineering, you can modify its status to 'PUBLISHED'.
If a table needs to be deprecated, update its status to 'DEPRECATED'.
After deprecating a table,
Accessing a Table from the Catalog¶
You can also retrieve a Table object using its Object ID using the
Exploring a Table¶
To explore a table, you can:
- obtain detailed information using the
- acquire descriptive statistics using the
- obtain a selection of rows using the
- obtain a larger random selection of rows based on a specified time range, size, and seed using the
# Obtain detailed information on a table
# Acquire descriptive statistics for a table
# Obtain a selection of table rows
df = invoice_table.preview(limit=20)
# Obtain a random selection of table rows based on a specified time range, size, and seed
df = invoice_table.sample(
By default, the statistics and materialization are computed before applying cleaning operations defined at the table level. To include these cleaning operations, set the after_cleaning parameter to True.
Creating Views to Prepare Data Before Defining Features¶
Besides EventView, ItemView, DimensionView, and SCDView, another type of view can be created from an SCDTable: Change Views. These views provide a way to analyze changes happening in a specific attribute within the natural key of the SCD table. To get a Change view, use the