Skip to content

featurebyte.SourceTable.create_dimension_table

create_dimension_table(
name: str,
dimension_id_column: str,
record_creation_timestamp_column: Union[str, NoneType]=None
) -> DimensionTable

Description

Creates and adds to the catalog a DimensionTable object from a source table that holds static descriptive information.

To create a Dimension Table, you need to identify the column representing the primary key of the source table (dimension_id_column).

After creation, the table can optionally incorporate additional metadata at the column level to further aid feature engineering. This can include identifying columns that identify or reference entities, providing information about the semantics of the table columns, specifying default cleaning operations, or furnishing descriptions of its columns.

Note that using a Dimension table requires special attention. If the data in the table changes slowly, it is not advisable to use it because these changes can cause significant data leaks during model training and adversely affect the inference performance. In such cases, it is recommended to use a Type 2 Slowly Changing Dimension table that maintains a history of changes.

Parameters

  • name: str
    The desired name for the new table.

  • dimension_id_column: str
    The column that serves as the primary key, uniquely identifying each record in the table.

  • record_creation_timestamp_column: Union[str, NoneType]
    The optional column for the timestamp when a record was created.

Returns

  • DimensionTable
    DimensionTable created from the source table.

Examples

Create a dimension table from a source table.

>>> # Register GroceryProduct as a dimension table
>>> source_table = ds.get_table(
...   database_name="spark_catalog",
...   schema_name="GROCERY",
...   table_name="GROCERYPRODUCT"
... )
>>> product_table = source_table.create_dimension_table(
...   name="GROCERYPRODUCT",
...   dimension_id_column="GroceryProductGuid"
... )