An SCDTable object represents a source table in the data warehouse that contains data that changes slowly and unpredictably over time. This table is commonly referred as a Slowly Changing Dimension (SCD) table.
There are two main types of SCDs: Type 1, which overwrites old data with new data, and Type 2, which maintains a history of changes by creating a new record for each change. FeatureByte only supports the use of Type 2 SCDs since SCDs of Type 1 may cause data leaks during model training and poor performance during inference.
An SCD table of Type 2 utilizes a natural key to distinguish each active row and facilitate tracking of changes over time. The SCD table employs effective and end (or expiration) timestamp columns to determine the active status of a row. In certain instances, an active flag column may replace the expiration timestamp column to indicate if a row is currently active.
SCDTable objects are created from a SourceTable object via the create_scd_table method, and by identifying the columns representing the columns representing the natural key, the effective timestamp, and optionally the surrogate key, the end timestamp, and the active flag.
After creation, the table can optionally incorporate additional metadata at the column level to further aid feature engineering. This can include identifying columns that identify or reference entities, providing information about the semantics of the table columns, specifying default cleaning operations, or furnishing descriptions of its columns.
- create_scd_table: create SCD table from source table