2. Register Tables
Our catalog is created and we can start registering tables in it.
Step 1: Select Data¶
We'll utilize the following four source tables of our Credit Default Dataset:
Table | Description |
---|---|
NEW_APPLICATION | Records new loan applications. |
PRIOR_APPLICATIONS | Contains data on prior loan applications and the final decision. |
CONSUMER_LOAN_STATUS | Tracks consumer loans status. |
CONSUMER_INSTALLMENTS | Logs monthly installments for consumer loans at the time of payment. |
Two source tables are left for you to explore.
Table | Description |
---|---|
CASH_LOAN_STATUS | Tracks cash loans status. |
CASH_INSTALLMENTS | Logs monthly installments for cash loans at the time of payment. |
Step 2: Locate Your data¶
From the menu, go to the Explore section and access the Source Tables.
You will find the four tables under the DEMO_DATASETS database and the CREDIT_DEFAULT schema.
Step 3: Understand Table Types¶
For accurate feature derivation, FeatureByte needs to recognize the roles of different tables.
Each table should be assigned a specific type based on its structure and purpose:
-
NEW_APPLICATION --> Dimension table.
Why Dimesion Table?
While we could have registered this as an Event table, this table contains only a subset of the applications. By setting it as a Dimension Table, we disable aggregations, preventing potentially non-meaningful computations.
-
PRIOR_APPLICATIONS --> Event table.
Why Event Table?
The table records final decision events for prior applications, making it suitable for an Event Table designation.
-
CONSUMER_LOAN_STATUS --> Slowly Changing Dimension (SCD) table.
Why Slowly Changing Dimension Table?
The table tracks loan statuses and dynamic fields that change over time, making it a Slowly Changing Dimension (SCD) Table.
-
CONSUMER_INSTALLMENTS --> Time Series table.
Why Time Series Table?
While we could have registered this as an Event Table, installment payments occur monthly. By defining it as a Time Series Table, we ensure calendar month aggregation, aligning with the event frequency.
Note
If you are interested in a use case that exploits item table, check out the Grocery UI Tutorials.
Step 4: Register the NEW_APPLICATION table¶
-
Select the NEW_APPLICATION table.
-
Click on
-
Set the table type as Dimension Table.
-
Specify its Dimension ID Column.
-
Specify the Record Creation Timestamp Column if applicable.
Step 5: Register the PRIOR_APPLICATIONS table as an Event Table¶
-
Select the PRIOR_APPLICATIONS table.
-
Click on
.
-
Set the table type as Event Table.
-
Identify the Event Timestamp Column.
The Event Timestamp Column must be a UTC Timestamp or a Snowflake TIMESTAMP_TZ
The Event Timestamp Column must be a UTC Timestamp or a Snowflake TIMESTAMP_TZ. Support for string-based datetime format and local time records will be added soon.
Databricks cluster time zone settings
If you are using Databricks, keep in mind that FeatureByte retrieves timestamps exactly as they are stored, without adjusting for your Databricks cluster's time zone settings.
-
Specify the Event ID Column if applicable.
-
Select the Event Time Zone Offset if applicable.
Local Date Parts
The Time Zone offset is used to extract date parts (e.g., hour of the day, weekday) in local time. Support for Daylight saving time (DST) will be added soon.
-
Specify the Record Creation Timestamp Column if applicable.
-
Establish a Default Feature Job Setting, either automatically (if a Record Creation Timestamp Column is provided) or manually.
Step 6: Register the CONSUMER_LOAN_STATUS table as a SCD Table¶
-
Select the CONSUMER_LOAN_STATUS table.
-
Click on
-
Set the table type as Slowly Changing Dimension Table.
-
Identify its Natural Key Column, Surrogate Key Column and Current Flag Column if applicable.
-
Specify the Effective Timestamp Column and its Schema. Ensure the following:
- If the column is recorded as a string, specify its string-based datetime format.
-
Indicate whether the Effective Timestamp is recorded in UTC or local time.
- If recorded in local time, you must specify its time zone component.
Databricks cluster time zone settings
If you are using Databricks, keep in mind that FeatureByte retrieves timestamps exactly as they are stored, without adjusting for your Databricks cluster's time zone settings.
-
Specify End Timestamp Column and its Schema if applicable. Ensure the following:
- If the column is recorded as a string, specify its string-based datetime format.
-
Indicate whether the End Timestamp is recorded in UTC or local time.
- If recorded in local time, you must specify its time zone component.
-
Specify the Record Creation Timestamp Column if applicable.
Step 7: Register the CONSUMER_INSTALLMENTS table as a Time Series Table¶
-
Select the CONSUMER_INSTALLMENTS table.
-
Click on
-
Set the table type as Time Series Table.
-
Specify the Reference Datetime Column and its Schema. Ensure the following:
- If the column is recorded as a string, specify its string-based datetime format.
- Indicate whether the Reference Datetime is recorded in UTC or local time.
- If recorded in local time, specify its time zone component.
- If recorded in UTC, specify the time zone component to convert it to local time.
Databricks cluster time zone settings
If you are using Databricks, keep in mind that FeatureByte retrieves timestamps exactly as they are stored, without adjusting for your Databricks cluster's time zone settings.
-
If a time zone column is used to assign individual time zones per record, specify the reference time zone. This should be the westernmost time zone among those specified in the column.
Westernmost Time Zone Example
Suppose you have a dataset with a
user_time_zone
column, where users are located in different time zones such asAmerica/New_York
,America/Chicago
, andAmerica/Los_Angeles
. The reference time zone should beAmerica/Los_Angeles
, as it is the westernmost among them.
-
Specify the Series ID if applicable and the time interval.
-
Specify the Record Creation Timestamp Column if applicable.
-
Establish a Default Feature Job Setting compatible with the series time interval and the series data availability.
Step 8: Review Registered Tables¶
Verify the registration by checking the Table Catalog under the 'Explore' section.