Skip to content

2. Register Tables

Our catalog is created and we can start registering tables in it.


Step 1: Select Data

We'll utilize the following four source tables of our Credit Default Dataset:

Table Description
NEW_APPLICATION Records new loan applications.
PRIOR_APPLICATIONS Contains data on prior loan applications and the final decision.
CONSUMER_LOAN_STATUS Tracks consumer loans status.
CONSUMER_INSTALLMENTS Logs monthly installments for consumer loans at the time of payment.

Two source tables are left for you to explore.

Table Description
CASH_LOAN_STATUS Tracks cash loans status.
CASH_INSTALLMENTS Logs monthly installments for cash loans at the time of payment.

Step 2: Locate Your data

From the menu, go to the Explore section and access the Source Tables.

You will find the four tables under the DEMO_DATASETS database and the CREDIT_DEFAULT schema.

Name


Step 3: Understand Table Types

For accurate feature derivation, FeatureByte needs to recognize the roles of different tables.

Each table should be assigned a specific type based on its structure and purpose:

  • NEW_APPLICATION --> Dimension table.

    Why Dimesion Table?

    While we could have registered this as an Event table, this table contains only a subset of the applications. By setting it as a Dimension Table, we disable aggregations, preventing potentially non-meaningful computations.

  • PRIOR_APPLICATIONS --> Event table.

    Why Event Table?

    The table records final decision events for prior applications, making it suitable for an Event Table designation.

  • CONSUMER_LOAN_STATUS --> Slowly Changing Dimension (SCD) table.

    Why Slowly Changing Dimension Table?

    The table tracks loan statuses and dynamic fields that change over time, making it a Slowly Changing Dimension (SCD) Table.

  • CONSUMER_INSTALLMENTS --> Time Series table.

    Why Time Series Table?

    While we could have registered this as an Event Table, installment payments occur monthly. By defining it as a Time Series Table, we ensure calendar month aggregation, aligning with the event frequency.

Note

If you are interested in a use case that exploits item table, check out the Grocery UI Tutorials.


Step 4: Register the NEW_APPLICATION table

  1. Select the NEW_APPLICATION table. Name

  2. Click on Register Table

  3. Set the table type as Dimension Table.

  4. Specify its Dimension ID Column.

  5. Specify the Record Creation Timestamp Column if applicable.


Register NEW_APPLICATION


Step 5: Register the PRIOR_APPLICATIONS table as an Event Table

  1. Select the PRIOR_APPLICATIONS table. Name

  2. Click on Register Table.

  3. Set the table type as Event Table.

  4. Identify the Event Timestamp Column.

    The Event Timestamp Column must be a UTC Timestamp or a Snowflake TIMESTAMP_TZ

    The Event Timestamp Column must be a UTC Timestamp or a Snowflake TIMESTAMP_TZ. Support for string-based datetime format and local time records will be added soon.

    Databricks cluster time zone settings

    If you are using Databricks, keep in mind that FeatureByte retrieves timestamps exactly as they are stored, without adjusting for your Databricks cluster's time zone settings.

  5. Specify the Event ID Column if applicable.

  6. Select the Event Time Zone Offset if applicable.

    Local Date Parts

    The Time Zone offset is used to extract date parts (e.g., hour of the day, weekday) in local time. Support for Daylight saving time (DST) will be added soon.

  7. Specify the Record Creation Timestamp Column if applicable.

  8. Establish a Default Feature Job Setting, either automatically (if a Record Creation Timestamp Column is provided) or manually.


Register PRIOR_APPLICATIONS


Step 6: Register the CONSUMER_LOAN_STATUS table as a SCD Table

  1. Select the CONSUMER_LOAN_STATUS table. Name

  2. Click on Register Table

  3. Set the table type as Slowly Changing Dimension Table.

  4. Identify its Natural Key Column, Surrogate Key Column and Current Flag Column if applicable.

  5. Specify the Effective Timestamp Column and its Schema. Ensure the following:

    Databricks cluster time zone settings

    If you are using Databricks, keep in mind that FeatureByte retrieves timestamps exactly as they are stored, without adjusting for your Databricks cluster's time zone settings.

  6. Specify End Timestamp Column and its Schema if applicable. Ensure the following:

  7. Specify the Record Creation Timestamp Column if applicable.


Register CONSUMER_LOAN_STATUS


Step 7: Register the CONSUMER_INSTALLMENTS table as a Time Series Table

  1. Select the CONSUMER_INSTALLMENTS table. Name

  2. Click on Register Table

  3. Set the table type as Time Series Table.

  4. Specify the Reference Datetime Column and its Schema. Ensure the following:

    Databricks cluster time zone settings

    If you are using Databricks, keep in mind that FeatureByte retrieves timestamps exactly as they are stored, without adjusting for your Databricks cluster's time zone settings.

    • If a time zone column is used to assign individual time zones per record, specify the reference time zone. This should be the westernmost time zone among those specified in the column.

      Westernmost Time Zone Example

      Suppose you have a dataset with a user_time_zone column, where users are located in different time zones such as America/New_York, America/Chicago, and America/Los_Angeles. The reference time zone should be America/Los_Angeles, as it is the westernmost among them.

  5. Specify the Series ID if applicable and the time interval.

  6. Specify the Record Creation Timestamp Column if applicable.

  7. Establish a Default Feature Job Setting compatible with the series time interval and the series data availability.


Register Tables


Step 8: Review Registered Tables

Verify the registration by checking the Table Catalog under the 'Explore' section.

Table Description