2. Register Tables
Our catalog is created and we can start registering tables in it.
Step 1: Select Data¶
We'll work with the following seven source tables of our Loan Applications Dataset:
Table | Description |
---|---|
NEW_APPLICATION | Contains information about new loan applications submitted by clients. |
CLIENT_PROFILE | Describes the profile for each client. |
BUREAU | Lists credits taken by clients from other financial institutions, as reported to the credit bureau. |
PREVIOUS_APPLICATION | Details prior loan applications by the client. |
INSTALLMENTS_PAYMENTS | Logs monthly installments for loans at the time of payment. |
LOAN_STATUS | Tracks whether loan was terminated. |
CREDIT_CARD_MONTHLY_BALANCE | Monthly balance summaries for credit cards |
Step 2: Locate Your data¶
From the menu, go to the Explore section and access the Source Tables.
Go to the DEMO_DATASETS database, and select the LOAN_APPLICATIONS schema.
Step 3: Explore Tables¶
To help you get familiar with the data in the LOAN_APPLICATIONS schema, select tables you are interested in.
Click to generate a summary for each table. Click
to read full summary.
Click one table to view its columns and preview its values.
Step 4: Understand Table Types¶
For accurate feature derivation, FeatureByte needs to recognize the roles of different tables.
Each table should be assigned a specific type based on its structure and purpose:
-
NEW_APPLICATION --> Dimension table.
Why Dimesion Table?
While we could have registered this as an Event table, this table contains only a subset of the applications. By setting it as a Dimension Table, we disable aggregations, preventing potentially non-meaningful computations.
-
CLIENT_PROFILE --> Slowly Changing Dimension (SCD) table.
Why Slowly Changing Dimension Table?
The table tracks the client profile that changes over time, making it a Slowly Changing Dimension (SCD) Table.
-
BUREAU --> Event table.
Why Event Table?
The table captures event-based updates from the credit bureau regarding external credit activities.
-
PREVIOUS_APPLICATION --> Event table.
Why Event Table?
The table records events tied to decisions made on previous loan applications.
-
LOAN_STATUS --> Slowly Changing Dimension (SCD) table.
Why Slowly Changing Dimension Table?
The table tracks loan statuses and dynamic fields that change over time, making it a Slowly Changing Dimension (SCD) Table.
-
INSTALLMENTS_PAYMENTS --> Event table.
Why Event Table?
The table logs discrete payment events for loan installments.
-
CREDIT_CARD_MONTHLY_BALANCE --> Time Series table.
Why Time Series Table?
The table provides monthly snapshots of credit cards usage, aligning well with time series.
Note
If you are interested in a use case that exploits item table, check out the Grocery UI Tutorials.
Step 5: Register the NEW_APPLICATION table¶
-
Select the NEW_APPLICATION table.
-
Click on
to start registration.
-
Set the table type as Dimension Table.
-
Specify its Dimension ID Column. Here:
SK_ID_CURR
. -
Specify the Record Creation Timestamp Column if applicable. Here:
available_at
. -
Complete registration by clicking
.
Step 6: Register the CLIENT_PROFILE table as a SCD Table¶
-
Select the CLIENT_PROFILE table.
-
Click on
-
Click on
to get assistance.
-
Accept suggestions to:
- Set the table type as Slowly Changing Dimension Table.
- Identify its Natural Key Column, Surrogate Key Column and Current Flag Column if applicable.
-
Specify the Effective Timestamp Column and its Schema. Ensure the following:
- If the column is recorded as a string, specify its string-based datetime format.
-
Indicate whether the Effective Timestamp is recorded in UTC or local time.
- If recorded in local time, you must specify its time zone component.
-
Specify End Timestamp Column and its Schema if applicable. Ensure the following:
- If the column is recorded as a string, specify its string-based datetime format.
-
Indicate whether the End Timestamp is recorded in UTC or local time.
- If recorded in local time, you must specify its time zone component.
-
Specify the Record Creation Timestamp Column if applicable.
Databricks cluster time zone settings
If you are using Databricks, keep in mind that FeatureByte retrieves timestamps exactly as they are stored, without adjusting for your Databricks cluster's time zone settings.
-
Complete registration by clicking
.
Step 7: Register the BUREAU table as an Event Table¶
-
Select the BUREAU table.
-
Click on
.
-
Click on
to get assistance.
-
Accept suggestions to:
-
Set the table type as Event Table.
-
Identify the Event Timestamp Column and confirm its schema.
Local Date Parts
The Time Zone is used to extract date parts (e.g., hour of the day, weekday) in local time.
-
Specify the Event ID Column if applicable.
-
Specify the Record Creation Timestamp Column if applicable.
-
-
Establish a Default Feature Job Setting, either automatically (if a Record Creation Timestamp Column is provided) or manually. We will set the job automatically by clicking
.
-
Complete registration by clicking
.
Step 8: Register the PREVIOUS_APPLICATION table as an Event Table¶
-
Select the PREVIOUS_APPLICATION table.
-
Click on
.
-
Click on
to get assistance.
-
Accept suggestions:
-
Establish a Default Feature Job Setting, either automatically (if a Record Creation Timestamp Column is provided) or manually. We will set the job automatically by clicking
.
-
Complete registration by clicking
.
Step 9: Register the LOAN_STATUS table as a SCD Table¶
-
Select the LOAN_STATUS table.
-
Click on
-
Click on
to get assistance.
-
Accept suggestions:
-
Complete registration by clicking
.
Step 10: Register the INSTALLMENTS_PAYMENTS table as an Event Table¶
-
Select the INSTALLMENTS_PAYMENTS table.
-
Click on
.
-
Click on
to get assistance.
-
Accept suggestions:
-
Establish a Default Feature Job Setting. We will set the job manually as a CRON job with a blind spot of 1 hour as follows:
-
Complete registration by clicking
.
Why a CRON job?
CRON feature job enables calendar windows which are suitable for installments that are monthly events.
Step 11: Register the CREDIT_CARD_MONTHLY_BALANCE table as a Time Series Table¶
-
Select the CREDIT_CARD_MONTHLY_BALANCE table.
-
Click on
-
Click on
to get assistance.
-
Accept suggestions except for the reference time zone to:
- Set the table type as Time Series Table.
-
Specify the Reference Datetime Column and its Schema. Ensure the following:
- If the column is recorded as a string, specify its string-based datetime format.
-
Indicate whether the Reference Datetime is recorded in UTC or local time.
- If recorded in local time, specify its time zone component.
- If recorded in UTC, specify the time zone component to convert it to local time.
-
If a time zone column is used to assign individual time zones per record, specify the reference time zone. This should be the westernmost time zone among those specified in the column.
Westernmost Time Zone Example
Suppose you have a dataset with a
user_time_zone
column, where users are located in different time zones such asAmerica/New_York
,America/Chicago
, andAmerica/Los_Angeles
. The reference time zone should beAmerica/Los_Angeles
, as it is the westernmost among them.
-
Specify the Series ID if applicable and the time interval.
-
Specify the Record Creation Timestamp Column if applicable.
-
Set the reference time zone to America/Los Angeles:
-
Establish a Default Feature Job Setting compatible with the series time interval and the series data availability.
-
Complete registration by clicking
.
Step 12: Review Registered Tables¶
Verify the registration by checking the Table Catalog under the 'Explore' section.