7. Create Observation Tables
What is an Observation Table?
An Observation Table is a structured collection of historical data points that acts as the foundation for training datasets. By adding features, you can create Feature Tables that can be used to train and validate Machine Learning models.
Each data point represents a specific historical moment for a particular entity and may also include target values. Observation Tables are often utilized across experiments within the same use case, even if selected features and models vary.
How to create an Observation Table?
You can either upload an Observation Table from a parquet or csv file or create it from a Source Table.
This guide explains how to configure Observation Tables from a Source Table and link them to our Credit Default context and use case.
We will create four Observation Tables:
- CREDIT_DEFAULT_TRAIN_2019_2023: Credit Default Observations for training ranging from 2019 to 2023.
- CREDIT_DEFAULT_HOLDOUT_2024_1H: Credit Default Observations for testing ranging from 2024/01/01 to 2024/07/01.
- CREDIT_DEFAULT_EDA_2019_2023: Credit Default Observations for EDA ranging from 2019 to 2023.
- PREVIEW_TABLE: 50 Credit Default Observations for Feature Preview.
Check out
For an example how to upload an Observation Table, check out the Grocery UI Tutorial. The tutorial also covers how to add Target values when your target has been registered with a logical approach.
Step 1: Navigate to Observation Table Catalog¶
From the menu, navigate to the 'Formulate' section and select the Observation Table catalog.
Step 2: Create Observation Tables from a Source Table¶
- Click
.
-
Select 'Derive from Source Table' tab and Click
-
In the Source Table catalog, we will use CREDIT_DEFAULT_FULL_OBSERVATIONS and CREDIT_DEFAULT_SAMPLE_OBSERVATIONS under the DEMO_DATASETS database and the CREDIT_DEFAULT schema and click
.
Step 2a: Create CREDIT_DEFAULT_TRAIN_2019_2023 table:¶
-
In the Source Table catalog, select CREDIT_DEFAULT_FULL_OBSERVATIONS and click
.
-
Set the training observation table as follows:
- Name: "CREDIT_DEFAULT_TRAIN_2019_2023"
- Description: "Credit Default Observations for training ranging from 2019 to 2023."
- Purpose: Training
- Sample Rows: 0
- Primary Entity: New Application
- Sampling Date Range: January 1, 2019 - January 1, 2024
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: NEW_APPLICATION_ID --> New Column Name: NEW_APPLICATION_ID
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
Disable sampling
To disable sampling, set the Sample Rows to 0.
-
Ensure Loan_Default is identified as the target.
-
Click
to save the table.
Step 2b: Create CREDIT_DEFAULT_HOLDOUT_2024_1H table:¶
-
In the Source Table catalog, select CREDIT_DEFAULT_FULL_OBSERVATIONS and click
.
-
Set the holdout observation table as follows:
- Name: "CREDIT_DEFAULT_HOLDOUT_2024_1H"
- Description: "Credit Default Observations for testing - 2024 1st semester."
- Purpose: Validation-Test
- Sample Rows: 0
- Primary Entity: New Application
- Sampling Date Range: January 1, 2024 - July 1, 2024
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: NEW_APPLICATION_ID --> New Column Name: NEW_APPLICATION_ID
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
-
Click
to save the table.
Step 2c: Create CREDIT_DEFAULT_EDA_2019_2023 table:¶
-
In the Source Table catalog, select CREDIT_DEFAULT_SAMPLE_OBSERVATIONS and click
.
-
Set the EDA observation table as follows:
- Name: "CREDIT_DEFAULT_EDA_2019_2023"
- Description: "Credit Default Observations for EDA ranging from 2019 to 2023."
- Purpose: EDA
- Sample Rows: 0
- Primary Entity: New Application
- Sampling Date Range: January 1, 2019 - January 1, 2024
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: NEW_APPLICATION_ID --> New Column Name: NEW_APPLICATION_ID
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
-
Click
to save the table.
Step 2d: Create CREDIT_DEFAULT_PREVIEW table:¶
-
In the Source Table catalog, select CREDIT_DEFAULT_SAMPLE_OBSERVATIONS and click
.
-
Set the preview observation table as follows:
- Name: "CREDIT_DEFAULT_PREVIEW"
- Description: "50 Credit Default Observations for preview."
- Purpose: Preview
- Sample Rows: 50
- Primary Entity: New Application
- Sampling Date Range: January 1, 2019 - January 1, 2024
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: NEW_APPLICATION_ID --> New Column Name: NEW_APPLICATION_ID
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
-
Click
to save the table.
Step 3: Link Observation Tables to a Context¶
-
Navigate to the Context Catalog and select the "New Loan Application (Early Stage)" context.
-
In the 'About' tab, click
under the Observation tables section and select one of the four Observation Tables. Confirm selection by clicking
.
-
Repeat it for each of the four Observation Tables.
Step 3: Link Observation Tables to a Use Case¶
-
Navigate to the Use Case Catalog and select the "Loan Default by client" use case.
-
In the 'About' tab, click
under the Observation tables section and select one of the four Observation Tables. Confirm selection by clicking
.
-
Repeat it for each of the four Observation Tables.
-
Set "CREDIT_DEFAULT_EDA_2019_2023" as the EDA Table.
-
Set "PREVIEW_TABLE" as the Preview Table.
Step 5: Check Observation Tables¶
Check successful registration by reviewing the Use Catalog.
Check successful registration by reviewing the Observation Table Catalog.