7. Create Observation Tables
What is an Observation Table?
An Observation Table is a structured collection of historical data points that acts as the foundation for training datasets. By adding features, you can create Feature Tables that can be used to train and validate Machine Learning models.
Each data point represents a specific historical moment for a particular entity and may also include target values. Observation Tables are often utilized across experiments within the same use case, even if selected features and models vary.
How to create an Observation Table?
You can either upload an Observation Table from a parquet or csv file or create it from a Source Table.
This guide explains how to configure Observation Tables from a Source Table and link them to our Credit Default context and use case.
We will create four Observation Tables:
- Applications up to Sept 2024 with Loan Defaults: Credit Default Observations for TRAINING up to Sept 2024.
- Applications Oct 2024 with Loan Defaults: Credit Default Observations for HOLDOUT (Oct 2024).
- 50K applications: Credit Default Observations for EDA .
- Applications Preview: 50 Credit Default Observations for Feature PREVIEW.
Check out
For an example how to upload an Observation Table, check out the Grocery UI Tutorial. The tutorial also covers how to add Target values when your target has been registered with a logical approach.
Step 1: Navigate to Observation Table Catalog¶
From the menu, navigate to the 'Formulate' section and select the Observation Table catalog.
Step 2: Create Observation Tables from a Source Table¶
- Click
.
-
Select 'Derive from Source Table' tab and Click
-
In the Source Table catalog, we will use OBSERVATIONS_WITH_TARGET and OBSERVATION_EDA_TABLE under the DEMO_DATASETS database and the CREDIT_DEFAULT schema and click
.
Step 2a: Create 'Applications up to Sept 2024 with Loan Defaults' table:¶
-
In the Source Table catalog, select OBSERVATIONS_WITH_TARGET and click
.
-
Set the table as follows:
- Name: "Applications up to Sept 2024 with Loan Defaults"
- Description: "Credit Default Observations for training up to Sept 2024."
- Purpose: Training
- Sample Rows: 0
- Primary Entity: New Application
- Sampling Date Range: January 1, 2018 - Oct 1, 2024
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: SK_ID_CURR --> New Column Name: SK_ID_CURR
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
Disable sampling
To disable sampling, set the Sample Rows to 0.
-
Ensure Loan_Default is identified as the target.
-
Click
to save the table.
Step 2b: Create 'Applications Oct 2024 with Loan Defaults' table:¶
-
In the Source Table catalog, select OBSERVATIONS_WITH_TARGET and click
.
-
Set the holdout observation table as follows:
- Name: "Applications Oct 2024 with Loan Defaults"
- Description: "Credit Default Observations for holdout (Oct 2024)"
- Purpose: Validation-Test
- Sample Rows: 0
- Primary Entity: New Application
- Sampling Date Range: October 1, 2024 - November 1, 2024
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: SK_ID_CURR --> New Column Name: SK_ID_CURR
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
-
Click
to save the table.
Step 2c: Create 50K applications up to Oct 2024 table:¶
-
In the Source Table catalog, select OBSERVATION_EDA_TABLE and click
.
-
Set the EDA observation table as follows:
- Name: "50K applications"
- Description: "Credit Default Observations for EDA."
- Purpose: EDA
- Sample Rows: 0
- Primary Entity: New Application
- Sampling Date Range: None
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: SK_ID_CURR --> New Column Name: SK_ID_CURR
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
-
Click
to save the table.
Step 2d: Create 'Applications Preview' table:¶
-
In the Source Table catalog, select OBSERVATION_EDA_TABLE and click
.
-
Set the preview observation table as follows:
- Name: "Applications Preview"
- Description: "50 Credit Default Observations for preview."
- Purpose: Preview
- Sample Rows: 50
- Primary Entity: New Application
- Sampling Date Range: None
-
Columns to Include:
- Original Column Name: POINT_IN_TIME --> New Column Name: POINT_IN_TIME
- Original Column Name: SK_ID_CURR --> New Column Name: SK_ID_CURR
- Original Column Name: Loan_Default --> New Column Name: Loan_Default (as Target)
-
Click
to save the table.
Step 3: Link Observation Tables to a Context¶
-
Navigate to the Context Catalog and select the "New Loan Application" context.
-
In the 'About' tab, click
under the Observation tables section and select one of the four Observation Tables. Confirm selection by clicking
.
-
Repeat it for each of the four Observation Tables.
Step 3: Link Observation Tables to a Use Case¶
-
Navigate to the Use Case Catalog and select the "Loan Default by client" use case.
-
In the 'About' tab, click
under the Observation tables section and select one of the four Observation Tables. Confirm selection by clicking
.
-
Repeat it for each of the four Observation Tables.
-
Set "50K applications" as the EDA Table.
-
Set "Applications Preview" as the Preview Table.
Step 5: Check Observation Tables¶
Check successful registration by reviewing the Use Case Catalog.
Check successful registration by reviewing the Observation Table Catalog.