10. Compute Feature Table
In this tutorial, you will learn how to generate a Feature Table from a feature list created in previous tutorials. First, we will upload an observation table. Once the feature table is generated, we will train a LightGBM model on the new training data.
Step 0: Upload a Training Observation Table¶
- Navigate to the 'Formulate' section and select the Observation Table catalog and click
.
-
Select 'Upload file' tab and configure the table:
- Name: "In_Store_Customer_2023_2024_20K"
- Purpose: "Training" as we will use the table to train a model
- Primary Entity: "customer"
- Target: Leave blank
- CSV/Parquet: "In-Store_Customer_2023_2024_20K_sample.parquet" that you can download here.
-
Select the uploaded table from the catalog and go to the 'About' tab and choose "In-Store Customer" from the Context dropdown menu.
-
Click
and select the target and new the new table, e.g., "Pre_Purchase_Customer_Activity_next_week_2023_2024_20K".
- Select "Pre_Purchase_Customer_Activity_next_week_2023_2024_20K" in the catalog.
- Choose "Customer Activity Next Week before a purchase" from the Use Case dropdown menu.
Step 1: Compute Feature Table¶
Navigate to the Feature List Catalog in the 'Experiment section of the menu.
For each feature list, follow these steps:
-
Click
.
-
Select the Observation Table
Pre_Purchase_Customer_Activity_next_week_2023_2024_20K
. -
Confirm the computation by clicking
.
Step 2: Review Feature Table¶
- Navigate to the Feature Table Catalog in the 'Experiment section of the menu.
- Click on the
1 + SHAP selection with embedding - Pre_Purchase_Customer_Activity_next_week_2023_2024_20K
table. - Open the 'Preview' tab to examine the table and check the values of individual features.
- Review a dictionary-type feature.
- Review a embedding-type feature.
Step 4: Train LightGBM¶
To evaluate the performance of your feature lists, download the files lgbm_grocery_ui_tutorials.ipynb and modeling_script.py here. Use the notebook to test the accuracy of each feature list and identify the best-performing one.
In this notebook, we train the models on the 2nd Semester of 2023 and the 1st Semester of 2024 and validate the predictions on July 2024.
The feature list with the highest AUC score (~0.88) is the SHAP selection with embedding. The Top 1 per theme is also a strong candidate as it combines simplicity (limited number of features) and strong predictivity.