8b. Refine Ideation
In the previous step of the tutorials, we ran a simple Feature Ideation, where the system independently generated a set of features for a single aggregation window.
Now, we focus on refining the ideation by exploring more features for the columns used by this feature set.
In this tutorial, you will learn how to:
- Prune uninformative columns based on the prior feature selection
- Enable filter detection
- Review and adjust the system's suggestions for transforms and filters.
Note
If you want to learn how to incorporate UDFs, such as embedding, to enrich feature engineering, checkout out the Grocery Dataset UI Tutorials.
Step 1: Create New Feature Ideation¶
-
Navigate to Feature Ideation under the 'Ideate' section of the menu.
-
Click
to start a new ideation process.
-
Edit the Feature Ideation name and description by clicking
.
-
Click
and configure the ideation by clicking
.
-
In Semantic Detection Setup, select the prior run feature selection to prune uninformative columns
-
In Filter Setup, uncheck the 'Skip Filters' option
-
Click
to start ideation.
Step 2: Review Feature Ideation Steps¶
Review Table Selection¶
Click to view detailed table selection analysis.
Review the Column Semantics Detection¶
Column semantics are organized by table. Select the tab corresponding to the table you are interested in.
Which Semantic Type Should You Focus On?
- Event Tables & Time Series Tables: Focus on event_type and event_status.
- Slowly Changing Dimension Tables: Focus on termination_timestamp / termination_date.
- All Tables: Review non_additive_numeric, non_informative, not_to_use, ambiguous_numeric, and ambiguous_categorical semantic types.
Review the Transforms Detection.¶
Transforms are organized by table. Select the tab corresponding to the table you are interested in.
Click for additional insight.
Review suggested Filters Detection.¶
Filters are organized by table. Select the tab corresponding to the table you are interested in.
Click for additional insight.
Review Feature Ideation Setup¶
- Review Aggregation windows
- Review Event Frequency type
- Review Key Aggregation Column
Review Ideated Features¶
Sort by Predictive Score if you are interested in the feature with the strongest correlation with the target
Review EDA¶
Review Feature Selection¶
Click on the selection to access its details:
- summary of the signal range for the selected features.
- information about how the selection was generated.
Step 2: Add Features to the Feature Catalog¶
-
Go to the Features tab of the Feature Selection step.
-
Clear the search (if you used it) and any prior selection (if any) by clicking
-
Clear Filters by clicking
and close the filter panel by clicking
.
-
Ensure the feature selection is properly selected.
-
Select features in the feature list by clicking
.
-
Save the selected features into the Feature Catalog by clicking
.
-
Call the feature list "SHAP selection after Column Pruning".
Once the process is complete, the added features should be marked as 'DRAFT'.
Step 3: Clone and reset Feature Ideation¶
Clone and reset the Feature Ideation to adjust the system's suggestions for column semantics, transforms and filters.
- Go to the Column Semantics step and click
.
- Edit name and description of the new ideation.
Step 4: Adjust Column Semantics¶
Ensure AMT_GOODS_PRICE Semantic Type in NEW_APPLICATION table is marked as 'non_negative_amount'.
-
Navigate to the NEW_APPLICATION tab.
-
Adjust semantic tag of AMT_GOODS_PRICE under numeric/additive_numeric/non_negative_amount in FeatureByte's ontology).
-
Click
to proceed to Transform.
Step 5: Adjust Transforms¶
-
Navigate to INSTALLMENTS_PAYMENTS tab.
-
Open the Transforms window by clicking
.
-
Click
to create time delta between actual_installment_date and scheduled_installment_date.
- Select the Time delta operation
- Choose the Day unit.
- Set the actual_installment_date column as the Operand 1 and the scheduled_installment_date column as the Operand 2. Both are 'Column' Operand type.
- Generate a name and relevance by clicking
.
- Review the relevance explanation and edit suggested name if needed.
- Select the Time delta operation
-
Click
and check the new transform is saved.
Click to proceed to Filters.
Step 6: Adjust Filters¶
We will delete filters and create a new one in the PREVIOUS_APPLICATION table.
We will delete:
- Filter 2: Approved Contract_status Consumer loans Contract_type Priorapplication
- Filter 3: Cash loans Contract_type Priorapplication
- Filter 4: Consumer loans Contract_type Priorapplication
and create two additional filters:
- CONTRACT_STATUS == Refused
- YIELD_GROUP == high
Delete filters¶
-
Navigate to the PREVIOUS_APPLICATION tab
-
open the Filters window by clicking
.
-
Click
to delete the 3 filters listed above.
Create filter with CONTRACT_STATUS == Refused¶
-
Click
to add a new filter.
-
Select Filter Column and choose CONTRACT_STATUS as the filter column.
-
Complete the filter condition by specifying the filter values. This will open a new windown listing all elligible values.
-
Identify the most relevant values by clicking
and finalize your value selection by selecting 'Refused'.
-
Generate filter name and relevance.
-
Check the relevance of the new filter.
-
Change the filter type to secondary filter. This will decide the complexity of features that will be ideated for this filter.
Create filter with YIELD_GROUP == high¶
-
Click
to add a new filter. Select Filter Column and choose YIELD_GROUP as the filter column.
-
Identify the most relevant values by clicking
and finalize your value selection by selecting 'high'.
-
Generate filter name and relevance. Check the relevance of the new filter. Change the filter type to secondary filter.
Review the final Filter list and proceed to Feature Selection¶
-
Click
-
Click
and then
to proceed up to Feature Selection.
-
Once complete, review suggested Feature Selection.
Step 7: Run new SHAP Feature Selection¶
To further reduce the number of features, run a new feature selection with an additional round of feature importance based on SHAP values.
-
Go to the Features tab of the Feature Selection step.
-
Select All Features.
-
Start Feature Selection by clicking on the Magic Wand
.
-
Select the SHAP-Based mode, select filtered features and increase the number of importance rounds to 2.
-
Once the selection is complete, review the selected features.
Step 8: Add New SHAP Feature Selection to the Feature Catalog¶
- Clear the search (if you used it) and any prior selection (if any) by clicking
- Select features in the feature list by clicking
.
-
Save the selected features into the Feature Catalog by clicking
.
-
Call the feature list "SHAP selection (1+2) after adjusted ideation".
Step 9: Add Original SHAP Feature Selection to the Feature Catalog¶
- Clear the search (if you used it) and any prior selection (if any) by clicking
- Select original SHAP Feature Selection and click
.
-
Save the selected features into the Feature Catalog by clicking
.
-
Call the feature list "Suggested SHAP selection (1+1) after adjusted ideation".
Step 10: Run Rule-Based Feature Selection¶
To ensure that the feature catalog covers all themes available, we will run a Rule-Based Selection.
-
Select All Features and initiate new selection by clicking on the Magic Wand
.
-
Select the Rule-Based mode, and select top 1 feature for each theme.
-
Once the selection is complete, review the selected features.
-
In the Features tab, click
. Click
. Call the feature list "Top feature per theme".