8b. Refine Ideation
In the previous tutorial, we explored the Automated Mode of Feature Ideation, where the system independently generated a comprehensive set of features.
Now, we focus on Semi-Automated Mode, which introduces an interactive layer to the ideation workflow. This mode empowers you to review, refine, and enhance the system's recommendations step by step, ensuring that the features align with your specific requirements and domain knowledge.
In this tutorial, you will learn how to:
- Review and adjust the system's suggestions, from table selection to filters.
- Understand how Semi-Automated Mode balances automation efficiency with manual refinement flexibility.
Note
If you want to learn how to incorporate UDFs, such as embedding, to enrich feature engineering, checkout out the Grocery Dataset UI Tutorials.
Step 1: Create New Feature Ideation¶
-
Navigate to Feature Ideation under the 'Ideate' section of the menu.
-
Click
to start a new ideation process.
-
Edit the Feature Ideation name and description by clicking
.
Step 2: Start Semi-Automated Mode¶
-
Begin the workflow by clicking
.
-
After complete, the table selection results will be displayed for review.
Step 3: Review Table Selection¶
-
Click
to view detailed table selection results.
-
To open the detailed report:
- Click
next to the Ideation name "Semi-Automated Mode".
- Then click
.
- Click
-
Return to the table selection screen and click
to proceed.
Step 4: Review Column Semantics Detection¶
-
Review the Column Semantics Detection results.
-
Click
to view the report.
-
Adjust semantic tags as needed (e.g., in the NEW_APPLICATION table tab assign the BIRTHDATE column a semantic type under date_time/timestamp_field/birth_timestamp in FeatureByte's ontology).
Which Semantic Type Should You Focus On?
- Event Tables & Time Series Tables: Focus on event_type and event_status.
- Slowly Changing Dimension Tables: Focus on termination_timestamp / termination_date.
- All Tables: Review non_additive_numeric, non_informative, not_to_use, ambiguous_numeric, and ambiguous_categorical semantic types.
Click to continue.
Step 5: Review Transforms Detection¶
-
Review the Transforms Detection results.
-
Click
to view the report.
Transforms for CONSUMER_LOAN_STATUS¶
We will retain only two transforms: AMT_CREDIT To AMT_APPLICATION and Planned Loan Duration. Additionally, we will learn how to create them.
- Navigate to CONSUMER_LOAN_STATUS tab and open the Transforms window by clicking
.
-
Click
to delete each suggested transform.
-
Create Ratio
AMT_CREDIT
byAMT_APPLICATION
- Click
- Select the 'Ratio' operation.
- Choose the AMT_CREDIT column as the Operand 1 and the AMT_APPLICATION column as the Operand 2. Both are 'Column' Operand type.
- Generate a name and relevance by clicking
.
- Review the relevance explanation and edit suggested name if needed.
- Click
-
Create Time Delta from
FIRST_DUE_TIMESTAMP
byLAST_DUE_1ST_VERSION_TIMESTAMP
- Click
- Select the 'Time delta' operation.
- Choose the FIRST_DUE_TIMESTAMP column as the Start Time and the LAST_DUE_1ST_VERSION_TIMESTAMP column as the End Time. Both are 'Column' Operand type.
- Choose
Day
as the Time Delta Unit. - Generate a name and relevance by clicking
.
- Review the relevance explanation and edit suggested name if needed.
- Click
-
Click
to save changes.
Transforms for CONSUMER_INSTALLMENTS¶
We will keep only 1 transform: Payment Delay
.
-
Navigate to CONSUMER_INSTALLMENTS tab and open the Transforms window by clicking
-
Click
to delete all suggested transforms except for Payment Delay.
-
Click
to save changes.
Transforms for PRIOR_APPLICATIONS¶
We keep the PRIOR_APPLICATIONS transforms unchanged.
Transforms for NEW_APPLICATION¶
We keep only 4 transforms for NEW_APPLICATION:
- AMT_ANNUITY To AMT_CREDIT
- AMT_GOODS_VALUE To AMT_CREDIT
- AMT_ANNUITY To AMT_GOODS_VALUE
- Credit-Goods Gap
Click to proceed.
Step 6: Review Filters Detection¶
Review suggested Filters Detection results.
Filters for CONSUMER_INSTALLMENTS¶
We keep the filter for CONSUMER_INSTALLMENTS unchanged.
Filters for PRIOR_APPILCATIONS¶
We will delete:
- Filter 3: Refused Status Revolving loans Contract_type Prior application
- Filter 5: Approved Status Revolving loans Contract_type Prior application
- Filter 7: Refused Status Consumer loans Contract_type Prior application
and create one additional filter: Consumer loans Contract_type Prior application
-
Navigate to the PRIOR_APPLICATIONS tab and open the Filters window by clicking
.
-
Click
to delete the 3 filters listed above.
-
Click
to add a new filter.
-
Select Filter Column and choose CONTRACT_TYPE as the filter column.
-
Complete the filter condition by specifying the filter values. This will open a new windown listing all elligible values.
-
Identify the most relevant values by clicking
and finalize your value selection by selecting 'Consumer Loans'.
-
Generate filter name and relevance.
-
Check the relevance of the new filter.
-
Change the filter type to secondary filter.
-
Review the final filter list
Click to proceed.
Step 7: Review Feature Ideation Setup¶
-
Review the suggested setup.
-
Modify Aggregation windows (e.g., change to
52 weeks
and104 weeks
) by clickingto remove the
26 weeks
window. -
Review the new setup.
Click to proceed.
Step 8: Review the Feature Ideation Report¶
-
After the process completes, review the ideated features table.
-
Click
next to the Ideation name "Semi-Automated Mode" to access the full ideation reports.
-
Click
to visualize the full report in a new tab.
Step 9: Run EDA¶
-
Select All Ideated Features by clicking
.
-
Scroll to the bottom of the ideated features table and click
to begin the Exploratory Data Anaylsis (EDA) process.
Step 10: Run Feature Selection¶
-
Start Feature Selection by clicking on the Magic Wand
.
-
Select the SHAP-Based mode, reduce the number of Top features candidates to 1000 and choose the option to exclude Low Added Value Features.
-
Once the selection is complete, review the selected features.
Step 11: Add Features to the Feature Catalog¶
- Clear the search (if you used it) and any prior selection (if any) by clicking
- Select features in the feature list by clicking
.
-
Save the selected features into the Feature Catalog by clicking
.
-
Call the feature list "SHAP selection from Semi-Automated Mode".
Step 12: Run Rule-based Selection¶
-
Change to All features by setting the dropdown list to All features.
-
Start Feature Selection by clicking on the Magic Wand
.
-
Select the Rule-Based mode. In this example, we want the top 5 features for each theme if it is part of top 200 features overall. We also choose the option to exclude Low Added Value Features
-
Once the selection is complete, review the selected features.
Step 13: Add Rule-based selection to the Feature Catalog¶
- Clear the search (if you used it) and any prior selection (if any) by clicking
- Select features in the feature list by clicking
.
- Save the selected features into the Feature Catalog by clicking
.
-
Call the feature list "Rule-based selection from Semi-Automated Mode".
Step 14: Review Prior Selections¶
-
Navigate to the Feature Selection tab to review your prior selections.
-
Click on a selection to access its details. Each selection provides information across three tabs:
- About Tab: Displays a description and a summary of the signal range for the selected features.
- Settings Tab: Shows detailed information about how the selection was generated, including parameters and logic used.
- Features Tab: Shows selected features together with their semantic relevance.
Step 15: Download the List of Ideated Features Metadata¶
Follow these steps to download a CSV file containing metadata for all ideated features (that we will use later for modeling):
- Click
and select 'All' under Recommendation Group to include Features in the catalog that are compatible with the use case but that were not suggested.
- Select
.
- Download the csv file by clicking
.
- Choose the "filtered features" option and give a name to your file (e.g., "Ideated Features").