Skip to content

8b. Refine Ideation

In the previous tutorial, we explored the Automated Mode of Feature Ideation, where the system independently generated a comprehensive set of features.

Now, we focus on Semi-Automated Mode, which introduces an interactive layer to the ideation workflow. This mode empowers you to review, refine, and enhance the system's recommendations step by step, ensuring that the features align with your specific requirements and domain knowledge.

In this tutorial, you will learn how to:

  • Review and adjust the system's suggestions, from table selection to filters.
  • Understand how Semi-Automated Mode balances automation efficiency with manual refinement flexibility.

Note

If you want to learn how to incorporate UDFs, such as embedding, to enrich feature engineering, checkout out the Grocery Dataset UI Tutorials.


Step 1: Create New Feature Ideation

  1. Navigate to Feature Ideation under the 'Ideate' section of the menu.

  2. Click New Ideation Button to start a new ideation process.

    Name

  3. Edit the Feature Ideation name and description by clicking Edit Button. Name


Step 2: Start Semi-Automated Mode

  1. Begin the workflow by clicking Next Step Button.

  2. After complete, the table selection results will be displayed for review.

Name


Step 3: Review Table Selection

  1. Click Name to view detailed table selection results. Name

  2. To open the detailed report:

    • Click Report Button next to the Ideation name "Semi-Automated Mode".
    • Then click Report tab Button.

    Name

  3. Return to the table selection screen and click Next Step Button to proceed. Name


Step 4: Review Column Semantics Detection

  1. Review the Column Semantics Detection results. Name

  2. Click Name to view the report. Name

  3. Adjust semantic tags as needed (e.g., in the NEW_APPLICATION table tab assign the BIRTHDATE column a semantic type under date_time/timestamp_field/birth_timestamp in FeatureByte's ontology). Name Name

Which Semantic Type Should You Focus On?

  • Event Tables & Time Series Tables: Focus on event_type and event_status.
  • Slowly Changing Dimension Tables: Focus on termination_timestamp / termination_date.
  • All Tables: Review non_additive_numeric, non_informative, not_to_use, ambiguous_numeric, and ambiguous_categorical semantic types.

Click Next Step Button to continue.


Step 5: Review Transforms Detection

  1. Review the Transforms Detection results. Name

  2. Click Name to view the report. Name

Transforms for CONSUMER_LOAN_STATUS

We will retain only two transforms: AMT_CREDIT To AMT_APPLICATION and Planned Loan Duration. Additionally, we will learn how to create them.

  1. Navigate to CONSUMER_LOAN_STATUS tab and open the Transforms window by clicking Name. Name
  2. Click Name to delete each suggested transform.

    Name

  3. Create Ratio AMT_CREDIT by AMT_APPLICATION

    • Click Name
    • Select the 'Ratio' operation. Name Name
    • Choose the AMT_CREDIT column as the Operand 1 and the AMT_APPLICATION column as the Operand 2. Both are 'Column' Operand type. Name Name
    • Generate a name and relevance by clicking Name. Name
    • Review the relevance explanation and edit suggested name if needed. Name
  4. Create Time Delta from FIRST_DUE_TIMESTAMP by LAST_DUE_1ST_VERSION_TIMESTAMP

    • Click Name
    • Select the 'Time delta' operation. Name Name
    • Choose the FIRST_DUE_TIMESTAMP column as the Start Time and the LAST_DUE_1ST_VERSION_TIMESTAMP column as the End Time. Both are 'Column' Operand type. Name Name
    • Choose Day as the Time Delta Unit.
    • Generate a name and relevance by clicking Name. Name
    • Review the relevance explanation and edit suggested name if needed. Name
  5. Click Name to save changes.

    Name

Transforms for CONSUMER_INSTALLMENTS

We will keep only 1 transform: Payment Delay.

  1. Navigate to CONSUMER_INSTALLMENTS tab and open the Transforms window by clicking Name

  2. Click Name to delete all suggested transforms except for Payment Delay. Name

  3. Click Name to save changes.

Transforms for PRIOR_APPLICATIONS

We keep the PRIOR_APPLICATIONS transforms unchanged.

Name

Transforms for NEW_APPLICATION

We keep only 4 transforms for NEW_APPLICATION:

  • AMT_ANNUITY To AMT_CREDIT
  • AMT_GOODS_VALUE To AMT_CREDIT
  • AMT_ANNUITY To AMT_GOODS_VALUE
  • Credit-Goods Gap

Name

Click Next Step Button to proceed.


Step 6: Review Filters Detection

Review suggested Filters Detection results.

Name

Filters for CONSUMER_INSTALLMENTS

We keep the filter for CONSUMER_INSTALLMENTS unchanged.

Filters for PRIOR_APPILCATIONS

We will delete:

  • Filter 3: Refused Status Revolving loans Contract_type Prior application
  • Filter 5: Approved Status Revolving loans Contract_type Prior application
  • Filter 7: Refused Status Consumer loans Contract_type Prior application

and create one additional filter: Consumer loans Contract_type Prior application

  1. Navigate to the PRIOR_APPLICATIONS tab and open the Filters window by clicking Name. Name

  2. Click Name to delete the 3 filters listed above.

    Name

  3. Click Name to add a new filter.

    Name

  4. Select Filter Column and choose CONTRACT_TYPE as the filter column. Name

  5. Complete the filter condition by specifying the filter values. This will open a new windown listing all elligible values. Name Name

  6. Identify the most relevant values by clicking Name and finalize your value selection by selecting 'Consumer Loans'. Name

  7. Generate filter name and relevance. Name

  8. Check the relevance of the new filter. Name

  9. Change the filter type to secondary filter. Name

  10. Review the final filter list Name

Click Next Step Button to proceed.

Step 7: Review Feature Ideation Setup

  1. Review the suggested setup. Name

  2. Modify Aggregation windows (e.g., change to 52 weeks and 104 weeks) by clicking Name to remove the 26 weeks window.

  3. Review the new setup. Name

Click Next Step Button to proceed.


Step 8: Review the Feature Ideation Report

  1. After the process completes, review the ideated features table.

    Name

  2. Click Report Button next to the Ideation name "Semi-Automated Mode" to access the full ideation reports.

    Name

  3. Click Report tab Button to visualize the full report in a new tab.

    Name


Step 9: Run EDA

  1. Select All Ideated Features by clicking Select All Button.

    Name

  2. Scroll to the bottom of the ideated features table and click EDA Button to begin the Exploratory Data Anaylsis (EDA) process.


Step 10: Run Feature Selection

  1. Start Feature Selection by clicking on the Magic Wand Magic Wand.

  2. Select the SHAP-Based mode, reduce the number of Top features candidates to 1000 and choose the option to exclude Low Added Value Features.

    Name

  3. Once the selection is complete, review the selected features.

    Name


Step 11: Add Features to the Feature Catalog

  1. Clear the search (if you used it) and any prior selection (if any) by clicking Clear Button
  2. Select features in the feature list by clicking Select All Button. Name
  3. Save the selected features into the Feature Catalog by clicking Save Feature List.

  4. Call the feature list "SHAP selection from Semi-Automated Mode".

    Name

Step 12: Run Rule-based Selection

  1. Change to All features by setting the dropdown list to All features. Name

  2. Start Feature Selection by clicking on the Magic Wand Magic Wand.

  3. Select the Rule-Based mode. In this example, we want the top 5 features for each theme if it is part of top 200 features overall. We also choose the option to exclude Low Added Value Features

    Name

  4. Once the selection is complete, review the selected features.

    Name


Step 13: Add Rule-based selection to the Feature Catalog

  1. Clear the search (if you used it) and any prior selection (if any) by clicking Clear Button
  2. Select features in the feature list by clicking Select All Button.
  3. Save the selected features into the Feature Catalog by clicking Save Feature List.
  4. Call the feature list "Rule-based selection from Semi-Automated Mode".

    Name


Step 14: Review Prior Selections

  1. Navigate to the Feature Selection tab to review your prior selections.

    Feature Selection Tab

  2. Click on a selection to access its details. Each selection provides information across three tabs:

    • About Tab: Displays a description and a summary of the signal range for the selected features.
    • Settings Tab: Shows detailed information about how the selection was generated, including parameters and logic used.
    • Features Tab: Shows selected features together with their semantic relevance.

Step 15: Download the List of Ideated Features Metadata

Follow these steps to download a CSV file containing metadata for all ideated features (that we will use later for modeling):

  1. Click Filter Button and select 'All' under Recommendation Group to include Features in the catalog that are compatible with the use case but that were not suggested.
  2. Select All Features Button.
  3. Download the csv file by clicking csv Button.
  4. Choose the "filtered features" option and give a name to your file (e.g., "Ideated Features").

Download ideated features csv file