Skip to content

13. Unleash Your Creativity

In this final section of our tutorial series, we'll demonstrate three powerful ways FeatureByte enhances your ability to perform advanced feature engineering.

  1. Edit features suggested by FeatureByte Copilot.
  2. Use FeatureByte's filtering capability to unearth features that concentrate on specific subsets of your data.
  3. Make FeatureByte Copilot use your own User Defined Function to leverage the power of transformer models and generate more advanced features from text or other data.

1. Edit Features suggested by FeatureByte Copilot

FeatureByte's Copilot provides suggested features that can be further customized within our SDK. We will guide you through the process of downloading a feature notebook, editing its code, and adding the newly customized feature to your catalog.

For example, you will learn how to modify the suggested feature CUSTOMER_Max_of_INVOICE_Sum_of_items_Discounts_To_Sum_of_items_TotalCosts.

This feature calculates the ratio of discounts to total costs for each invoice, then finds the maximum ratio for each customer over the past four weeks.

Screenshots guide you through each step, from downloading to modifying and running the notebook, and analyzing the feature in the Feature Catalog.

Step 1: Select a Feature and Download its Notebook

Let's select the feature suggested by Copilot in Feature Ideation.

Name

Download the notebook by selecting the feature and clicking notebook icon.

Name

Step 2: Explore the Notebook

Explore the Feature SDK code in the notebook.

Name Name Name

Step 3: Edit the Notebook to Create a New Feature

Edit the feature calculation from maximum to standard deviation.

Name

Step 4: Add New Feature to the Catalog and Analyze it

Run the notebook to add the new feature to the Feature Catalog. Analyze the new feature's semantic relevanceby clicking relevance icon.

Name

You can also run EDA and check the feature's predictive power for the use case.

Name

2. Discover New Features with Filtering

Uncover new features by applying filters to specific data subsets. This section demonstrates starting a new feature ideation session with a focus on the ProductGroup column. It involves:

  • Adding filters to narrow down the data.
  • Selecting relevant product groups such as staple foods and dairy.
  • Considering group suggestions for broader categories, like alcoholic items, and refining your selections.
  • Using the Copilot to suggest and evaluate the semantic relevance of these filters.

Illustrative screenshots will show you how to navigate the filter configuration and selection process.

Step 1: Start a New Ideation

Start a new ideation session where we will configure a new filter by clicking new ideation icon.

Name

In the Feature Ideation Configuration, scroll down to the Filters section.

Name

Name

Step 2: Choose Column for the Filter

Click 'Add Filters' for the INVOICEITEMS table.

Name

Select 'ProductGroup' as the column to use for our new filter.

Name

Step 3: Get Suggestions for Column Individual Values

Click on top 25 icon to identify the top 25 most semantically relevant ProductGroup values for our use case to predict a grocery customer's activity next week.

Name

Most relevant ProductGroup values belong to staple foods, dairy, and other fresh products.

Name

Step 4: Get Suggestions to Group Column values

To group these values, click on group icon to identify the most relevant groups for our use case.

Name

A group of alcoholic items appears first. Let's select another group, the staple food group suggested.

Name

If you had selected items before, confirm whether you want to overwrite this prior selection.

Name

Step 5: Finalize Your New Filter Settings

Confirm your final selection.

Name

Optionally, provide an alias for the condition.

Name

Copilot will suggest a name for this filter and evaluate its semantic relevance.

Name

You can change the filter type to secondary filter if you want only simple features to be produced for this filter.

Name

You can specify whether features using the non filtered table should be also produced.

Name

Step 6: Run Feature Ideation with Your New Filer

You are now set to run Feature Ideation by clicking Name

3. Enhance Text Data with User-Defined Functions

Leverage transformer models to generate advanced features from text data using User Defined Functions (UDFs).

We’ll walk you through:

  • Registering a SQL UDF in the catalog, using an SBERT Transformer model added to the tutorial's Snowflake data warehouse.
  • Starting a feature ideation session using this UDF on the ProductGroup column, which contains descriptive text.
  • Generating new features using the transformer model to analyze textual data semantically.

Step 1: Register the UDF in the Catalog.

Run this Python code in a notebook.

import featurebyte as fb
fb.use_profile("tutorial")

catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)

fb.UserDefinedFunction.create(
    name='embedding',
    sql_function_name='F_SBERT_EMBEDDING',
    function_parameters=[fb.FunctionParameter(name="x", dtype=fb.enum.DBVarType.VARCHAR)],
    output_dtype=fb.enum.DBVarType.ARRAY,
    is_global=False,
)

Step 2: Start a New Ideation

Start a new ideation where we will use the SBERT Transformer model by clicking new ideation icon.

Name

Step 3: Apply the Transform to the item's ProductGroup

Name

Apply the transformer to the ProductGroup column, which contains descriptive text.

Name Name

Step 4: Run Feature Ideation

You are now set to run Feature Ideation by clicking Name

Step 5: Analyze New Features

Copilot generated 16 new features using the embedding. Name

One interesting feature is CUSTOMER_vs_OVERALL_Mean_vector_of_items_product_ProductGroup_embeddings_4w. This feature computes the similarity between an individual customer and other customers by calculating the Cosine Similarity between the Mean vector of items product_ProductGroup_embeddings for both the customer and all customers over a 4w period.

Name

This new feature, while not the most predictive, carries interesting signals that are likely to complement more traditional features and might help build stronger models.

Name