12. Use Embeddings
Use embeddings¶
In this tutorial, we'll use product group embeddings to compare a customer's latest invoice with their past purchases from the last 26 weeks.
To learn how to create a SQL Embedding User-Defined Function (UDF), check out the 'Bring Your Own Transformer' tutorials.
For our hosted tutorials, we have pre-configured a SQL UDF using the SBERT Transformer model on our Snowflake data warehouse. We'll register this UDF in the Catalog and apply it to analyze the ProductGroup descriptions.
In [1]:
                Copied!
                
                
            import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)  
    
        14:27:17 | INFO | SDK version: 3.0.1.dev45 INFO :featurebyte:SDK version: 3.0.1.dev45 14:27:17 | INFO | No catalog activated. INFO :featurebyte:No catalog activated. 14:27:17 | INFO | Using profile: tutorial INFO :featurebyte:Using profile: tutorial 14:27:17 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml INFO :featurebyte:Using configuration file at: /Users/gxav/.featurebyte/config.yaml 14:27:17 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) INFO :featurebyte:Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 14:27:17 | INFO | SDK version: 3.0.1.dev45 INFO :featurebyte:SDK version: 3.0.1.dev45 14:27:17 | INFO | No catalog activated. INFO :featurebyte:No catalog activated. 14:27:17 | INFO | Catalog activated: Grocery Dataset Tutorial INFO :featurebyte.api.catalog:Catalog activated: Grocery Dataset Tutorial 16:09:06 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:09:06 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:09:06 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:09:06 | INFO | No catalog activated. 16:09:06 | INFO | Catalog activated: Grocery Dataset Tutorial
Register the F_SBERT_EMBEDDING UDF to the Catalog¶
In [2]:
                Copied!
                
                
            fb.UserDefinedFunction.create(
    name='embedding', 
    sql_function_name='F_SBERT_EMBEDDING',
    function_parameters=[fb.FunctionParameter(name="x", dtype=fb.enum.DBVarType.VARCHAR)],
    output_dtype=fb.enum.DBVarType.EMBEDDING,
    is_global=False,
)
fb.UserDefinedFunction.create(
    name='embedding', 
    sql_function_name='F_SBERT_EMBEDDING',
    function_parameters=[fb.FunctionParameter(name="x", dtype=fb.enum.DBVarType.VARCHAR)],
    output_dtype=fb.enum.DBVarType.EMBEDDING,
    is_global=False,
)
    
        Out[2]:
User Defined Function
	| name | embedding | ||||||||||
| created_at | 2025-06-02 06:27:17 | ||||||||||
| updated_at | None | ||||||||||
| description | None | ||||||||||
| sql_function_name | F_SBERT_EMBEDDING | ||||||||||
| function_parameters | 
 | ||||||||||
| signature | embedding(x: str) -> embedding | ||||||||||
| output_dtype | EMBEDDING | ||||||||||
| feature_store_name | playground | ||||||||||
| used_by_features | [] | 
Apply the embedding UDF instance to ProductGroup¶
In [3]:
                Copied!
                
                
            # Get embedding UDF instance.
embedding_udf = catalog.get_user_defined_function("embedding")
# Get embedding UDF instance.
embedding_udf = catalog.get_user_defined_function("embedding")
    
        In [4]:
                Copied!
                
                
            # Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Apply embedding to ProductGroup column in GROCERYPRODUCT view.
groceryproduct_view["ProductGroup_embedding"] = embedding_udf(groceryproduct_view["ProductGroup"])
# Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Apply embedding to ProductGroup column in GROCERYPRODUCT view.
groceryproduct_view["ProductGroup_embedding"] = embedding_udf(groceryproduct_view["ProductGroup"])
    
        Get other views¶
In [5]:
                Copied!
                
                
            # Get view from GROCERYINVOICE event table.
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
# Get view from GROCERYINVOICE event table.
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
    
        Join views¶
In [6]:
                Copied!
                
                
            # Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rprefix="product_")
# Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rprefix="product_")
    
        Get the mean vector of an invoice's Product Group descriptions¶
In [7]:
                Copied!
                
                
            # Group invoiceitems_view by invoice entity (GroceryInvoiceGuid).
invoiceitems_view_by_invoice = invoiceitems_view.groupby("GroceryInvoiceGuid")
# Group invoiceitems_view by invoice entity (GroceryInvoiceGuid).
invoiceitems_view_by_invoice = invoiceitems_view.groupby("GroceryInvoiceGuid")
    
        In [8]:
                Copied!
                
                
            # Mean vector of product_ProductGroup_embedding for the invoice.
invoice_mean_vector_of_item_product_productgroup_embedding = invoiceitems_view_by_invoice.aggregate(
    "product_ProductGroup_embedding", method=fb.AggFunc.AVG,
    feature_name="INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"
)
# Mean vector of product_ProductGroup_embedding for the invoice.
invoice_mean_vector_of_item_product_productgroup_embedding = invoiceitems_view_by_invoice.aggregate(
    "product_ProductGroup_embedding", method=fb.AggFunc.AVG,
    feature_name="INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"
)
    
        Get the mean vector of the Customer's latest invoice¶
In [9]:
                Copied!
                
                
            # Add INVOICE_Mean_vector_of_item_product_ProductGroup_embedding feature to the GROCERYINVOICE view as a column.
groceryinvoice_view = groceryinvoice_view.add_feature(
    "INVOICE_Mean_vector_of_item_product_ProductGroup_embedding",
    invoice_mean_vector_of_item_product_productgroup_embedding
)
# Add INVOICE_Mean_vector_of_item_product_ProductGroup_embedding feature to the GROCERYINVOICE view as a column.
groceryinvoice_view = groceryinvoice_view.add_feature(
    "INVOICE_Mean_vector_of_item_product_ProductGroup_embedding",
    invoice_mean_vector_of_item_product_productgroup_embedding
)
    
        In [10]:
                Copied!
                
                
            # Group GROCERYINVOICE view by customer entity (GroceryCustomerGuid).
groceryinvoice_view_by_customer = groceryinvoice_view.groupby(['GroceryCustomerGuid'])
# Group GROCERYINVOICE view by customer entity (GroceryCustomerGuid).
groceryinvoice_view_by_customer = groceryinvoice_view.groupby(['GroceryCustomerGuid'])
    
        In [11]:
                Copied!
                
                
            # Get Latest Mean vector of item product_ProductGroup_embedding for the customer
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding =\
groceryinvoice_view_by_customer.aggregate_over(
    "INVOICE_Mean_vector_of_item_product_ProductGroup_embedding", method="latest",
    feature_names=["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"],
    windows=[None]
)["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"]
# Get Latest Mean vector of item product_ProductGroup_embedding for the customer
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding =\
groceryinvoice_view_by_customer.aggregate_over(
    "INVOICE_Mean_vector_of_item_product_ProductGroup_embedding", method="latest",
    feature_names=["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"],
    windows=[None]
)["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"]
    
        Get the mean vector for the Customer's Product Group descriptions over past 26 weeks¶
In [12]:
                Copied!
                
                
            # Group INVOICEITEMS view by customer entity (GroceryCustomerGuid).
invoiceitems_view_by_customer = invoiceitems_view.groupby(['GroceryCustomerGuid'])
# Group INVOICEITEMS view by customer entity (GroceryCustomerGuid).
invoiceitems_view_by_customer = invoiceitems_view.groupby(['GroceryCustomerGuid'])
    
        In [13]:
                Copied!
                
                
            # Get Mean vector of product_ProductGroup_embedding for the customer over time.
feature_group = invoiceitems_view_by_customer.aggregate_over(
    "product_ProductGroup_embedding", method="avg",
    feature_names=[
        "CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"
    ],
    windows=["26w"],
)
# Get CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w object from feature group.
customer_mean_vector_of_item_product_productgroup_embedding_26w =\
feature_group["CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"]
# Get Mean vector of product_ProductGroup_embedding for the customer over time.
feature_group = invoiceitems_view_by_customer.aggregate_over(
    "product_ProductGroup_embedding", method="avg",
    feature_names=[
        "CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"
    ],
    windows=["26w"],
)
# Get CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w object from feature group.
customer_mean_vector_of_item_product_productgroup_embedding_26w =\
feature_group["CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"]
    
        Derive Similarity between latest invoice and 26 weeks purchases¶
In [14]:
                Copied!
                
                
            # Derive Similarity feature from cosine similarity between
# CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding
# and CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice =\
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding.vec.cosine_similarity(
    customer_mean_vector_of_item_product_productgroup_embedding_26w
)
# Give a name to new feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.name = \
"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice"
# Derive Similarity feature from cosine similarity between
# CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding
# and CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice =\
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding.vec.cosine_similarity(
    customer_mean_vector_of_item_product_productgroup_embedding_26w
)
# Give a name to new feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.name = \
"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice"
    
        Preview feature¶
In [15]:
                Copied!
                
                
            # Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
    
        In [16]:
                Copied!
                
                
            # Preview customer_mean_vector_of_item_product_productgroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w.preview(preview_table)
# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w.preview(preview_table)
    
        Out[16]:
| POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w | |
|---|---|---|---|
| 0 | 2023-06-09 16:38:32 | 6acb20fd-605d-4982-aa39-77054f08103c | [-0.055679653671241, 0.036607537203578004, -0.... | 
| 1 | 2023-05-25 05:21:12 | cf670b7a-c6bf-4598-b0c0-400378b9cab6 | [-0.036810443362098, 0.054307278078735005, -0.... | 
| 2 | 2022-11-01 14:32:22 | 8687e2a4-7f97-4442-873c-5c52d74404f8 | [-0.05579970813751801, 0.038974709615827004, -... | 
| 3 | 2023-01-06 14:38:32 | e63c0f14-3530-49e9-b73e-f92594e82663 | [-0.05125611890158, 0.0256558639652, -0.012342... | 
| 4 | 2023-04-26 19:36:57 | f867935a-d33a-43d1-b3bc-02c539769836 | [-0.049507673506261, 0.036102647723594006, -0.... | 
| 5 | 2023-06-02 14:24:28 | 0154e4b4-25a4-4276-af72-2826bbc64c31 | [-0.07301349714239201, 0.041353363137788, -0.0... | 
| 6 | 2022-07-23 15:32:29 | d87d65b8-4f78-41cc-8bd3-0064f83fe4fb | [-0.043172992735109, 0.033448811410606, -0.017... | 
| 7 | 2023-03-20 15:08:44 | 6f5299d0-fa38-4707-8108-1b66805d84e5 | [-0.042076902204549, 0.024071049535537, -0.006... | 
| 8 | 2023-05-04 15:15:25 | 099cb405-5b2d-4dba-9071-a157ff0dbadc | [-0.052660999834658, 0.030567513644866002, -0.... | 
| 9 | 2023-02-28 10:28:24 | 59b63729-d448-4496-8f36-de26a91e2310 | [-0.048510884983021, 0.034143495114781, -0.026... | 
In [17]:
                Copied!
                
                
            # Preview customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.preview(preview_table)
# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.preview(preview_table)
    
        Out[17]:
| POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice | |
|---|---|---|---|
| 0 | 2022-07-23 15:32:29 | d87d65b8-4f78-41cc-8bd3-0064f83fe4fb | 0.580733 | 
| 1 | 2023-02-28 10:28:24 | 59b63729-d448-4496-8f36-de26a91e2310 | 0.913123 | 
| 2 | 2023-05-25 05:21:12 | cf670b7a-c6bf-4598-b0c0-400378b9cab6 | 0.444495 | 
| 3 | 2023-04-26 19:36:57 | f867935a-d33a-43d1-b3bc-02c539769836 | 0.636809 | 
| 4 | 2023-01-06 14:38:32 | e63c0f14-3530-49e9-b73e-f92594e82663 | 0.869239 | 
| 5 | 2023-06-09 16:38:32 | 6acb20fd-605d-4982-aa39-77054f08103c | 0.518619 | 
| 6 | 2023-03-20 15:08:44 | 6f5299d0-fa38-4707-8108-1b66805d84e5 | 0.875395 | 
| 7 | 2023-05-04 15:15:25 | 099cb405-5b2d-4dba-9071-a157ff0dbadc | 0.926910 | 
| 8 | 2022-11-01 14:32:22 | 8687e2a4-7f97-4442-873c-5c52d74404f8 | 0.533653 | 
| 9 | 2023-06-02 14:24:28 | 0154e4b4-25a4-4276-af72-2826bbc64c31 | 0.794856 | 
Save feature¶
In [18]:
                Copied!
                
                
            # Save feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.save()
# Save feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.save()
    
        Done! |████████████████████████████████████████| 100% in 6.2s (0.16%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s)
Add description¶
In [19]:
                Copied!
                
                
            # Add description
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.update_description(
	"Compare the customer's 4w Mean vector of item "
	"product_ProductGroup_embedding with the customer's most recent "
	"invoice. This comparison is done using the Cosine Similarity metric to"
	" measure how similar these mean vector embeddings are."
)
# Add description
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.update_description(
	"Compare the customer's 4w Mean vector of item "
	"product_ProductGroup_embedding with the customer's most recent "
	"invoice. This comparison is done using the Cosine Similarity metric to"
	" measure how similar these mean vector embeddings are."
)