12. Use Embeddings
Use embeddings¶
In this tutorial, we'll use product group embeddings to compare a customer's latest invoice with their past purchases from the last 26 weeks.
To learn how to create a SQL Embedding User-Defined Function (UDF), check out the 'Bring Your Own Transformer' tutorials.
For our hosted tutorials, we have pre-configured a SQL UDF using the SBERT Transformer model on our Snowflake data warehouse. We'll register this UDF in the Catalog and apply it to analyze the ProductGroup descriptions.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
17:48:20 | INFO | SDK version: 3.3.1 17:48:20 | INFO | No catalog activated. 17:48:20 | INFO | Using profile: tutorial 17:48:20 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 17:48:20 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 17:48:20 | INFO | SDK version: 3.3.1 17:48:20 | INFO | No catalog activated. 17:48:20 | INFO | Catalog activated: Grocery Dataset SDK Tutorial 16:09:06 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:09:06 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:09:06 | INFO | No catalog activated. 16:09:06 | INFO | Catalog activated: Grocery Dataset Tutorial
Register the F_SBERT_EMBEDDING UDF to the Catalog¶
In [2]:
Copied!
fb.UserDefinedFunction.create(
name='embedding',
sql_function_name='F_SBERT_EMBEDDING',
function_parameters=[fb.FunctionParameter(name="x", dtype=fb.enum.DBVarType.VARCHAR)],
output_dtype=fb.enum.DBVarType.EMBEDDING,
is_global=False,
)
fb.UserDefinedFunction.create(
name='embedding',
sql_function_name='F_SBERT_EMBEDDING',
function_parameters=[fb.FunctionParameter(name="x", dtype=fb.enum.DBVarType.VARCHAR)],
output_dtype=fb.enum.DBVarType.EMBEDDING,
is_global=False,
)
Out[2]:
User Defined Function
| name | embedding | ||||||||||
| created_at | 2025-12-04 09:48:21 | ||||||||||
| updated_at | None | ||||||||||
| description | None | ||||||||||
| sql_function_name | F_SBERT_EMBEDDING | ||||||||||
| function_parameters |
|
||||||||||
| signature | embedding(x: str) -> embedding | ||||||||||
| output_dtype | EMBEDDING | ||||||||||
| feature_store_name | playground | ||||||||||
| used_by_features | [] |
Apply the embedding UDF instance to ProductGroup¶
In [3]:
Copied!
# Get embedding UDF instance.
embedding_udf = catalog.get_user_defined_function("embedding")
# Get embedding UDF instance.
embedding_udf = catalog.get_user_defined_function("embedding")
In [4]:
Copied!
# Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Apply embedding to ProductGroup column in GROCERYPRODUCT view.
groceryproduct_view["ProductGroup_embedding"] = embedding_udf(groceryproduct_view["ProductGroup"])
# Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Apply embedding to ProductGroup column in GROCERYPRODUCT view.
groceryproduct_view["ProductGroup_embedding"] = embedding_udf(groceryproduct_view["ProductGroup"])
Get other views¶
In [5]:
Copied!
# Get view from GROCERYINVOICE event table.
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
# Get view from GROCERYINVOICE event table.
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
Join views¶
In [6]:
Copied!
# Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rprefix="product_")
# Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rprefix="product_")
Get the mean vector of an invoice's Product Group descriptions¶
In [7]:
Copied!
# Group invoiceitems_view by invoice entity (GroceryInvoiceGuid).
invoiceitems_view_by_invoice = invoiceitems_view.groupby("GroceryInvoiceGuid")
# Group invoiceitems_view by invoice entity (GroceryInvoiceGuid).
invoiceitems_view_by_invoice = invoiceitems_view.groupby("GroceryInvoiceGuid")
In [8]:
Copied!
# Mean vector of product_ProductGroup_embedding for the invoice.
invoice_mean_vector_of_item_product_productgroup_embedding = invoiceitems_view_by_invoice.aggregate(
"product_ProductGroup_embedding", method=fb.AggFunc.AVG,
feature_name="INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"
)
# Mean vector of product_ProductGroup_embedding for the invoice.
invoice_mean_vector_of_item_product_productgroup_embedding = invoiceitems_view_by_invoice.aggregate(
"product_ProductGroup_embedding", method=fb.AggFunc.AVG,
feature_name="INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"
)
Get the mean vector of the Customer's latest invoice¶
In [9]:
Copied!
# Add INVOICE_Mean_vector_of_item_product_ProductGroup_embedding feature to the GROCERYINVOICE view as a column.
groceryinvoice_view = groceryinvoice_view.add_feature(
"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding",
invoice_mean_vector_of_item_product_productgroup_embedding
)
# Add INVOICE_Mean_vector_of_item_product_ProductGroup_embedding feature to the GROCERYINVOICE view as a column.
groceryinvoice_view = groceryinvoice_view.add_feature(
"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding",
invoice_mean_vector_of_item_product_productgroup_embedding
)
In [10]:
Copied!
# Group GROCERYINVOICE view by customer entity (GroceryCustomerGuid).
groceryinvoice_view_by_customer = groceryinvoice_view.groupby(['GroceryCustomerGuid'])
# Group GROCERYINVOICE view by customer entity (GroceryCustomerGuid).
groceryinvoice_view_by_customer = groceryinvoice_view.groupby(['GroceryCustomerGuid'])
In [11]:
Copied!
# Get Latest Mean vector of item product_ProductGroup_embedding for the customer
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding =\
groceryinvoice_view_by_customer.aggregate_over(
"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding", method="latest",
feature_names=["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"],
windows=[None]
)["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"]
# Get Latest Mean vector of item product_ProductGroup_embedding for the customer
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding =\
groceryinvoice_view_by_customer.aggregate_over(
"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding", method="latest",
feature_names=["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"],
windows=[None]
)["CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"]
Get the mean vector for the Customer's Product Group descriptions over past 26 weeks¶
In [12]:
Copied!
# Group INVOICEITEMS view by customer entity (GroceryCustomerGuid).
invoiceitems_view_by_customer = invoiceitems_view.groupby(['GroceryCustomerGuid'])
# Group INVOICEITEMS view by customer entity (GroceryCustomerGuid).
invoiceitems_view_by_customer = invoiceitems_view.groupby(['GroceryCustomerGuid'])
In [13]:
Copied!
# Get Mean vector of product_ProductGroup_embedding for the customer over time.
feature_group = invoiceitems_view_by_customer.aggregate_over(
"product_ProductGroup_embedding", method="avg",
feature_names=[
"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"
],
windows=["26w"],
)
# Get CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w object from feature group.
customer_mean_vector_of_item_product_productgroup_embedding_26w =\
feature_group["CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"]
# Get Mean vector of product_ProductGroup_embedding for the customer over time.
feature_group = invoiceitems_view_by_customer.aggregate_over(
"product_ProductGroup_embedding", method="avg",
feature_names=[
"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"
],
windows=["26w"],
)
# Get CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w object from feature group.
customer_mean_vector_of_item_product_productgroup_embedding_26w =\
feature_group["CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"]
Derive Similarity between latest invoice and 26 weeks purchases¶
In [14]:
Copied!
# Derive Similarity feature from cosine similarity between
# CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding
# and CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice =\
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding.vec.cosine_similarity(
customer_mean_vector_of_item_product_productgroup_embedding_26w
)
# Give a name to new feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.name = \
"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice"
# Derive Similarity feature from cosine similarity between
# CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding
# and CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice =\
customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding.vec.cosine_similarity(
customer_mean_vector_of_item_product_productgroup_embedding_26w
)
# Give a name to new feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.name = \
"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice"
Preview feature¶
In [15]:
Copied!
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
In [16]:
Copied!
# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w.preview(preview_table)
# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w
customer_mean_vector_of_item_product_productgroup_embedding_26w.preview(preview_table)
Out[16]:
| POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w | |
|---|---|---|---|
| 0 | 2023-02-14 11:48:38 | 889a2767-dbee-4ac0-a030-f2b921f0867c | [-0.052484318513864, 0.05171371964983101, -0.0... |
| 1 | 2023-06-26 15:05:25 | a3ac8a5e-8d9b-40b4-a835-38ccce024110 | [-0.049744474463834, 0.038873286350486, -0.009... |
| 2 | 2022-10-03 20:03:17 | 20941a03-d666-4718-851c-5016640f2c87 | [-0.042910263044031005, 0.034346470350607006, ... |
| 3 | 2023-03-24 09:58:13 | 9c7bb568-3237-4115-8e4b-ad1e2855299a | [-0.052150995857493006, 0.037196888150099, -0.... |
| 4 | 2023-03-27 14:11:18 | f0f67ecc-38a0-4ddf-99a4-a3d531b00aa9 | [-0.062560791503272, 0.029942942868309, -0.009... |
| 5 | 2022-09-30 17:39:57 | a4a2b3f5-69fb-403b-8868-bc360b500add | [-0.047326317238654006, 0.030740150702946, -0.... |
| 6 | 2022-07-26 09:12:59 | 75c6652a-fb95-4d36-aad1-e9a38ad2e4b0 | [-0.048205366552721, 0.025369158073321, -0.016... |
| 7 | 2022-12-05 17:43:17 | c2dc525f-9f21-4b98-87e8-8ded5f0bdd37 | [-0.05604952121268, 0.041603376320935005, -0.0... |
| 8 | 2023-06-14 08:04:37 | 7b7b5ae1-4a84-449b-88f0-37bd54bc5c2c | [-0.052355102333821006, 0.018552341389964002, ... |
| 9 | 2022-09-11 11:55:15 | 0a9b41ff-baec-4e30-b4da-b7ade2a14176 | [-0.053599380285148004, 0.044373493109329004, ... |
In [17]:
Copied!
# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.preview(preview_table)
# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.preview(preview_table)
Out[17]:
| POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice | |
|---|---|---|---|
| 0 | 2023-03-24 09:58:13 | 9c7bb568-3237-4115-8e4b-ad1e2855299a | 0.781048 |
| 1 | 2023-06-26 15:05:25 | a3ac8a5e-8d9b-40b4-a835-38ccce024110 | 0.925759 |
| 2 | 2022-07-26 09:12:59 | 75c6652a-fb95-4d36-aad1-e9a38ad2e4b0 | 0.828945 |
| 3 | 2022-09-30 17:39:57 | a4a2b3f5-69fb-403b-8868-bc360b500add | 0.718581 |
| 4 | 2023-06-14 08:04:37 | 7b7b5ae1-4a84-449b-88f0-37bd54bc5c2c | 0.895320 |
| 5 | 2022-12-05 17:43:17 | c2dc525f-9f21-4b98-87e8-8ded5f0bdd37 | 0.713184 |
| 6 | 2022-10-03 20:03:17 | 20941a03-d666-4718-851c-5016640f2c87 | 0.493135 |
| 7 | 2023-03-27 14:11:18 | f0f67ecc-38a0-4ddf-99a4-a3d531b00aa9 | 0.532114 |
| 8 | 2023-02-14 11:48:38 | 889a2767-dbee-4ac0-a030-f2b921f0867c | 0.596505 |
| 9 | 2022-09-11 11:55:15 | 0a9b41ff-baec-4e30-b4da-b7ade2a14176 | 0.898229 |
Save feature¶
In [18]:
Copied!
# Save feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.save()
# Save feature
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.save()
Done! |████████████████████████████████████████| 100% in 6.2s (0.16%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s)
Add description¶
In [19]:
Copied!
# Add description
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.update_description(
"Compare the customer's 4w Mean vector of item "
"product_ProductGroup_embedding with the customer's most recent "
"invoice. This comparison is done using the Cosine Similarity metric to"
" measure how similar these mean vector embeddings are."
)
# Add description
customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.update_description(
"Compare the customer's 4w Mean vector of item "
"product_ProductGroup_embedding with the customer's most recent "
"invoice. This comparison is done using the Cosine Similarity metric to"
" measure how similar these mean vector embeddings are."
)