INVOICE Avg ProductGroup Embedding
Aggregate embedding features¶
Embedding features can also be aggregated using average, max, etc. methods. This is usually called average pooling or max pooling.
This can be useful when computing a representation of average grocery invoice product set.
In [1]:
Copied!
import featurebyte as fb
fb.use_profile("tutorial")
import featurebyte as fb
fb.use_profile("tutorial")
18:59:48 | INFO | Using configuration file at: /Users/viktor/.featurebyte/config.yaml 18:59:48 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 18:59:48 | WARNING | Remote SDK version (0.5.1.dev70) is different from local (0.5.1.dev63). Update local SDK to avoid unexpected behavior. 18:59:48 | INFO | No catalog activated. 18:59:48 | INFO | 10 feature lists, 59 features deployed 18:59:48 | INFO | Using profile: tutorial 18:59:49 | INFO | Using configuration file at: /Users/viktor/.featurebyte/config.yaml 18:59:49 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 18:59:49 | WARNING | Remote SDK version (0.5.1.dev70) is different from local (0.5.1.dev63). Update local SDK to avoid unexpected behavior. 18:59:49 | INFO | No catalog activated. 18:59:49 | INFO | 10 feature lists, 59 features deployed
In [2]:
Copied!
catalog = fb.Catalog.activate("Grocery Dataset Tutorial")
catalog = fb.Catalog.activate("Grocery Dataset Tutorial")
18:59:50 | INFO | Catalog activated: Grocery Dataset Tutorial
Get previously defined UDF function.¶
Function, which was previosuly created via:
embedding_udf = fb.UserDefinedFunction.create(
name='embedding',
sql_function_name='F_SBERT_EMBEDDING',
function_parameters=[fb.FunctionParameter(name="x", dtype=fb.enum.DBVarType.VARCHAR)],
output_dtype=fb.enum.DBVarType.ARRAY,
is_global=False,
)
can be instantiated as following
In [3]:
Copied!
embedding_udf = fb.UserDefinedFunction.get("embedding")
embedding_udf = fb.UserDefinedFunction.get("embedding")
Get views¶
In [4]:
Copied!
# Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
# Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
Run embedding UDF on the ProductGroup column¶
In [5]:
Copied!
groceryproduct_view["ProductGroupEmbedding"] = embedding_udf(groceryproduct_view["ProductGroup"])
groceryproduct_view["ProductGroupEmbedding"] = embedding_udf(groceryproduct_view["ProductGroup"])
Join views¶
In [6]:
Copied!
# Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rsuffix="")
# Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rsuffix="")
Calculate average pooling representation of the product in the basket¶
In [7]:
Copied!
ivoice_product_group_embedding = invoiceitems_view.groupby("GroceryInvoiceGuid").aggregate(
"ProductGroupEmbedding",
method=fb.AggFunc.AVG,
feature_name="INVOICE_Average_ProductGroup_Embedding",
)
ivoice_product_group_embedding = invoiceitems_view.groupby("GroceryInvoiceGuid").aggregate(
"ProductGroupEmbedding",
method=fb.AggFunc.AVG,
feature_name="INVOICE_Average_ProductGroup_Embedding",
)
Get previosuly created observation table¶
Which can be created via:
observation_table = invoiceitems_view.create_observation_table(
name="Preview tables with Invoice Items",
sample_rows=10,
columns=["Timestamp", "GroceryInvoiceItemGuid"],
columns_rename_mapping={
"Timestamp": "POINT_IN_TIME",
"GroceryInvoiceItemGuid": "GROCERYINVOICEITEMGUID",
},
)
In [8]:
Copied!
observation_table = catalog.get_observation_table("Preview tables with Invoice Items")
ivoice_product_group_embedding.preview(observation_table.to_pandas())
observation_table = catalog.get_observation_table("Preview tables with Invoice Items")
ivoice_product_group_embedding.preview(observation_table.to_pandas())
Downloading table |████████████████████████████████████████| 10/10 [100%] in 0.1
Out[8]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | INVOICE_Average_ProductGroup_Embedding | |
---|---|---|---|
0 | 2022-02-27 12:19:06 | d307efab-fc40-4b16-be88-2d13b70d8903 | [-0.05745803168172, 0.029082955069968, -0.0123... |
1 | 2022-03-19 13:17:52 | e42054bf-fc35-4248-a279-16dc7ac8efa5 | [-0.033861739560962, 0.035966096445918, -0.005... |
2 | 2022-04-13 19:50:26 | b3afcbe1-dd98-41f3-bb79-9133bd316dae | [-0.025993689894676, 0.035225141793489005, -0.... |
3 | 2022-09-27 12:46:45 | 48156112-a586-42ee-ab1d-fc53629f438a | [-0.046910252094015004, 0.028634541244669003, ... |
4 | 2022-11-04 12:15:04 | 23563a05-76be-44e7-9fe6-6bb4e4d11d2b | [-0.050647039982406, 0.016964060931721002, 0.0... |
5 | 2023-03-09 12:15:30 | 57c7d176-f2e3-48d4-94a8-d7f3fbc726a3 | [-0.05835969094187, -0.021431716857478002, -0.... |
6 | 2023-04-20 12:17:35 | 20779d9e-69c3-4135-a42b-6e7a10819136 | [-0.034671933725149005, 0.04667551081889, -0.0... |
7 | 2023-06-12 14:14:32 | 8717d1fa-6708-4a49-b022-b21f89d5060b | [-0.063282540440559, 0.007787061482668001, -0.... |
8 | 2023-07-03 13:14:46 | 4b00f6c0-0913-4608-b0f2-f344ad57481b | [-0.042198220500723006, -0.001129888463765, -0... |
9 | 2023-08-09 11:17:46 | c851f86c-f52e-4b55-8311-b62f55be6945 | [-0.051229094560056004, 0.07612699148771601, -... |
Save feature and view definition file¶
In [9]:
Copied!
ivoice_product_group_embedding.save()
ivoice_product_group_embedding.save()
Done! |████████████████████████████████████████| 100% in 6.5s (0.16%/s)
In [10]:
Copied!
# Add description
ivoice_product_group_embedding.update_description(
"Average product group embedding in grocery customer invoice"
)
# See feature definition file
ivoice_product_group_embedding.definition
# Add description
ivoice_product_group_embedding.update_description(
"Average product group embedding in grocery customer invoice"
)
# See feature definition file
ivoice_product_group_embedding.definition
Out[10]:
# Generated by SDK version: 0.5.1.dev70
from bson import ObjectId
from featurebyte import DimensionTable
from featurebyte import ItemTable
from featurebyte import UserDefinedFunction
# dimension_table name: "GROCERYPRODUCT"
dimension_table = DimensionTable.get_by_id(ObjectId("6553b0012c001f76d263e059"))
dimension_view = dimension_table.get_view(
view_mode="manual", drop_column_names=[], column_cleaning_operations=[]
)
col = dimension_view["ProductGroup"]
# udf_name: embedding, sql_function_name: F_SBERT_EMBEDDING
udf_embedding = UserDefinedFunction.get_by_id(
ObjectId("6553b511850516ee23c8b734")
)
col_1 = udf_embedding(col)
view = dimension_view.copy()
view["ProductGroupEmbedding"] = col_1
# item_table name: "INVOICEITEMS", event_table name: "GROCERYINVOICE"
item_table = ItemTable.get_by_id(ObjectId("6553b0002c001f76d263e058"))
item_view = item_table.get_view(
event_suffix=None,
view_mode="manual",
drop_column_names=["record_available_at"],
column_cleaning_operations=[],
event_drop_column_names=["record_available_at"],
event_column_cleaning_operations=[],
event_join_column_names=[
"Timestamp",
"GroceryInvoiceGuid",
"GroceryCustomerGuid",
"tz_offset",
],
)
joined_view = item_view.join(
view, on="GroceryProductGuid", how="left", rsuffix="", rprefix=""
)
feat = joined_view.groupby(
by_keys=["GroceryInvoiceGuid"], category=None
).aggregate(
value_column="ProductGroupEmbedding",
method="avg",
feature_name="INVOICE_Average_ProductGroup_Embedding",
skip_fill_na=True,
)
output = feat
output.save(_id=ObjectId("6553b59ee582cfd133727770"))
In [ ]:
Copied!