{ "cells": [ { "cell_type": "markdown", "id": "8064a52e", "metadata": {}, "source": [ "### Use embeddings\n", "\n", "In this tutorial, we'll use product group embeddings to compare a customer's latest invoice with their past purchases from the last 26 weeks.\n", "\n", "To learn how to create a SQL Embedding User-Defined Function (UDF), check out the ['Bring Your Own Transformer' tutorials](https://docs.featurebyte.com/latest/get_started/bring_your_own_transformer/overview/).\n", "\n", "For our hosted tutorials, we have pre-configured a SQL UDF using the SBERT Transformer model on our Snowflake data warehouse. We'll register this UDF in the Catalog and apply it to analyze the ProductGroup descriptions." ] }, { "cell_type": "code", "execution_count": 1, "id": "f517d2ac-9dca-47a7-80d9-121269e43bf6", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:03.033820Z", "iopub.status.busy": "2024-06-12T08:09:03.033733Z", "iopub.status.idle": "2024-06-12T08:09:06.652491Z", "shell.execute_reply": "2024-06-12T08:09:06.652142Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32;20m16:09:06\u001b[0m | \u001b[1m\u001b[33;20mWARNING \u001b[0m\u001b[0m | \u001b[1m\u001b[33;20mService endpoint is inaccessible: http://featurebyte-server:8088\u001b[0m\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32;20m16:09:06\u001b[0m | \u001b[1m\u001b[38;20mINFO \u001b[0m\u001b[0m | \u001b[1m\u001b[38;20mUsing profile: tutorial\u001b[0m\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32;20m16:09:06\u001b[0m | \u001b[1m\u001b[38;20mINFO \u001b[0m\u001b[0m | \u001b[1m\u001b[38;20mUsing configuration file at: /Users/gxav/.featurebyte/config.yaml\u001b[0m\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32;20m16:09:06\u001b[0m | \u001b[1m\u001b[38;20mINFO \u001b[0m\u001b[0m | \u001b[1m\u001b[38;20mActive profile: tutorial (https://tutorials.featurebyte.com/api/v1)\u001b[0m\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32;20m16:09:06\u001b[0m | \u001b[1m\u001b[33;20mWARNING \u001b[0m\u001b[0m | \u001b[1m\u001b[33;20mRemote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior.\u001b[0m\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32;20m16:09:06\u001b[0m | \u001b[1m\u001b[38;20mINFO \u001b[0m\u001b[0m | \u001b[1m\u001b[38;20mNo catalog activated.\u001b[0m\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32;20m16:09:06\u001b[0m | \u001b[1m\u001b[38;20mINFO \u001b[0m\u001b[0m | \u001b[1m\u001b[38;20mCatalog activated: Grocery Dataset Tutorial\u001b[0m\u001b[0m\n" ] } ], "source": [ "import featurebyte as fb\n", "\n", "# Set your profile to the tutorial environment\n", "fb.use_profile(\"tutorial\")\n", "\n", "catalog_name = \"Grocery Dataset Tutorial\"\n", "catalog = fb.Catalog.activate(catalog_name) " ] }, { "cell_type": "markdown", "id": "0a49a221-6061-4df7-8fc1-65fc07450ef5", "metadata": {}, "source": [ "#### Register the F_SBERT_EMBEDDING UDF to the Catalog" ] }, { "cell_type": "code", "execution_count": 2, "id": "b573c7ba-8321-4a4c-9c32-80591b9982b3", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:06.654607Z", "iopub.status.busy": "2024-06-12T08:09:06.654443Z", "iopub.status.idle": "2024-06-12T08:09:48.769811Z", "shell.execute_reply": "2024-06-12T08:09:48.769535Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t
User Defined Function
\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t
nameembedding
created_at2024-06-12 08:09:06
updated_atNone
descriptionNone
sql_function_nameF_SBERT_EMBEDDING
function_parameters\n", "\t\t\t\t
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedtypedefault_valuetest_value
0xVARCHARNoneNone
\n", "\t\t\t
signatureembedding(x: str) -> array
output_dtypeARRAY
feature_store_nameplayground
used_by_features[]
\n", "\n" ], "text/plain": [ "\n", "{\n", " 'name': 'embedding',\n", " 'created_at': '2024-06-12T08:09:06.796000',\n", " 'updated_at': None,\n", " 'description': None,\n", " 'sql_function_name': 'F_SBERT_EMBEDDING',\n", " 'function_parameters': [\n", " {\n", " 'name': 'x',\n", " 'dtype': 'VARCHAR',\n", " 'default_value': None,\n", " 'test_value': None\n", " }\n", " ],\n", " 'signature': 'embedding(x: str) -> array',\n", " 'output_dtype': 'ARRAY',\n", " 'feature_store_name': 'playground',\n", " 'used_by_features': []\n", "}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fb.UserDefinedFunction.create(\n", " name='embedding', \n", " sql_function_name='F_SBERT_EMBEDDING',\n", " function_parameters=[fb.FunctionParameter(name=\"x\", dtype=fb.enum.DBVarType.VARCHAR)],\n", " output_dtype=fb.enum.DBVarType.ARRAY,\n", " is_global=False,\n", ")" ] }, { "cell_type": "markdown", "id": "34700ca5", "metadata": {}, "source": [ "#### Apply the embedding UDF instance to ProductGroup" ] }, { "cell_type": "code", "execution_count": 3, "id": "dab4b6ee", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:48.771611Z", "iopub.status.busy": "2024-06-12T08:09:48.771485Z", "iopub.status.idle": "2024-06-12T08:09:48.840881Z", "shell.execute_reply": "2024-06-12T08:09:48.840583Z" } }, "outputs": [], "source": [ "# Get embedding UDF instance.\n", "embedding_udf = catalog.get_user_defined_function(\n", " \"embedding\"\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "id": "f4fb79a3", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:48.842705Z", "iopub.status.busy": "2024-06-12T08:09:48.842619Z", "iopub.status.idle": "2024-06-12T08:09:49.098842Z", "shell.execute_reply": "2024-06-12T08:09:49.098514Z" } }, "outputs": [], "source": [ "# Get view from GROCERYPRODUCT dimension table.\n", "groceryproduct_view = catalog.get_view(\"GROCERYPRODUCT\")\n", "# Apply embedding to ProductGroup column in GROCERYPRODUCT view.\n", "groceryproduct_view[\"ProductGroup_embedding\"] = embedding_udf(groceryproduct_view[\"ProductGroup\"])" ] }, { "cell_type": "markdown", "id": "16bf1c08-4ab7-408e-ae84-d13bf5bcc52a", "metadata": {}, "source": [ "#### Get other views" ] }, { "cell_type": "code", "execution_count": 5, "id": "438c3a84-789e-47d5-8ab6-5311334cba9f", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.100844Z", "iopub.status.busy": "2024-06-12T08:09:49.100735Z", "iopub.status.idle": "2024-06-12T08:09:49.631374Z", "shell.execute_reply": "2024-06-12T08:09:49.631069Z" } }, "outputs": [], "source": [ "# Get view from GROCERYINVOICE event table.\n", "groceryinvoice_view = catalog.get_view(\"GROCERYINVOICE\")\n", "# Get view from INVOICEITEMS item table.\n", "invoiceitems_view = catalog.get_view(\"INVOICEITEMS\")" ] }, { "cell_type": "markdown", "id": "8648a1a8", "metadata": {}, "source": [ "#### Join views" ] }, { "cell_type": "code", "execution_count": 6, "id": "674516f5", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.633356Z", "iopub.status.busy": "2024-06-12T08:09:49.633241Z", "iopub.status.idle": "2024-06-12T08:09:49.640371Z", "shell.execute_reply": "2024-06-12T08:09:49.640105Z" } }, "outputs": [], "source": [ "# Join GROCERYPRODUCT view to INVOICEITEMS view.\n", "invoiceitems_view = invoiceitems_view.join(\n", " groceryproduct_view, rprefix=\"product_\"\n", ")" ] }, { "cell_type": "markdown", "id": "2d44265e", "metadata": {}, "source": [ "#### Get the mean vector of an invoice's Product Group descriptions" ] }, { "cell_type": "code", "execution_count": 7, "id": "534e7824", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.641890Z", "iopub.status.busy": "2024-06-12T08:09:49.641800Z", "iopub.status.idle": "2024-06-12T08:09:49.714184Z", "shell.execute_reply": "2024-06-12T08:09:49.713892Z" } }, "outputs": [], "source": [ "# Group invoiceitems_view by invoice entity (GroceryInvoiceGuid).\n", "invoiceitems_view_by_invoice =\\\n", "invoiceitems_view.groupby(\"GroceryInvoiceGuid\")" ] }, { "cell_type": "code", "execution_count": 8, "id": "cc1e5d60", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.715926Z", "iopub.status.busy": "2024-06-12T08:09:49.715834Z", "iopub.status.idle": "2024-06-12T08:09:49.725810Z", "shell.execute_reply": "2024-06-12T08:09:49.725557Z" } }, "outputs": [], "source": [ "# Mean vector of product_ProductGroup_embedding for the invoice.\n", "invoice_mean_vector_of_item_product_productgroup_embedding =\\\n", "invoiceitems_view_by_invoice.aggregate(\n", " \"product_ProductGroup_embedding\", method=fb.AggFunc.AVG,\n", " feature_name=\"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\"\n", ")" ] }, { "cell_type": "markdown", "id": "7fdd0664", "metadata": {}, "source": [ "#### Get the mean vector of the Customer's latest invoice" ] }, { "cell_type": "code", "execution_count": 9, "id": "2a639685-84e2-460b-b5a9-1de92999302b", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.727490Z", "iopub.status.busy": "2024-06-12T08:09:49.727404Z", "iopub.status.idle": "2024-06-12T08:09:49.801958Z", "shell.execute_reply": "2024-06-12T08:09:49.801675Z" } }, "outputs": [], "source": [ "# Add INVOICE_Mean_vector_of_item_product_ProductGroup_embedding feature to the GROCERYINVOICE view\n", "# as a column.\n", "groceryinvoice_view =\\\n", "groceryinvoice_view.add_feature(\n", " \"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\",\n", " invoice_mean_vector_of_item_product_productgroup_embedding\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "id": "e6d1b9d7", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.803778Z", "iopub.status.busy": "2024-06-12T08:09:49.803686Z", "iopub.status.idle": "2024-06-12T08:09:49.862677Z", "shell.execute_reply": "2024-06-12T08:09:49.862390Z" } }, "outputs": [], "source": [ "# Group GROCERYINVOICE view by customer entity (GroceryCustomerGuid).\n", "groceryinvoice_view_by_customer =\\\n", "groceryinvoice_view.groupby(['GroceryCustomerGuid'])" ] }, { "cell_type": "code", "execution_count": 11, "id": "2fb0c207", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.864455Z", "iopub.status.busy": "2024-06-12T08:09:49.864365Z", "iopub.status.idle": "2024-06-12T08:09:49.875319Z", "shell.execute_reply": "2024-06-12T08:09:49.875049Z" } }, "outputs": [], "source": [ "# Get Latest Mean vector of item product_ProductGroup_embedding for the customer\n", "customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding =\\\n", "groceryinvoice_view_by_customer.aggregate_over(\n", " \"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\", method=\"latest\",\n", " feature_names=[\"CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\"],\n", " windows=[None]\n", ")[\"CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\"]" ] }, { "cell_type": "markdown", "id": "e42c60d0", "metadata": {}, "source": [ "#### Get the mean vector for the Customer's Product Group descriptions over past 26 weeks" ] }, { "cell_type": "code", "execution_count": 12, "id": "211d7d36", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.876941Z", "iopub.status.busy": "2024-06-12T08:09:49.876850Z", "iopub.status.idle": "2024-06-12T08:09:49.878849Z", "shell.execute_reply": "2024-06-12T08:09:49.878559Z" } }, "outputs": [], "source": [ "# Group INVOICEITEMS view by customer entity (GroceryCustomerGuid).\n", "invoiceitems_view_by_customer =\\\n", "invoiceitems_view.groupby(['GroceryCustomerGuid'])" ] }, { "cell_type": "code", "execution_count": 13, "id": "fd6f7194", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.880353Z", "iopub.status.busy": "2024-06-12T08:09:49.880263Z", "iopub.status.idle": "2024-06-12T08:09:49.890525Z", "shell.execute_reply": "2024-06-12T08:09:49.890250Z" } }, "outputs": [], "source": [ "# Get Mean vector of product_ProductGroup_embedding for the customer over time.\n", "feature_group =\\\n", "invoiceitems_view_by_customer.aggregate_over(\n", " \"product_ProductGroup_embedding\", method=\"avg\",\n", " feature_names=[\n", " \"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w\"\n", " ],\n", " windows=[\"26w\"],\n", ")\n", "# Get CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w object from feature group.\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w =\\\n", "feature_group[\"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w\"]" ] }, { "cell_type": "markdown", "id": "ac744656", "metadata": {}, "source": [ "#### Derive Similarity between latest invoice and 26 weeks purchases" ] }, { "cell_type": "code", "execution_count": 14, "id": "1ce1c900", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.892108Z", "iopub.status.busy": "2024-06-12T08:09:49.892028Z", "iopub.status.idle": "2024-06-12T08:09:49.894126Z", "shell.execute_reply": "2024-06-12T08:09:49.893865Z" } }, "outputs": [], "source": [ "# Derive Similarity feature from cosine similarity between\n", "# CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\n", "# and CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice =\\\n", "customer_latest_invoice_mean_vector_of_item_product_productgroup_embedding.vec.cosine_similarity(\n", " customer_mean_vector_of_item_product_productgroup_embedding_26w\n", ")\n", "# Give a name to new feature\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.name = \\\n", "\"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice\"" ] }, { "cell_type": "markdown", "id": "27dd5363-1864-483e-bde5-7da2b0414cd9", "metadata": {}, "source": [ "#### Preview feature" ] }, { "cell_type": "code", "execution_count": 15, "id": "ecb3752c-e9a8-4f41-895b-8e6f4a060726", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.895753Z", "iopub.status.busy": "2024-06-12T08:09:49.895668Z", "iopub.status.idle": "2024-06-12T08:09:49.971607Z", "shell.execute_reply": "2024-06-12T08:09:49.971279Z" } }, "outputs": [], "source": [ "# Get observation table: 'Preview Table with 10 items'\n", "preview_table = catalog.get_observation_table(\"Preview Table with 10 items\")" ] }, { "cell_type": "code", "execution_count": 16, "id": "9c2f93bb-bf1d-46be-bfa9-ca91a43595b1", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:09:49.973410Z", "iopub.status.busy": "2024-06-12T08:09:49.973323Z", "iopub.status.idle": "2024-06-12T08:10:32.732971Z", "shell.execute_reply": "2024-06-12T08:10:32.732669Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
POINT_IN_TIMEGROCERYINVOICEITEMGUIDCUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w
02023-04-11 17:23:576084f39f-9d2c-4111-b1cc-502e1559c0c0[-0.049583778360973, 0.040724096369544, -0.019...
12023-02-07 11:04:26fd1caae1-77e6-4667-8c83-df13f05bf2f5[-0.047664110747538006, 0.040520216686427, -0....
22023-03-17 11:15:091b627a25-7eb4-4f61-b243-c93db487bff0[-0.05412369494863901, 0.028804376621762003, -...
32022-09-18 18:52:36ac7edfb5-63ed-49fb-9b89-76b0288ed2f8[-0.051115894625539005, 0.033894273326497006, ...
42023-05-28 19:27:1415973b2f-2256-4caa-b65b-cbbfdff0905b[-0.046703376632649, 0.035018777767219005, -0....
52022-12-26 15:01:07264f79fd-c24a-47cc-8a68-fe3753a4d74b[-0.057210504562501006, 0.026236914188101004, ...
62023-03-31 18:50:00213ef7d3-c27b-43e0-bc0a-57d6c7c254b0[-0.049429968671181004, 0.033228590529173, -0....
72022-12-10 21:08:2677d02174-f1e1-41c1-9fb9-01c6246b0009[-0.051941385508354004, 0.032673704067397, -0....
82022-08-17 19:13:5240a07ca4-a991-4d21-b5cf-74ee61220f96[-0.0460019625995, 0.045009310261646, -0.02764...
92023-05-05 08:00:4257ca0770-eb8b-4769-8e67-eb1b7cc0a934[-0.05695966097050201, 0.03538753235396, -0.01...
\n", "
" ], "text/plain": [ " POINT_IN_TIME GROCERYINVOICEITEMGUID \\\n", "0 2023-04-11 17:23:57 6084f39f-9d2c-4111-b1cc-502e1559c0c0 \n", "1 2023-02-07 11:04:26 fd1caae1-77e6-4667-8c83-df13f05bf2f5 \n", "2 2023-03-17 11:15:09 1b627a25-7eb4-4f61-b243-c93db487bff0 \n", "3 2022-09-18 18:52:36 ac7edfb5-63ed-49fb-9b89-76b0288ed2f8 \n", "4 2023-05-28 19:27:14 15973b2f-2256-4caa-b65b-cbbfdff0905b \n", "5 2022-12-26 15:01:07 264f79fd-c24a-47cc-8a68-fe3753a4d74b \n", "6 2023-03-31 18:50:00 213ef7d3-c27b-43e0-bc0a-57d6c7c254b0 \n", "7 2022-12-10 21:08:26 77d02174-f1e1-41c1-9fb9-01c6246b0009 \n", "8 2022-08-17 19:13:52 40a07ca4-a991-4d21-b5cf-74ee61220f96 \n", "9 2023-05-05 08:00:42 57ca0770-eb8b-4769-8e67-eb1b7cc0a934 \n", "\n", " CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w \n", "0 [-0.049583778360973, 0.040724096369544, -0.019... \n", "1 [-0.047664110747538006, 0.040520216686427, -0.... \n", "2 [-0.05412369494863901, 0.028804376621762003, -... \n", "3 [-0.051115894625539005, 0.033894273326497006, ... \n", "4 [-0.046703376632649, 0.035018777767219005, -0.... \n", "5 [-0.057210504562501006, 0.026236914188101004, ... \n", "6 [-0.049429968671181004, 0.033228590529173, -0.... \n", "7 [-0.051941385508354004, 0.032673704067397, -0.... \n", "8 [-0.0460019625995, 0.045009310261646, -0.02764... \n", "9 [-0.05695966097050201, 0.03538753235396, -0.01... " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w.preview(\n", " preview_table\n", ")" ] }, { "cell_type": "code", "execution_count": 17, "id": "cb6f83a1-35b8-4c27-a0b2-e92a1c52c252", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:10:32.734949Z", "iopub.status.busy": "2024-06-12T08:10:32.734811Z", "iopub.status.idle": "2024-06-12T08:11:32.726395Z", "shell.execute_reply": "2024-06-12T08:11:32.725980Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
POINT_IN_TIMEGROCERYINVOICEITEMGUIDCUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice
02023-02-07 11:04:26fd1caae1-77e6-4667-8c83-df13f05bf2f50.901249
12023-05-28 19:27:1415973b2f-2256-4caa-b65b-cbbfdff0905b0.577883
22022-09-18 18:52:36ac7edfb5-63ed-49fb-9b89-76b0288ed2f80.955820
32023-03-31 18:50:00213ef7d3-c27b-43e0-bc0a-57d6c7c254b00.943237
42022-12-26 15:01:07264f79fd-c24a-47cc-8a68-fe3753a4d74b0.927726
52023-04-11 17:23:576084f39f-9d2c-4111-b1cc-502e1559c0c00.802008
62022-08-17 19:13:5240a07ca4-a991-4d21-b5cf-74ee61220f960.892769
72022-12-10 21:08:2677d02174-f1e1-41c1-9fb9-01c6246b00090.912434
82023-03-17 11:15:091b627a25-7eb4-4f61-b243-c93db487bff00.758240
92023-05-05 08:00:4257ca0770-eb8b-4769-8e67-eb1b7cc0a9340.843593
\n", "
" ], "text/plain": [ " POINT_IN_TIME GROCERYINVOICEITEMGUID \\\n", "0 2023-02-07 11:04:26 fd1caae1-77e6-4667-8c83-df13f05bf2f5 \n", "1 2023-05-28 19:27:14 15973b2f-2256-4caa-b65b-cbbfdff0905b \n", "2 2022-09-18 18:52:36 ac7edfb5-63ed-49fb-9b89-76b0288ed2f8 \n", "3 2023-03-31 18:50:00 213ef7d3-c27b-43e0-bc0a-57d6c7c254b0 \n", "4 2022-12-26 15:01:07 264f79fd-c24a-47cc-8a68-fe3753a4d74b \n", "5 2023-04-11 17:23:57 6084f39f-9d2c-4111-b1cc-502e1559c0c0 \n", "6 2022-08-17 19:13:52 40a07ca4-a991-4d21-b5cf-74ee61220f96 \n", "7 2022-12-10 21:08:26 77d02174-f1e1-41c1-9fb9-01c6246b0009 \n", "8 2023-03-17 11:15:09 1b627a25-7eb4-4f61-b243-c93db487bff0 \n", "9 2023-05-05 08:00:42 57ca0770-eb8b-4769-8e67-eb1b7cc0a934 \n", "\n", " CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice \n", "0 0.901249 \n", "1 0.577883 \n", "2 0.955820 \n", "3 0.943237 \n", "4 0.927726 \n", "5 0.802008 \n", "6 0.892769 \n", "7 0.912434 \n", "8 0.758240 \n", "9 0.843593 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Preview customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.preview(\n", " preview_table\n", ")" ] }, { "cell_type": "markdown", "id": "a9049185", "metadata": {}, "source": [ "#### Save feature" ] }, { "cell_type": "code", "execution_count": 18, "id": "7188bf6d", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:11:32.730452Z", "iopub.status.busy": "2024-06-12T08:11:32.730316Z", "iopub.status.idle": "2024-06-12T08:11:39.592370Z", "shell.execute_reply": "2024-06-12T08:11:39.592084Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▁▃▅ 0% in 0s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▂▄▆ 0% in 0s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▃▅▇ 0% in 0s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▄▆█ 0% in 0s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▅▇▇ 0% in 0s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▆█▆ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▇▇▅ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | █▆▄ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▇▅▃ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▆▄▂ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▅▃▁ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▄▂▂ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▃▁▃ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▂▂▄ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▁▃▅ 0% in 1s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▂▄▆ 0% in 2s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▃▅▇ 0% in 2s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... | | ▄▆█ 0% in 2s (~0s, 0.0%/s)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 2s (~0s, 0.5%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 2s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 3s (~0s, 0.4%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 3s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 4s (~0s, 0.3%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 4s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Working... |████████████████████████████████████████| ▆█▆ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 5s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▇▇ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆█▆ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▇▅ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| █▆▄ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▇▅▃ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▆▄▂ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▅▃▁ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▂▂ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▁▃ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▂▄ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▁▃▅ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▂▄▆ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▃▅▇ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Working... |████████████████████████████████████████| ▄▆█ 100% in 6s (~0s, 0.2%/" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) " ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Save feature\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.save()" ] }, { "cell_type": "markdown", "id": "a624a7d9", "metadata": {}, "source": [ "#### As always, add description and view definition file" ] }, { "cell_type": "code", "execution_count": 19, "id": "d5068efa", "metadata": { "execution": { "iopub.execute_input": "2024-06-12T08:11:39.594267Z", "iopub.status.busy": "2024-06-12T08:11:39.594177Z", "iopub.status.idle": "2024-06-12T08:11:39.771353Z", "shell.execute_reply": "2024-06-12T08:11:39.771109Z" } }, "outputs": [ { "data": { "text/html": [ "
# Generated by SDK version: 1.1.0.dev7\n",
       "from bson import ObjectId\n",
       "from featurebyte import DimensionTable\n",
       "from featurebyte import EventTable\n",
       "from featurebyte import FeatureJobSetting\n",
       "from featurebyte import ItemTable\n",
       "from featurebyte import UserDefinedFunction\n",
       "\n",
       "\n",
       "# dimension_table name: "GROCERYPRODUCT"\n",
       "dimension_table = DimensionTable.get_by_id(ObjectId("666956c78080c62d0dc616e2"))\n",
       "dimension_view = dimension_table.get_view(\n",
       "    view_mode="manual", drop_column_names=[], column_cleaning_operations=[]\n",
       ")\n",
       "col = dimension_view["ProductGroup"]\n",
       "\n",
       "# udf_name: embedding, sql_function_name: F_SBERT_EMBEDDING\n",
       "udf_embedding = UserDefinedFunction.get_by_id(\n",
       "    ObjectId("666957a23fab5208644858ad")\n",
       ")\n",
       "col_1 = udf_embedding(col)\n",
       "view = dimension_view.copy()\n",
       "view["ProductGroup_embedding"] = col_1\n",
       "\n",
       "# item_table name: "INVOICEITEMS", event_table name: "GROCERYINVOICE"\n",
       "item_table = ItemTable.get_by_id(ObjectId("666956c58080c62d0dc616e1"))\n",
       "item_view = item_table.get_view(\n",
       "    event_suffix=None,\n",
       "    view_mode="manual",\n",
       "    drop_column_names=["record_available_at"],\n",
       "    column_cleaning_operations=[],\n",
       "    event_drop_column_names=["record_available_at"],\n",
       "    event_column_cleaning_operations=[],\n",
       "    event_join_column_names=[\n",
       "        "Timestamp",\n",
       "        "GroceryInvoiceGuid",\n",
       "        "GroceryCustomerGuid",\n",
       "        "tz_offset",\n",
       "    ],\n",
       ")\n",
       "joined_view = item_view.join(\n",
       "    view, on="GroceryProductGuid", how="left", rsuffix="", rprefix="product_"\n",
       ")\n",
       "grouped = joined_view.groupby(\n",
       "    by_keys=["GroceryCustomerGuid"], category=None\n",
       ").aggregate_over(\n",
       "    value_column="product_ProductGroup_embedding",\n",
       "    method="avg",\n",
       "    windows=["26w"],\n",
       "    feature_names=[\n",
       "        "CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"\n",
       "    ],\n",
       "    feature_job_setting=FeatureJobSetting(\n",
       "        blind_spot="120s", period="3600s", offset="120s"\n",
       "    ),\n",
       "    skip_fill_na=True,\n",
       "    offset=None,\n",
       ")\n",
       "feat = grouped[\n",
       "    "CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w"\n",
       "]\n",
       "feat_1 = joined_view.groupby(\n",
       "    by_keys=["GroceryInvoiceGuid"], category=None\n",
       ").aggregate(\n",
       "    value_column="product_ProductGroup_embedding",\n",
       "    method="avg",\n",
       "    feature_name="INVOICE_Mean_vector_of_item_product_ProductGroup_embedding",\n",
       "    skip_fill_na=True,\n",
       ")\n",
       "\n",
       "# event_table name: "GROCERYINVOICE"\n",
       "event_table = EventTable.get_by_id(ObjectId("666956c38080c62d0dc616e0"))\n",
       "event_view = event_table.get_view(\n",
       "    view_mode="manual",\n",
       "    drop_column_names=["record_available_at"],\n",
       "    column_cleaning_operations=[],\n",
       ")\n",
       "joined_view_1 = event_view.add_feature(\n",
       "    new_column_name="INVOICE_Mean_vector_of_item_product_ProductGroup_embedding",\n",
       "    feature=feat_1,\n",
       "    entity_column="GroceryInvoiceGuid",\n",
       ")\n",
       "grouped_1 = joined_view_1.groupby(\n",
       "    by_keys=["GroceryCustomerGuid"], category=None\n",
       ").aggregate_over(\n",
       "    value_column="INVOICE_Mean_vector_of_item_product_ProductGroup_embedding",\n",
       "    method="latest",\n",
       "    windows=[None],\n",
       "    feature_names=[\n",
       "        "CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"\n",
       "    ],\n",
       "    feature_job_setting=FeatureJobSetting(\n",
       "        blind_spot="120s", period="3600s", offset="120s"\n",
       "    ),\n",
       "    skip_fill_na=True,\n",
       "    offset=None,\n",
       ")\n",
       "feat_2 = grouped_1[\n",
       "    "CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding"\n",
       "]\n",
       "feat_3 = feat_2.vec.cosine_similarity(other=feat)\n",
       "feat_3.name = "CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice"\n",
       "output = feat_3\n",
       "output.save(_id=ObjectId("666957cd3fab5208644858b2"))\n",
       "
\n", "
" ], "text/plain": [ "'# Generated by SDK version: 1.1.0.dev7\\nfrom bson import ObjectId\\nfrom featurebyte import DimensionTable\\nfrom featurebyte import EventTable\\nfrom featurebyte import FeatureJobSetting\\nfrom featurebyte import ItemTable\\nfrom featurebyte import UserDefinedFunction\\n\\n\\n# dimension_table name: \"GROCERYPRODUCT\"\\ndimension_table = DimensionTable.get_by_id(ObjectId(\"666956c78080c62d0dc616e2\"))\\ndimension_view = dimension_table.get_view(\\n view_mode=\"manual\", drop_column_names=[], column_cleaning_operations=[]\\n)\\ncol = dimension_view[\"ProductGroup\"]\\n\\n# udf_name: embedding, sql_function_name: F_SBERT_EMBEDDING\\nudf_embedding = UserDefinedFunction.get_by_id(\\n ObjectId(\"666957a23fab5208644858ad\")\\n)\\ncol_1 = udf_embedding(col)\\nview = dimension_view.copy()\\nview[\"ProductGroup_embedding\"] = col_1\\n\\n# item_table name: \"INVOICEITEMS\", event_table name: \"GROCERYINVOICE\"\\nitem_table = ItemTable.get_by_id(ObjectId(\"666956c58080c62d0dc616e1\"))\\nitem_view = item_table.get_view(\\n event_suffix=None,\\n view_mode=\"manual\",\\n drop_column_names=[\"record_available_at\"],\\n column_cleaning_operations=[],\\n event_drop_column_names=[\"record_available_at\"],\\n event_column_cleaning_operations=[],\\n event_join_column_names=[\\n \"Timestamp\",\\n \"GroceryInvoiceGuid\",\\n \"GroceryCustomerGuid\",\\n \"tz_offset\",\\n ],\\n)\\njoined_view = item_view.join(\\n view, on=\"GroceryProductGuid\", how=\"left\", rsuffix=\"\", rprefix=\"product_\"\\n)\\ngrouped = joined_view.groupby(\\n by_keys=[\"GroceryCustomerGuid\"], category=None\\n).aggregate_over(\\n value_column=\"product_ProductGroup_embedding\",\\n method=\"avg\",\\n windows=[\"26w\"],\\n feature_names=[\\n \"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w\"\\n ],\\n feature_job_setting=FeatureJobSetting(\\n blind_spot=\"120s\", period=\"3600s\", offset=\"120s\"\\n ),\\n skip_fill_na=True,\\n offset=None,\\n)\\nfeat = grouped[\\n \"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w\"\\n]\\nfeat_1 = joined_view.groupby(\\n by_keys=[\"GroceryInvoiceGuid\"], category=None\\n).aggregate(\\n value_column=\"product_ProductGroup_embedding\",\\n method=\"avg\",\\n feature_name=\"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\",\\n skip_fill_na=True,\\n)\\n\\n# event_table name: \"GROCERYINVOICE\"\\nevent_table = EventTable.get_by_id(ObjectId(\"666956c38080c62d0dc616e0\"))\\nevent_view = event_table.get_view(\\n view_mode=\"manual\",\\n drop_column_names=[\"record_available_at\"],\\n column_cleaning_operations=[],\\n)\\njoined_view_1 = event_view.add_feature(\\n new_column_name=\"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\",\\n feature=feat_1,\\n entity_column=\"GroceryInvoiceGuid\",\\n)\\ngrouped_1 = joined_view_1.groupby(\\n by_keys=[\"GroceryCustomerGuid\"], category=None\\n).aggregate_over(\\n value_column=\"INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\",\\n method=\"latest\",\\n windows=[None],\\n feature_names=[\\n \"CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\"\\n ],\\n feature_job_setting=FeatureJobSetting(\\n blind_spot=\"120s\", period=\"3600s\", offset=\"120s\"\\n ),\\n skip_fill_na=True,\\n offset=None,\\n)\\nfeat_2 = grouped_1[\\n \"CUSTOMER_Latest_INVOICE_Mean_vector_of_item_product_ProductGroup_embedding\"\\n]\\nfeat_3 = feat_2.vec.cosine_similarity(other=feat)\\nfeat_3.name = \"CUSTOMER_Mean_vector_of_item_product_ProductGroup_embedding_26w_vs_latest_invoice\"\\noutput = feat_3\\noutput.save(_id=ObjectId(\"666957cd3fab5208644858b2\"))\\n'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add description\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.update_description(\n", "\t\"Compare the customer's 4w Mean vector of item \"\n", "\t\"product_ProductGroup_embedding with the customer's most recent \"\n", "\t\"invoice. This comparison is done using the Cosine Similarity metric to\"\n", "\t\" measure how similar these mean vector embeddings are.\"\n", ")\n", "# See feature definition file\n", "customer_mean_vector_of_item_product_productgroup_embedding_26w_vs_latest_invoice.definition" ] }, { "cell_type": "markdown", "id": "720a8a33-c432-41a4-a99b-9cdd986350f0", "metadata": {}, "source": [ "### Concepts in this tutorial\n", "- [View Joins](https://docs.featurebyte.com/latest/about/glossary/#view-join)\n", "- [UDF Transforms](https://docs.featurebyte.com/latest/about/glossary/#udf-transforms)\n", "- [Feature Transforms](https://docs.featurebyte.com/latest/about/glossary/#feature-transforms)\n", "\n", "#### SDK reference for\n", "- [Get UDF instance](https://docs.featurebyte.com/latest/reference/featurebyte.api.catalog.Catalog.get_user_defined_function/)\n", "- [Add an aggregation by invoice to the event view](https://docs.featurebyte.com/latest/reference/featurebyte.api.event_view.EventView.add_feature/)\n", "- [Feature.vec.cosine_similarity()](https://docs.featurebyte.com/latest/reference/featurebyte.Feature.vec.cosine_similarity/)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 5 }