11. Derive Similarity Features from Bucketing
Create similarity features¶
In this tutorial we will look into another way of deriving features from other features - creating similarity features.
We will create a feature that compares the customer purchase patterns across product groups to all customers purchase patterns. We will first do bucketing at the customer level and at the overall level. Then we will compare the 2 dictionaries with the cosine similarity.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
10:52:29 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088/ 10:52:29 | INFO | Using profile: tutorial 10:52:29 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 10:52:29 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 10:52:29 | INFO | SDK version: 2.0.1.dev67 10:52:29 | INFO | No catalog activated. 10:52:29 | INFO | Catalog activated: Grocery Dataset Tutorial 16:08:44 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:08:44 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:08:44 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:08:44 | INFO | No catalog activated. 16:08:44 | INFO | Catalog activated: Grocery Dataset Tutorial
In [2]:
Copied!
# Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
# Get view from GROCERYPRODUCT dimension table.
groceryproduct_view = catalog.get_view("GROCERYPRODUCT")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
Join views¶
In [3]:
Copied!
# Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rsuffix="")
# Join GROCERYPRODUCT view to INVOICEITEMS view.
invoiceitems_view = invoiceitems_view.join(groceryproduct_view, rsuffix="")
Create distribution features across |product groups¶
In [4]:
Copied!
# Group INVOICEITEMS view by customer entity (GroceryCustomerGuid) across different ProductGroups.
invoiceitems_view_by_customer_across_productgroup =\
invoiceitems_view.groupby(
['GroceryCustomerGuid'], category="ProductGroup"
)
# Group INVOICEITEMS view by customer entity (GroceryCustomerGuid) across different ProductGroups.
invoiceitems_view_by_customer_across_productgroup =\
invoiceitems_view.groupby(
['GroceryCustomerGuid'], category="ProductGroup"
)
In [5]:
Copied!
# Create Buckets representing the cumulative TotalCost of item, categorized by ProductGroup,
# for the customer over the past 26 weeks.
# The result is presented as a dictionary where the ProductGroup serves as the key
# and its corresponding sum of item TotalCost for the customer forms the value.
feature_group =\
invoiceitems_view_by_customer_across_productgroup.aggregate_over(
"TotalCost", method=fb.AggFunc.SUM,
feature_names=[
"CUSTOMER_item_TotalCost_across_product_ProductGroups_26w"
],
windows=["26w"]
)
# Get CUSTOMER_item_TotalCost_across_product_ProductGroups_26w object from feature group.
customer_item_totalcost_across_product_productgroups_26w =\
feature_group["CUSTOMER_item_TotalCost_across_product_ProductGroups_26w"]
# Create Buckets representing the cumulative TotalCost of item, categorized by ProductGroup,
# for the customer over the past 26 weeks.
# The result is presented as a dictionary where the ProductGroup serves as the key
# and its corresponding sum of item TotalCost for the customer forms the value.
feature_group =\
invoiceitems_view_by_customer_across_productgroup.aggregate_over(
"TotalCost", method=fb.AggFunc.SUM,
feature_names=[
"CUSTOMER_item_TotalCost_across_product_ProductGroups_26w"
],
windows=["26w"]
)
# Get CUSTOMER_item_TotalCost_across_product_ProductGroups_26w object from feature group.
customer_item_totalcost_across_product_productgroups_26w =\
feature_group["CUSTOMER_item_TotalCost_across_product_ProductGroups_26w"]
In [6]:
Copied!
# Group INVOICEITEMS view across different ProductGroups.
invoiceitems_view_by_overall_across_productgroup =\
invoiceitems_view.groupby([], category="ProductGroup")
# Group INVOICEITEMS view across different ProductGroups.
invoiceitems_view_by_overall_across_productgroup =\
invoiceitems_view.groupby([], category="ProductGroup")
In [7]:
Copied!
# Create Buckets representing the cumulative TotalCost of item, categorized by ProductGroup,
# for ALL customers over the past 26 weeks.
# The result is presented as a dictionary where the ProductGroup serves as the key
# and its corresponding sum of item TotalCost forms the value.
feature_group =\
invoiceitems_view_by_overall_across_productgroup.aggregate_over(
"TotalCost", method=fb.AggFunc.SUM,
feature_names=[
"OVERALL_item_TotalCost_across_product_ProductGroups_26w"
],
windows=["26w"]
)
# Get OVERALL_item_TotalCost_across_product_ProductGroups_26w object from feature group.
overall_item_totalcost_across_product_productgroups_26w =\
feature_group["OVERALL_item_TotalCost_across_product_ProductGroups_26w"]
# Create Buckets representing the cumulative TotalCost of item, categorized by ProductGroup,
# for ALL customers over the past 26 weeks.
# The result is presented as a dictionary where the ProductGroup serves as the key
# and its corresponding sum of item TotalCost forms the value.
feature_group =\
invoiceitems_view_by_overall_across_productgroup.aggregate_over(
"TotalCost", method=fb.AggFunc.SUM,
feature_names=[
"OVERALL_item_TotalCost_across_product_ProductGroups_26w"
],
windows=["26w"]
)
# Get OVERALL_item_TotalCost_across_product_ProductGroups_26w object from feature group.
overall_item_totalcost_across_product_productgroups_26w =\
feature_group["OVERALL_item_TotalCost_across_product_ProductGroups_26w"]
Derive Similarity feature across entities¶
In [8]:
Copied!
# Derive Similarity feature from cosine similarity between
# CUSTOMER_item_TotalCost_across_product_ProductGroups_26w
# and OVERALL_item_TotalCost_across_product_ProductGroups_26w
customer_vs_overall_item_totalcost_across_product_productgroups_26w =\
customer_item_totalcost_across_product_productgroups_26w.cd.cosine_similarity(
overall_item_totalcost_across_product_productgroups_26w
)
# Give a name to new feature
customer_vs_overall_item_totalcost_across_product_productgroups_26w.name = \
"CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w"
# Derive Similarity feature from cosine similarity between
# CUSTOMER_item_TotalCost_across_product_ProductGroups_26w
# and OVERALL_item_TotalCost_across_product_ProductGroups_26w
customer_vs_overall_item_totalcost_across_product_productgroups_26w =\
customer_item_totalcost_across_product_productgroups_26w.cd.cosine_similarity(
overall_item_totalcost_across_product_productgroups_26w
)
# Give a name to new feature
customer_vs_overall_item_totalcost_across_product_productgroups_26w.name = \
"CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w"
Preview feature¶
In [9]:
Copied!
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
# Get observation table: 'Preview Table with 10 items'
preview_table = catalog.get_observation_table("Preview Table with 10 items")
In [10]:
Copied!
# Preview CUSTOMER_item_TotalCost_across_product_ProductGroups_26w
customer_item_totalcost_across_product_productgroups_26w.preview(
preview_table
)
# Preview CUSTOMER_item_TotalCost_across_product_ProductGroups_26w
customer_item_totalcost_across_product_productgroups_26w.preview(
preview_table
)
Out[10]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_item_TotalCost_across_product_ProductGroups_26w | |
---|---|---|---|
0 | 2023-03-07 17:53:55 | cb478a4e-9266-4523-8ee0-e205881cc5f5 | {'Adoucissants et Soin du linge': 4.99, 'Aide ... |
1 | 2023-03-21 13:49:48 | 84225bbb-adb8-451e-98c0-897c83c2fad9 | {'Biscuits': 14.44, 'Biscuits apéritifs': 2.23... |
2 | 2023-06-06 16:13:44 | bdc6b6bb-a23a-48c8-bb3a-622d9161c0e8 | {'Aide à la Pâtisserie': 1.58, 'Animalerie, So... |
3 | 2022-08-22 06:10:42 | 9267d0dd-9685-4667-8f06-1761abe73c4d | {'Adoucissants et Soin du linge': 3.98, 'Aide ... |
4 | 2023-05-21 13:59:44 | 989643fe-1377-4f5f-8f38-f349a611da0c | {'Animalerie, Soins et Hygiène': 6.99, 'Autres... |
5 | 2022-09-23 18:09:33 | 459a1b6e-1239-46d1-9e40-539c7e895483 | {'Animalerie, Soins et Hygiène': 18.13, 'Apéri... |
6 | 2023-01-06 08:49:29 | dad86e21-3af4-4b5b-8058-60f946b6dac5 | {'Aide à la Pâtisserie': 2.49, 'Autres Produit... |
7 | 2022-07-23 17:33:25 | 849454c5-6640-419d-871d-1f0895a1c3b4 | {'Boucherie': 8.06, 'Café': 4.99, 'Chips et To... |
8 | 2022-09-07 09:16:15 | 53eac49e-7ecd-4cb6-a1c0-38e3900efd7a | {'Aide à la Pâtisserie': 38.46, 'Autres Produi... |
9 | 2022-12-30 16:42:05 | 4dd1487a-0379-4eef-b200-97ac1bb1164f | {'Adoucissants et Soin du linge': 9.47, 'Aide ... |
In [11]:
Copied!
# Preview OVERALL_item_TotalCost_across_product_ProductGroups_26w
overall_item_totalcost_across_product_productgroups_26w.preview(
preview_table
)
# Preview OVERALL_item_TotalCost_across_product_ProductGroups_26w
overall_item_totalcost_across_product_productgroups_26w.preview(
preview_table
)
Out[11]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | OVERALL_item_TotalCost_across_product_ProductGroups_26w | |
---|---|---|---|
0 | 2022-09-23 18:09:33 | 459a1b6e-1239-46d1-9e40-539c7e895483 | {'Adoucissants et Soin du linge': 1094.28, 'Ai... |
1 | 2022-07-23 17:33:25 | 849454c5-6640-419d-871d-1f0895a1c3b4 | {'Adoucissants et Soin du linge': 1151.91, 'Ai... |
2 | 2022-08-22 06:10:42 | 9267d0dd-9685-4667-8f06-1761abe73c4d | {'Adoucissants et Soin du linge': 1119.6, 'Aid... |
3 | 2023-01-06 08:49:29 | dad86e21-3af4-4b5b-8058-60f946b6dac5 | {'Adoucissants et Soin du linge': 1202.56, 'Ai... |
4 | 2022-09-07 09:16:15 | 53eac49e-7ecd-4cb6-a1c0-38e3900efd7a | {'Adoucissants et Soin du linge': 1113.76, 'Ai... |
5 | 2022-12-30 16:42:05 | 4dd1487a-0379-4eef-b200-97ac1bb1164f | {'Adoucissants et Soin du linge': 1198.86, 'Ai... |
6 | 2023-03-21 13:49:48 | 84225bbb-adb8-451e-98c0-897c83c2fad9 | {'Adoucissants et Soin du linge': 1232.59, 'Ai... |
7 | 2023-05-21 13:59:44 | 989643fe-1377-4f5f-8f38-f349a611da0c | {'Adoucissants et Soin du linge': 1141.46, 'Ai... |
8 | 2023-06-06 16:13:44 | bdc6b6bb-a23a-48c8-bb3a-622d9161c0e8 | {'Adoucissants et Soin du linge': 1231.32, 'Ai... |
9 | 2023-03-07 17:53:55 | cb478a4e-9266-4523-8ee0-e205881cc5f5 | {'Adoucissants et Soin du linge': 1226.21, 'Ai... |
In [12]:
Copied!
# Preview CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w
customer_vs_overall_item_totalcost_across_product_productgroups_26w.preview(
preview_table
)
# Preview CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w
customer_vs_overall_item_totalcost_across_product_productgroups_26w.preview(
preview_table
)
Out[12]:
POINT_IN_TIME | GROCERYINVOICEITEMGUID | CUSTOMER_vs_OVERALL_item_TotalCost_across_product_ProductGroups_26w | |
---|---|---|---|
0 | 2023-03-21 13:49:48 | 84225bbb-adb8-451e-98c0-897c83c2fad9 | 0.803208 |
1 | 2023-03-07 17:53:55 | cb478a4e-9266-4523-8ee0-e205881cc5f5 | 0.677664 |
2 | 2023-05-21 13:59:44 | 989643fe-1377-4f5f-8f38-f349a611da0c | 0.722324 |
3 | 2022-08-22 06:10:42 | 9267d0dd-9685-4667-8f06-1761abe73c4d | 0.809058 |
4 | 2023-06-06 16:13:44 | bdc6b6bb-a23a-48c8-bb3a-622d9161c0e8 | 0.803588 |
5 | 2022-09-23 18:09:33 | 459a1b6e-1239-46d1-9e40-539c7e895483 | 0.781515 |
6 | 2022-07-23 17:33:25 | 849454c5-6640-419d-871d-1f0895a1c3b4 | 0.668405 |
7 | 2023-01-06 08:49:29 | dad86e21-3af4-4b5b-8058-60f946b6dac5 | 0.640181 |
8 | 2022-09-07 09:16:15 | 53eac49e-7ecd-4cb6-a1c0-38e3900efd7a | 0.747673 |
9 | 2022-12-30 16:42:05 | 4dd1487a-0379-4eef-b200-97ac1bb1164f | 0.800607 |
Save feature¶
In [13]:
Copied!
# Save feature
customer_vs_overall_item_totalcost_across_product_productgroups_26w.save()
# Save feature
customer_vs_overall_item_totalcost_across_product_productgroups_26w.save()
Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s)
Add description¶
In [14]:
Copied!
# Add description
customer_vs_overall_item_totalcost_across_product_productgroups_26w.update_description(
"Similarity between the customer and all customers measured by the "
"Cosine Similarity between the Distribution representing the cumulative"
" TotalCost of item, categorized by their respective product's "
"ProductGroup, over 26w for both entities."
)
# Add description
customer_vs_overall_item_totalcost_across_product_productgroups_26w.update_description(
"Similarity between the customer and all customers measured by the "
"Cosine Similarity between the Distribution representing the cumulative"
" TotalCost of item, categorized by their respective product's "
"ProductGroup, over 26w for both entities."
)