3. Register entities
Registering entities represented in grocery dataset¶
In FeatureByte, an "entity" models real-world objects and ideas. These entities often correspond to columns in database tables.
Taking our grocery scenario as an example, we can view "Customer", "Invoice", and "Item" as entities.
To help FeatureByte identify these entities in the data and the columns that represent them, we'll be creating and tagging these entities in this tutorial.
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
15:28:14 | INFO | SDK version: 1.0.2.dev46 15:28:14 | INFO | No catalog activated. 15:28:14 | INFO | Using profile: tutorial 15:28:14 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 15:28:14 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 15:28:14 | INFO | SDK version: 1.0.2.dev46 15:28:14 | INFO | No catalog activated. 15:28:14 | INFO | Catalog activated: Grocery Dataset Tutorial
As previously discussed, we'll establish the following entities:
- customer - an individual shopping at the stores
- invoice - a record of the customer's purchase
- item - individual items on the invoice
- product - products available in the store for purchase
- productgroup - the category or group a product falls under
- frenchstate - a region in France (since our dataset revolves around French grocery stores and their customers)
It's worth noting that the count of entities doesn't necessarily have to align with the number of tables. An entity is a business-oriented term, and multiple entities might be represented within a single table.
When creating an entity, you'll need to define its serving name. This name acts as a unique identifier, particularly during preview or serving requests.
catalog.create_entity(name="customer", serving_names=["GROCERYCUSTOMERGUID"])
catalog.create_entity(name="invoice", serving_names=["GROCERYINVOICEGUID"])
catalog.create_entity(name="item", serving_names=["GROCERYINVOICEITEMGUID"])
catalog.create_entity(name="product", serving_names=["GROCERYPRODUCTGUID"])
catalog.create_entity(name="productgroup", serving_names=["PRODUCTGROUP"])
catalog.create_entity(name="frenchstate", serving_names=["FRENCHSTATE"])
name | frenchstate |
created_at | 2024-04-26 07:28:17 |
updated_at | None |
description | None |
serving_names | ['FRENCHSTATE'] |
catalog_name | Grocery Dataset Tutorial |
Now that we've established the entities, it's time to guide FeatureByte in mapping these entities to the actual data in our tables.
customer_table = catalog.get_table("GROCERYCUSTOMER")
invoice_table = catalog.get_table("GROCERYINVOICE")
items_table = catalog.get_table("INVOICEITEMS")
product_table = catalog.get_table("GROCERYPRODUCT")
# tag the entities for the grocery customer table
customer_table.GroceryCustomerGuid.as_entity("customer")
customer_table.State.as_entity("frenchstate")
# tag the entities for the grocery invoice table
invoice_table.GroceryInvoiceGuid.as_entity("invoice")
invoice_table.GroceryCustomerGuid.as_entity("customer")
# tag the entities for the grocery items table
items_table.GroceryInvoiceItemGuid.as_entity("item")
items_table.GroceryInvoiceGuid.as_entity("invoice")
items_table.GroceryProductGuid.as_entity("product")
# tag the entities for the grocery product table
product_table.GroceryProductGuid.as_entity("product")
product_table.ProductGroup.as_entity("productgroup")
Now, if we list the tables as we did in the previous tutorial, we'll notice that entities have been assigned to each table.
display(catalog.list_tables())
id | name | type | status | entities | created_at | |
---|---|---|---|---|---|---|
0 | 662b577aaa13c89fa14554e3 | GROCERYPRODUCT | dimension_table | PUBLIC_DRAFT | [product, productgroup] | 2024-04-26T07:27:55.127000 |
1 | 662b5778aa13c89fa14554e2 | INVOICEITEMS | item_table | PUBLIC_DRAFT | [item, invoice, product] | 2024-04-26T07:27:52.998000 |
2 | 662b5775aa13c89fa14554e1 | GROCERYINVOICE | event_table | PUBLIC_DRAFT | [invoice, customer] | 2024-04-26T07:27:50.409000 |
3 | 662b5773aa13c89fa14554e0 | GROCERYCUSTOMER | scd_table | PUBLIC_DRAFT | [customer, frenchstate] | 2024-04-26T07:27:48.329000 |
We can also list entities separately:
display(catalog.list_entities())
id | name | serving_names | created_at | |
---|---|---|---|---|
0 | 662b5791228ac8cf5c5c926f | frenchstate | [FRENCHSTATE] | 2024-04-26T07:28:17.817000 |
1 | 662b5791228ac8cf5c5c926e | productgroup | [PRODUCTGROUP] | 2024-04-26T07:28:17.313000 |
2 | 662b5790228ac8cf5c5c926d | product | [GROCERYPRODUCTGUID] | 2024-04-26T07:28:16.770000 |
3 | 662b5790228ac8cf5c5c926c | item | [GROCERYINVOICEITEMGUID] | 2024-04-26T07:28:16.250000 |
4 | 662b578f228ac8cf5c5c926b | invoice | [GROCERYINVOICEGUID] | 2024-04-26T07:28:15.706000 |
5 | 662b578e228ac8cf5c5c926a | customer | [GROCERYCUSTOMERGUID] | 2024-04-26T07:28:15.128000 |
And let's examine the relationships between entities, which FeatureByte has conveniently outlined for us:
display(catalog.list_relationships())
id | relationship_type | entity | related_entity | relation_table | relation_table_type | enabled | created_at | updated_at | |
---|---|---|---|---|---|---|---|---|---|
0 | 662b579bdaab72d046ad9666 | child_parent | product | productgroup | GROCERYPRODUCT | dimension_table | True | 2024-04-26T07:28:27.177000 | None |
1 | 662b5799daab72d046ad9660 | child_parent | item | product | INVOICEITEMS | item_table | True | 2024-04-26T07:28:25.707000 | None |
2 | 662b5799daab72d046ad965b | child_parent | item | invoice | INVOICEITEMS | item_table | True | 2024-04-26T07:28:25.185000 | None |
3 | 662b579872b2fff854399a73 | child_parent | invoice | customer | GROCERYINVOICE | event_table | True | 2024-04-26T07:28:24.061000 | None |
4 | 662b579672b2fff854399a6e | child_parent | customer | frenchstate | GROCERYCUSTOMER | scd_table | True | 2024-04-26T07:28:22.588000 | None |