4. Update Descriptions to Tables
(Optional) Updating descriptions to tables and columns¶
Table and column descriptions are automatically fetched from your Data Warehouse when they are available. If these descriptions are missing or incomplete, you have the option to edit and update them.
While not mandatory, updating concise descriptions to tables and columns can be immensely beneficial if you are using FeatureByte Enterprise. These annotations assist FeatureByte's feature ideation engine in generating insightful features.
Much like a data scientist, FeatureByte does its best to grasp the significance and purpose of various tables and columns, discerning their types and more. Based on this understanding, it suggests pertinent aggregations and feature combinations. While FeatureByte can operate effectively without these descriptions, having them certainly enhances the quality of its recommendations.
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
10:41:46 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088/ 10:41:46 | INFO | Using profile: tutorial 10:41:47 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 10:41:47 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 10:41:47 | INFO | SDK version: 2.0.1.dev67 10:41:47 | INFO | No catalog activated. 10:41:47 | INFO | Catalog activated: Grocery Dataset Tutorial 16:05:56 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:05:56 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:05:56 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:05:56 | INFO | No catalog activated. 16:05:56 | INFO | Catalog activated: Grocery Dataset Tutorial
Get tables from the catalog first:
catalog.list_tables()
id | name | type | status | entities | created_at | |
---|---|---|---|---|---|---|
0 | 676235f9375a850179e42f8f | GROCERYPRODUCT | dimension_table | PUBLIC_DRAFT | [product, productgroup] | 2024-12-18T02:39:53.572000 |
1 | 676235f7375a850179e42f8e | INVOICEITEMS | item_table | PUBLIC_DRAFT | [item, invoice, product] | 2024-12-18T02:39:51.944000 |
2 | 676235f5375a850179e42f8d | GROCERYINVOICE | event_table | PUBLIC_DRAFT | [invoice, customer] | 2024-12-18T02:39:50.049000 |
3 | 676235f3375a850179e42f8c | GROCERYCUSTOMER | scd_table | PUBLIC_DRAFT | [customer, frenchstate] | 2024-12-18T02:39:48.264000 |
customer_table = catalog.get_table("GROCERYCUSTOMER")
invoice_table = catalog.get_table("GROCERYINVOICE")
items_table = catalog.get_table("INVOICEITEMS")
product_table = catalog.get_table("GROCERYPRODUCT")
Let's discover the current descriptions of tables:¶
customer_table.description
'Customer details, including their name, address, and date of birth.'
invoice_table.description
'Grocery invoice details, containing the timestamp and the total amount of the invoice.'
items_table.description
'The grocery item details within each invoice, including the quantity, total cost, discount applied, and product ID.'
product_table.description
'The product group description for each grocery product.'
Let's update descriptions of one table:¶
customer_table.update_description('Customer details')
customer_table.description
'Customer details'
customer_table.update_description('Customer details, including their name, address, and date of birth.')
Let's discover the current descriptions of columns for each table:¶
You can either display all columns together
import pandas as pd
pd.DataFrame(customer_table.info(verbose=True)["columns_info"])
name | dtype | entity | semantic | critical_data_info | description | |
---|---|---|---|---|---|---|
0 | RowID | VARCHAR | None | scd_surrogate_key_id | None | Unique identifier of each row in the table, in... |
1 | GroceryCustomerGuid | VARCHAR | customer | scd_natural_key_id | None | Unique identifier for each customer, in GUID f... |
2 | ValidFrom | TIMESTAMP | None | scd_effective_timestamp | None | GMT timestamp of when this version of a custom... |
3 | Gender | VARCHAR | None | None | None | The customer's gender. Can only have values ma... |
4 | Title | VARCHAR | None | None | None | The customer's title. Can only have values Dr.... |
5 | GivenName | VARCHAR | None | None | None | The customer's given name. |
6 | MiddleInitial | VARCHAR | None | None | None | The first letter of the customer's middle name. |
7 | Surname | VARCHAR | None | None | None | The customer's family or surname. |
8 | StreetAddress | VARCHAR | None | None | None | The customer address's building number and str... |
9 | City | VARCHAR | None | None | None | The city name of the customer's address. |
10 | State | VARCHAR | frenchstate | None | None | The state name of the customer's address. |
11 | PostalCode | VARCHAR | None | None | None | The postal code of the customer's address. Con... |
12 | BrowserUserAgent | VARCHAR | None | None | None | The user agent details of the customer's inter... |
13 | DateOfBirth | TIMESTAMP | None | None | None | The customer's date of birth. |
14 | Latitude | FLOAT | None | None | None | The latitude of the customer's address. |
15 | Longitude | FLOAT | None | None | None | The longitude of the customer's address. |
16 | record_available_at | TIMESTAMP | None | record_creation_timestamp | None | A timestamp for when this row was added to the... |
17 | CurrentRecord | BOOL | None | scd_current_flag | None | Flags whether this version of the customer's d... |
Or display each column one by one
for column in customer_table.columns:
print(f"{column}: {customer_table[column].description}")
RowID: Unique identifier of each row in the table, in GUID format. Uniquely identifies each customer and version combination. GroceryCustomerGuid: Unique identifier for each customer, in GUID format. ValidFrom: GMT timestamp of when this version of a customer's details becomes valid or live. Gender: The customer's gender. Can only have values male or female. Title: The customer's title. Can only have values Dr. Mr. Mrs. or Ms. GivenName: The customer's given name. MiddleInitial: The first letter of the customer's middle name. Surname: The customer's family or surname. StreetAddress: The customer address's building number and street name. City: The city name of the customer's address. State: The state name of the customer's address. PostalCode: The postal code of the customer's address. Contains only digits but is a categorical variable. BrowserUserAgent: The user agent details of the customer's internet browser, including the operating system name and version, and the browser name and version. DateOfBirth: The customer's date of birth. Latitude: The latitude of the customer's address. Longitude: The longitude of the customer's address. record_available_at: A timestamp for when this row was added to the database. CurrentRecord: Flags whether this version of the customer's details is the latest known version.
If the description is incorrect or incomplete, you can edit it
# By using the table method: update_column_description
customer_table.update_column_description(
"RowID",
"Unique identifier of each row in the table, in GUID format. Uniquely identifies each customer and version combination."
)
# Or by using the column method: update_description
customer_table.RowID.update_description(
"Unique identifier of each row in the table, in GUID format. Uniquely identifies each customer and version combination."
)
That's it for this tutorial. Again, this is an optional step, but it can drastically improve FeatureByte's feature ideation.