4. Update Descriptions to Tables
(Optional) Updating descriptions to tables and columns¶
Table and column descriptions are automatically fetched from your Data Warehouse when they are available. If these descriptions are missing or incomplete, you have the option to edit and update them.
While not mandatory, updating concise descriptions to tables and columns can be immensely beneficial if you are using FeatureByte Enterprise. These annotations assist FeatureByte's Feature Ideation engine in generating insightful features.
Much like a data scientist, FeatureByte does its best to grasp the significance and purpose of various tables and columns, discerning their types and more. Based on this understanding, it suggests pertinent aggregations and feature combinations. While FeatureByte can operate effectively without these descriptions, having them certainly enhances the quality of its recommendations.
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Credit Default Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:38:24 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088/ 16:38:24 | INFO | Using profile: tutorial 16:38:24 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:38:24 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:38:24 | INFO | SDK version: 2.1.0.dev113 16:38:24 | INFO | No catalog activated. 16:38:24 | INFO | Catalog activated: Credit Default Dataset SDK Tutorial 16:05:56 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:05:56 | INFO | No catalog activated. 16:05:56 | INFO | Catalog activated: Grocery Dataset Tutorial
Get tables from the catalog first¶
catalog.list_tables()
id | name | type | status | entities | created_at | |
---|---|---|---|---|---|---|
0 | 67c2c752924afe7a79ec6f27 | CONSUMER_INSTALLMENTS | time_series_table | PUBLIC_DRAFT | [Consumer Loan] | 2025-03-01T08:37:38.763000 |
1 | 67c2c750924afe7a79ec6f26 | CONSUMER_LOAN_STATUS | scd_table | PUBLIC_DRAFT | [Consumer Loan, Client] | 2025-03-01T08:37:36.903000 |
2 | 67c2c74e924afe7a79ec6f25 | PRIOR_APPLICATIONS | event_table | PUBLIC_DRAFT | [Prior Application, Client] | 2025-03-01T08:37:34.829000 |
3 | 67c2c74c924afe7a79ec6f24 | NEW_APPLICATION | dimension_table | PUBLIC_DRAFT | [New Application, Client] | 2025-03-01T08:37:32.806000 |
new_application = catalog.get_table("NEW_APPLICATION")
prior_applications = catalog.get_table("PRIOR_APPLICATIONS")
consumer_loans = catalog.get_table("CONSUMER_LOAN_STATUS")
consumer_installments = catalog.get_table("CONSUMER_INSTALLMENTS")
Discover the current descriptions of tables¶
new_application.description
'Records new loan applications.'
prior_applications.description
'Contains data on prior loan applications and the final decision.'
consumer_loans.description
'Tracks consumer loans status.'
consumer_installments.description
'Logs monthly installments for consumer loans at the time of payment.'
Update descriptions of one table¶
new_application.update_description('new loan applications.')
new_application.description
'new loan applications.'
new_application.update_description('application')
Discover the current descriptions of columns for NEW_APPLICATION¶
You can either display all columns together
import pandas as pd
pd.DataFrame(new_application.info(verbose=True)["columns_info"])
name | dtype | entity | semantic | critical_data_info | description | |
---|---|---|---|---|---|---|
0 | BIRTHDATE | VARCHAR | None | None | None | Client birthdate |
1 | CODE_GENDER | VARCHAR | None | None | None | Gender of the client |
2 | INCOME_TYPE | VARCHAR | None | None | None | Clients income type (businessman, working, mat... |
3 | EDUCATION_TYPE | VARCHAR | None | None | None | Level of highest education the client achieved |
4 | OCCUPATION_TYPE | VARCHAR | None | None | None | What kind of occupation does the client have |
5 | ORGANIZATION_TYPE | VARCHAR | None | None | None | Type of organization where client works |
6 | APPLICATION_ID | INT | New Application | dimension_id | None | ID of application |
7 | CLIENT_ID | INT | Client | None | None | ID of the client |
8 | AMT_CREDIT | FLOAT | None | None | None | Credit amount of the loan |
9 | AMT_ANNUITY | FLOAT | None | None | None | Loan annuity |
10 | AMT_GOODS_VALUE | FLOAT | None | None | None | For consumer loans it is the value of the good... |
11 | REGION_POPULATION_RELATIVE | FLOAT | None | None | None | Normalized population of region where client l... |
12 | APPLICATION_TIME | TIMESTAMP | None | None | None | Application timestamp |
13 | DAYS_EMPLOYED | INT | None | None | None | How many days before the application the perso... |
14 | DAYS_REGISTRATION | FLOAT | None | None | None | How many days before the application did clien... |
15 | DAYS_LAST_PHONE_CHANGE | FLOAT | None | None | None | How many days before the application did clien... |
16 | FLOORSMAX_MEDI | FLOAT | None | None | None | Normalized information about building where th... |
17 | LANDAREA_MEDI | FLOAT | None | None | None | Normalized information about building where th... |
18 | FLAG_DOCUMENT_3 | INT | None | None | None | Did client provide document |
19 | AMT_REQ_CREDIT_BUREAU_QRT | FLOAT | None | None | None | Number of enquiries to Credit Bureau about the... |
20 | available_at | TIMESTAMP | None | record_creation_timestamp | None | Timestamp the record was added to the database |
Or display each column one by one
for column in new_application.columns:
print(f"{column}: {new_application[column].description}")
BIRTHDATE: Client birthdate CODE_GENDER: Gender of the client INCOME_TYPE: Clients income type (businessman, working, maternity leave,…) EDUCATION_TYPE: Level of highest education the client achieved OCCUPATION_TYPE: What kind of occupation does the client have ORGANIZATION_TYPE: Type of organization where client works APPLICATION_ID: ID of application CLIENT_ID: ID of the client AMT_CREDIT: Credit amount of the loan AMT_ANNUITY: Loan annuity AMT_GOODS_VALUE: For consumer loans it is the value of the goods for which the loan is given REGION_POPULATION_RELATIVE: Normalized population of region where client lives (higher number means the client lives in more populated region) APPLICATION_TIME: Application timestamp DAYS_EMPLOYED: How many days before the application the person started current employment DAYS_REGISTRATION: How many days before the application did client change his registration DAYS_LAST_PHONE_CHANGE: How many days before the application did client change his phone FLOORSMAX_MEDI: Normalized information about building where the client lives LANDAREA_MEDI: Normalized information about building where the client lives FLAG_DOCUMENT_3: Did client provide document AMT_REQ_CREDIT_BUREAU_QRT: Number of enquiries to Credit Bureau about the client 3 month before application (excluding one month before application) available_at: Timestamp the record was added to the database
Update column descriptions¶
If the description is incorrect or incomplete, you can edit it
# By using the table method: update_column_description
new_application.update_column_description(
"DAYS_EMPLOYED",
"How many days before the application the person started current employment"
)
# Or by using the column method: update_description
new_application.DAYS_EMPLOYED.update_description(
"How many days before the application the person started current employment"
)
That's it for this tutorial. Again, this is an optional step, but it can drastically improve FeatureByte's feature ideation.