Skip to content

5. Update Descriptions and Tag Semantics

What is Semantic Tagging?

Semantic tagging is the process of associating each column with a semantic type.

A semantic type defines a column's meaning, expected values, and suitable feature engineering operations. By linking columns to semantic types, you ensure that data is properly transformed, aggregated, and utilized for analysis and machine learning.

FeatureByte automates semantic tagging through Generative AI, which analyzes column metadata and suggests an appropriate semantic type. However, you can manually assign or refine these types at the table level. If needed, semantic tags can also be overwritten during Feature Ideation to fine-tune feature engineering strategies.

By structuring this process within a data ontology, FeatureByte enables a systematic approach to selecting relevant feature engineering techniques while minimizing manual effort.

Why it is important?

Accurate descriptions and semantic tagging of data fields and tables are essential for enhancing Feature Ideation’s recommendations, enabling more relevant data aggregations, transforms, filters, and feature combinations. While Feature Ideation can operate without descriptions, including them leads to better feature selection and model performance.


Step 1: Update Table Descriptions

Note

This step is optional if your Data Warehouse already includes table descriptions.

  1. From the menu, navigate to the 'Explore' section and open the Table Catalog. If you're on the Table Diagram page, click Table Catalog to return to the Table Catalog. Table Catalog

  2. Verify the following table descriptions:

    Table Description
    GROCERYCUSTOMER Customer details, including their name, address, and date of birth
    GROCERYINVOICE Grocery invoice details, containing the timestamp and the total amount of the invoice
    INVOICEITEMS Details of grocery product items within each invoice, including quantity, total cost, discount applied, and product ID
    GROCERYPRODUCT Product group description for each grocery product
  3. Edit the table's description if needed:

    • Select the table from the Table Catalog and navigate to the 'About' tab.
    • Update the description by clicking Table Catalog next to the description field.

    Table Description Table Description Edit


Step 2: Update Column Descriptions

Note

This step is optional if your Data Warehouse already includes column descriptions.

  1. Select a table from the Table Catalog and navigate to the 'Columns' tab.
  2. Update the description by clicking Table Catalog next to the description field of the column.

Column Description Edit


Step 3: Tag Semantics

Note

Semantic Tagging is not required at the table level, as Feature Ideation will automatically infer and fill in missing semantic tags.

However, it is a best practice to verify that column descriptions are accurate and manually assign semantic types at the table level when needed.

In this tutorial, we will leave the columns semantically untagged. If you want to tag semantics at the table level, follow those steps:

  1. Select one table from the Table Catalog.
  2. Go to the 'Columns' tab.
  3. Click Run Semantic Type Detection to run semantic tagging.
  4. Review the suggestions provided. Accept, adjust, or leave the column semantically untagged.

Semantic Tagging Suggestions

Which Semantic Type Should You Focus On?

When working with different table types, pay close attention to specific semantic types, as they influence filtering strategies, data aggregation, and feature engineering choices.

In Event Table and Time Series Table, check out the event_type (categorization of events based on their primary purpose or nature) and event_status (state, condition, or outcome of an event) semantic types. These columns will guide event-based filtering strategies.

In a Slowly Changing Dimension Table, check out the termination_timestamp and termination_date semantic types that indicate when an entity is actively terminated, sometimes prematurely. These columns determine how active entities are aggregated and when terminated entities should be analyzed.

For all tables, check out:

  • the non_additive_numeric semantic types (numeric values where direct addition is not meaningful). Understanding these columns prevents incorrect sum operations.
  • the automated non_informative semantic type (column with constant value). This may indicate problems in your data.
  • the not_to_use semantic type (sensitive, personal, operational, or non-reliable data that should not be used). This decides whether feature engineering should be operated for those columns.
  • the ambiguous_numeric (column that combines different units or scales) and ambiguous_categorical (column that does not provide unique information by itself) semantic types. These columns may require prior manual transformations before being used by feature engineering.

By carefully reviewing these semantic types, you can enhance feature selection and ensure high-quality transformations for machine learning.