A TableColumn object represents a column within a Table object. You can add metadata to TableColumn objects to help with feature engineering, such as tagging columns with entity references or defining default data cleaning operations.
To remove an entity tag, set the entity name to None.
Defining Default Cleaning Operations¶
For a specific column, define an ordered sequence of cleaning operations. Ensure that values imputed in earlier steps are not marked for cleaning in later operations.
Use the following contructors to define each cleaning operation:
MissingValueImputation: Imputes missing values.
DisguisedValueImputation: Imputes disguised values from a list.
UnexpectedValueImputation: Imputes unexpected values not found in a given list.
ValueBeyondEndpointImputation: Imputes numeric or date values outside specified boundaries.
StringValueImputation: Imputes string values.
imputed_value parameter is None, the values to impute are replaced with missing values, and the corresponding rows are ignored during aggregation operations.
To set the default cleaning operations for a column, use the
To list columns in a Table object, use the
To display column specifications, including tagged entity IDs and default cleaning operations, use the
To obtain TableColumn descriptive statistics, use the
By default, statistics and materialization are computed before applying cleaning operations. To include cleaning operations in the output, set the after_cleaning parameter to True:
Table and column descriptions are automatically fetched from your Data Warehouse when they are available. If these descriptions are missing or incomplete, you have the option to edit and update them.
To see a description of a column in a Table object, use the
To update description of a column in a Table object, use the