Skip to content

featurebyte.TableColumn.update_critical_data_info

update_critical_data_info(
cleaning_operations: List[Annotated[Union[MissingValueImputation, DisguisedValueImputation, UnexpectedValueImputation, ValueBeyondEndpointImputation, StringValueImputation]]]
)

Description

Associates metadata with the column such as default cleaning operations that automatically apply when views are created from a table. These operations help ensure data consistency and accuracy.

For a specific column, define a sequence of cleaning operations to be executed in order. Ensure that values imputed in earlier steps are not marked for cleaning in subsequent operations.

To set default cleaning operations for a column, use the following class objects in a list:

  • MissingValueImputation: Imputes missing values.
  • DisguisedValueImputation: Imputes disguised values from a list.
  • UnexpectedValueImputation: Imputes unexpected values not found in a given list.
  • ValueBeyondEndpointImputation: Imputes numeric or date values outside specified boundaries.
  • StringValueImputation: Imputes string values.

If the imputed_value parameter is None, the values to impute are replaced with missing values and the corresponding rows are ignored during aggregation operations.

Parameters

  • cleaning_operations: List[Annotated[Union[MissingValueImputation, DisguisedValueImputation, UnexpectedValueImputation, ValueBeyondEndpointImputation, StringValueImputation]]]
    List of cleaning operations to be applied on the column.

Examples

Add missing value imputation & negative value imputation operations to a table column.

>>> event_table = catalog.get_table("GROCERYINVOICE")
>>> event_table["Amount"].update_critical_data_info(
...    cleaning_operations=[
...        fb.MissingValueImputation(imputed_value=0),
...        fb.ValueBeyondEndpointImputation(
...            type="less_than", end_point=0, imputed_value=0
...        ),
...    ]
... )
Show column cleaning operations of the event table.

>>> event_table.column_cleaning_operations
[ColumnCleaningOperation(column_name='Amount', cleaning_operations=[MissingValueImputation(imputed_value=0.0),
ValueBeyondEndpointImputation(imputed_value=0.0, type=less_than, end_point=0.0)])]
Remove cleaning operations and show the column cleaning operations of the event table.

>>> event_table["Amount"].update_critical_data_info(cleaning_operations=[])
>>> event_table.column_cleaning_operations
[]

See Also