featurebyte.TableColumn.update_critical_data_info¶
update_critical_data_info(
cleaning_operations: List[Annotated[Union[MissingValueImputation, DisguisedValueImputation, UnexpectedValueImputation, ValueBeyondEndpointImputation, StringValueImputation]]]
)Description¶
Associates metadata with the column such as default cleaning operations that automatically apply when views are created from a table. These operations help ensure data consistency and accuracy.
For a specific column, define a sequence of cleaning operations to be executed in order. Ensure that values imputed in earlier steps are not marked for cleaning in subsequent operations.
To set default cleaning operations for a column, use the following class objects in a list:
MissingValueImputation
: Imputes missing values.DisguisedValueImputation
: Imputes disguised values from a list.UnexpectedValueImputation
: Imputes unexpected values not found in a given list.ValueBeyondEndpointImputation
: Imputes numeric or date values outside specified boundaries.StringValueImputation
: Imputes string values.
If the imputed_value
parameter is None, the values to impute are replaced with missing values and the
corresponding rows are ignored during aggregation operations.
Parameters¶
- cleaning_operations: List[Annotated[Union[MissingValueImputation, DisguisedValueImputation, UnexpectedValueImputation, ValueBeyondEndpointImputation, StringValueImputation]]]
List of cleaning operations to be applied on the column.
Examples¶
Add missing value imputation & negative value imputation operations to a table column.
>>> event_table = catalog.get_table("GROCERYINVOICE")
>>> event_table["Amount"].update_critical_data_info(
... cleaning_operations=[
... fb.MissingValueImputation(imputed_value=0),
... fb.ValueBeyondEndpointImputation(
... type="less_than", end_point=0, imputed_value=0
... ),
... ]
... )