Skip to content

featurebyte.ColumnCleaningOperation

class ColumnCleaningOperation(
*,
column_name: str,
cleaning_operations: Sequence[Annotated[Union[MissingValueImputation, DisguisedValueImputation, UnexpectedValueImputation, ValueBeyondEndpointImputation, StringValueImputation]]]
)

Description

The ColumnCleaningOperation object serves as a link between a table column and a specific cleaning configuration. It is utilized when creating a view in the manual mode that requires different configurations per colum. The column_cleaning_operations parameter takes a list of these configurations. For each configuration, the ColumnCleaningOperation object establishes the relationship between the colum involved and the corresponding cleaning operations.

Parameters

  • column_name: str
    Name of the column that requires cleaning. The cleaning operations specified in the second parameter will be applied to this column.

  • cleaning_operations: Sequence[Annotated[Union[MissingValueImputation, DisguisedValueImputation, UnexpectedValueImputation, ValueBeyondEndpointImputation, StringValueImputation]]]
    Sequence (e.g., list) of cleaning operations that will be applied to the specified column. Each cleaning operation is an instance of one of the five classes that perform specific cleaning tasks on the data. When the cleaning_operations are executed, they will be applied to the specified column in the order that they appear in the list. Ensure that values imputed in earlier steps are not marked for cleaning in later operations.

Examples

Check table cleaning operation of this feature first:

>>> feature = catalog.get_feature("InvoiceAmountAvg_60days")
>>> feature.info()["table_cleaning_operation"]
{'this': [], 'default': []}

Create a new version of a feature with different table cleaning operations:

>>> new_feature = feature.create_new_version(
...   table_cleaning_operations=[
...     fb.TableCleaningOperation(
...       table_name="GROCERYINVOICE",
...       column_cleaning_operations=[
...         fb.ColumnCleaningOperation(
...           column_name="Amount",
...           cleaning_operations=[fb.MissingValueImputation(imputed_value=0.0)],
...         )
...       ],
...     )
...   ]
... )