Skip to content

5. Set Default Cleaning Operations

A crucial step in every data science project is ensuring the data is clean and ready for feature engineering.

To ensure data quality during feature engineering, you can centralize Cleaning Operations at the table level to address issues such as missing values, disguised missing values (missing values that are not explicitly encoded as missing values), or outliers.

Step 1: Locate Columns to Be Cleaned

  1. From the menu, go to the 'Explore' section and access the Table Catalog.

  2. From the Table Catalog, select the table where you want to set cleaning operations.

  3. Go to the 'Columns' tab.

Column Description

Column CDI

Step 2: Set Cleaning Operations

Here is an example how to set cleaning operations for the Amount column.

  1. In the 'Columns' tab, click on the critical data info edit button for Amount.

  2. Apply the following cleaning steps to the Amount column:

    • ignore disguised missing values equal to -99 and -98.
    • cap any amount less than 0 Euro.
    • cap any amount greater than 2000 Euros.
  3. Click the "Apply 3 cleaning steps" button

Column CDI

The newly applied cleaning steps for the Amount column should be visible in the 'Columns' tab.

Column CDI