Skip to content

5. Set Default Cleaning Operations

A critical step in any data science project is ensuring the data is clean and prepared for feature engineering.

To maintain data quality during feature engineering, you can centralize Cleaning Operations at the table level. This approach allows you to effectively address common issues, such as missing values, disguised missing values (e.g., those not explicitly labeled as missing), and outliers.


Step 1: Locate Columns to Be Cleaned

  1. From the menu, go to the 'Explore' section and access the Table Catalog.

  2. From the Table Catalog, select the table where you want to set cleaning operations.

  3. Go to the 'Columns' tab.

Column Description

Column CDI


Step 2: Set Cleaning Operations

Here is an example how to set cleaning operations for the Amount column in the GROCERYINVOICE table.

  1. In the 'Columns' tab, click on the critical data info edit button for Amount.
  2. Apply the following cleaning steps to the Amount column:

    • Ignore disguised missing values equal to -99 and -98.
    • Cap any amount less than 0 Euro.
    • Cap any amount greater than 2000 Euros.
  3. Click the Save 3 cleaning steps button

Column CDI

The newly applied cleaning steps for the Amount column should be visible in the 'Columns' tab.

Column CDI