5. Set Default Cleaning Operations
A crucial step in every data science project is ensuring the data is clean and ready for feature engineering.
To ensure data quality during feature engineering, you can centralize Cleaning Operations at the table level to address issues such as missing values, disguised missing values (missing values that are not explicitly encoded as missing values), or outliers.
Step 1: Locate Columns to Be Cleaned¶
-
From the menu, go to the 'Explore' section and access the Table Catalog.
-
From the Table Catalog, select the table where you want to set cleaning operations.
-
Go to the 'Columns' tab.
Step 2: Set Cleaning Operations¶
Here is an example how to set cleaning operations for the Amount
column.
-
In the 'Columns' tab, click on the critical data info edit button for
Amount
. -
Apply the following cleaning steps to the
Amount
column:- ignore disguised missing values equal to -99 and -98.
- cap any amount less than 0 Euro.
- cap any amount greater than 2000 Euros.
-
Click the "Apply 3 cleaning steps" button
The newly applied cleaning steps for the Amount column should be visible in the 'Columns' tab.