5. Set Default Cleaning Operations
A critical step in any data science project is ensuring the data is clean and prepared for feature engineering.
To maintain data quality during feature engineering, you can centralize Cleaning Operations at the table level. This approach allows you to effectively address common issues, such as missing values, disguised missing values (e.g., those not explicitly labeled as missing), and outliers.
Step 1: Locate Columns to Be Cleaned¶
-
From the menu, go to the 'Explore' section and access the Table Catalog.
-
From the Table Catalog, select the table where you want to set cleaning operations.
-
Go to the 'Columns' tab.
Step 2: Set Cleaning Operations¶
Here is an example how to set cleaning operations for the Amount
column in the GROCERYINVOICE table.
- In the 'Columns' tab, click on the critical data info edit button for
Amount
. -
Apply the following cleaning steps to the
Amount
column:- Ignore disguised missing values equal to
-99
and-98
. - Cap any amount
less than 0 Euro
. - Cap any amount
greater than 2000 Euros
.
- Ignore disguised missing values equal to
-
Click the button
The newly applied cleaning steps for the Amount column should be visible in the 'Columns' tab.