5. Set Default Cleaning Operations
A crucial step in every data science project is ensuring the data is clean and ready for feature engineering.
To ensure data quality during feature engineering, you can centralize Cleaning Operations at the table level to address issues such as missing values, disguised missing values (missing values that are not explicitly encoded as missing values), or outliers.
Step 1: Locate Columns to Be Cleaned¶
-
From the menu, go to the 'Explore' section and access the Table Catalog.
-
From the Table Catalog, select the table where you want to set cleaning operations.
-
Go to the 'Columns' tab.
Step 2: Set Cleaning Operations¶
Here is an example how to set cleaning operations for the Amount
column in the GROCERYINVOICE table.
-
In the 'Columns' tab, click on the critical data info edit button for
Amount
. -
Apply the following cleaning steps to the
Amount
column: -
ignore disguised missing values equal to
-99
and-98
. - cap any amount
less than 0 Euro
. -
cap any amount
greater than 2000 Euros
. -
Click the button
The newly applied cleaning steps for the Amount column should be visible in the 'Columns' tab.