FeatureByte: A Comprehensive Workflow for Feature Engineering¶
FeatureByte streamlines the end-to-end feature engineering process, from feature creation to deployment and lifecycle management.
Ready to get started? Explore each step and its corresponding tutorials to unlock the full potential of FeatureByte.
Step 1: Define Your Data Model¶
- Create a Catalog: Organize your data into a clear and accessible structure.
- Register Tables: Classify tables as event, item, dimension, or slowly changing dimension.
- Define entities: Identify key entities within your data and tag relevant columns.
- Enhance Data Quality: Set default cleaning operations to ensure data accuracy.
Tutorials
Step 2: Formulate Your Use Case¶
- Identify the Primary Entity: Determine the central focus of your use case.
- Define the Target: Specify the outcome you aim to predict or classify.
- Establish the Context: Define the scope and circumstances in which features are expected to be served.
- Create Observation Sets: Define the specific data points to be used for modeling.
Tutorials
-
UI Tutorials: Formulate Use Case, Create Observation Tables.
-
SDK Tutorials: Formulate Use Case, Create Observation Tables.
Step 3: Create Features¶
Automated Feature Creation:¶
- Use Feature Ideation to generate relevant feature suggestions.
- Evaluate features using EDA and select the most promising ones with various feature selection strategies.
Tutorials
- UI Tutorials: Ideate Features, Refine Ideation
Manual Feature Creation:¶
- Leverage the Python SDK to create custom features.
Tutorials
Step 4: Experiment and Iterate¶
- Build feature lists: Combine selected features into cohesive sets using Feature Ideation, the Feature List Builder, or the SDK.
- Generate Historical Feature Data: Prepare historical data for model training and evaluation.
- Train and Test Models: Iterate on model development and hyperparameter tuning.
- Share Promising features: Collaborate with team members and share valuable insights.
Tutorials
-
UI Tutorials: Create New Feature List, Compute Feature Table.
-
SDK Tutorials: Create Feature Lists, Compute Historical Feature Values.
Step 5: Deploy and Serve Features¶
- Mark features as production-ready: Certify features for deployment.
- Deploy Features and Enable Serving: Make features available for real-time or batch inference.
In catalogs with Approval Flow enabled, deploying features involves:
- Verifying feature compliance with default cleaning operations and feature job settings.
- Checking the status of source tables.
- Backtesting feature job settings to ensure no future training-serving inconsistencies.
- Sharing the feature definition file for review and approval.
This comprehensive process ensures governance and reduces the risk of errors in production.
Step 6: Manage the Feature Life Cycle¶
- Update Feature Job Settings and Cleaning Operations: Adjust configurations when data quality or availability changes.
- Create New Feature Versions: Produce updated versions reflecting the latest settings and mark them as default.
- Monitor Feature Job Status: Regularly review feature job status to track performance and ensure smooth operations.
In catalogs with Approval Flow enabled, changes in table metadata trigger a review process. This ensures:
- New versions of features and feature lists address any data-related issues.
- New deployments use the latest, approved configurations.
- Clear documentation of all changes for compliance and reproducibility.