Data Flows¶
FeatureByte is designed to avoid large-scale outbound data transfers.
This section details the data flows triggered by each key action in FeatureByte.
Create Feature Store¶
Action | MongoDB (FB Service) | Data Source (DWH) | Feature Store (DWH) | SDK |
---|---|---|---|---|
Create Feature Store | Store connection details to DWH and Feature Store definition | - | Create metadata | - |
Once the feature store is set up, FeatureByte will utilize your data warehouse as a:
- data source.
- compute engine to leverage its scalability, stability, and efficiency.
- storage of partial aggregates (tiles) and precomputed feature values to support feature serving.
Create and Activate Catalog¶
Action | MongoDB (FB Service) | Data Source (DWH) | Feature Store (DWH) | SDK |
---|---|---|---|---|
Create Catalog | Store metadata | - | - | Update current catalog state |
Activate Catalog | Read metadata | - | - | Update current catalog state |
FeatureByte's catalogs serve as centralized storage for metadata about various FeatureByte components, such as tables, entities, features, and feature lists related to specific domains.
Register Tables in Catalog¶
Action | MongoDB (FB Service) | Data Source (DWH) | Feature Store (DWH) | SDK |
---|---|---|---|---|
Create Catalog table | Store metadata | Read source table metadata | - | - |
Catalog tables offer a logical representation of source tables, consolidating essential metadata for feature engineering without storing the actual data.
Create Features and Serve Historical Features¶
Action | MongoDB (FB Service) | Data Source (DWH) | Feature Store (DWH) | SDK |
---|---|---|---|---|
Save Feature | Store definition file, logical plan and additional metadata | - | - | Retrieve Definition File |
Historical Serving | Read the logical plan and generate a physical plan. Store metadata on observation and historical feature tables | Read source tables | Generate and store tiles if not found. Read tiles to compute features. Store observation and historical feature tables on demand | Retrieve Training Data on demand |
When using the Python SDK to declare a feature, a feature definition file is generated. This file acts as the authoritative source for the feature and is used to create the final logical execution graph.
When you want to serve historical requests, the graph is converted into a physical plan specific to the platform you are using. To facilitate easy access and lineage tracking, you can store observation sets and historical feature tables in the feature store. However, this is optional. By storing these tables in the feature store, you can avoid having to load large tables in the SDK, which can improve performance.
Deploy and Serve Features Online¶
Action | MongoDB (FB Service) | Data Source (DWH) | Feature Store (DWH) | REST API / SDK |
---|---|---|---|---|
Deploy Feature List | Store metadata. Read the logical plan and generate a physical plan | Read source tables | Scheduled job to generate and store tiles and online features | - |
Online and Batch Serving | Store metadata on batch request and feature tables | - | Read online store. Store batch request and feature tables | Retrieve Serving Data on demand |
The FeatureByte Service manages feature job orchestration from the moment a feature is deployed until its deployment is disabled. This ensures that the feature store consistently contains the most up-to-date values for each feature.
A feature job is a batch process that creates both offline and online partial aggregations (tiles) and feature values for a specific feature before storing them in the feature store. The scheduling of a feature job is based on the feature job setting configuration associated with the respective feature.
Feature values are retrieved from the online store when features are served online or in batch for prediction purposes.