FeatureByte Python SDK¶
The FeatureByte Python SDK offers a comprehensive set of objects for feature engineering, simplifying the management and manipulation of feature stores, data sources, tables, entities, views, features, feature lists and other necessary objects for feature serving.
The following diagram illustrates the relationships between each core object within the SDK. The objects in the diagram can be classified into three types:
- Objects linked to a feature store: These objects include the FeatureStore object itself, as well as the DataSource and SourceTable objects that can be obtained from the FeatureStore object.
- Objects linked to a catalog: These objects include the Catalog object itself, along with all other objects that can be accessed from the Catalog object.
- Local Objects: The View object is a temporary object that is not added to the catalog.
Click on any element in the diagram to access the SDK reference for that specific object.
Below is a brief overview of each object:
-
FeatureStore: Handles connections and location details for a feature store within a data warehouse.
-
Catalog: A centralized metadata repository for organizing tables, entities, features, and feature lists and other objects to facilitate feature serving for a specific domain.
-
DataSource: The collection of source tables that the feature store can access. The object is immutable.
-
SourceTable: A table from a data warehouse that the feature store can access. The object is immutable.
-
Table: Provides a centralized location for metadata about a source table. This metadata determines the type of operations that can be applied to the table's views and includes essential information for feature engineering.
-
Entity: Contains metadata on an entity type represented or referenced by tables in your data warehouse.
-
Relationship: Links two Entity objects based on their direct relationship.
-
View: Is a local virtual table that can be modified and joined to other views to prepare data before feature creation. Works like a SQL view.
-
Feature: Contains the logical plan for computing a feature.
-
FeatureList: A collection of Feature objects tailored for a specific use case, typically for generating feature values in machine learning applications.
-
ObservationTable: Represents an observation set in the feature store that combines historical points-in-time and entity values to make historical feature requests, usually for training and testing machine learning applications.
-
HistoricalFeatureTable: Represents a table in the feature store with historical feature values from a historical feature request and metadata on the FeatureList and the ObservationTable objects used to create it.
-
Deployment: Used to manage the online and batch serving of a deployed FeatureList in a production environment.
-
BatchRequestTable: Represents a table in the feature store specifying entity values for batch serving.
-
BatchFeatureTable: Represents a table in the feature store with feature values from batch serving and metadata on the Deployment and the BatchRequestTable objects used to create it.
Upcoming objects:
-
Target: Contains metadata about a machine learning application's target variable and, optionally, the logical plan to compute it, outlining the prediction objective. It can be used during historical feature requests to populate target values.
-
Context: Contains metadata about a UseCase's main entity, any subset of that entity that the UseCase may focus on, the expected timing of batch or online serving, the available information at inference, and any constraints that need to be considered, such as legal or operational restrictions.
-
UseCase: Combines Target and Context objects to define the machine learning application's objective and operating conditions. It can be used to document and organize historical feature requests or feature list deployments.
Note
Three objects are not supported yet: Target, Context, and Use Case.
The distinction between SourceTable, Table, and View is important for several reasons:
-
It allows SourceTable to be an immutable object and represent the original table in the data warehouse. The object can be used to run Exploratory Data Analysis to determine the type of data it contains before registering the table in the catalog.
-
By creating a separate Table object in a catalog, you can inform other users that this table is relevant for the domain covered by the catalog. In addition, a Table object can be deleted when not used by any features or deprecated if it cannot be deleted. This allows for better organization and management of data tables.
-
Registering a Table object per type allows for type-specific methods or properties to be associated with the Table object. This means users can have a smoother experience when working with data tables.
-
Distinguishing between Table and View ensures that there is only one authoritative source of metadata for a given table. Various team members can create views without modifying the original Table, which offers flexibility and reduces the risk of data inconsistency. Although Views are local objects, the code to generate them can be shared in your git repository.