Skip to content

FeatureByte Python SDK

The FeatureByte Python SDK offers a comprehensive set of objects for feature engineering, simplifying the management and manipulation of feature stores, data sources, tables, entities, views, features, feature lists and other necessary objects for feature serving.

The following diagram illustrates the relationships between each core object within the SDK.

DATA MODELLING FEATURE ENGINEERING HISTORICAL SERVING BATCH / ONLINE SERVING BatchFeatureTable BatchRequestTable Deployment HistoricalFeatureTable ObservationTable FeatureList Feature UserDefinedFunction UseCase Context Target View Relationship Entity Table SourceTable DataSource FeatureStore Catalog USED TO ORGANIZE TAGS Catalog Objects LEGEND: FeatureStore Objects Local Objects

Click on any element in the diagram to access the SDK reference for that specific object.

Below is a brief overview of each object:

  1. FeatureStore: Handles connections and location details for a feature store within a data warehouse.

  2. DataSource: The collection of source tables that the feature store can access. The object is immutable.

  3. SourceTable: A table from a data warehouse that the feature store can access. The object is immutable.

  4. Catalog: A centralized metadata repository for organizing tables, entities, features, and feature lists and other objects to facilitate feature serving for a specific domain.

  5. Table: Provides a centralized location for metadata about a source table. This metadata determines the type of operations that can be applied to the table's views and includes essential information for feature engineering.

  6. Entity: Contains metadata on an entity type represented or referenced by tables in your data warehouse.

  7. Relationship: Links two Entity objects based on their direct relationship.

  8. View: Is a local virtual table that can be modified and joined to other views to prepare data before feature creation. Works like a SQL view.

  9. UserDefinedFunction: Is an object that allows you to execute custom SQL User-Defined Functions (UDF) on a column in a view. It is particularly useful for incorporating transformer models into FeatureByte.

  10. Feature: Contains the logical plan for computing a feature.

  11. FeatureList: A collection of Feature objects tailored for a specific use case, typically for generating feature values in machine learning applications.

  12. Target: Contains metadata about a machine learning application's target variable and, optionally, the logical plan to compute it, outlining the prediction objective. It can be used to augment the observation tables with target values.

  13. Context: Contains descriptive metadata about a UseCase's primary entity, such as any subset of that entity that the UseCase may focus on, the expected timing of batch or online serving, the available information at inference, and any constraints that need to be considered, such as legal or operational restrictions.

  14. UseCase: Combines Target and Context objects to define the machine learning application's objective and operating conditions. It can be used to document and organize historical feature requests or feature list deployments.

  15. ObservationTable: Represents an observation set in the feature store that combines historical points-in-time and entity values to make historical feature requests, usually for training and testing machine learning applications. ObservationTable objects are typically associated with Context or UseCase objects.

  16. HistoricalFeatureTable: Represents a table in the feature store with historical feature values from a historical feature request and metadata on the FeatureList and the ObservationTable objects used to create it.

  17. Deployment: Used to manage the online and batch serving of a deployed FeatureList in a production environment.

  18. BatchRequestTable: Represents a table in the feature store specifying entity values for batch serving.

  19. BatchFeatureTable: Represents a table in the feature store with feature values from batch serving and metadata on the Deployment and the BatchRequestTable objects used to create it.

Note

The distinction between SourceTable, Table, and View is important for several reasons:

  1. It allows SourceTable to be an immutable object and represent the original table in the data warehouse. The object can be used to run Exploratory Data Analysis to determine the type of data it contains before registering the table in the catalog.

  2. By creating a separate Table object in a catalog, you can inform other users that this table is relevant for the domain covered by the catalog. In addition, a Table object can be deleted when not used by any features or deprecated if it cannot be deleted. This allows for better organization and management of data tables.

  3. Registering a Table object per type allows for type-specific methods or properties to be associated with the Table object. This means users can have a smoother experience when working with data tables.

  4. Distinguishing between Table and View ensures that there is only one authoritative source of metadata for a given table. Various team members can create views without modifying the original Table, which offers flexibility and reduces the risk of data inconsistency. Although Views are local objects, the code to generate them can be shared in your git repository.

Learn by Example

Discover FeatureByte's SDK with our step-by-step SDK tutorials. We'll guide you through creating a catalog, registering its data model, formulating your use case, crafting features, computing training data, and deploying and managing those features.