A FeatureList object is a collection of Feature objects that is tailored to meet the needs of a particular use case. It is commonly used in generating feature values for Machine Learning training and inference.
Creating a Feature List¶
# Get an existing feature list from the catalog feature_list = catalog.get_feature_list(<feature_list_name>) # Get existing features from the catalog feature1 = catalog.get_feature(<feature2_name>) feature2 = catalog.get_feature_by_id(<feature2_id>) # Form a new feature list my_feature_list = FeatureList( [feature_list, feature1, feature2], name='Improved List for Customer Personalization' )
Listing Feature Objects in a FeatureList Object¶
You can obtain a list of the feature names of a feature list or more information in the form of a DataFrame containing various attributes of its Feature objects, such as their names, versions, types, corresponding tables, related entities, creation dates, states of readiness and online availability. To obtain this list, use either the
feature_names property or the
Adding a Feature List Object to the Catalog¶
After saving a FeatureList object, it cannot be modified. You can create new FeatureList objects with the same namespace to support versioning. Refer to the versioning section, for more details.
Setting Feature List Status¶
Feature lists can be assigned one of five status levels to differentiate between experimental feature lists and those suitable for deployment or already deployed.
The status is managed at the namespace level of a Feature List object, meaning all FeatureList objects with the same namespace share the same status. The five possible status levels are:
- "DEPLOYED": Assigned to feature list namespaces with at least one deployed FeatureList object.
- "TEMPLATE": For feature lists as reference templates or safe starting points.
- "PUBLIC_DRAFT": For feature lists shared for feedback purposes.
- "DRAFT": For feature lists in the prototype stage.
- "DEPRECATED": For outdated or unnecessary feature lists.
To obtain the current status of a feature list, use the
status property. To change the status, use the
For the following scenarios, some status levels are automatically assigned to feature lists:
- when a feature list with a new namespace is created, the "DRAFT" status is assigned to the feature list.
- when at least one FeatureList object within the namespace is deployed, the "DEPLOYED" status is assigned.
- when deployment is disabled for all FeatureList objects in the namespace, the "PUBLIC_DRAFT" status is assigned.
- Before setting a feature list status to "TEMPLATE", ensure all features in the default version are "PRODUCTION_READY".
- Only "DRAFT" FeatureList objects can be deleted.
- You cannot revert a feature list status to a "DRAFT" status.
- Once a feature list is in "DEPLOYED" status, you cannot update the status to other status until all the associated deployments are disabled.
Managing Feature List Versions¶
A new version of a feature list is created by generating a new FeatureList object with the same namespace as the original one. This new FeatureList object has its own Object ID and version name.
Creating new feature list versions allows you to use the latest default version of each feature, unless specific feature versions are specified. To create a new feature list version, use the
You can list FeatureList objects (versions) with the same namespace from any FeatureList object using the
Feature List Readiness¶
The Feature List readiness metric provides a statistic on the readiness of Feature objects within the FeatureList object. This metric represents the percentage of its Feature objects that are production ready.
You can access the metric through the
production_ready_fraction property of the FeatureList object as shown below:
Setting a Default Feature List Version¶
The default version simplifies feature list reuse by providing the most appropriate version when none is explicitly specified. There are two default version modes:
- By default, the feature list's default version mode is automatic, selecting the Feature List object with the highest readiness metric. The most recent version becomes the default if multiple versions have the same readiness metric.
- When a feature list's default version mode is set to manual, you can designate a specific FeatureList object as the default version for FeatureList objects with the same namespace using the
You can change the feature list's default version mode using the
To reset the default version mode of the feature list, use the
Accessing a Feature from the Catalog¶
list_feature_lists() method returns the default version of each feature list.
To obtain the default version of a feature list, use the
get_feature_list() method and the feature list's namespace. In case of a specific version, provide the version name as well.
Additionally, you can obtain a FeatureList object by utilizing the
get_feature_list_by_id() method and its object ID.
Getting Historical Feature Values¶
First, verify the primary entity of a FeatureList object that indicates the entities that can be used to serve the feature list. A feature list can be served by its primary entity or any descendant serving entities. To obtain the primary entity, use the
Historical serving of a feature list is usually intended for exploration, model training, and testing purposes. The data requested is presented as an observation set that combines historical points-in-time and key values of the feature list's primary entity. Values of related serving entities can also be used.
An accepted serving name must be used for the column containing the entity values. If you can not rename the column, use the
serving_names_mapping parameter of the historical request methods to specify the mapping.
The historical points-in-time must be timestamps in UTC and must be contained in a column named 'POINT-IN-TIME'.
The observation set can be:
- a pandas DataFrame.
- or an ObservationTable object representing an observation set in the feature store.
Requesting historical features is supported by two methods:
compute_historical_features(): returns a loaded DataFrame. Use this method when the output is expected to be of a manageable size that can be handled locally.
compute_historical_feature_table(): returns a HistoricalFeatureTable object representing the output table stored in the feature store. This method is suitable for handling large tables and storing them in the feature store for reuse or auditing.
Here's an example using the
compute_historical_features() method that returns a DataFrame:
compute_historical_feature_table()method that returns a HistoricalFeatureTable:
You can download the HistoricalFeatureTable object using the
download() method, or delete it using the
The HistoricalFeatureTable object contains metadata on the FeatureList and ObservationTable objects used, offering a full lineage of training or test data. To access their Object IDs, use the
feature_list_id and the
Previewing a Feature List¶
Previewing a FeatureList object is similar to getting historical feature values. However, unlike the
compute_historical_features() method, this method does not store partial aggregations (tiles) to speed up future computation. Instead, it computes the feature values on the fly and should be used only for small observation sets (limited to 50 rows) for debugging or exploring values of newly created features.
preview() method returns a pandas DataFrame.
Deploying a Feature List¶
Follow these steps to deploy a FeatureList object:
- Ensure all its Feature objects are labeled as "PRODUCTION_READY".
- Optionally, choose a name for your deployment.
- Use the
deploy()method to create a Deployment object.
- Enable the Deployment object using the
Deploying a FeatureList object returns a Deployment object that can be used for:
Refer to the documentation of the Deployment object for description of batch and online serving.
Monitoring Feature Job status¶
The report includes recent runs for these jobs, their success status, and the job durations.
Failed and late jobs can occur for various reasons, including insufficient compute capacity. Examine your data warehouse logs for more information on the errors. If errors result from inadequate compute capacity, consider increasing your instance size.