FeatureList
A FeatureList object is a collection of Feature objects that is tailored to meet the needs of a particular use case. It is commonly used in generating feature values for Machine Learning training and inference.
Creating a Feature List¶
A FeatureList object is created using its constructor, which takes a list of Feature, FeatureList, or FeatureGroup objects as input.
# Get an existing feature list from the catalog
feature_list = catalog.get_feature_list(<feature_list_name>)
# Get existing features from the catalog
feature1 = catalog.get_feature(<feature2_name>)
feature2 = catalog.get_feature_by_id(<feature2_id>)
# Form a new feature list
my_feature_list = FeatureList(
[feature_list, feature1, feature2],
name='Improved List for Customer Personalization'
)
Listing Feature Objects in a FeatureList Object¶
You can obtain a list of the feature names of a feature list or more information in the form of a DataFrame containing various attributes of its Feature objects, such as their names, versions, types, corresponding tables, related entities, creation dates, states of readiness and online availability. To obtain this list, use either the feature_names
property or the list_features()
method.
# Get the feature names only
display(my_feature_list.feature_names)
# Get detailed information on the Feature objects in the list
df = my_feature_list.list_features()
Adding a Feature List Object to the Catalog¶
Saving a FeatureList object makes the object persistent and adds it to the catalog using the save()
method.
Note
After saving a FeatureList object, it cannot be modified. You can create new FeatureList objects with the same namespace to support versioning. Refer to the versioning section, for more details.
Setting Feature List Status¶
Feature lists can be assigned one of five status levels to differentiate between experimental feature lists and those suitable for deployment or already deployed.
The status is managed at the namespace level of a Feature List object, meaning all FeatureList objects with the same namespace share the same status. The five possible status levels are:
- "DEPLOYED": Assigned to feature list namespaces with at least one deployed FeatureList object.
- "TEMPLATE": For feature lists as reference templates or safe starting points.
- "PUBLIC_DRAFT": For feature lists shared for feedback purposes.
- "DRAFT": For feature lists in the prototype stage.
- "DEPRECATED": For outdated or unnecessary feature lists.
To obtain the current status of a feature list, use the status
property. To change the status, use the update_status()
method:
Note
For the following scenarios, some status levels are automatically assigned to feature lists:
- when a feature list with a new namespace is created, the "DRAFT" status is assigned to the feature list.
- when at least one FeatureList object within the namespace is deployed, the "DEPLOYED" status is assigned.
- when deployment is disabled for all FeatureList objects in the namespace, the "PUBLIC_DRAFT" status is assigned.
Additional guidelines:
- Before setting a feature list status to "TEMPLATE", ensure all features in the default version are "PRODUCTION_READY".
- Only "DRAFT" FeatureList objects can be deleted.
- You cannot revert a feature list status to a "DRAFT" status.
- Once a feature list is in "DEPLOYED" status, you cannot update the status to other status until all the associated deployments are disabled.
Managing Feature List Versions¶
A new version of a feature list is created by generating a new FeatureList object with the same namespace as the original one. This new FeatureList object has its own Object ID and version name.
Creating new feature list versions allows you to use the latest default version of each feature, unless specific feature versions are specified. To create a new feature list version, use the create_new_version()
method:
The Object ID and version name of the new FeatureList object can be accessed using the id
and version
properties. The name remains the same as the original FeatureList object.
print("new_feature_list_version.name", new_feature_list_version.name)
print("new_feature_list_version.id", new_feature_list_version.id)
print("new_feature_list_version.version", new_feature_list_version.version)
You can list FeatureList objects (versions) with the same namespace from any FeatureList object using the list_versions()
method.
Feature List Readiness¶
The Feature List readiness metric provides a statistic on the readiness of Feature objects within the FeatureList object. This metric represents the percentage of its Feature objects that are production ready.
You can access the metric through the production_ready_fraction
property of the FeatureList object as shown below:
Accessing a Feature from the Catalog¶
You can refer to the catalog to view a list of existing feature lists, including their detailed information, using the list_feature_lists()
method:
Note
The list_feature_lists()
method returns the default version of each feature list.
To obtain the default version of a feature list, use the get_feature_list()
method and the feature list's namespace. In case of a specific version, provide the version name as well.
default_version = catalog.get_feature_list(<feature_list_name>)
new_version_added_to_catalog = catalog.get_feature_list(
<feature_list_name>, version=<version_name>
)
Additionally, you can obtain a FeatureList object by utilizing the get_feature_list_by_id()
method and its object ID.
Getting Historical Feature Values¶
First, verify the primary entity of a FeatureList object that indicates the entities that can be used to serve the feature list. A feature list can be served by its primary entity or any descendant serving entities. To obtain the primary entity, use the primary_entity
property.
Historical serving of a feature list is usually intended for exploration, model training, and testing purposes. The data requested is presented as an observation set that combines historical points-in-time and key values of the feature list's primary entity. Values of related serving entities can also be used.
Note
An accepted serving name must be used for the column containing the entity values. If you can not rename the column, use the serving_names_mapping
parameter of the historical request methods to specify the mapping.
The historical points-in-time must be timestamps in UTC and must be contained in a column named 'POINT-IN-TIME'.
The observation set can be:
- a pandas DataFrame.
- or an ObservationTable object representing an observation set in the feature store.
You can obtain an ObservationTable object from the catalog using the get_observation_table()
method:
Requesting historical features is supported by two methods:
compute_historical_features()
: returns a loaded DataFrame. Use this method when the output is expected to be of a manageable size that can be handled locally.compute_historical_feature_table()
: returns a HistoricalFeatureTable object representing the output table stored in the feature store. This method is suitable for handling large tables and storing them in the feature store for reuse or auditing.
Here's an example using the compute_historical_features()
method that returns a DataFrame:
my_feature_list = catalog.get_feature_list(<feature_list_name>)
training_data = my_feature_list.compute_historical_features(observation_table)
compute_historical_feature_table()
method that returns a HistoricalFeatureTable:
training_table_name = (
'2y Features for Customer Purchase next 2w '
'up to end 22 with Improved Feature List'
)
training_table = my_feature_list.compute_historical_feature_table(
observation_table,
historical_feature_table_name=training_table_name
)
You can download the HistoricalFeatureTable object using the download()
method, or delete it using the delete()
method:
The HistoricalFeatureTable object contains metadata on the FeatureList and ObservationTable objects used, offering a full lineage of training or test data. To access their Object IDs, use the feature_list_id
and the observation_table_id
properties.
Previewing a Feature List¶
Previewing a FeatureList object is similar to getting historical feature values. However, unlike the compute_historical_features()
method, this method does not store partial aggregations (tiles) to speed up future computation. Instead, it computes the feature values on the fly and should be used only for small observation sets (limited to 50 rows) for debugging or exploring values of newly created features.
The preview()
method returns a pandas DataFrame.
import pandas as pd
observation_set = pd.DataFrame({
'GROCERYCUSTOMERGUID': ["30e3fbe4-3cbe-4d51-b6ca-1f990ef9773d"],
'POINT_IN_TIME': [pd.Timestamp("2022-12-17 12:12:40")]
})
display(my_feature_list.preview(observation_set))
Deploying a Feature List¶
A FeatureList object is deployed to support batch and online serving for inference. This triggers the orchestration of the feature materialization into the online feature store.
Follow these steps to deploy a FeatureList object:
- Ensure all its Feature objects are labeled as "PRODUCTION_READY".
- Optionally, choose a name for your deployment.
- Use the
deploy()
method to create a Deployment object. - Enable the Deployment object using the
enable()
method.
# Check Feature objects are PRODUCTION_READY.
# A readiness metric of 100% should be returned.
display(my_feature_list.production_ready_fraction)
# Create deployment
my_deployment = my_feature_list.deploy(
name="Deployment of the improved feature list",
)
# Enable deployment
my_deployment.enable()
Note
Basic Deployment: At a minimum, you need to provide only the name of the deployment. However, it's highly recommended to also specify the related use case.
Specifying Use Case: Use the use_case_name
parameter to link your deployment to the relevant use case. This linkage allows for efficient tracking of all deployments associated with a particular use case, utilizing the list_deployments()
method.
Feature Readiness: If your features are not yet production-ready, you have the option to update their status. Utilize the make_production_ready
parameter to mark features as ready for production. For Enterprise users, this action will initiate a request for the related features, facilitating their transition to production use.
Deploying a FeatureList object returns a Deployment object that can be used for:
- Batch serving
- Retrieving either Python or shell script templates for serving online features.
Note
Refer to the documentation of the Deployment object for description of batch and online serving.
Monitoring Feature Job status¶
Once a feature list has been deployed and is being served, the get_feature_jobs_status()
method returns a report on the recent activity of scheduled feature jobs in your feature store.
The report includes recent runs for these jobs, their success status, and the job durations.
Failed and late jobs can occur for various reasons, including insufficient compute capacity. Examine your data warehouse logs for more information on the errors. If errors result from inadequate compute capacity, consider increasing your instance size.