Skip to content

FeatureList

A FeatureList object is a collection of Feature objects that is tailored to meet the needs of a particular use case. It is commonly used in generating feature values for Machine Learning training and inference.

Creating a Feature List

A FeatureList object is created using its constructor, which takes a list of Feature, FeatureList, or FeatureGroup objects as input.

# Get an existing feature list from the catalog
feature_list = catalog.get_feature_list(<feature_list_name>)
# Get existing features from the catalog
feature1 = catalog.get_feature(<feature2_name>)
feature2 = catalog.get_feature_by_id(<feature2_id>)
# Form a new feature list
my_feature_list = FeatureList(
    [feature_list, feature1, feature2],
    name='Improved List for Customer Personalization'
)

Listing Feature Objects in a FeatureList Object

You can obtain a list of the feature names of a feature list or more information in the form of a DataFrame containing various attributes of its Feature objects, such as their names, versions, types, corresponding tables, related entities, creation dates, states of readiness and online availability. To obtain this list, use either the feature_names property or the list_features() method.

# Get the feature names only
display(my_feature_list.feature_names)
# Get detailed information on the Feature objects in the list
df = my_feature_list.list_features()

Adding a Feature List Object to the Catalog

Saving a FeatureList object makes the object persistent and adds it to the catalog using the save() method.

my_feature_list.save()

Note

After saving a FeatureList object, it cannot be modified. You can create new FeatureList objects with the same namespace to support versioning. Refer to the versioning section, for more details.

Setting Feature List Status

Feature lists can be assigned one of five status levels to differentiate between experimental feature lists and those suitable for deployment or already deployed.

The status is managed at the namespace level of a Feature List object, meaning all FeatureList objects with the same namespace share the same status. The five possible status levels are:

  • "DEPLOYED": Assigned to feature list namespaces with at least one deployed FeatureList object.
  • "TEMPLATE": For feature lists as reference templates or safe starting points.
  • "PUBLIC_DRAFT": For feature lists shared for feedback purposes.
  • "DRAFT": For feature lists in the prototype stage.
  • "DEPRECATED": For outdated or unnecessary feature lists.

To obtain the current status of a feature list, use the status property. To change the status, use the update_status() method:

my_feature_list.update_status("PUBLIC_DRAFT")

Note

For the following scenarios, some status levels are automatically assigned to feature lists:

  • when a feature list with a new namespace is created, the "DRAFT" status is assigned to the feature list.
  • when at least one FeatureList object within the namespace is deployed, the "DEPLOYED" status is assigned.
  • when deployment is disabled for all FeatureList objects in the namespace, the "PUBLIC_DRAFT" status is assigned.

Additional guidelines:

  • Before setting a feature list status to "TEMPLATE", ensure all features in the default version are "PRODUCTION_READY".
  • Only "DRAFT" FeatureList objects can be deleted.
  • You cannot revert a feature list status to a "DRAFT" status.
  • Once a feature list is in "DEPLOYED" status, you cannot update the status to other status until all the associated deployments are disabled.

Managing Feature List Versions

A new version of a feature list is created by generating a new FeatureList object with the same namespace as the original one. This new FeatureList object has its own Object ID and version name.

Creating new feature list versions allows you to use the latest default version of each feature, unless specific feature versions are specified. To create a new feature list version, use the create_new_version() method:

new_feature_list_version = my_feature_list.create_new_version()

The Object ID and version name of the new FeatureList object can be accessed using the id and version properties. The name remains the same as the original FeatureList object.

print("new_feature_list_version.name", new_feature_list_version.name)
print("new_feature_list_version.id", new_feature_list_version.id)
print("new_feature_list_version.version", new_feature_list_version.version)

You can list FeatureList objects (versions) with the same namespace from any FeatureList object using the list_versions() method.

new_feature_list_version.list_versions()

Feature List Readiness

The Feature List readiness metric provides a statistic on the readiness of Feature objects within the FeatureList object. This metric represents the percentage of its Feature objects that are production ready.

You can access the metric through the production_ready_fraction property of the FeatureList object as shown below:

display(new_feature_list_version.production_ready_fraction)

Setting a Default Feature List Version

The default version simplifies feature list reuse by providing the most appropriate version when none is explicitly specified. There are two default version modes:

  • By default, the feature list's default version mode is automatic, selecting the Feature List object with the highest readiness metric. The most recent version becomes the default if multiple versions have the same readiness metric.
  • When a feature list's default version mode is set to manual, you can designate a specific FeatureList object as the default version for FeatureList objects with the same namespace using the as_default_version() method.

You can change the feature list's default version mode using the update_default_version_mode() method.

new_feature_list_version.update_default_version_mode("MANUAL")
my_feature_list.as_default_version()

To reset the default version mode of the feature list, use the update_default_version_mode() method:

my_feature_list.update_default_version_mode("AUTO")

Accessing a Feature from the Catalog

You can refer to the catalog to view a list of existing feature lists, including their detailed information, using the list_feature_lists() method:

catalog.list_feature_lists()

Note

The list_feature_lists() method returns the default version of each feature list.

To obtain the default version of a feature list, use the get_feature_list() method and the feature list's namespace. In case of a specific version, provide the version name as well.

default_version = catalog.get_feature_list(<feature_list_name>)
new_version_added_to_catalog = catalog.get_feature_list(
    <feature_list_name>, version=<version_name>
)

Additionally, you can obtain a FeatureList object by utilizing the get_feature_list_by_id() method and its object ID.

feature_object = catalog.get_feature_list_by_id(<feature_list_object_ID>)

Getting Historical Feature Values

First, verify the primary entity of a FeatureList object that indicates the entities that can be used to serve the feature list. A feature list can be served by its primary entity or any descendant serving entities. To obtain the primary entity, use the primary_entity property.

display(my_feature_list.primary_entity)

Historical serving of a feature list is usually intended for exploration, model training, and testing purposes. The data requested is presented as an observation set that combines historical points-in-time and key values of the feature list's primary entity. Values of related serving entities can also be used.

Note

An accepted serving name must be used for the column containing the entity values. If you can not rename the column, use the serving_names_mapping parameter of the historical request methods to specify the mapping.

The historical points-in-time must be timestamps in UTC and must be contained in a column named 'POINT-IN-TIME'.

The observation set can be:

You can obtain an ObservationTable object from the catalog using the get_observation_table() method:

observation_table = catalog.get_observation_table(<observation_table_name>)

Requesting historical features is supported by two methods:

  • compute_historical_features(): returns a loaded DataFrame. Use this method when the output is expected to be of a manageable size that can be handled locally.
  • compute_historical_feature_table(): returns a HistoricalFeatureTable object representing the output table stored in the feature store. This method is suitable for handling large tables and storing them in the feature store for reuse or auditing.

Here's an example using the compute_historical_features() method that returns a DataFrame:

my_feature_list = catalog.get_feature_list(<feature_list_name>)
training_data = my_feature_list.compute_historical_features(observation_table)
Here's an example using the compute_historical_feature_table() method that returns a HistoricalFeatureTable:
training_table_name = (
    '2y Features for Customer Purchase next 2w '
    'up to end 22 with Improved Feature List'
)
training_table = my_feature_list.compute_historical_feature_table(
    observation_table,
    historical_feature_table_name=training_table_name
)

You can download the HistoricalFeatureTable object using the download() method, or delete it using the delete() method:

training_table.download()
training_table.delete()

The HistoricalFeatureTable object contains metadata on the FeatureList and ObservationTable objects used, offering a full lineage of training or test data. To access their Object IDs, use the feature_list_id and the observation_table_id properties.

training_table.feature_list_id
training_table.observation_table_id

Previewing a Feature List

Previewing a FeatureList object is similar to getting historical feature values. However, unlike the compute_historical_features() method, this method does not store partial aggregations (tiles) to speed up future computation. Instead, it computes the feature values on the fly and should be used only for small observation sets (limited to 50 rows) for debugging or exploring values of newly created features. The preview() method returns a pandas DataFrame.

import pandas as pd
observation_set = pd.DataFrame({
    'GROCERYCUSTOMERGUID': ["30e3fbe4-3cbe-4d51-b6ca-1f990ef9773d"],
    'POINT_IN_TIME': [pd.Timestamp("2022-12-17 12:12:40")]
})
display(my_feature_list.preview(observation_set))

Deploying a Feature List

A FeatureList object is deployed to support batch and online serving for inference. This triggers the orchestration of the feature materialization into the online feature store.

Follow these steps to deploy a FeatureList object:

  1. Ensure all its Feature objects are labeled as "PRODUCTION_READY".
  2. Optionally, choose a name for your deployment.
  3. Use the deploy() method to create a Deployment object.
  4. Enable the Deployment object using the enable() method.
# Check Feature objects are PRODUCTION_READY.
# A readiness metric of 100% should be returned.
display(my_feature_list.production_ready_fraction)
# Create deployment
my_deployment = my_feature_list.deploy(
    name="Deployment of the improved feature list",
)
# Enable deployment
my_deployment.enable()

Deploying a FeatureList object returns a Deployment object that can be used for:

Note

Refer to the documentation of the Deployment object for description of batch and online serving.

Monitoring Feature Job status

Once a feature list has been deployed and is being served, the get_feature_jobs_status() method returns a report on the recent activity of scheduled feature jobs in your feature store.

feature_list.get_feature_jobs_status()

The report includes recent runs for these jobs, their success status, and the job durations.

Failed and late jobs can occur for various reasons, including insufficient compute capacity. Examine your data warehouse logs for more information on the errors. If errors result from inadequate compute capacity, consider increasing your instance size.