Target

A Target object contains the logical plan (also referred to as a blueprint) to compute a target.

The target values are computed by using a set of observations for training purposes.

Targets can sometimes be extracted directly from existing attributes in the source tables. However, in many cases, targets are created through a sequence of operations like row transformations, joins, filters, and aggregates.

In FeatureByte, the computational blueprint for Target objects can be defined from View objects via:

Target objects can also be formed as transformations of one or more Target objects.

Lookup targets¶

Lookup targets are simple targets extracted directly from entity attributes in a view without the need for aggregation. For example, targets extracted from a column in a specific view reflect characteristics of the entity linked with that view's primary key.

Consider the Grocery dataset used in our tutorials. Here, you can designate the "Amount" column from the "GROCERYINVOICE" table view as a target for the "groceryinvoice" entity using the as_target() method:

invoice_view = catalog.get_view("GROCERYINVOICE")
invoice_view["Amount"].as_target("Invoice_Amount")

For a Slowly Changing Dimension (SCD) view where attributes change overtime, the target is linked to the entity identified by the table's natural key. By default, the target value is acquired by selecting:

The active attribute at the points-in-time of the observation set

In the following example, the "UsesWindows" target indicates whether a customer is using Windows. It is a Lookup target for the "grocerycustomer" entity that is identified by the natural key "GroceryCustomerGuid" of the SCD table "GROCERYCUSTOMER".

customer_view = catalog.get_view("GROCERYCUSTOMER")
# Extract operating system from BrowserUserAgent column
customer_view["OperatingSystemIsWindows"] = \
    customer_view.BrowserUserAgent.str.contains("Windows")
# Create a target from the OperatingSystemIsWindows column
uses_windows = customer_view.OperatingSystemIsWindows.as_target("UsesWindows")

In case of an SCD view or an Event view, you can specify an offset, if you want the attribute value from a specific point in the future.

In the following example, we use an offset of 28 days to create a target that indicates the attribute value four weeks after the observation point.

uses_windows_28d_later = customer_view.OperatingSystemIsWindows.as_target(
    "UsesWindows_28d_later", offset='28d'
)

Aggregate targets¶

Aggregate targets involves applying various forward aggregation functions to a collection of data points grouped by an entity (or a tuple of entities). Supported forward aggregation functions include the latest, count, sum, average, minimum, maximum, and standard deviation.

Below is the two step process to define an aggregate target:

Determine the level of analysis by grouping view rows based on columns representing one or more entities in the view using the groupby() method.

items_view = catalog.get_view("INVOICEITEMS")
# Group items by the column GroceryCustomerGuid that references the customer entity
items_by_customer = items_view.groupby("GroceryCustomerGuid")

We currently support one type of forward aggregation:
Aggregates Over A Window: Targets generated by aggregating data within a specific time frame, commonly used for analyzing event data, item data, and change view data.

Aggregate Over a Window example¶

An Aggregate Over a Window is obtained using the forward_aggregate() method on a GroupBy object:

# Group items by the column GroceryCustomerGuid that references the customer entity
items_by_customer = items_view.groupby("GroceryCustomerGuid")
# Declare targets that measure the discount received by customer
customer_discounts_next_7d = items_by_customer.forward_aggregate(
    "Discount",
    method=fb.AggFunc.SUM,
    target_name="CustomerDiscounts_next_7d",
    window='7d'
)

Generic Transforms¶

Generic transformations applicable to ViewColumn objects can also be applied to Target objects of any data type. The list of generic transforms can be found in the provided glossary.

Numeric Transforms¶

Numeric Targets can be manipulated using built-in arithmetic operators (+, -, *, /). For example:

customer_total_discount_pct_next_4w = (
    customer_total_discount_next_4w / customer_total_spent_next_4w
)

In addition to these arithmetic operations, other numeric transformations that are applicable to ViewColumn objects can also be applied to Target objects.

Conditional Transforms¶

You can apply if-then-else logic by using conditional statements, which include other Target objects related to the same entity.

cond = customer_state == "Ile-de-France"
customer_spent_over_next_7d[cond] = 100 + customer_spent_over_next_7d[cond]

Getting Target Values¶

First, verify the primary entity of a Target object that indicates the entities that can be used to compute the target. A target can be served by its primary entity or any descendant serving entities. To obtain the primary entity, use the primary_entity property.

display(my_target.primary_entity)

The data requested is presented as an observation set that combines historical points-in-time and key values of the target's primary entity. Values of related serving entities can also be used.

Note

An accepted serving name must be used for the column containing the entity values. If you can not rename the column, use the serving_names_mapping parameter of the compute methods to specify the mapping.

The historical points-in-time must be timestamps in UTC and must be contained in a column named 'POINT-IN-TIME'.

The observation set can be:

a pandas DataFrame.
or an ObservationTable object representing an observation set in the feature store.

You can obtain an ObservationTable object from the catalog using the get_observation_table() method:

observation_table = catalog.get_observation_table(<observation_table_name>)

Requesting targets is supported by two methods:

compute_targets(): returns a loaded DataFrame. Use this method when the output is expected to be of a manageable size that can be handled locally.
compute_target_table(): returns an ObservationTable object representing the output table stored in the feature store. This method is suitable for handling large tables and storing them in the feature store for reuse or auditing.

Here's an example using the compute_targets() method that returns a DataFrame:

my_target = catalog.get_target(<target_name>)
training_data = my_target.compute_targets(observation_table)

Here's an example using the compute_target_table() method that returns an ObservationTable:

training_table = my_target.compute_target_table(
    observation_table,
    target_table_name='Customer Purchase next 2w'
)

You can download the ObservationTable object using the download() method, or delete it using the delete() method:

training_table.download()
training_table.delete()

The ObservationTable object contains metadata on the Target and ObservationTable objects used, offering a full lineage of training or test data. To access their Object IDs, use the observation_table_id and target_id property found within the RequestInput property of the ObservationTable.

training_table.request_input.observation_table_id
training_table.request_input.target_id

Note that this ObservationTable can then be passed into compute_historical_feature_table() to get a full table of training data that can be used to train your models.

Previewing a Target¶

First, verify the primary entity of a Target, which indicates the entities that can be used to serve the target. A target can be served by its primary entity or any descendant serving entities.

You can obtain the primary entity of a target by using the primary_entity method as shown below:

# This should show the name of the primary entity together with its serving names.
# The only accepted serving_name in this example is 'GROCERYCUSTOMERGUID'.
display(next_customer_sales_14d.primary_entity)

Note

You can preview a Target object using a small observation set of up to 50 rows. This computes the target values on the fly and should be used only for small observation sets for debugging or exploring unsaved targets.

The small observation set must combine historical points-in-time and key values of the primary entity from the target. Associated serving entities can also be utilized.

An accepted serving name should be used for the column containing the entity values.

The historical points-in-time must be timestamps in UTC and must be contained in a column named 'POINT-IN-TIME'.

The preview() method returns a pandas DataFrame.

import pandas as pd
observation_set = pd.DataFrame({
    'GROCERYCUSTOMERGUID': ["30e3fbe4-3cbe-4d51-b6ca-1f990ef9773d"],
    'POINT_IN_TIME': [pd.Timestamp("2022-12-17 12:12:40")]
})
display(next_customer_sales_14d.preview(observation_set))

Adding a Target Object to the Catalog¶

Saving a Target Object makes the object persistent and adds it to the catalog.

next_customer_sales_14d.save()

Note

After saving it, a Target object cannot be modified.

Accessing a Target from the Catalog¶

You can refer to the catalog to view a list of existing targets, including their detailed information, using the list_targets() method:

catalog.list_targets()

Note

The list_targets() method returns the default version of each target.

You can also retrieve a Target object using its Object ID using the get_target_by_id() method.

target_object = catalog.get_target_by_id(<target_object_ID>)

Accessing the Target Definition file of a Target object¶

The target definition file is a Target object's single source of truth. The file is generated automatically after a target is declared in the SDK.

This file uses the same SDK syntax as the target declaration and provides an explicit outline of the intended operations of the target declaration, including those inherited but not explicitly declared by you.

The target definition file is the basis for generating the final logical execution graph, which is then transpiled into platform-specific SQL (e.g. SnowSQL, SparkSQL) for target materialization.

The file can be easily displayed in the SDK using the definition property.

target_object = catalog.get_target_by_id(<target_object_ID>)
display(target_object.definition)