Target
A Target object outlines what a Machine Learning model aims to predict.
In FeatureByte, you can establish a target using two principal methods:
- Logical Approach: This technique calculates targets within FeatureByte, mirroring the process of creating features.
- Descriptive Approach: You directly outline your prediction goal.
These Target objects are instrumental in constructing a Use Case, in collaboration with a Context. It's important to note that only targets defined through the logical approach can be computed via FeatureByte.
Logical Plan for a Target¶
Target objects, built upon View objects, come in three varieties:
- Lookup Targets: Directly retrieve values from view attributes for a future point in time.
- Forward Window-based Aggregate Targets: Use forward-looking aggregations over grouped data.
- Aggregate Targets As At a Future Point-in-Time: Apply aggregations at a designated future moment.
Additionally, targets can emerge as transformations of existing Target objects, offering various ways to define what you want to predict.
Lookup Targets¶
Lookup targets are direct extractions from view attributes, representing simple, non-aggregated future entity characteristics.
Example: To target the "Amount" column in the "GROCERYINVOICE" table from our Grocery dataset tutorials, you would use the as_target()
method like so:
invoice_view = catalog.get_view("GROCERYINVOICE")
invoice_view["Amount"].as_target("INVOICE_Amount")
In the context of Slowly Changing Dimension (SCD) views where attributes evolve over time, you would specify an offset to precisely identify the future attribute of interest, relative to the observation points of your modelling data.
Example: Predicting a customer's location in 4 weeks could look like this:
customer_view = catalog.get_view("GROCERYCUSTOMER")
customer_state_in_4w = customer_view.State.as_target("CUSTOMER_State_in_4w", offset="4w")
Forward Window-based Aggregate Targets¶
This method entails aggregating future data for groups using functions such as count, sum, average, minimum, maximum, and standard deviation. You group data by relevant entities using the groupby()
method, followed by applying the aggregation with the forward_aggregate()
method.
Example: For predicting future discounts for customers over the next 7 days:
items_view = catalog.get_view("INVOICEITEMS")
# Group items by the column GroceryCustomerGuid that references the customer entity
items_by_customer = items_view.groupby("GroceryCustomerGuid")
# Declare target that measures the future discount that the customer would receive in the next 7 days
customer_discounts_next_7d = items_by_customer.forward_aggregate(
"Discount",
method=fb.AggFunc.SUM,
target_name="CUSTOMER_Sum_of_Discounts_next_7d",
window='7d'
)
Aggregate Targets As At a Future Point-in-Time¶
These targets focus on grouping data and applying an aggregation function at a specific future time. The approach resembles the forward window-based targets but zeros in on a pinpoint future time using the forward_aggregate_asat()
method.
Example: Estimating the number of customers in a state in 4 weeks:
customer_view = catalog.get_view("GROCERYCUSTOMER")
# Group customers by the column State that references the frenchstate entity
customers_by_state = customer_view.groupby("State")
# Declare target that measures the number of customers in 4 weeks by State
frenchstate_count_of_customers_in_4w = customers_by_state.forward_aggregate_asat(
None,
method=fb.AggFunc.COUNT,
target_name="FRENCHSTATE_Count_of_Customers_in_4w",
offset='4w'
)
Important
The forward_aggregate_asat
method is only available for SCD views.
Generic Transforms¶
Generic transformations applicable to ViewColumn objects can also be applied to Target objects of any data type. The list of generic transforms can be found in the provided glossary.
Numeric Transforms¶
Numeric Targets can be manipulated using built-in arithmetic operators (+, -, *, /). For example:
customer_total_discount_pct_next_4w = (
customer_total_discount_next_4w / customer_total_spent_next_4w
)
In addition to these arithmetic operations, other numeric transformations that are applicable to ViewColumn objects can also be applied to Target objects.
Conditional Transforms¶
You can apply if-then-else logic by using conditional statements, which include other Target objects related to the same entity.
cond = customer_state == "Ile-de-France"
customer_spent_over_next_7d[cond] = 100 + customer_spent_over_next_7d[cond]
Getting Target Values¶
First, verify the primary entity of a Target object that indicates the entities that can be used to compute the target. A target can be served by its primary entity or any descendant serving entities. To obtain the primary entity, use the primary_entity
property.
The data requested is presented as an observation set that combines historical points-in-time and key values of the target's primary entity. Values of related serving entities can also be used.
Note
An accepted serving name must be used for the column containing the entity values. If you can not rename the column, use the serving_names_mapping
parameter of the compute methods to specify the mapping.
The historical points-in-time must be timestamps in UTC and must be contained in a column named 'POINT-IN-TIME'.
The observation set can be:
- a pandas DataFrame.
- or an ObservationTable object representing an observation set in the feature store.
You can obtain an ObservationTable object from the catalog using the get_observation_table()
method:
Requesting targets is supported by two methods:
compute_targets()
: returns a loaded DataFrame. Use this method when the output is expected to be of a manageable size that can be handled locally.compute_target_table()
: returns an ObservationTable object representing the output table stored in the feature store. This method is suitable for handling large tables and storing them in the feature store for reuse or auditing.
Here's an example using the compute_targets()
method that returns a DataFrame:
my_target = catalog.get_target(<target_name>)
observation_df_with_target = my_target.compute_targets(observation_table)
compute_target_table()
method that returns an ObservationTable:
observation_table_with_target = my_target.compute_target_table(
observation_table,
observation_table_name='Customer Purchase next 2w'
)
You can download the ObservationTable object using the download()
method, or delete it using the delete()
method:
The ObservationTable object contains metadata on the Target and ObservationTable objects used, offering a full lineage of training or test data. To access their Object IDs, use the observation_table_id
and target_id
property found within the RequestInput
property of the ObservationTable
.
observation_table_with_target.request_input.observation_table_id
observation_table_with_target.request_input.target_id
Note that this ObservationTable can then be passed into compute_historical_feature_table()
to get a full table of training data that can be used to train your models.
Previewing a Target¶
First, verify the primary entity of a Target, which indicates the entities that can be used to serve the target. A target can be served by its primary entity or any descendant serving entities.
You can obtain the primary entity of a target by using the primary_entity
method as shown below:
# This should show the name of the primary entity together with its serving names.
# The only accepted serving_name in this example is 'GROCERYCUSTOMERGUID'.
display(next_customer_sales_14d.primary_entity)
Note
You can preview a Target object using a small observation set of up to 50 rows. This computes the target values on the fly and should be used only for small observation sets for debugging or exploring unsaved targets.
The small observation set must combine historical points-in-time and key values of the primary entity from the target. Associated serving entities can also be utilized.
An accepted serving name should be used for the column containing the entity values.
The historical points-in-time must be timestamps in UTC and must be contained in a column named 'POINT-IN-TIME'.
The preview()
method returns a pandas DataFrame.
import pandas as pd
observation_set = pd.DataFrame({
'GROCERYCUSTOMERGUID': ["30e3fbe4-3cbe-4d51-b6ca-1f990ef9773d"],
'POINT_IN_TIME': [pd.Timestamp("2022-12-17 12:12:40")]
})
display(next_customer_sales_14d.preview(observation_set))
Adding a Target Object to the Catalog¶
Saving a Target Object makes the object persistent and adds it to the catalog.
Note
After saving it, a Target object cannot be modified.
Accessing a Target from the Catalog¶
You can refer to the catalog to view a list of existing targets, including their detailed information, using the list_targets()
method:
Note
The list_targets()
method returns the default version of each target.
You can also retrieve a Target object using its Object ID using the get_target_by_id()
method.
Accessing the Target Definition file of a Target object¶
The target definition file is a Target object's single source of truth. The file is generated automatically after a target is declared in the SDK.
This file uses the same SDK syntax as the target declaration and provides an explicit outline of the intended operations of the target declaration, including those inherited but not explicitly declared by you.
The target definition file is the basis for generating the final logical execution graph, which is then transpiled into platform-specific SQL (e.g. SnowSQL, SparkSQL) for target materialization.
The file can be easily displayed in the SDK using the definition
property.
Descriptive Set up of a Target¶
A Target can be created descriptively using the create
method of the TargetNameSpace
class. For the list of variable types supported by FeatureByte, consult the DBVarType enum class docstring here.