6. Create Target
Create target¶
We want to predict a spending of active customers in next 2 weeks.
Let's create at target that measures sum of invoice Amount for the customer over the next 14d period.
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
21:58:14 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 21:58:14 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 21:58:15 | WARNING | Remote SDK version (0.5.0.dev6) is different from local (0.5.0.dev1). Update local SDK to avoid unexpected behavior. 21:58:15 | INFO | No catalog activated. 21:58:15 | INFO | 6 feature lists, 31 features deployed 21:58:15 | INFO | Using profile: tutorial 21:58:15 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 21:58:15 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 21:58:16 | WARNING | Remote SDK version (0.5.0.dev6) is different from local (0.5.0.dev1). Update local SDK to avoid unexpected behavior. 21:58:16 | INFO | No catalog activated. 21:58:16 | INFO | 6 feature lists, 31 features deployed 21:58:17 | INFO | Catalog activated: Grocery Dataset Tutorial
As we already know, Amount column is from GROCERYINVOICE table, that's why we need to create a view from it:
In [2]:
Copied!
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
groceryinvoice_view = catalog.get_view("GROCERYINVOICE")
The target is a sum of the Amount column in next 14d:
In [3]:
Copied!
target = groceryinvoice_view\
.groupby(['GroceryCustomerGuid'])\
.forward_aggregate(
"Amount", method="sum",
target_name="CUSTOMER_Sum_of_invoice_Amount_next_14d",
window='14d',
fill_value=0
)
target = groceryinvoice_view\
.groupby(['GroceryCustomerGuid'])\
.forward_aggregate(
"Amount", method="sum",
target_name="CUSTOMER_Sum_of_invoice_Amount_next_14d",
window='14d',
fill_value=0
)
In order for a target (or a feature) to be recorded in catalog, we need to save it:
In [4]:
Copied!
target.save()
target.save()
Also we will update description of the target
In [5]:
Copied!
target.update_description(
"Sum of invoice Amount for the customer over the next 14d period."
)
target.update_description(
"Sum of invoice Amount for the customer over the next 14d period."
)
Target is created. We can check target's definition file, which provides explicit outline of all operations for declaration of the target. For example this definition includes implicit operations like all cleaning operations inherited from the table.
In [6]:
Copied!
target.definition
target.definition
Out[6]:
# Generated by SDK version: 0.5.0.dev6
from bson import ObjectId
from featurebyte import ColumnCleaningOperation
from featurebyte import DisguisedValueImputation
from featurebyte import EventTable
from featurebyte import ValueBeyondEndpointImputation
# event_table name: "GROCERYINVOICE"
event_table = EventTable.get_by_id(ObjectId("64ff1c910d5bfbfb21bce78a"))
event_view = event_table.get_view(
view_mode="manual",
drop_column_names=["record_available_at"],
column_cleaning_operations=[
ColumnCleaningOperation(
column_name="Amount",
cleaning_operations=[
DisguisedValueImputation(
imputed_value=None, disguised_values=[-99, -98]
),
ValueBeyondEndpointImputation(
type="less_than", end_point=0, imputed_value=0
),
ValueBeyondEndpointImputation(
type="greater_than", end_point=2000, imputed_value=2000
),
],
)
],
)
target = event_view.groupby(
by_keys=["GroceryCustomerGuid"], category=None
).forward_aggregate(
value_column="Amount",
method="sum",
window="14d",
target_name="CUSTOMER_Sum_of_invoice_Amount_next_14d",
skip_fill_na=True,
)
feat = target["CUSTOMER_Sum_of_invoice_Amount_next_14d"]
feat_1 = feat.copy()
feat_1[feat.isnull()] = 0
feat_1.name = "CUSTOMER_Sum_of_invoice_Amount_next_14d"
output = feat_1
output.save(_id=ObjectId("64ff1cfb385e2ada5de0a15a"))
In [ ]:
Copied!