10. Create Features from SCD
Create Features from SCD Table¶
Three types of features can be declared from a Slowly Changing Dimension Table in FeatureByte:
- Lookup features: Current attributes or past attributes using an offset.
- Aggregate "As At" features: Aggregates as a particular point in time.
- Change features: Features derived for a change view tracking the changes in a column of the table.
We will declare here two features from CONSUMER_LOAN_STATUS:
- CLIENT_Consumer_Loan_GOODS_CATEGORY_with_Lowest_sum_of_active_Consumer_Loans_AMT_CREDITs
- CLIENT_Max_of_Consumer_Loan_terminations_Consumer_Loan_AMT_ANNUITYs_104w
Activate catalog¶
In [1]:
Copied!
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Credit Default Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Credit Default Dataset SDK Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:42:48 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088/ 16:42:48 | INFO | Using profile: tutorial 16:42:48 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:42:48 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:42:48 | INFO | SDK version: 2.1.0.dev113 16:42:48 | INFO | No catalog activated. 16:42:48 | INFO | Catalog activated: Credit Default Dataset SDK Tutorial
Get view from table¶
In [2]:
Copied!
# Get view from CONSUMER_LOAN_STATUS scd table.
consumer_loan_status_view = catalog.get_view("CONSUMER_LOAN_STATUS")
# Get view from CONSUMER_LOAN_STATUS scd table.
consumer_loan_status_view = catalog.get_view("CONSUMER_LOAN_STATUS")
Filter View to keep only active entities¶
Read on view subsetting
In [3]:
Copied!
# Filter consumer_loan_status_view with null TERMINATION_TIMESTAMP values.
cond = consumer_loan_status_view["TERMINATION_TIMESTAMP"].isnull()
consumer_loan_status_active_consumer_loans_view = consumer_loan_status_view[cond]
# Filter consumer_loan_status_view with null TERMINATION_TIMESTAMP values.
cond = consumer_loan_status_view["TERMINATION_TIMESTAMP"].isnull()
consumer_loan_status_active_consumer_loans_view = consumer_loan_status_view[cond]
Do "as at" aggregation in consumer_loan_status_active_consumer_loans_view¶
See SDK reference for features
See SDK reference to groupby a view
In [4]:
Copied!
# Group consumer_loan_status_active_consumer_loans_view view by Client entity (CLIENT_ID) across
# different GOODS_CATEGORYs.
consumer_loan_status_active_consumer_loans_view_by_client_across_goods_categorys = (
consumer_loan_status_active_consumer_loans_view.groupby(
["CLIENT_ID"], category="GOODS_CATEGORY"
)
)
# Group consumer_loan_status_active_consumer_loans_view view by Client entity (CLIENT_ID) across
# different GOODS_CATEGORYs.
consumer_loan_status_active_consumer_loans_view_by_client_across_goods_categorys = (
consumer_loan_status_active_consumer_loans_view.groupby(
["CLIENT_ID"], category="GOODS_CATEGORY"
)
)
In [5]:
Copied!
# Distribution of the total AMT_CREDITs of active Consumer Loans, segmented by GOODS_CATEGORY for
# the Client.
client_active_consumer_loans_amt_credits__by_goods_category = (
consumer_loan_status_active_consumer_loans_view_by_client_across_goods_categorys.aggregate_asat(
"AMT_CREDIT",
method="sum",
feature_name="CLIENT_active_Consumer_Loans_AMT_CREDITs__by_GOODS_CATEGORY",
)
)
# Distribution of the total AMT_CREDITs of active Consumer Loans, segmented by GOODS_CATEGORY for
# the Client.
client_active_consumer_loans_amt_credits__by_goods_category = (
consumer_loan_status_active_consumer_loans_view_by_client_across_goods_categorys.aggregate_asat(
"AMT_CREDIT",
method="sum",
feature_name="CLIENT_active_Consumer_Loans_AMT_CREDITs__by_GOODS_CATEGORY",
)
)
Derive feature from aggregation across categories¶
In [6]:
Copied!
# Get Consumer Loan GOODS_CATEGORY with Lowest sum of active Consumer Loans AMT_CREDITs from
# CLIENT_active_Consumer_Loans_AMT_CREDITs__by_GOODS_CATEGORY
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits = (
client_active_consumer_loans_amt_credits__by_goods_category.cd.key_with_lowest_value()
)
# Give a name to new feature
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits.name = "CLIENT_Consumer_Loan_GOODS_CATEGORY_with_Lowest_sum_of_active_Consumer_Loans_AMT_CREDITs"
# Get Consumer Loan GOODS_CATEGORY with Lowest sum of active Consumer Loans AMT_CREDITs from
# CLIENT_active_Consumer_Loans_AMT_CREDITs__by_GOODS_CATEGORY
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits = (
client_active_consumer_loans_amt_credits__by_goods_category.cd.key_with_lowest_value()
)
# Give a name to new feature
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits.name = "CLIENT_Consumer_Loan_GOODS_CATEGORY_with_Lowest_sum_of_active_Consumer_Loans_AMT_CREDITs"
Create Termination View¶
In [7]:
Copied!
# Create change view from CONSUMER_LOAN_STATUS table to track changes in TERMINATION_TIMESTAMP.
consumer_loan_status_termination_view = catalog.get_table(
"CONSUMER_LOAN_STATUS"
).get_change_view(track_changes_column="TERMINATION_TIMESTAMP")
cond = consumer_loan_status_termination_view["new_TERMINATION_TIMESTAMP"].isnull()
consumer_loan_status_termination_view = consumer_loan_status_termination_view[~cond]
# Create change view from CONSUMER_LOAN_STATUS table to track changes in TERMINATION_TIMESTAMP.
consumer_loan_status_termination_view = catalog.get_table(
"CONSUMER_LOAN_STATUS"
).get_change_view(track_changes_column="TERMINATION_TIMESTAMP")
cond = consumer_loan_status_termination_view["new_TERMINATION_TIMESTAMP"].isnull()
consumer_loan_status_termination_view = consumer_loan_status_termination_view[~cond]
Join views¶
In [8]:
Copied!
# Join CONSUMER_LOAN_STATUS view to consumer_loan_status_termination_view view.
consumer_loan_status_termination_view = consumer_loan_status_termination_view.join(
consumer_loan_status_view, rprefix="Consumer Loan_"
)
# Join CONSUMER_LOAN_STATUS view to consumer_loan_status_termination_view view.
consumer_loan_status_termination_view = consumer_loan_status_termination_view.join(
consumer_loan_status_view, rprefix="Consumer Loan_"
)
Do window aggregation from consumer_loan_status_termination_view¶
See SDK reference for features
See SDK reference to groupby a view
See SDK reference to do aggregation over time
In [9]:
Copied!
# Group consumer_loan_status_termination_view view by Client entity (Consumer Loan_CLIENT_ID).
consumer_loan_status_termination_view_by_client = (
consumer_loan_status_termination_view.groupby(["Consumer Loan_CLIENT_ID"])
)
# Group consumer_loan_status_termination_view view by Client entity (Consumer Loan_CLIENT_ID).
consumer_loan_status_termination_view_by_client = (
consumer_loan_status_termination_view.groupby(["Consumer Loan_CLIENT_ID"])
)
In [10]:
Copied!
# Get Max of Consumer Loan_AMT_ANNUITY for the Client over time.
client_max_of_consumer_loan_terminations_consumer_loan_amt_annuitys_104w = (
consumer_loan_status_termination_view_by_client.aggregate_over(
"Consumer Loan_AMT_ANNUITY",
method="max",
feature_names=[
"CLIENT_Max_of_Consumer_Loan_terminations_Consumer_Loan_AMT_ANNUITYs_104w"
],
windows=["104w"],
)["CLIENT_Max_of_Consumer_Loan_terminations_Consumer_Loan_AMT_ANNUITYs_104w"]
)
# Get Max of Consumer Loan_AMT_ANNUITY for the Client over time.
client_max_of_consumer_loan_terminations_consumer_loan_amt_annuitys_104w = (
consumer_loan_status_termination_view_by_client.aggregate_over(
"Consumer Loan_AMT_ANNUITY",
method="max",
feature_names=[
"CLIENT_Max_of_Consumer_Loan_terminations_Consumer_Loan_AMT_ANNUITYs_104w"
],
windows=["104w"],
)["CLIENT_Max_of_Consumer_Loan_terminations_Consumer_Loan_AMT_ANNUITYs_104w"]
)
Save features¶
In [11]:
Copied!
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits.save()
client_max_of_consumer_loan_terminations_consumer_loan_amt_annuitys_104w.save()
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits.save()
client_max_of_consumer_loan_terminations_consumer_loan_amt_annuitys_104w.save()
Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s) Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s)
Add description¶
In [12]:
Copied!
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits.update_description(
"Consumer Loan GOODS_CATEGORY with Lowest sum of active Consumer Loans "
"AMT_CREDITs for the Client."
)
client_max_of_consumer_loan_terminations_consumer_loan_amt_annuitys_104w.update_description(
"Max of Consumer Loan terminations Consumer Loan_AMT_ANNUITYs for the "
"Client over a 104w period."
)
client_consumer_loan_goods_category_with_lowest_sum_of_active_consumer_loans_amt_credits.update_description(
"Consumer Loan GOODS_CATEGORY with Lowest sum of active Consumer Loans "
"AMT_CREDITs for the Client."
)
client_max_of_consumer_loan_terminations_consumer_loan_amt_annuitys_104w.update_description(
"Max of Consumer Loan terminations Consumer Loan_AMT_ANNUITYs for the "
"Client over a 104w period."
)
In [ ]:
Copied!