Skip to content

featurebyte.view.GroupBy.aggregate_asat

aggregate_asat(
value_column: Union[str, NoneType]=None,
method: Union[Literal["sum", "avg", "min", "max", "count", "na_count", "std", "latest"], NoneType]=None,
feature_name: Union[str, NoneType]=None,
offset: Union[str, NoneType]=None,
backward: bool=True,
fill_value: Union[StrictInt, StrictFloat, StrictStr, bool, NoneType]=None,
skip_fill_na: bool=False
) -> Feature

Description

The aggregate_as_at method of a GroupBy instance returns an Aggregate ""as at"" Feature object. The object aggregates data from the column specified by the value_column parameter using the aggregation method provided by the method parameter. By default, the aggrgegation is done on rows active at the point-in-time indicated in the feature request. The primary entity of the Feature is determined by the grouping key of the GroupBy instance,

These aggregation operations are exclusively available for Slowly Changing Dimension (SCD) views, and the grouping key used in the GroupBy instance should not be the natural key of the SCD view.

For instance, a possible example of an aggregate ‘as at’ feature from a Credit Cards table could be the count of credit cards held by a customer at the point-in-time indicated in the feature request.

If an offset is defined, the aggregation uses the active rows of the SCD view's data at the point-in-time indicated in the feature request, minus the specified offset.

If the GroupBy instance involves computation across a categorical column, the returned Feature object is a Cross Aggregate "as at" Feature. In this scenario, the feature value after materialization is a dictionary with keys representing the categories of the categorical column and their corresponding values indicating the aggregated values for each category.

You may choose to fill the feature value with a default value if the column to be aggregated is empty.

It is possible to perform additional transformations on the Feature object, and the Feature object is added to the catalog solely when explicitly saved.

Parameters

  • value_column: Union[str, NoneType]
    Column to be aggregated

  • method: Union[Literal["sum", "avg", "min", "max", "count", "na_count", "std", "latest"], NoneType]
    Aggregation method

  • feature_name: Union[str, NoneType]
    Output feature name

  • offset: Union[str, NoneType]
    Optional offset to apply to the point in time column in the feature request. The aggregation result will be as at the point in time adjusted by this offset. Format of offset is "{size}{unit}", where size is a positive integer and unit is one of the following:

    "ns": nanosecond
    "us": microsecond
    "ms": millisecond
    "s": second
    "m": minute
    "h": hour
    "d": day
    "w": week

  • backward: bool
    default: True
    Whether the offset should be applied backward or forward

  • fill_value: Union[StrictInt, StrictFloat, StrictStr, bool, NoneType]
    Value to fill if the value in the column is empty

  • skip_fill_na: bool
    default: False
    Whether to skip filling NaN values

Returns

  • Feature

Examples

Count number of active cards per customer at a point-in-time.

>>> # Filter active cards
>>> cond = credit_card_accounts['status'] == "active"
>>> # Group by customer
>>> active_credit_card_by_cust = credit_card_accounts[cond].groupby(
...   "CustomerID"
... )
>>> feature = active_credit_card_by_cust.aggregate_asat(
...   method=fb.AggFunc.COUNT,
...   feature_name="Number of Active Credit Cards",
... )

Count number of active cards per customer 12 weeks prior to a point-in-time

>>> feature_12w_before = active_credit_card_by_cust.aggregate_asat(
...   method=fb.AggFunc.COUNT,
...   feature_name="Number of Active Credit Cards 12 w before",
...   offset="12w"
... )