Skip to content

featurebyte.view.GroupBy.aggregate_over

aggregate_over(
value_column: Union[str, NoneType]=None,
method: Union[Literal["sum", "avg", "min", "max", "count", "na_count", "std", "latest"], NoneType]=None,
windows: Union[List[Union[str, NoneType]], NoneType]=None,
feature_names: Union[List[str], NoneType]=None,
timestamp_column: Union[str, NoneType]=None,
feature_job_setting: Union[FeatureJobSetting, NoneType]=None,
fill_value: Union[StrictInt, StrictFloat, StrictStr, bool, NoneType]=None,
skip_fill_na: bool=False
) -> FeatureGroup

Description

The aggregate_over method of a GroupBy instance returns a FeatureGroup containing Aggregate Over a Window Feature objects. These Feature objects aggregate data from the column specified by the value_column parameter, using the aggregation method provided by the method parameter. The aggregation is performed within specific time frames prior to the point-in-time indicated in the feature request. The time frames are defined by the windows parameter. Each Feature object within the FeatureGroup corresponds to a window in the list provided by the windows parameter. The primary entity of the Feature is determined by the grouping key of the GroupBy instance.

These features are often used for analyzing event and item data.

If the GroupBy instance involves computation across a categorical column, the resulting Feature object is a Cross Aggregate Over a Window Feature. In this scenario, the feature value after materialization is a dictionary with keys representing the categories of the categorical column and their corresponding values indicating the aggregated values for each category.

You can choose to fill the feature value with a default value if the column being aggregated is empty.

Additional transformations can be performed on the Feature objects, and the Feature objects within the FeatureGroup are added to the catalog only when explicitly saved.

Parameters

  • value_column: Union[str, NoneType]
    Column to be aggregated

  • method: Union[Literal["sum", "avg", "min", "max", "count", "na_count", "std", "latest"], NoneType]
    Aggregation method

  • windows: Union[List[Union[str, NoneType]], NoneType]
    List of aggregation window sizes. Use None to indicated unbounded window size (only applicable to "latest" method). Format of a window size is "{size}{unit}", where size is a positive integer and unit is one of the following:

    "ns": nanosecond
    "us": microsecond
    "ms": millisecond
    "s": second
    "m": minute
    "h": hour
    "d": day
    "w": week

    Note: Window sizes must be multiples of feature job frequency

  • feature_names: Union[List[str], NoneType]
    Output feature names

  • timestamp_column: Union[str, NoneType]
    Timestamp column used to specify the window (if not specified, event table timestamp is used)

  • feature_job_setting: Union[FeatureJobSetting, NoneType]
    Dictionary contains blind_spot, frequency and time_modulo_frequency keys which are feature job setting parameters

  • fill_value: Union[StrictInt, StrictFloat, StrictStr, bool, NoneType]
    Value to fill if the value in the column is empty

  • skip_fill_na: bool
    default: False
    Whether to skip filling NaN values

Returns

  • FeatureGroup

Examples

Sum of discounts by grocerycustomer entity over the past 7 and 28 days.

>>> items_view = catalog.get_view("INVOICEITEMS")
>>> # Group items by the column GroceryCustomerGuid that references the customer entity
>>> items_by_customer = items_view.groupby("GroceryCustomerGuid")
>>> # Declare features that measure the discount received by customer
>>> customer_discounts = items_by_customer.aggregate_over(
...   "Discount",
...   method=fb.AggFunc.SUM,
...   feature_names=["CustomerDiscounts_7d", "CustomerDiscounts_28d"],
...   fill_value=0,
...   windows=['7d', '28d']
... )

Sum spent by grocerycustomer entity across product group over the past 28 days.

>>> # Join product view to items view
>>> product_view = catalog.get_view("GROCERYPRODUCT")
>>> items_view = items_view.join(product_view)
>>> # Group items by the column GroceryCustomerGuid that references the customer entity
>>> # And use ProductGroup as the column to perform operations across
>>> items_by_customer_across_product_group = items_view.groupby(
...   by_keys="GroceryCustomerGuid", category="ProductGroup"
... )
>>> # Cross Aggregate feature of the customer purchases across product group over the past 4 weeks
>>> customer_inventory_28d = items_by_customer_across_product_group.aggregate_over(
...   "TotalCost",
...   method=fb.AggFunc.SUM,
...   feature_names=["CustomerInventory_28d"],
...   windows=['28d']
... )

See Also