Skip to content

featurebyte.view.GroupBy.aggregate

aggregate(
value_column: Optional[str]=None,
method: Optional[Literal["sum", "avg", "min", "max", "count", "na_count", "std", "latest"]]=None,
feature_name: Optional[str]=None,
fill_value: Union[StrictInt, StrictFloat, StrictStr, bool, NoneType]=None,
skip_fill_na: Optional[bool]=None
) -> Feature

Description

The aggregate method of a GroupBy class instance returns a Simple Aggregate Feature object. This object aggregates data from the column specified by the value_column parameter using the aggregation method provided by the method parameter, without taking into account the order or sequence of the data. The primary entity of the Feature is determined by the grouping key of the GroupBy instance.

If the GroupBy class instance involves computation across a categorical column, the resulting Feature object is a Simple Cross Aggregate Feature. In this scenario, the feature value after materialization is a dictionary with keys representing the categories of the categorical column and their corresponding values indicating the aggregated values for each category.

You can choose to fill the feature value with a default value if the column being aggregated is empty.

It's important to note that additional transformations can be performed on the Feature object. The Feature object is added to the catalog only when explicitly saved.

To avoid time leakage, simple aggregation is exclusively supported for Item views. This is applicable when the grouping key corresponds to the event key of the Item view. An example of such features includes the count of items in an Order.

Parameters

  • value_column: Optional[str]
    Column to be aggregated

  • method: Optional[Literal["sum", "avg", "min", "max", "count", "na_count", "std", "latest"]]
    Aggregation method

  • feature_name: Optional[str]
    Output feature name

  • fill_value: Union[StrictInt, StrictFloat, StrictStr, bool, NoneType]
    Value to fill if the value in the column is empty

  • skip_fill_na: Optional[bool]
    Whether to skip filling NaN values, filling nan operation is skipped by default as it is expensive during feature serving

Returns

  • Feature

Examples

>>> items_view = catalog.get_view("INVOICEITEMS")
>>> # Group items by the column GroceryInvoiceGuid that references the customer entity
>>> items_by_invoice = items_view.groupby("GroceryInvoiceGuid")
>>> # Get the number of items in each invoice
>>> invoice_item_count = items_by_invoice.aggregate(
...   None,
...   method=fb.AggFunc.COUNT,
...   feature_name="InvoiceItemCount",
... )