featurebyte.view.groupby¶
Description¶
The groupby method of a view returns a GroupBy class that can be used to group data based on one or more columns representing entities (specified in the key parameter). Within each entity or group of entities, the GroupBy class applies aggregation function(s) to the data.
The grouping keys determine the primary entity for the declared features in the aggregation function.
Moreover, the groupby method's category parameter allows you to define a categorical column, which can be used to generate Cross Aggregate Features. These features involve aggregating data across categories of the categorical column, enabling the extraction of patterns in an entity across these categories. For instance, you can calculate the amount spent by a customer on each product category during a specific time period using this approach.
Parameters¶
- by_keys: Union[str, List[str]]
Specifies the column or list of columns by which the data should be grouped. These columns must correspond to entities registered in the catalog. If this parameter is set to an empty list, the data will not be grouped. - category: Union[str, NoneType]
Optional category parameter to enable aggregation across categories. To use this parameter, provide the name of a column in the View that represents a categorical column.
Returns¶
- GroupBy
a groupby object that contains information about the groups
Examples¶
Groupby for Aggregate features.
>>> items_view = catalog.get_view("INVOICEITEMS")
>>> # Group items by the column GroceryCustomerGuid that references the customer entity
>>> items_by_customer = items_view.groupby("GroceryCustomerGuid")
>>> # Declare features that measure the discount received by customer
>>> customer_discounts = items_by_customer.aggregate_over(
... "Discount",
... method=fb.AggFunc.SUM,
... feature_names=["CustomerDiscounts_7d", "CustomerDiscounts_28d"],
... fill_value=0,
... windows=['7d', '28d']
... )
Groupby for Cross Aggregate features.
>>> # Join product view to items view
>>> product_view = catalog.get_view("GROCERYPRODUCT")
>>> items_view = items_view.join(product_view)
>>> # Group items by the column GroceryCustomerGuid that references the customer entity
>>> # And use ProductGroup as the column to perform operations across
>>> items_by_customer_across_product_group = items_view.groupby(
... by_keys="GroceryCustomerGuid", category="ProductGroup"
... )
See Also¶
- GroupBy: GroupBy object
- GroupBy.aggregate: Create feature from grouped aggregates
- GroupBy.aggregate_over: Create features from grouped aggregates over different time windows