View
A View object is a local virtual table that can be modified and joined to other views to prepare data before feature definition. A view does not contain any data of its own, but instead retrieves data from the underlying tables each time it is queried. It doesn't modify the data in those tables either. The view object works similar to a SQL view.
There are five distinct types of View objects:
View Type | Derived from | Purpose |
---|---|---|
EventView | EventTable | Used to define Lookup features for event entities, create Aggregate Over a Window features for other entities, or augment related ItemView. |
ItemView | ItemTable | Used to define Lookup features for item entities, create Simple Aggregate features for event entities, or create Aggregate Over a Window features for other entities. |
DimensionView | DimensionTable | Used to define Lookup features and augment other views through joins. |
SCDView | SCDTable | Used to define Lookup features for the entity represented by the table's natural key or to create Aggregate As At features for other entities. Can also be used to augment views of event or item views through joins. |
ChangeView | SCDTable | Used to create Aggregate Over a Window features on changes happening in a specific attribute of the natural key of the SCD table. |
Deriving Views¶
View objects are obtained from Table objects using the get_view()
or get_change_view()
methods.
# Create an EventView from an EventTable
invoice_table = catalog.get_table("GROCERYINVOICE")
invoice_view = invoice_table.get_view()
# Alternatively, get the view directly from the catalog
invoice_view = catalog.get_view("GROCERYINVOICE")
# Create a ChangeView from an SCDTable
customer_table = catalog.get_table("GROCERYCUSTOMER")
address_changed_view = customer_table.get_change_view(
track_changes_column="StreetAddress"
)
Two view construction modes are available:
- Auto (default): Automatically cleans data according to default operations specified for each column within the table and excludes special columns not meant for feature engineering.
- Manual: Allows custom cleaning operations without applying default cleaning operations. Note that the manual mode is not available in the
get_view()
method of the Catalog object.
# Obtain view in manual mode
item_table = catalog.get_table("INVOICEITEMS")
items_view = items_table.get_view(
event_suffix=None,
view_mode="manual",
drop_column_names=[],
column_cleaning_operations=[
ColumnCleaningOperation(
column_name="Discount",
cleaning_operations=[
MissingValueImputation(imputed_value=0),
ValueBeyondEndpointImputation(
type="less_than", end_point=0, imputed_value=0
),
],
)
],
event_drop_column_names=["record_available_at"],
event_column_cleaning_operations=[],
event_join_column_names=[
"Timestamp",
"GroceryInvoiceGuid",
"GroceryCustomerGuid",
],
)
Exploring Views¶
You can conduct exploratory analysis on view columns using the describe()
, preview()
and sample()
methods. Descriptive statistics, preview, and sample are performed after applying cleaning operations set at the table level or during view creation if created in manual mode.
# Obtain descriptive statistics for a view
invoice_view.describe()
# Preview a selection of rows from the view
df = invoice_view.preview(limit=20)
# Sample random rows from the view based on a specified time range, size, and seed
df = invoice_view.sample(
from_timestamp=pd.Timestamp('2023-04-01'),
to_timestamp=pd.Timestamp('2023-05-01'), size=100, seed=23
)
Transforming Views¶
You can modify View objects by adding new columns by performing various operations on the view columns. For a comprehensive guide on view column operations, refer to the ViewColumn object documentation.
customer_view = catalog.get_view("GROCERYCUSTOMER")
# Extract operating system from BrowserUserAgent column
customer_view["OperatingSystemIsWindows"] = \
customer_view.BrowserUserAgent.str.contains("Windows")
To add new columns to a view, you can use the join()
method to combine it with another view.
Note
To join two views, use the join() method of the left view and specify the right view object in the other_view parameter. The method will match rows from both views based on a shared key, which is either the primary key of the right view or the natural key if the right view is a Slowly Changing Dimension (SCD) view.
If the shared key identifies an entity that is referenced in the left view or the column name of the shared key is the same in both views, the join() method will automatically identify the column in the left view to use for the join.
By default, a left join is performed, and the resulting view will have the same number of rows as the left view. However, you can set the how
parameter to 'inner' to perform an inner join. In this case, the resulting view will only contain rows where there is a match between the columns in both tables.
When the right view is an SCD view, the event timestamp of the left view determines which record of the right view to join.
Consider this example:
- The GroceryCustomerGuid column in invoice_view is tagged with the customer entity
- The GroceryCustomerGuid natural key in customer_view is tagged with the customer entity
# customer_view can be joined to invoice_view based on their related entity ‘customer’.
invoice_view = invoice_view.join(
customer_view[["Gender", "State"]], rsuffix="_Customer"
)
You could explicitly specify the column in the calling view for the join:
invoice_view = invoice_view.join(
customer_view[["Gender", "State"]], rsuffix="_Customer",
on=”GroceryCustomerGuid”
)
For ItemView objects, the event timestamp and columns representing entities in the related Event table are automatically added. Additional attributes can be joined using the join_event_table_attributes()
method.
# Add invoice amount to items view
items_view.join_event_table_attributes(
['Amount'], event_suffix='_invoice_total'
)
Simple aggregate features obtained from an ItemView object can be added as a column to the corresponding EventView using the add_feature()
method. Once the feature is integrated, the new column can be aggregated over a time window to create Aggregate Over a Window features.
You can also create new views from subsets of existing views.
Accessing Source Table Raw Data in Views¶
To access raw data from the source table when working with views, you can use the raw
property of the view. This allows you to interact directly with the original data, which is useful for tasks such as adding new columns based on issues in the original data.
Defining Features from Views¶
There are three ways to define features from a view:
- Directly use view attributes as features for the entity identified by the view's primary key or the natural key of an SCDView object using the
as_feature()
oras_features()
methods. This returns a Lookup feature. -
Determine the level of analysis for an Aggregate feature by grouping view rows based on columns representing one or more entities in the view using the
groupby()
method. Then, apply aggregations to define Aggregate features.# Group items by the column GroceryCustomerGuid that references the customer entity discounted_items_by_customer = items_with_discount.groupby("GroceryCustomerGuid") # Declare features that measure the discount received by the customer customer_discount = discounted_items_by_customer.aggregate_over( "Discount", method=fb.AggFunc.SUM, feature_names=["CustomerDiscounts_7d", "CustomerDiscounts_28d"], fill_value=0, windows=['7d', '28d'] )
-
Determine the level of analysis for a Cross Aggregate feature involving aggregation operations across categories by grouping view rows based on columns representing one or more entities in the view, and using a categorical column for performing operations across categories using the
groupby()
method. Then, apply aggregations to define Cross Aggregate features.# Group items by the column GroceryCustomerGuid that references the customer entity # and use GroceryProductGuid as the column to perform operations across discounted_items_by_customer_across_product_id = items_with_discount.groupby( by_keys = "GroceryCustomerGuid", category="GroceryProductGuid" ) # Declare features that measure the total discount per customer across products customer_discount_across_product = \ discounted_items_by_customer_across_product_id.aggregate_over( "Discount", method=fb.AggFunc.SUM, feature_names=[ "CustomerDiscounts_per_product_7d", "CustomerDiscounts_per_product_28d" ], fill_value=0, windows=['7d', '28d'] )