View
A View object is a local virtual table that can be modified and joined with other views to prepare data before feature definition. It does not store any data itself; instead, it retrieves data from the underlying tables each time it is queried. Like a SQL view, it serves as an abstraction layer that enables flexible data manipulation without altering the source tables.
Types of View Objects¶
Each View type corresponds to a specific table type and defines how data can be queried, joined, and transformed to create features. View objects form the foundation for building Lookup, Aggregate, and Cross Aggregate features.
There are seven distinct types of View objects:
View Type | Derived From | Purpose |
---|---|---|
EventView | EventTable | Used to define Lookup features for event entities, create Aggregate Over a Window features for other entities, or to augment related ItemView. |
ItemView | ItemTable | Used to define Lookup features for item entities, create Non-Temporal Aggregate features for event entities, or create Aggregate Over a Window features for other entities. |
SnapshotsView | SnapshotsTable | Used to create Aggregate Over a Window features, define Lookup features for the entity represented by the table’s series ID, or create Aggregate As At features for other entities. Can also be used to augment views through joins. |
TimeSeriesView | TimeSeriesTable | Used to create Aggregate Over a Window features. |
DimensionView | DimensionTable | Used to define Lookup features or to augment other views through joins. |
SCDView | SCDTable | Used to define Lookup features for the entity represented by the table’s natural key, or to create Aggregate As At features for other entities. Can also be used to augment views through joins. |
ChangeView | SCDTable | Used to create Aggregate Over a Window features on changes occurring in a specific attribute of the natural key of an SCD table. |
Each View object can be further joined, filtered, or transformed to prepare datasets for feature declaration.
Deriving Views¶
View objects are obtained from Table objects using the get_view()
or get_change_view()
methods.
# Create an EventView from an EventTable
invoice_table = catalog.get_table("GROCERYINVOICE")
invoice_view = invoice_table.get_view()
# Alternatively, get the view directly from the catalog
invoice_view = catalog.get_view("GROCERYINVOICE")
# Create a ChangeView from an SCDTable
customer_table = catalog.get_table("GROCERYCUSTOMER")
address_changed_view = customer_table.get_change_view(
track_changes_column="StreetAddress"
)
Two view construction modes are available:
- Auto (default): Automatically cleans data according to default operations specified for each column within the table and excludes special columns not meant for feature engineering.
- Manual: Allows custom cleaning operations without applying default cleaning operations. Note that the manual mode is not available in the
get_view()
method of the Catalog object.
# Obtain view in manual mode
item_table = catalog.get_table("INVOICEITEMS")
items_view = items_table.get_view(
event_suffix=None,
view_mode="manual",
drop_column_names=[],
column_cleaning_operations=[
ColumnCleaningOperation(
column_name="Discount",
cleaning_operations=[
MissingValueImputation(imputed_value=0),
ValueBeyondEndpointImputation(
type="less_than", end_point=0, imputed_value=0
),
],
)
],
event_drop_column_names=["record_available_at"],
event_column_cleaning_operations=[],
event_join_column_names=[
"Timestamp",
"GroceryInvoiceGuid",
"GroceryCustomerGuid",
],
)
Exploring Views¶
You can conduct exploratory analysis on view columns using the describe()
, preview()
and sample()
methods. Descriptive statistics, preview, and sample are performed after applying cleaning operations set at the table level or during view creation if created in manual mode.
# Obtain descriptive statistics for a view
invoice_view.describe()
# Preview a selection of rows from the view
df = invoice_view.preview(limit=20)
# Sample random rows from the view based on a specified time range, size, and seed
df = invoice_view.sample(
from_timestamp=pd.Timestamp('2023-04-01'),
to_timestamp=pd.Timestamp('2023-05-01'), size=100, seed=23
)
Transforming Views¶
Transformations allow you to enrich, filter, and extend your data before defining features. You can modify View objects by adding new columns through transformations or joins. For a complete list of supported column operations, refer to the ViewColumn object documentation.
customer_view = catalog.get_view("GROCERYCUSTOMER")
# Extract operating system from BrowserUserAgent column
customer_view["OperatingSystemIsWindows"] = customer_view.BrowserUserAgent.str.contains("Windows")
Joining Views¶
You can combine two views using the join()
method.
This operation enriches one view (the left view) with attributes from another (the right view).
Join Mechanics
- Use the
join()
method of the left view, specifying the right view object through theother_view
parameter. - The method matches rows based on a shared key:
- the dimension ID if the right view is a DimensionView,
- the series ID if the right view is a SnapshotsView,
- the natural key if the right view is an SCDView, or
- the event ID if the right view is an EventView.
- If the shared key identifies an entity referenced in the left view — or the column name matches in both views — FeatureByte automatically detects the join column.
- By default, a left join is performed, preserving all rows from the left view. You can set
how="inner"
to perform an inner join instead. - When joining with an SCD or Snapshots view, the datetime column of the left view determines which record from the right view is joined.
- Joining two SCD views together is not supported.
Join Direction
A view can only serve as the right table in joins if it represents a Dimension, SCD, Snapshots, or Event view. Time Series cannot be used as right tables.
Example
- The
GroceryCustomerGuid
column ininvoice_view
is tagged with thecustomer
entity. - The
GroceryCustomerGuid
natural key incustomer_view
is also tagged with the same entity. Therefore, the join can be performed automatically.
# Join based on shared entity 'customer'
invoice_view = invoice_view.join(
customer_view[["Gender", "State"]], rsuffix="_Customer"
)
You can also explicitly specify the join column:
invoice_view = invoice_view.join(
customer_view[["Gender", "State"]],
rsuffix="_Customer",
on="GroceryCustomerGuid"
)
Joining Event Attributes to Item Views¶
For ItemView objects,
the event timestamp and columns representing entities from the related Event Table are automatically available.
You can enrich an ItemView with additional attributes from its related Event Table using the join_event_table_attributes()
method.
# Add invoice amount to items view
items_view.join_event_table_attributes(
["Amount"], event_suffix="_invoice_total"
)
Adding Features to Views¶
You can also attach an existing feature as a column to a view using the add_feature()
method.
For example, a simple non-temporal aggregate feature defined on an ItemView can be added to the corresponding EventView.
Once added, it can be further aggregated over a temporal window to define an Aggregate Over a Window feature.
Subsetting Views¶
You can also create new views by subsetting rows from an existing view.
Accessing Source Table Raw Data in Views¶
To access raw data from the source table when working with views, you can use the raw
property of the view. This allows you to interact directly with the original data, which is useful for tasks such as adding new columns based on issues in the original data.
Defining Features from Views¶
There are three ways to define features from a view:
- Directly use view attributes as features for the entity identified by the view's primary key or the natural key of an SCDView object using the
as_feature()
oras_features()
methods. This returns a Lookup feature. -
Determine the level of analysis for an Aggregate feature by grouping view rows based on columns representing one or more entities in the view using the
groupby()
method. Then, apply aggregations to define Aggregate features.# Group items by the column GroceryCustomerGuid that references the customer entity discounted_items_by_customer = items_with_discount.groupby("GroceryCustomerGuid") # Declare features that measure the discount received by the customer customer_discount = discounted_items_by_customer.aggregate_over( "Discount", method=fb.AggFunc.SUM, feature_names=["CustomerDiscounts_7d", "CustomerDiscounts_28d"], fill_value=0, windows=['7d', '28d'] )
-
Determine the level of analysis for a Cross Aggregate feature involving aggregation operations across categories by grouping view rows based on columns representing one or more entities in the view, and using a categorical column for performing operations across categories using the
groupby()
method. Then, apply aggregations to define Cross Aggregate features.# Group items by the column GroceryCustomerGuid that references the customer entity # and use GroceryProductGuid as the column to perform operations across discounted_items_by_customer_across_product_id = items_with_discount.groupby( by_keys = "GroceryCustomerGuid", category="GroceryProductGuid" ) # Declare features that measure the total discount per customer across products customer_discount_across_product = \ discounted_items_by_customer_across_product_id.aggregate_over( "Discount", method=fb.AggFunc.SUM, feature_names=[ "CustomerDiscounts_per_product_7d", "CustomerDiscounts_per_product_28d" ], fill_value=0, windows=['7d', '28d'] )