Skip to content

View

A View object is a local virtual table that can be modified and joined with other views to prepare data before feature definition. It does not store any data itself; instead, it retrieves data from the underlying tables each time it is queried. Like a SQL view, it serves as an abstraction layer that enables flexible data manipulation without altering the source tables.

Types of View Objects

Each View type corresponds to a specific table type and defines how data can be queried, joined, and transformed to create features. View objects form the foundation for building Lookup, Aggregate, and Cross Aggregate features.

There are seven distinct types of View objects:

View Type Derived From Purpose
EventView EventTable Used to define Lookup features for event entities, create Aggregate Over a Window features for other entities, or to augment related ItemView.
ItemView ItemTable Used to define Lookup features for item entities, create Non-Temporal Aggregate features for event entities, or create Aggregate Over a Window features for other entities.
SnapshotsView SnapshotsTable Used to create Aggregate Over a Window features, define Lookup features for the entity represented by the table’s series ID, or create Aggregate As At features for other entities. Can also be used to augment views through joins.
TimeSeriesView TimeSeriesTable Used to create Aggregate Over a Window features.
DimensionView DimensionTable Used to define Lookup features or to augment other views through joins.
SCDView SCDTable Used to define Lookup features for the entity represented by the table’s natural key, or to create Aggregate As At features for other entities. Can also be used to augment views through joins.
ChangeView SCDTable Used to create Aggregate Over a Window features on changes occurring in a specific attribute of the natural key of an SCD table.

Each View object can be further joined, filtered, or transformed to prepare datasets for feature declaration.

Deriving Views

View objects are obtained from Table objects using the get_view() or get_change_view() methods.

# Create an EventView from an EventTable
invoice_table = catalog.get_table("GROCERYINVOICE")
invoice_view = invoice_table.get_view()

# Alternatively, get the view directly from the catalog
invoice_view = catalog.get_view("GROCERYINVOICE")

# Create a ChangeView from an SCDTable
customer_table = catalog.get_table("GROCERYCUSTOMER")
address_changed_view = customer_table.get_change_view(
    track_changes_column="StreetAddress"
)

Two view construction modes are available:

  • Auto (default): Automatically cleans data according to default operations specified for each column within the table and excludes special columns not meant for feature engineering.
  • Manual: Allows custom cleaning operations without applying default cleaning operations. Note that the manual mode is not available in the get_view() method of the Catalog object.
# Obtain view in manual mode
item_table = catalog.get_table("INVOICEITEMS")
items_view = items_table.get_view(
    event_suffix=None,
    view_mode="manual",
    drop_column_names=[],
    column_cleaning_operations=[
        ColumnCleaningOperation(
            column_name="Discount",
            cleaning_operations=[
                MissingValueImputation(imputed_value=0),
                ValueBeyondEndpointImputation(
                    type="less_than", end_point=0, imputed_value=0
                ),
            ],
        )
    ],
    event_drop_column_names=["record_available_at"],
    event_column_cleaning_operations=[],
    event_join_column_names=[
        "Timestamp",
        "GroceryInvoiceGuid",
        "GroceryCustomerGuid",
    ],
)

Exploring Views

You can conduct exploratory analysis on view columns using the describe(), preview() and sample() methods. Descriptive statistics, preview, and sample are performed after applying cleaning operations set at the table level or during view creation if created in manual mode.

# Obtain descriptive statistics for a view
invoice_view.describe()
# Preview a selection of rows from the view
df = invoice_view.preview(limit=20)
# Sample random rows from the view based on a specified time range, size, and seed
df = invoice_view.sample(
    from_timestamp=pd.Timestamp('2023-04-01'),
    to_timestamp=pd.Timestamp('2023-05-01'), size=100, seed=23
)

Transforming Views

Transformations allow you to enrich, filter, and extend your data before defining features. You can modify View objects by adding new columns through transformations or joins. For a complete list of supported column operations, refer to the ViewColumn object documentation.

customer_view = catalog.get_view("GROCERYCUSTOMER")

# Extract operating system from BrowserUserAgent column
customer_view["OperatingSystemIsWindows"] = customer_view.BrowserUserAgent.str.contains("Windows")

Joining Views

You can combine two views using the join() method. This operation enriches one view (the left view) with attributes from another (the right view).

Join Mechanics

  • Use the join() method of the left view, specifying the right view object through the other_view parameter.
  • The method matches rows based on a shared key:
  • If the shared key identifies an entity referenced in the left view — or the column name matches in both views — FeatureByte automatically detects the join column.
  • By default, a left join is performed, preserving all rows from the left view. You can set how="inner" to perform an inner join instead.
  • When joining with an SCD or Snapshots view, the datetime column of the left view determines which record from the right view is joined.
  • Joining two SCD views together is not supported.

Join Direction

A view can only serve as the right table in joins if it represents a Dimension, SCD, Snapshots, or Event view. Time Series cannot be used as right tables.

Example

  • The GroceryCustomerGuid column in invoice_view is tagged with the customer entity.
  • The GroceryCustomerGuid natural key in customer_view is also tagged with the same entity. Therefore, the join can be performed automatically.
# Join based on shared entity 'customer'
invoice_view = invoice_view.join(
    customer_view[["Gender", "State"]], rsuffix="_Customer"
)

You can also explicitly specify the join column:

invoice_view = invoice_view.join(
    customer_view[["Gender", "State"]],
    rsuffix="_Customer",
    on="GroceryCustomerGuid"
)

Joining Event Attributes to Item Views

For ItemView objects, the event timestamp and columns representing entities from the related Event Table are automatically available. You can enrich an ItemView with additional attributes from its related Event Table using the join_event_table_attributes() method.

# Add invoice amount to items view
items_view.join_event_table_attributes(
    ["Amount"], event_suffix="_invoice_total"
)

Adding Features to Views

You can also attach an existing feature as a column to a view using the add_feature() method. For example, a simple non-temporal aggregate feature defined on an ItemView can be added to the corresponding EventView. Once added, it can be further aggregated over a temporal window to define an Aggregate Over a Window feature.

feature = catalog.get_feature("InvoiceCount")
invoice_view.add_feature("invoice_count", feature)

Subsetting Views

You can also create new views by subsetting rows from an existing view.

# Subset rows
cond = item_view.Discount != 0
items_with_discount = item_view[cond]

Accessing Source Table Raw Data in Views

To access raw data from the source table when working with views, you can use the raw property of the view. This allows you to interact directly with the original data, which is useful for tasks such as adding new columns based on issues in the original data.

items_view['Discount_missing'] = items_view.raw['Discount'].isnull()

Defining Features from Views

There are three ways to define features from a view:

  1. Directly use view attributes as features for the entity identified by the view's primary key or the natural key of an SCDView object using the as_feature() or as_features() methods. This returns a Lookup feature.
    # Create a feature from the OperatingSystemIsWindows column
    uses_windows = customer_view.OperatingSystemIsWindows.as_feature("UsesWindows")
    
  2. Determine the level of analysis for an Aggregate feature by grouping view rows based on columns representing one or more entities in the view using the groupby() method. Then, apply aggregations to define Aggregate features.

    # Group items by the column GroceryCustomerGuid that references the customer entity
    discounted_items_by_customer = items_with_discount.groupby("GroceryCustomerGuid")
    # Declare features that measure the discount received by the customer
    customer_discount = discounted_items_by_customer.aggregate_over(
        "Discount",
        method=fb.AggFunc.SUM,
        feature_names=["CustomerDiscounts_7d", "CustomerDiscounts_28d"],
        fill_value=0,
        windows=['7d', '28d']
    )
    

  3. Determine the level of analysis for a Cross Aggregate feature involving aggregation operations across categories by grouping view rows based on columns representing one or more entities in the view, and using a categorical column for performing operations across categories using the groupby() method. Then, apply aggregations to define Cross Aggregate features.

    # Group items by the column GroceryCustomerGuid that references the customer entity
    # and use GroceryProductGuid as the column to perform operations across
    discounted_items_by_customer_across_product_id = items_with_discount.groupby(
        by_keys = "GroceryCustomerGuid", category="GroceryProductGuid"
    )
    # Declare features that measure the total discount per customer across products
    customer_discount_across_product = \
        discounted_items_by_customer_across_product_id.aggregate_over(
        "Discount",
        method=fb.AggFunc.SUM,
        feature_names=[
            "CustomerDiscounts_per_product_7d",
            "CustomerDiscounts_per_product_28d"
        ],
        fill_value=0,
        windows=['7d', '28d']
    )