Skip to content

SourceTable

A SourceTable object is a table from a data warehouse that the feature store can access.

Getting a Source Table

  1. Obtain the DataSource object associated with a FeatureStore object using the 'get_data_source() method:

    ds = fb.FeatureStore.get("playground").get_data_source()
    

  2. Obtain a SourceTable object using the get_source_table() method:

    source_table = ds.get_source_table(
        database_name="spark_catalog",
        schema_name="GROCERY",
        table_name="GROCERYCUSTOMER"
    )
    

Exploring a Source Table

  1. Obtain descriptive statistics for the table using the describe() method:
    source_table.describe()
    
  2. Preview a selection of rows from the table using the preview() method:
    df = source_table.preview(limit=20)
    
  3. Sample a larger number of random rows based on a time range, size, and seed using the sample() method:
    df = source_table.sample(
        from_timestamp=pd.Timestamp('2023-04-01'),
        to_timestamp=pd.Timestamp('2023-05-01'),
        size=100, seed=23
    )
    

Registering the table to the catalog

  1. Activate the desired catalog using the activate() class method:

    catalog = fb.Catalog.activate(<catalog_name>)
    

  2. Determine the table's type and register the table using the method specific to its type:

Example of registering an event table using the create_event_table() method:

invoice_table = source_table.create_event_table(
    name="GROCERYINVOICE",
    event_id_column="GroceryInvoiceGuid",
    event_timestamp_column="Timestamp",
    event_timestamp_timezone_offset_column="tz_offset",
    record_creation_timestamp_column="record_available_at"
)

In this example, the table is added to the catalog as an EventTable object under the name of GROCERYINVOICE.