Skip to content

Overview

FeatureByte's SDK Tutorials

Welcome aboard! You're about to embark on a learning journey with FeatureByte's SDK. Whether you're a beginner eager to get started or an expert looking to delve deeper, these tutorials have something for everyone. Step by step, we'll guide you through creating a catalog, registering its data model, formulating your use case, crafting features, computing training data, and deploying and managing those features.

For a holistic view of FeatureByte's open-source platform, along with insights into the overall workflow and the intricacies of the SDK, please head over to our documentation main page and explore the workflow and SDK overview sections.

Dataset Overview

The dataset for our main tutorial is the 'French grocery dataset'. The dataset contains four tables containing data from a chain of grocery stores:

  • GroceryCustomer: Customer details, including their name, address, and date of birth.

  • GroceryInvoice: Grocery invoice details, containing the timestamp and the total amount of the invoice.

  • InvoiceItems: The grocery item details within each invoice, including the quantity, total cost, discount applied, and product ID.

  • GroceryProduct: The product group description for each grocery product.

It can potentially be used for a number of prediction use cases, we are going to consider predicting amount active customers will spend in next 2 weeks.

French grocery dataset

Note

To simulate a production environment, the data resides in the tutorial's data warehouse, where it's dynamically updated with new records every hour.

Getting Started

  • For Practitioners: If you aim to run the notebooks and immerse yourself in the end-to-end workflow, please follow the instructions for the tutorials installation and execute each notebook in sequence.

  • For Readers: If you're here just to read and understand, feel free to jump to any section of your interest.

And if you're itching to explore more, after completing up to step 7 in the end-to-end workflow, dive deep into specific feature notebooks to witness the full power of FeatureByte's feature engineering capabilities.

End-to-End Workflow

  1. Create catalog

    Define the Data Model of the catalog

  2. Register tables

  3. Register entities
  4. Update descriptions to tables (optional)
  5. Set Default Cleaning Operations

    Formulate your use case

  6. Formulate Use Case

  7. Create Observation Tables

    Create features

  8. Create lookup feature

  9. Create window aggregate features
  10. Derive features from other features
  11. Derive similarity features from bucketing
  12. Use embeddings

    Compute training data for your use case

  13. Create feature list

  14. Compute historical feature values

    Deploy and manage your features

  15. Deploy and serve a feature list

  16. Manage feature life cycle

Expand Your Horizons

Want more? Learn by example! Here are some additional feature examples tailored for various entities:

If you are interested in integrating your own transformer models for text processing or other transformations within the FeatureByte SDK. This is done by registering a User Defined Function (UDF).

For step-by-step guidance on creating a SQL Embedding UDF, visit the Bring Your Own Transformer tutorial.

Download Tutorials

Download all the end-to-end workflow notebooks here.