The modern Feature Engineering & Management platform¶

FeatureByte is a free and source available feature platform designed to:

Create state-of-the-art features, not data pipelines: Create features for Machine Learning with just a few lines of code. Leave the plumbing and pipelining to FeatureByte. We take care of orchestrating the data ops - whether it’s time-window aggs or backfilling, so you can deliver more value from data.
Improve Accuracy through data: Use the intuitive feature declaration framework to transform creative ideas into training data in minutes. Ditch the limitations of ad-hoc pipelines for features with much more scale, complexity and freshness.
Streamline machine learning data pipelines: Get more value from AI. Faster. Deploy and serve features in minutes, instead of weeks or months. Declare features in Python and automatically generate optimized data pipelines — all using tools you love like Jupyter Notebooks.

Take charge of the entire ML feature lifecycle¶

Feature Engineering and management doesn’t have to be complicated. Take charge of the entire ML feature lifecycle. With FeatureByte, you can create, experiment, serve and manage your features in one tool.

Create

Create and share state-of-the-art ML features effortlessly
Search and reuse features to create feature lists tailored to your use case

Create and Save Feature

# Get view from catalog
invoice_view = catalog.get_view("GROCERYINVOICE")
# Declare features of total spent by customer
# in the past 7 and 28 days
customer_purchases = invoice_view.groupby(
    "GroceryCustomerGuid"
).aggregate_over(
    "Amount",
    method="sum",
    feature_names=[
        "CustomerTotalSpent_7d",
        "CustomerTotalSpent_28d"
    ],
    fill_value=0,
    windows=['7d', '28d']
)
customer_purchases.save()

Experiment Featurelist

# Get feature list from the catalog
feature_list = catalog.get_feature_list(
    "200 Features on Active Customers"
)
# Get an observation set from the catalog
observation_set = catalog.get_observation_table(
    "5M rows of active Customers in 2021-2022"
)
# Compute training data and
# store it in the feature store for reuse and audit
training = \
    feature_list.compute_historical_feature_table(
      observation_set,
      name="Training set to predict purchases next 2w"
    )

Experiment

Immediately access historical features through automated backfilling - let FeatureByte handle the complexity of time-aware SQL
Experiment on live data at scale, innovating faster
Iterate rapidly with different feature lists to create more accurate models

Serve

Deploy AI data pipelines and serve features in minutes
Access features with low latency
Reduce costs and security risk by performing computations in your existing data platform
Ensure data consistency between model training and inferencing

Deploy and Serve Feature List

# Get feature list from the catalog
feature_list = catalog.get_feature_list(
    "200 Features on Active Customers"
)
# Create deployment
deployment = feature_list.deploy(
    name="Features for customer purchases next 2w",
)
# Activate deployment
deployment.enable()
# Get shell script template for online serving
deployment.get_online_serving_code(language="sh")

Define Data Cleaning Policy on Table

# Get table from catalog
items_table = catalog.get_table("INVOICEITEMS")

# Discount must not be negative
items_table.Discount.update_critical_data_info(
    cleaning_operations=[
        fb.MissingValueImputation(
            imputed_value=0
        ),
        fb.ValueBeyondEndpointImputation(
            type="less_than",
            end_point=0,
            imputed_value=0
        ),
    ]
)

Manage

Organize feature engineering assets with domain-specific catalogs
Centralize cleaning operations and feature job configurations
Differentiate features that are prototype versus production ready
Create new versions of your features to handle changes in data
Keep full lineage of your training data and features in production
Monitor the health of feature pipelines centrally