Skip to content

Connect to Databricks

This guide will help you set up FeatureByte with a Databricks Data Warehouse.

Before You Begin

Gather the credentials for connecting to your databricks account, specifically:

  • Name of the databricks server you're connecting to
  • Sign-in credentials for the databricks server
  • Credentials for the storage service that is used to stage files for the databricks cluster

You'll also want to ensure that the user you're connecting with has the relevant privileges that are required. Specifically, the role should have the following privileges:

  • USAGE on cluster

Refer to Databricks Cluster ACL for more details.

Why are these privileges needed? These privileges are needed for featurebyte to write some metadata into your Databricks data warehouse. This is used internally by our application to track some metadata, and perform some optimizations to make your experience better.

Setup Guide

FeatureByte Installation

Make sure that you have FeatureByte installed. See installation for more details.

Step 1: Test that your connection works

We can now try to see if your connection works by trying to create a new feature store. We can do so by running the following commands (either in a notebook, or a python interactive shell).

  • If you know that a feature store exists already, we can try to list the existing feature stores on databricks.
    import featurebyte as fb
    
    fb.FeatureStore.list()
    
    # Get a feature store if it exists
    feature_store = fb.FeatureStore.get(name="<feature_store_name>")
    
  • Alternatively, try to create a feature store.
    # Name of the feature store that we want to create/connect to
    feature_store = fb.FeatureStore.get_or_create(
        name="<feature_store_name>",
        source_type=fb.SourceType.DATABRICKS,
        details=fb.DatabricksDetails(
            host="<host_name>",
            http_path="<http_path>",
            featurebyte_catalog="hive_metastore",
            featurebyte_schema="<schema_name>",
            storage_type=fb.StorageType.S3,
            storage_url="<storage_url>/<schema_name>",
            storage_spark_url="dbfs:/FileStore/<schema_name>",
        ),
        database_credential=fb.AccessTokenCredential(
            access_token="<access_token>",
        ),
        storage_credential=fb.S3StorageCredential(
            s3_access_key_id="<s3_access_key_id>",
            s3_secret_access_key="<s3_secret_access_key>",
        )
    )
    

Refer to Databricks JDBC Connection Parameters for more details.

Step 2: Connect to your Databricks feature store

# List all databases in the feature store
feature_store.get_data_source().list_databases()

Congratulations! You have successfully connected to your Databricks data warehouse if you are able to run these commands without any errors!

Next Steps

Now that you've connected to your data, feel free to try out some tutorials!