Connect to Databricks¶
This guide will help you set up FeatureByte with a Databricks Data Warehouse.
Before You Begin¶
Gather the credentials for connecting to your databricks account, specifically:
- Name of the databricks server you're connecting to
- Sign-in credentials for the databricks server
- Credentials for the storage service that is used to stage files for the databricks cluster
You'll also want to ensure that the user you're connecting with has the relevant privileges that are required. Specifically, the role should have the following privileges:
USAGE
on cluster
Refer to Databricks Cluster ACL for more details.
Why are these privileges needed?
These privileges are needed for featurebyte to write some metadata into your Databricks data warehouse. This is used internally by our application to track some metadata, and perform some optimizations to make your experience better.Setup Guide¶
FeatureByte Installation
Make sure that you have FeatureByte installed. See installation for more details.
Step 1: Test that your connection works¶
We can now try to see if your connection works by trying to create a new feature store. We can do so by running the following commands (either in a notebook, or a python interactive shell).
- If you know that a feature store exists already, we can try to list the existing feature stores on databricks.
- Alternatively, try to create a feature store.
# Name of the feature store that we want to create/connect to feature_store = fb.FeatureStore.get_or_create( name="<feature_store_name>", source_type=fb.SourceType.DATABRICKS, details=fb.DatabricksDetails( host="<host_name>", http_path="<http_path>", featurebyte_catalog="hive_metastore", featurebyte_schema="<schema_name>", storage_type=fb.StorageType.S3, storage_url="<storage_url>/<schema_name>", storage_spark_url="dbfs:/FileStore/<schema_name>", ), database_credential=fb.AccessTokenCredential( access_token="<access_token>", ), storage_credential=fb.S3StorageCredential( s3_access_key_id="<s3_access_key_id>", s3_secret_access_key="<s3_secret_access_key>", ) )
Refer to Databricks JDBC Connection Parameters for more details.
Step 2: Connect to your Databricks feature store¶
Congratulations! You have successfully connected to your Databricks data warehouse if you are able to run these commands without any errors!
Next Steps¶
Now that you've connected to your data, feel free to try out some tutorials!