Deploying using Databricks MLFlow

Introduction¶

MLFlow is an open source platform for managing machine learning life cycles. It provides experimentation and deployment capabilities. The latter allows serving models via low-latency HTTP interface, i.e. exposing any machine learning model as an HTTP endpoint. MLFlow can be deployed separately, but it also comes with Databricks, that’s why if the data warehouse is based on Databricks or if infrastructure already includes MLFlow as a component it is possible to register (deploy) to MLFlow and call from the UDF function.

In this section we will go through the steps on how to deploy any transformer model to Databricks provided MLFlow deployment, but all steps are valid for any custom MLFlow deployment as well.

Steps¶

In order to execute the following steps we will need a Databricks account with permissions to create ML clusters. We will use the SBERT Transformer model, but any other model will also work.

Steps we need to perform:

Create compute cluster
Create notebook
Register model in model registry
Verify model is available over HTTP

Create compute cluster¶

Note

This step is Databricks specific. If you are using vanilla MLFlow you can proceed to step 3 “Register model in model registry”.

Go to compute and create ML cluster:

DataBricks ML Cluster

Make sure to pick an ML cluster in "Databrick runtime version", otherwise ML flow will not be available. Scroll down and click on the "Create Compute" button.

Create notebook¶

This notebook is mainly needed to register a model in the MLFlow registry and deploy it to production. If you are not using Databricks, just a Jupyter notebook or any other python script which has access to a custom MLFlow server is also fine. In Databricks you can create a notebook by clicking New -> Notebook

DataBricks Notebook

Register model in model registry¶

Since we are utilizing a Databricks notebook that was established in a prior step, there is no need to include MLFlow cluster connection code. If you're using a custom Jupyter notebook or Python script, it may be necessary to establish a connection with the MLFlow cluster. Typically, this connection is made using the mlflow.set_tracking_uri function, as detailed in the MLFlow documentation.

First, install required dependencies which are not pre-installed

%pip install sentence-transformers

Next, we will develop an MLFlow model wrapper. This wrapper will adapt the transformer model to meet the functional expectations of MLFlow. It will serve as the principal class for registration in the model registry and for subsequent deployment:

import pandas as pd

import mlflow
import sentence_transformers

from mlflow.models import infer_signature
from sentence_transformers import SentenceTransformer


class TransformerWrapper(mlflow.pyfunc.PythonModel):

    def __init__(self):
        self.model_name = "sentence-transformers/all-MiniLM-L12-v2"

    def load_context(self, context : dict):
        self.model = SentenceTransformer(self.model_name)

    def predict(self, context : dict, model_input : pd.DataFrame):
        return self.model.encode(model_input["text"])

Now, proceed to create the model, infer its input and output structure, specify the necessary pip dependencies, and register the model to the model registry:

with mlflow.start_run() as run:

    model_name = "transformer-model"
    registered_model_name=f"development.default.{model_name}"

    model = TransformerWrapper()
    model.load_context({})

    pip_reqs = mlflow.pyfunc.get_default_conda_env()
    pip_reqs["dependencies"][-1]["pip"] += [
        "sentence-transformers=={}".format(sentence_transformers.__version__),
    ]
    sample_inputs = pd.DataFrame({"text": ["Hello", "world"]})

    log_result = mlflow.pyfunc.log_model(
        model_name,
        python_model=model,
        conda_env=pip_reqs,
        input_example=sample_inputs,
        signature=infer_signature(sample_inputs, model.predict(None, sample_inputs)),
        registered_model_name=registered_model_name,
     )

This will produce an output similar to:

Registered model 'transformer-model' already exists. Creating a new version of this model...
Created version '3' of model 'development.default.transformer-model'.

Navigate to the 'Models' section located in the left-hand menu. There, select our 'transformer-model' to view its details.

MLFlow models

Deploy model to serving endpoint¶

Next, we will create a serving endpoint for the model:

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoint = client.create_endpoint(
    config={
        "name": model_name,
        "config": {
            "served_entities": [
                {
                    "entity_name": registered_model_name,
                    "entity_version": log_result.registered_model_version,
                    "workload_size": "Small",
                    "scale_to_zero_enabled": True
                }
            ],
        }
    }
)

Navigate to the 'Serving' section located in the left-hand menu. Select 'transformer-model' to view the details of the endpoint.

DataBricks serving endpoint logs

You will need to wait for some time for the container to be built and the endpoint to be ready. If the endpoint is not ready, you can check the logs to see what is happening.

Now, let’s query the endpoint using the requests library. Fill in the placeholders with your Databricks workspace URI and access token. You can generate access tokens by navigating to the 'Settings', then 'Developer', and finally the 'Access Tokens' section.

import requests

response = requests.post(
    '<your_databricks_workspace_uri>/serving-endpoints/transformer-model/invocations',
    headers={
        'Authorization': 'Bearer <your_databricks_access_token>',
        'Content-Type': 'application/json',
    },
    json={"inputs": {"text": ["hello", "world"]}},
)

After executing this code we can check the status and response body:

MLFLow HTTP model test

We see that response status is 200 (success), and the body contains our embeddings.

Full notebook is awailable here.