Dlt notebook
In [0]:
Copied!
%pip install mlflow
%pip install sentence-transformers
%pip install cloudpickle==2.0.0
%pip install mlflow
%pip install sentence-transformers
%pip install cloudpickle==2.0.0
Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages. Requirement already satisfied: mlflow in /local_disk0/.ephemeral_nfs/envs/pythonEnv-55cb8023-69e4-4349-acf0-7c919b05a898/lib/python3.10/site-packages (2.8.0) Requirement already satisfied: alembic!=1.10.0,<2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-55cb8023-69e4-4349-acf0-7c919b05a898/lib/python3.10/site-packages (from mlflow) (1.12.1) Requirement already satisfied: pyarrow<14,>=4.0.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (8.0.0) Requirement already satisfied: querystring-parser<2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-55cb8023-69e4-4349-acf0-7c919b05a898/lib/python3.10/site-packages (from mlflow) (1.2.4) Requirement already satisfied: packaging<24 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (22.0) Requirement already satisfied: Flask<4 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (2.2.5) Requirement already satisfied: importlib-metadata!=4.7.0,<7,>=3.7.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (4.11.3) Requirement already satisfied: databricks-cli<1,>=0.8.7 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (0.17.7) Requirement already satisfied: sqlalchemy<3,>=1.4.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.4.39) Requirement already satisfied: sqlparse<1,>=0.4.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (0.4.2) Requirement already satisfied: click<9,>=7.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (8.0.4) Requirement already satisfied: psutil<6 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (5.9.0) Requirement already satisfied: scikit-learn<2 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.1.1) Requirement already satisfied: docker<7,>=4.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-55cb8023-69e4-4349-acf0-7c919b05a898/lib/python3.10/site-packages (from mlflow) (6.1.3) Requirement already satisfied: pytz<2024 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (2022.7) Requirement already satisfied: markdown<4,>=3.3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.4.1) Requirement already satisfied: requests<3,>=2.17.3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (2.28.1) Requirement already satisfied: Jinja2<4,>=2.11 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.1.2) Requirement already satisfied: numpy<2 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.23.5) Requirement already satisfied: gitpython<4,>=2.1.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.1.27) Requirement already satisfied: protobuf<5,>=3.12.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (4.24.0) Requirement already satisfied: pyyaml<7,>=5.1 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (6.0) Requirement already satisfied: gunicorn<22 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (20.1.0) Requirement already satisfied: cloudpickle<3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (2.0.0) Requirement already satisfied: scipy<2 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.10.0) Requirement already satisfied: entrypoints<1 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (0.4) Requirement already satisfied: pandas<3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.5.3) Requirement already satisfied: matplotlib<4 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.7.0) Requirement already satisfied: Mako in /databricks/python3/lib/python3.10/site-packages (from alembic!=1.10.0,<2->mlflow) (1.2.0) Requirement already satisfied: typing-extensions>=4 in /databricks/python3/lib/python3.10/site-packages (from alembic!=1.10.0,<2->mlflow) (4.4.0) Requirement already satisfied: oauthlib>=3.1.0 in /usr/lib/python3/dist-packages (from databricks-cli<1,>=0.8.7->mlflow) (3.2.0) Requirement already satisfied: pyjwt>=1.7.0 in /usr/lib/python3/dist-packages (from databricks-cli<1,>=0.8.7->mlflow) (2.3.0) Requirement already satisfied: urllib3<2.0.0,>=1.26.7 in /databricks/python3/lib/python3.10/site-packages (from databricks-cli<1,>=0.8.7->mlflow) (1.26.14) Requirement already satisfied: tabulate>=0.7.7 in /databricks/python3/lib/python3.10/site-packages (from databricks-cli<1,>=0.8.7->mlflow) (0.8.10) Requirement already satisfied: six>=1.10.0 in /usr/lib/python3/dist-packages (from databricks-cli<1,>=0.8.7->mlflow) (1.16.0) Requirement already satisfied: websocket-client>=0.32.0 in /databricks/python3/lib/python3.10/site-packages (from docker<7,>=4.0.0->mlflow) (0.58.0) Requirement already satisfied: Werkzeug>=2.2.2 in /databricks/python3/lib/python3.10/site-packages (from Flask<4->mlflow) (2.2.2) Requirement already satisfied: itsdangerous>=2.0 in /databricks/python3/lib/python3.10/site-packages (from Flask<4->mlflow) (2.0.1) Requirement already satisfied: gitdb<5,>=4.0.1 in /databricks/python3/lib/python3.10/site-packages (from gitpython<4,>=2.1.0->mlflow) (4.0.10) Requirement already satisfied: setuptools>=3.0 in /databricks/python3/lib/python3.10/site-packages (from gunicorn<22->mlflow) (65.6.3) Requirement already satisfied: zipp>=0.5 in /databricks/python3/lib/python3.10/site-packages (from importlib-metadata!=4.7.0,<7,>=3.7.0->mlflow) (3.11.0) Requirement already satisfied: MarkupSafe>=2.0 in /databricks/python3/lib/python3.10/site-packages (from Jinja2<4,>=2.11->mlflow) (2.1.1) Requirement already satisfied: fonttools>=4.22.0 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (4.25.0) Requirement already satisfied: pyparsing>=2.3.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (2.8.2) Requirement already satisfied: cycler>=0.10 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (0.11.0) Requirement already satisfied: pillow>=6.2.0 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (9.4.0) Requirement already satisfied: contourpy>=1.0.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (1.0.5) Requirement already satisfied: kiwisolver>=1.0.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (1.4.4) Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests<3,>=2.17.3->mlflow) (3.4) Requirement already satisfied: charset-normalizer<3,>=2 in /databricks/python3/lib/python3.10/site-packages (from requests<3,>=2.17.3->mlflow) (2.0.4) Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests<3,>=2.17.3->mlflow) (2022.12.7) Requirement already satisfied: threadpoolctl>=2.0.0 in /databricks/python3/lib/python3.10/site-packages (from scikit-learn<2->mlflow) (2.2.0) Requirement already satisfied: joblib>=1.0.0 in /databricks/python3/lib/python3.10/site-packages (from scikit-learn<2->mlflow) (1.2.0) Requirement already satisfied: greenlet!=0.4.17 in /databricks/python3/lib/python3.10/site-packages (from sqlalchemy<3,>=1.4.0->mlflow) (2.0.1) Requirement already satisfied: smmap<6,>=3.0.1 in /databricks/python3/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=2.1.0->mlflow) (5.0.0) Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages. Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages. Requirement already satisfied: sentence-transformers in /databricks/python3/lib/python3.10/site-packages (2.2.2) Requirement already satisfied: huggingface-hub>=0.4.0 in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (0.14.1) Requirement already satisfied: nltk in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (3.7) Requirement already satisfied: transformers<5.0.0,>=4.6.0 in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (4.31.0) Requirement already satisfied: sentencepiece in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (0.1.99) Requirement already satisfied: scipy in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (1.10.0) Requirement already satisfied: torch>=1.6.0 in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (2.0.1+cpu) Requirement already satisfied: torchvision in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (0.15.2+cpu) Requirement already satisfied: tqdm in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (4.64.1) Requirement already satisfied: numpy in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (1.23.5) Requirement already satisfied: scikit-learn in /databricks/python3/lib/python3.10/site-packages (from sentence-transformers) (1.1.1) Requirement already satisfied: requests in /databricks/python3/lib/python3.10/site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (2.28.1) Requirement already satisfied: filelock in /databricks/python3/lib/python3.10/site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (3.9.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in /databricks/python3/lib/python3.10/site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (4.4.0) Requirement already satisfied: pyyaml>=5.1 in /databricks/python3/lib/python3.10/site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (6.0) Requirement already satisfied: packaging>=20.9 in /databricks/python3/lib/python3.10/site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (22.0) Requirement already satisfied: fsspec in /databricks/python3/lib/python3.10/site-packages (from huggingface-hub>=0.4.0->sentence-transformers) (2022.11.0) Requirement already satisfied: networkx in /databricks/python3/lib/python3.10/site-packages (from torch>=1.6.0->sentence-transformers) (2.8.4) Requirement already satisfied: jinja2 in /databricks/python3/lib/python3.10/site-packages (from torch>=1.6.0->sentence-transformers) (3.1.2) Requirement already satisfied: sympy in /databricks/python3/lib/python3.10/site-packages (from torch>=1.6.0->sentence-transformers) (1.11.1) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /databricks/python3/lib/python3.10/site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (0.13.3) Requirement already satisfied: safetensors>=0.3.1 in /databricks/python3/lib/python3.10/site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (0.3.3) Requirement already satisfied: regex!=2019.12.17 in /databricks/python3/lib/python3.10/site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (2022.7.9) Requirement already satisfied: click in /databricks/python3/lib/python3.10/site-packages (from nltk->sentence-transformers) (8.0.4) Requirement already satisfied: joblib in /databricks/python3/lib/python3.10/site-packages (from nltk->sentence-transformers) (1.2.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /databricks/python3/lib/python3.10/site-packages (from scikit-learn->sentence-transformers) (2.2.0) Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /databricks/python3/lib/python3.10/site-packages (from torchvision->sentence-transformers) (9.4.0) Requirement already satisfied: MarkupSafe>=2.0 in /databricks/python3/lib/python3.10/site-packages (from jinja2->torch>=1.6.0->sentence-transformers) (2.1.1) Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (3.4) Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (2022.12.7) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.10/site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (1.26.14) Requirement already satisfied: charset-normalizer<3,>=2 in /databricks/python3/lib/python3.10/site-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (2.0.4) Requirement already satisfied: mpmath>=0.19 in /databricks/python3/lib/python3.10/site-packages (from sympy->torch>=1.6.0->sentence-transformers) (1.2.1) Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages. Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages. Requirement already satisfied: cloudpickle==2.0.0 in /databricks/python3/lib/python3.10/site-packages (2.0.0) Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.
In [0]:
Copied!
import dlt
import mlflow
from pyspark.sql.functions import *
from pyspark.sql.types import ArrayType, FloatType
import dlt
import mlflow
from pyspark.sql.functions import *
from pyspark.sql.types import ArrayType, FloatType
In [0]:
Copied!
model_uri = "models:/transformer-model/production"
model_udf = mlflow.pyfunc.spark_udf(spark, model_uri=model_uri, result_type=ArrayType(FloatType()))
model_uri = "models:/transformer-model/production"
model_udf = mlflow.pyfunc.spark_udf(spark, model_uri=model_uri, result_type=ArrayType(FloatType()))
Downloading artifacts: 0%| | 0/5 [00:00<?, ?it/s]
2023/11/15 09:03:19 WARNING mlflow.pyfunc: Detected one or more mismatches between the model's dependencies and the current Python environment: - torch (current: 2.0.1+cpu, required: torch==1.13.1) To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file. 2023/11/15 09:03:19 WARNING mlflow.pyfunc: Calling `spark_udf()` with `env_manager="local"` does not recreate the same environment that was used during training, which may lead to errors or inaccurate predictions. We recommend specifying `env_manager="conda"`, which automatically recreates the environment that was used to train the model and performs inference in the recreated environment.
Downloading artifacts: 0%| | 0/1 [00:00<?, ?it/s]
2023/11/15 09:03:19 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
In [0]:
Copied!
@dlt.table(comment="Test DLT table", name="groceryproduct_embedding")
def product_embeddings():
return (
spark.read.table("hive_metastore.grocery.groceryproduct")
.withColumn("ProductGroupEmbedding", model_udf(col("ProductGroup")))
)
@dlt.table(comment="Test DLT table", name="groceryproduct_embedding")
def product_embeddings():
return (
spark.read.table("hive_metastore.grocery.groceryproduct")
.withColumn("ProductGroupEmbedding", model_udf(col("ProductGroup")))
)
groceryproduct_embedding is defined as a
Delta Live Tables dataset
with schema:
Name | Type |
---|---|
GroceryProductGuid | string |
ProductGroup | string |
ProductGroupEmbedding | array<float> |
To populate your table you must either:
- Run an existing pipeline using the Delta Live Tables menu
- Create a new pipeline: Create Pipeline
In [0]:
Copied!