Skip to content

featurebyte.SparkDetails

class SparkDetails(
*,
host: StrictStr="127.0.0.1",
port: int=10000,
http_path: StrictStr="cliservice",
use_http_transport: bool=False,
use_ssl: bool=False,
storage_type: StorageType,
storage_url: str,
storage_path: StrictStr,
catalog_name: StrictStr,
schema_name: StrictStr
)

Description

Model for details used to connect to a Spark data source.

Parameters

  • host: StrictStr
    default: "127.0.0.1"
    The server where your spark cluster is hosted.

  • port: int
    default: 10000
    The port your spark cluster is hosted on.

  • http_path: StrictStr
    default: "cliservice"
    Spark compute resource URL.

  • use_http_transport: bool
    default: False
    Configuration on whether to use HTTP as our transport layer. Defaults to Thrift

  • use_ssl: bool
    default: False
    Configuration on whether to use SSL. Only applicable if use_http_transport is set to True.

  • storage_type: StorageType
    Storage type of where we will be persisting the feature store to.

  • storage_url: str
    URL of where we will be uploading our custom UDFs to.

  • storage_path: StrictStr
    Path where we will be reading our data from. Note that this technically points to the same location as the storage_url. However, the format that the warehouse accepts differs between the read and write path, and as such, we require two fields.

  • catalog_name: StrictStr
    The name of the catalog to use for creation of output tables.

  • schema_name: StrictStr
    The name of the schema to use for creation of output tables.

Examples

>>> details = fb.SparkDetails(
...   host="<host>",
...   port=10003,
...   catalog_name="spark_catalog",
...   schema_name="<schema_name>",
...   storage_type=fb.StorageType.S3,
...   storage_url="<storage_url>",
...   storage_path="gs://dataproc-cluster-staging/{<schema_name>}"
... )