featurebyte.SparkDetails¶

class SparkDetails(

*,

host: StrictStr="127.0.0.1",

port: int=10000,

http_path: StrictStr="cliservice",

use_http_transport: bool=False,

use_ssl: bool=False,

storage_type: StorageType,

storage_url: str,

storage_spark_url: StrictStr,

featurebyte_catalog: StrictStr,

featurebyte_schema: StrictStr

)

Description¶

Model for details used to connect to a Spark data source.

Parameters¶

host: StrictStr
default: "127.0.0.1"
The server where your spark cluster is hosted.
port: int
default: 10000
The port your spark cluster is hosted on.
http_path: StrictStr
default: "cliservice"
Spark compute resource URL.
use_http_transport: bool
default: False
Configuration on whether to use HTTP as our transport layer. Defaults to Thrift
use_ssl: bool
default: False
Configuration on whether to use SSL. Only applicable if use_http_transport is set to True.
storage_type: StorageType
Storage type of where we will be persisting the feature store to.
storage_url: str
URL of where we will be uploading our custom UDFs to.
storage_spark_url: StrictStr
URL of where we will be reading our data from. Note that this technically points to the same location as the storage_url. However, the format that the warehouse accepts differs between the read and write path, and as such, we require two fields.
featurebyte_catalog: StrictStr
Name of the database that holds metadata about the actual data. This is commonly filled as hive_metastore.
featurebyte_schema: StrictStr
The name of the schema containing the tables and columns.

Examples¶

>>> details = fb.SparkDetails(
...   host="<host>",
...   port=10003,
...   featurebyte_catalog="spark_catalog",
...   featurebyte_schema="<schema_name>",
...   storage_type=fb.StorageType.S3,
...   storage_url="<storage_url>",
...   storage_spark_url="gs://dataproc-cluster-staging/{<schema_name>}"
... )