Skip to content

featurebyte.FeatureJobSetting

class FeatureJobSetting(
*,
blind_spot: str,
period: str,
offset: str,
execution_buffer: str="0s"
)

Description

FeatureJobSetting class is used to declare the Feature Job Setting. The setting comprises three parameters:

  • The period parameter specifies how often the batch process should run.
  • The offset parameter defines the timing from the end of the frequency time period to when the feature job commences. For example, a feature job with the following settings (period 60m, offset: 130s) will start 2 min and 10 seconds after the beginning of each hour: 00:02:10, 01:02:10, 02:02:10, …, 15:02:10, …, 23:02:10.
  • The blind_spot parameter sets the time gap between feature computation and the latest event timestamp to be processed.

Note that these parameters are the same duration type strings that pandas accepts in pd.Timedelta().

Parameters

  • blind_spot: str
    Establishes the time difference between when the feature is calculated and the most recent event timestamp to be processed.

  • period: str
    Indicates the interval at which the batch process should be executed.

  • offset: str
    Specifies the offset from the end of the period interval to the start of the feature job. For instance, with settings period: 60m and offset: 130s, the feature job will begin 2 minutes and 10 seconds after the start of each hour, such as 00:02:10, 01:02:10, 02:02:10, ..., 15:02:10, ..., 23:02:10.

  • execution_buffer: str
    default: "0s"
    Specifies the time buffer for the feature job execution. The buffer is used to account for potential delays in the batch process execution.

Examples

Consider a case study where a data warehouse refreshes each hour. The data refresh starts 10 seconds after the hour and is usually finished within 2 minutes. Sometimes the data refresh misses the latest data, up to a maximum of the last 30 seconds at the end of the hour. Therefore, an appropriate feature job settings could be:

  • period: 60m
  • offset: 10s + 2m + 5s (a safety buffer) = 135s
  • blind_spot: 30s + 10s + 2m + 5s = 165s
>>> feature_job_setting = fb.FeatureJobSetting(
...  blind_spot="165s"
...  period="60m"
...  offset="135s"
... )