featurebyte.FeatureJobSetting¶
class FeatureJobSetting(
*,
blind_spot: str,
frequency: str,
time_modulo_frequency: str
)Description¶
FeatureJobSetting class is used to declare the Feature Job Setting. The setting comprises three parameters:
- The frequency parameter specifies how often the batch process should run.
- The time_modulo_frequency parameter defines the timing from the end of the frequency time period to when the feature job commences. For example, a feature job with the following settings (frequency 60m, time_modulo_frequency: 130s) will start 2 min and 10 seconds after the beginning of each hour: 00:02:10, 01:02:10, 02:02:10, …, 15:02:10, …, 23:02:10.
- The blind_spot parameter sets the time gap between feature computation and the latest event timestamp to be processed.
Note that these parameters are the same duration type strings that pandas accepts in pd.Timedelta().
Parameters¶
- blind_spot: str
Establishes the time difference between when the feature is calculated and the most recent event timestamp to be processed. - frequency: str
Indicates the interval at which the batch process should be executed. - time_modulo_frequency: str
Specifies the offset from the end of the frequency interval to the start of the feature job. For instance, with settings frequency: 60m and time_modulo_frequency: 130s, the feature job will begin 2 minutes and 10 seconds after the start of each hour, such as 00:02:10, 01:02:10, 02:02:10, ..., 15:02:10, ..., 23:02:10.
Examples¶
Consider a case study where a data warehouse refreshes each hour. The data refresh starts 10 seconds after the hour and is usually finished within 2 minutes. Sometimes the data refresh misses the latest data, up to a maximum of the last 30 seconds at the end of the hour. Therefore, an appropriate feature job settings could be:
- frequency: 60m = time_modulo_frequency: 10s + 2m + 5s (a safety buffer) = 135s
- blind_spot: 30s + 10s + 2m + 5s = 165s