featurebyte.ObservationTable.split¶
split(
split_ratios: List[float],
names: Optional[List[str]]=None,
seed: int=1234
) -> List[ObservationTable]Description¶
Split the observation table into multiple tables based on percentages. Each split creates a new observation table containing a non-overlapping subset of rows. The splits are determined using a seeded random assignment, ensuring reproducibility.
The first split is automatically assigned Purpose.TRAINING, while all subsequent splits are assigned Purpose.VALIDATION_TEST.
Parameters¶
- split_ratios: List[float]
List of percentages (0-1) for each split. Must sum to 1.0 and contain 2 or 3 values. Example: [0.7, 0.3] for a 70/30 train/test split Example: [0.6, 0.2, 0.2] for a 60/20/20 train/validation/test split - names: Optional[List[str]]
Names for the resulting tables. If None, auto-generated as "{name}_split_0", "{name}_split_1", etc. Must have the same length as split_ratios if provided. - seed: int
default: 1234
Random seed for reproducible splits. Default is 1234.
Returns¶
- List[ObservationTable]
List of split observation tables in the same order as split_ratios. The first table has Purpose.TRAINING, the rest have Purpose.VALIDATION_TEST.
Raises¶
- ValueError
If split_ratios is invalid (doesn't sum to 1, wrong length, values out of range). If names length doesn't match split_ratios length.
Examples¶
Split into train (70%) and test (30%) sets:
Split into train (60%), validation (20%), and test (20%) sets: