featurebyte.SourceTable.sample¶
sample(
size: int=10,
seed: int=1234,
from_timestamp: Union[datetime, str, NoneType]=None,
to_timestamp: Union[datetime, str, NoneType]=None,
after_cleaning: bool=False
) -> DataFrameDescription¶
Returns a DataFrame that contains a random selection of rows of the table based on a specified time range, size, and seed for sampling control. By default, the materialization process occurs before any cleaning operations that were defined at the column level.
Parameters¶
- size: int
default: 10
Maximum number of rows to sample. - seed: int
default: 1234
Seed to use for random sampling. - from_timestamp: Union[datetime, str, NoneType]
Start of date range to sample from. - to_timestamp: Union[datetime, str, NoneType]
End of date range to sample from. - after_cleaning: bool
default: False
Whether to apply cleaning operations.
Returns¶
- DataFrame
Sampled rows from the table.
Examples¶
Sample 3 rows from the table.
>>> catalog.get_table("GROCERYPRODUCT").sample(3)
GroceryProductGuid ProductGroup
0 e890c5cb-689b-4caf-8e49-6b97bb9420c0 Épices
1 5720e4df-2996-4443-a1bc-3d896bf98140 Chat
2 96fc4d80-8cb0-4f1b-af01-e71ad7e7104a Pains
Sample 3 rows from the table with timestamps.
See Also¶
- Table.preview: Retrieve a preview of a table.
- Table.describe: Retrieve a summary of a table.