Skip to content

featurebyte.View.sample

sample(
size: int=10,
seed: int=1234,
from_timestamp: Union[datetime, str, NoneType]=None,
to_timestamp: Union[datetime, str, NoneType]=None,
**kwargs: Any
) -> DataFrame

Description

Returns a DataFrame that contains a random selection of rows of the view based on a specified time range, size, and seed for sampling control. The materialization process occurs after any cleaning operations that were defined either at the table level or during the view's creation.

Parameters

  • size: int
    default: 10
    Maximum number of rows to sample, with an upper bound of 10,000 rows.

  • seed: int
    default: 1234
    Seed to use for random sampling.

  • from_timestamp: Union[datetime, str, NoneType]
    Start of date range to sample from.

  • to_timestamp: Union[datetime, str, NoneType]
    End of date range to sample from.

  • **kwargs: Any
    Additional keyword parameters.

Returns

  • DataFrame
    Sampled rows of the data.

Examples

Sample rows of a view.

>>> catalog.get_view("GROCERYPRODUCT").sample(size=3)
                     GroceryProductGuid ProductGroup
0  e890c5cb-689b-4caf-8e49-6b97bb9420c0       Épices
1  5720e4df-2996-4443-a1bc-3d896bf98140         Chat
2  96fc4d80-8cb0-4f1b-af01-e71ad7e7104a        Pains

Sample rows of a view with timestamp.

>>> catalog.get_view("GROCERYINVOICE").sample(
...     size=3,
...     from_timestamp=datetime(2019, 1, 1),
...     to_timestamp=datetime(2019, 1, 31),
... )

See Also