Skip to content

featurebyte.ViewColumn.sample

sample(
size: int=10,
seed: int=1234,
from_timestamp: Union[datetime, str, NoneType]=None,
to_timestamp: Union[datetime, str, NoneType]=None,
**kwargs: Any
) -> DataFrame

Description

Returns a Series that contains a random selection of rows of the view column based on a specified time range, size, and seed for sampling control. The materialization process occurs after any cleaning operations that were defined either at the table level or during the view's creation.

Parameters

  • size: int
    default: 10
    Maximum number of rows to sample, with an upper bound of 10,000 rows.

  • seed: int
    default: 1234
    Seed to use for random sampling.

  • from_timestamp: Union[datetime, str, NoneType]
    Start of date range to sample from.

  • to_timestamp: Union[datetime, str, NoneType]
    End of date range to sample from.

  • **kwargs: Any
    Additional keyword parameters.

Returns

  • DataFrame
    Sampled rows of the data.

Examples

Sample 3 rows of a column.

>>> catalog.get_view("GROCERYPRODUCT")["ProductGroup"].sample(3)
  ProductGroup
0       Épices
1         Chat
2        Pains

Sample 3 rows of a column with timestamp.

>>> catalog.get_view("GROCERYINVOICE")["Amount"].sample(
...     size=3, seed=123, from_timestamp="2020-01-01", to_timestamp="2023-01-31"
... )

See Also