Skip to content


size: int=10,
seed: int=1234,
from_timestamp: Union[datetime, str, NoneType]=None,
to_timestamp: Union[datetime, str, NoneType]=None,
**kwargs: Any
) -> DataFrame


Returns a Series that contains a random selection of rows of the view column based on a specified time range, size, and seed for sampling control. The materialization process occurs after any cleaning operations that were defined either at the table level or during the view's creation.


  • size: int
    default: 10
    Maximum number of rows to sample, with an upper bound of 10,000 rows.

  • seed: int
    default: 1234
    Seed to use for random sampling.

  • from_timestamp: Union[datetime, str, NoneType]
    Start of date range to sample from.

  • to_timestamp: Union[datetime, str, NoneType]
    End of date range to sample from.

  • **kwargs: Any
    Additional keyword parameters.


  • DataFrame
    Sampled rows of the data.


Sample 3 rows of a column.

>>> catalog.get_view("GROCERYPRODUCT")["ProductGroup"].sample(3)
0       Épices
1         Chat
2        Pains

Sample 3 rows of a column with timestamp.

>>> catalog.get_view("GROCERYINVOICE")["Amount"].sample(
...   size=3,
...   seed=123,
...   from_timestamp="2020-01-01",
...   to_timestamp="2023-01-31"
... )

See Also