Skip to content


size: int=10,
seed: int=1234,
from_timestamp: Union[datetime, str, NoneType]=None,
to_timestamp: Union[datetime, str, NoneType]=None,
**kwargs: Any
) -> DataFrame


Returns a DataFrame that contains a random selection of rows of the view based on a specified time range, size, and seed for sampling control. The materialization process occurs after any cleaning operations that were defined either at the table level or during the view's creation.


  • size: int
    default: 10
    Maximum number of rows to sample, with an upper bound of 10,000 rows.

  • seed: int
    default: 1234
    Seed to use for random sampling.

  • from_timestamp: Union[datetime, str, NoneType]
    Start of date range to sample from.

  • to_timestamp: Union[datetime, str, NoneType]
    End of date range to sample from.

  • **kwargs: Any
    Additional keyword parameters.


  • DataFrame
    Sampled rows of the data.


Sample rows of a view.

>>> catalog.get_view("GROCERYPRODUCT").sample(size=3)
                     GroceryProductGuid ProductGroup
0  e890c5cb-689b-4caf-8e49-6b97bb9420c0       Épices
1  5720e4df-2996-4443-a1bc-3d896bf98140         Chat
2  96fc4d80-8cb0-4f1b-af01-e71ad7e7104a        Pains

Sample rows of a view with timestamp.

>>> catalog.get_view("GROCERYINVOICE").sample(
...   size=3,
...   from_timestamp=datetime(2019, 1, 1),
...   to_timestamp=datetime(2019, 1, 31),
... )

See Also