Skip to content

featurebyte.ViewColumn.describe

describe(
size: int=0,
seed: int=1234,
from_timestamp: Union[datetime, str, NoneType]=None,
to_timestamp: Union[datetime, str, NoneType]=None,
**kwargs: Any
) -> DataFrame

Description

Returns descriptive statistics of the view column. The statistics are computed after any cleaning operations that were defined either at the table level or during the view's creation have been applied.

Parameters

  • size: int
    default: 0
    Maximum number of rows to sample.

  • seed: int
    default: 1234
    Seed to use for random sampling.

  • from_timestamp: Union[datetime, str, NoneType]
    Start of date range to sample from.

  • to_timestamp: Union[datetime, str, NoneType]
    End of date range to sample from.

  • **kwargs: Any
    Additional keyword parameters.

Returns

  • DataFrame
    Summary of the view.

Examples

Get summary of a column.

>>> catalog.get_view("GROCERYPRODUCT")["ProductGroup"].describe()
                ProductGroup
dtype                VARCHAR
unique                    87
%missing                 0.0
%empty                     0
entropy              4.13031
top       Chips et Tortillas
freq                  1319.0

Get summary of a column with timestamp.

>>> catalog.get_view("GROCERYINVOICE")["Amount"].describe(
...     from_timestamp="2020-01-01", to_timestamp="2023-01-31"
... )