featurebyte.TableColumn.describe¶

describe(

size: int=0,

seed: int=1234,

from_timestamp: Union[datetime, str, NoneType]=None,

to_timestamp: Union[datetime, str, NoneType]=None,

after_cleaning: bool=False

) -> DataFrame

Description¶

Returns descriptive statistics of the table column. By default, the statistics are computed before any cleaning operations that were defined at the table level.

Parameters¶

size: int
default: 0
Maximum number of rows to sample. If 0, all rows will be used.
seed: int
default: 1234
Seed to use for random sampling.
from_timestamp: Union[datetime, str, NoneType]
Start of date range to sample from.
to_timestamp: Union[datetime, str, NoneType]
End of date range to sample from.
after_cleaning: bool
default: False
Whether to compute description statistics after cleaning.

Returns¶

DataFrame
Summary of the table column.

Examples¶

Describe a table without cleaning operations

>>> event_table = catalog.get_table("GROCERYINVOICE")
>>> description = event_table["Amount"].describe(
...     from_timestamp=datetime(2020, 1, 1),
...     to_timestamp=datetime(2020, 1, 31),
... )

Describe a table after cleaning operations have been applied.

>>> event_table = catalog.get_table("GROCERYINVOICE")
>>> event_table["Amount"].update_critical_data_info(
...     cleaning_operations=[
...         fb.MissingValueImputation(imputed_value=0),
...     ]
... )

>>> description = event_table["Amount"].describe(
...     from_timestamp=datetime(2020, 1, 1),
...     to_timestamp=datetime(2020, 1, 31),
...     after_cleaning=True,
... )