featurebyte.SourceTable.describe¶
describe(
size: int=0,
seed: int=1234,
from_timestamp: Union[datetime, str, NoneType]=None,
to_timestamp: Union[datetime, str, NoneType]=None,
after_cleaning: bool=False
) -> DataFrameDescription¶
Returns descriptive statistics of the table columns.
Parameters¶
- size: int
default: 0
Maximum number of rows to sample. If 0, all rows will be used. - seed: int
default: 1234
Seed to use for random sampling. - from_timestamp: Union[datetime, str, NoneType]
Start of date range to sample from. - to_timestamp: Union[datetime, str, NoneType]
End of date range to sample from. - after_cleaning: bool
default: False
Whether to apply cleaning operations.
Returns¶
- DataFrame
Summary of the table.
Examples¶
Get a summary of a view.
>>> catalog.get_table("GROCERYINVOICE").describe(
... from_timestamp=datetime(2022, 1, 1),
... to_timestamp=datetime(2022, 12, 31),
... )
GroceryInvoiceGuid GroceryCustomerGuid Timestamp record_available_at Amount
dtype VARCHAR VARCHAR TIMESTAMP TIMESTAMP FLOAT
unique 25422 471 25399 5908 6734
%missing 0.0 0.0 0.0 0.0 0.0
%empty 0 0 NaN NaN NaN
entropy 6.214608 5.784261 NaN NaN NaN
top 018f0163-249b-4cbc-ab4d-e933ce3786c1 c5820998-e779-4d62-ab8b-79ef0dfd841b NaN NaN NaN
freq 1.0 692.0 NaN NaN NaN
mean NaN NaN NaN NaN 19.966062
std NaN NaN NaN NaN 25.027878
min NaN NaN 2022-01-01T00:24:14.000000000 2022-01-01T01:01:00.000000000 0.0
25% NaN NaN NaN NaN 4.5325
50% NaN NaN NaN NaN 10.725
75% NaN NaN NaN NaN 24.99
max NaN NaN 2022-12-30T22:37:57.000000000 2022-12-30T23:01:00.000000000 360.84
See Also¶
- Table.preview: Retrieve a preview of a table.
- Table.sample: Retrieve a sample of a table.