Changelog¶

v1.0.2 (2024-03-15)¶

🐛 Bug Fixes¶

service Databricks integration fix

v1.0.1 (2024-03-12)¶

💡 Enhancements¶

api Support description specification during table creation.
api Create api to manage online stores
session Specify role and group in Snowflake and Databricks details to enforce permissions for accessing source and output tables
service Simplify user defined function route creation schema
online_serving Implement FEAST offline stores for Spark Thrift and DataBricks for online serving support
service Compute data description in batches of columns
service Support offset parameter for aggregate_asat
profile Create a profile from databricks secrets to simplify access from a Databricks workspace.
service Improve efficiency of feature table cache checks for saved feature lists
session Add client_session_keep_alive to snowflake connector to keep the session alive
service Support cancellation for historical features table creation task

🐛 Bug Fixes¶

service Updates output variable type of count aggregation to be integer instead of float
service Fix FeatureList online_enabled_feature_ids attribute not updated correctly in some cases
session Fix snowflake session using wrong role if the user's default role does not match role in feature store details
session Fix count dictionary entropy UDF behavior for edge cases
deployment Fix getting sample entity serving names for deployment fails when entity has null values
service Fix ambiguous column name error when using SCD lookup features with different offsets

v1.0.0 (2023-12-21)¶

💡 Enhancements¶

session Implement missing UDFs for DataBricks clusters that support Unity Catalog.
storage Support azure blob storage for file storage.

🐛 Bug Fixes¶

service Fixes a bug where the feature saving would fail if the feature or colum name contains quotes.
deployment Fix an issue where periodic tasks were not disabled when reverting a failed deployment

v0.6.2 (2023-12-01)¶

🛑 Breaking Changes¶

api Support using observation tables in feature, target and featurelist preview
Parameter observation_set in Feature.preview, Target.preview and FeatureList.preview now accepts ObservationTable object or pandas dataframe
Breaking change: Parameter observation_table in FeatureList.compute_historical_feature_table is renamed to observation_set
feature_list Change feature list catalog output dataframe column name from primary_entities to primary_entity

💡 Enhancements¶

databricks-unity Add session for databricks unity cluster, and migrate one UDF to python for databricks unity cluster.
target Allow users to create observation table with just a target id, but no graph.
service Support latest aggregation for vector columns
service Update repeated columns validation logic to handle excluded columns.
endpoints Enable observation table to associate with multiple use cases from endpoints
target Derive window for lookup targets as well
service Add critical data info validation logic
api Implement remove observation table from context
service Support rename of context, use case, observation table and historical feature table
target_table Persist primary entity IDs for the target observation table
observation_table Update observation table creation check to make sure primary entity is set
service Implement service to materialize features to be published to external feature store
service Add feature definition hash to new feature model to allow duplicated features to be detected
observation_table Track uploaded file name when creating an observation table from an uploaded file.
observation_table Add way to update purpose for observation table.
tests Use published featurebyte library in notebook tests.
service Reduce complexity of describe query to avoid memory issue during query compilation
session Use DBFS for Databricks session storage to simplify setup
target_namespace Add support for target namespace deletion
observation_table add minimum interval between entities to observation table
api Implement delete observation table from use case
api Implement removal of default preview and eda table for context
api Enable observation table to associate with multiple use cases from api
api Implement removal of default preview and eda table for use case

🐛 Bug Fixes¶

observation_table fix validation around primary entity IDs when creating observation tables
worker Use cpu worker for feature job setting analysis to avoid blocking io worker async loop
session Make data warehouse session creation asynchronous with a timeout to avoid blocking the asyncio main thread. This prevents the API service from being unresponsive when certain compute clusters takes a long time to start up.
service Fix observation table sampling so that it is always uniform over the input
worker Fix feature job setting analysis fails for databricks feature store
session Fix spark session failing with spark version >= 3.4.1
service Fix observation table file upload error
target Support value_column=None for count in forward_aggregate/target operations.
service Fix division by zero error when calling describe on empty views
worker Fix bug where feature job setting analysis backtest fails when the analysis is missing an optional histogram
service Fixes a view join issue that causes the generated feature not savable due to graph inconsistency.
use_case Allow use cases to be created with descriptive only targets
service Fixes an error when rendering FeatureJobStatusResult in notebooks when matplotlib package is not available.
feature Fix feature saving bug when the feature contains timestamp filtering

v0.6.1 (2023-11-22)¶

🐛 Bug Fixes¶

api fixed async task return code

v0.6.0 (2023-10-10)¶

🛑 Breaking Changes¶

observation_table Validate that entities are present when creating an observation table.

💡 Enhancements¶

target Use window from target namespace instead of the target version.
service UseCase creation to accept TargetNameSpace id as a parameter
historical_feature_table Make FeatureClusters optional when creating historical feature table from UI.
service Move online serving code template generation to the online serving service
model Handle old Context records with entity_ids attribute in the database
service Add key_with_highest_value() and key_with_lowest_value() for cross aggregates
api Add consistent table feature job settings validation during feature creation.
api Change Context Entity attribute's name to Primary Entity
api Use primary entity parameter in Target and Context creation
service Add last_updated_at in FeatureModel to indicate when feature value is last updated
api Revise feature list create new version to avoid throwing error when the feature list is the same as the previous version
service Support rprefix parameter in View's join method
observation_table Add an optional purpose to observation table when creating a new observation table.
docs Documentation for Context and UseCase
observation_table Track earliest point in time, and unique entity col counts as part of metadata.
service Support extracting value counts and customised statistics in PreviewService
api Remove direct observation table reference from UseCase
warehouse improve data warehouse asset validation
api Use EntityBriefInfoList for entity info for both UseCase and Context
api Add trigo functions to series.
api Include observation table operation into Context API Object
observation_table Add route to allow users to upload CSV files to create observation tables.
target Tag entity_ids when creating an observation table from a target.
api-client improve api-client retry
service Entity Validation for Context, Target and UseCase
service Add Context Info method into both Context API Object and Route
api Add functionality to calculate haversine distance.
service Fix PreviewService describe() method when stats_names are provided

🐛 Bug Fixes¶

service Validate non-existent Target and Context when creating Use Case
session Fix execute query failing when variant columns contain null values
service Validate null target_id when adding obs table to use case
service Fix maximum recursion depth exceeded error in complex queries
service Fix race condition when accessing cached values in ApiObject's get_by_id()
hive fix hive connection error when spark_catalog is not the default
api Target#list should include items in target namespace.
target Fix target definition SDK code generation by skipping project.
service Fix join validation logic to account for rprefix

v0.5.1 (2023-09-08)¶

💡 Enhancements¶

service Optimize feature readiness service update runtime.

🐛 Bug Fixes¶

packaging Restore cryptography package dependency [DEV-2233]

v0.5.0 (2023-09-06)¶

🛑 Breaking Changes¶

Configurations Configurations::use_profile() function is now a method rather than a classmethod
```
- Configurations.use_profile("profile")
+ Configurations().use_profile("profile")
```

💡 Enhancements¶

service Cache view created from query in Spark for better performance
vector-aggregation Add java UDAFs for sum and max for use in spark.
vector-operations Add cosine_similarity to compare two vector columns.
vector-aggregation Add integration test to test end to end for VECTOR_AGGREGATE_MAX.
vector-aggregations Enable vector aggregations for tiling aggregate - max and sum - functions
middleware Organize exceptions to reduce verbosity in middleware
api Add support for updating description of table columns in the python API
vector-aggregation Update groupby logic for non tile based aggregates
api Implement API object for Use Case component
api Use Context name instead of Context id for the API signature
api Implement API object for Context
vector_aggregation Add UDTF for max, sum and avg for snowflake.
api Integrate Context API object for UseCase
vector-aggregation Snowflake return values for vector aggregations should be a list now, instead of a string.
vector-aggregation Add java UDAFs for average for use in spark.
vector_aggregation Only return one row in table vector aggregate function per partition
service Support conditionally updating a feature using a mask derived from other feature(s)
vector-aggregation Add guardrails to prevent array aggregations if agg func is not max or avg.
service Tag semantics for all special columns during table creation
api Implement UseCase Info
service Change join type to inner when joining event and item tables
vector-aggregation Register vector aggregate max, and update parent dtype inference logic.
service Implement scheduled task to clean up stale versions and drop online store tables when possible
use-case Implement guardrail for use case's observation table not to be deleted
vector-aggregations Enable vector aggregations for tiling aggregate avg function
api Rename description update functions for versioned assets
vector-aggregation Support integer values in vectors; add support integration test for simple aggregates
vector-aggregation Update groupby_helper to take in parent_dtype.
httpClient added a ssl_verify value in Configurations to allow disabling of ssl certificate verification
online-serving Split online store compute and insert query to minimize table locking
tests Use the notebook as the test id in the notebook tests.
vector-aggregation Add simple average spark udaf.
vector-aggregation Add average snowflake udtf.
api Associate Deployment with UseCase
service Skip creating a data warehouse session when online disabling a feature
use-case implement use case model and its associated routes
service Apply event timestamp filter on EventTable directly in scheduled tile jobs when possible

🐛 Bug Fixes¶

worker Block running multiple concurrent deployment create/update tasks for the same deployment
service Fix bug where feature job starts running while the feature is still being enabled
dependencies upgrading scipy dependency
service Fixes an invalid identifier error in sql when feature involves a mix of filtered and non-filtered versions of the same view.
worker Fixes a bug where scheduler does not work with certain mongodb uris.
online-serving Fix incompatible column types when inserting to online store tables
service Fix feature saving error due to tile generation bug
service Ensure row ordering of online serving output DataFrame matches input request data
dependencies Limiting python range to 3.8>=,<3.12 due to scipy constraint
service Use execute_query_long_running when inserting to online store tables to fix timeout errors
model Mongodb index on periodic task name conflicts with scheduler engine
service Fix conversion of date type to double in spark

v0.4.4 (2023-08-29)¶

🐛 Bug Fixes¶

api Fix logic for determining timezone offset column in datetime accessor
service Fix SDK code generation for conditional assignment when the assign value is a series
service Fix invalid identifier error for complex features with both item and window aggregates

💡 Enhancements¶

profile Allow creating of profile directly with fb.register_profile(name, url, token)

v0.4.3 (2023-08-21)¶

🐛 Bug Fixes¶

service Fix feature materialization error due to ambiguous internal column names
service Fix error when generating info for features in some edge cases
api Fix item table default job settings not synchronized when job settings are updated in the event table, fix historical feature table listing failure

v0.4.2 (2023-08-07)¶

🛑 Breaking Changes¶

target Update compute_target to return observation table instead of target table will make it easier to use with compute historical features
target Update target info to return a TableBriefInfoList instead of a custom struct this will help keep it consistent with feature, and also fix a bug in info where we wrongly assumed there was only one input table.

💡 Enhancements¶

target Add as_target to SDK, and add node to graph when it is called
target Add fill_value and skip_fill_na to forward_aggregate, and update name
target Create lookup target graph node
service Speed up operation structure extraction by caching the result of _extract() in BaseGraphExtractor

🐛 Bug Fixes¶

api Fix api objects listing failure in some notebooks environments
utils Fix is_notebook check to support Google Colab [https://github.com/featurebyte/featurebyte/issues/1598]

v0.4.1 (2023-07-25)¶

🛑 Breaking Changes¶

online-serving Update online store table schema to use long table format
dependencies Limiting python version from >=3.8,<4.0 to >=3.8,<3.13 due to scipy version constraint

💡 Enhancements¶

generic-function add user-defined-function support
target add basic API object for Target Initialize the basic API object for Target.
feature-group update the feature group save operation to use /feature/batch route
service Update describe query to be compatible with Spark 3.2
service Ensure FeatureModel's derived attributes are derived from pruned graph
target add basic info for Target Adds some basic information about Target's. Additional information that contains more details about the actual data will be added in a follow-up.
list_versions update Feature's & FeatureList's list_versions method by adding is_default to the dataframe output
service Move TILE_JOB_MONITOR table from data warehouse to persistent
service Avoid using SHOW COLUMNS to support Spark 3.2
table skip calling data warehouse for table metadata during table construction
target add ForwardAggregate node to graph for ForwardAggregate Implement ForwardAggregator - only adds node to graph. Node is still a no-op.
service Add option to disable audit logging for internal documents
query-graph optimize query graph pruning computation by combining multiple pruning tasks into one
target add input data and metadata for targets Add more information about target metadata.
target Add primary_entity property to Target API object.
service Refactor FeatureManager and TileManager as services
tests Move tutorial notebooks into the FeatureByte repo
service Replace ONLINE_STORE_MAPPING data warehouse table by OnlineStoreComputeQueryService
feature block feature readiness & feature list status transition from DRAFT to DEPRECATED
task_manager refactor task manager to take celery object as a parameter, and refactor task executor to import tasks explicitly
feature fix bug with feature_list_ids not being updated after a feature list is deleted
service Replace TILE_FEATURE_MAPPING table in the data warehouse with mongo persistent
target perform SQL generation for forward aggregate node
feature fix primary entity identification bug for time aggregation over item aggregation features
feature limit manual default feature version selection to only the versions with highest readiness level
feature-list revise feature list saving to reduce api calls
service Refactor tile task to use dependency injection
service Fix error when disabling features created before OnlineStoreComputeQueryService is introduced
deployment Skip redundant updates of ONLINE_STORE_MAPPING table
static-source-table support materialization of static source table from source table or view
catalog Create target_table API object Remove default catalog, require explicit activation of catalog before catalog operations.
feature-list update feature list to preserve feature order
target Add gates to prevent target from setting item to non-target series.
target Add TargetNamespace#create This will allow us to register spec-like versions of a Target, that don't have a recipe attached.
deployment Reduce unnecessary backfill computation when deploying features
service Refactor TileScheduler as a service
target stub out target namespace schema and models
service Add traceback to tile job log for troubleshooting
target add end-to-end integration test for target, and include preview endpoint in target
feature update feature & feature list save operation to use POST /feature/batch route
service Disable tile monitoring by default
service Fix listing of databases and schemas in Spark 3.2
target Refactor compute_target and compute_historical_feature
feature optimize time to deserialize feature model
entity-relationship remove POST /relationship_info, POST /entity/parent and DELETE /entity/parent/ endpoints
service Support description update and retrieval for all saved objects
config Add default_profile in config to allow for a default profile to be set, and require a profile to be set if default_profile is not set
target Create target_table API object Create the TargetTable API object, and stub out the compute_target endpoint.
target Add datetime and string accessors into the Target API object.
service Fix unnecessary usage of SQL functions incompatible with Spark 3.2 (ILIKE and DATEADD)
preview Improve efficiency of feature and feature list preview by reducing unnecessary tile computation
service Fix DATEADD undefined function error in Spark 3.2 and re-enable tests
service Implement TileRegistryService to track tile information in mongo persistent
spark-session add kerberos authentication and webhdfs support for Spark session
service Fix compatibility of string contains operation with Spark 3.2
target add CRUD API endpoints for Target First portion of the work to include the Target API object.
target Fully implement compute_target to materialize a dataframe
service Refactor info service by splitting out logic to their respective services. Most of the info service logic was not being reused. It also feels cleaner for each service to be responsible for its own info logic. This way, dependencies are clearer. We also refactor service initialization such that we consistently use the dependency injection pattern.
online-serving Use INSERT operation to update online store tables to address concurrency issues
target create target namespace when we create a target
service Fix more datetime transform compatibility issues in Spark 3.2
storage Add support for using s3 as storage for featurebyte service
target Create target_table services, routes, models and schema This will help us support materializing target tables in the warehouse.

⚠️ Deprecations¶

target remove blind_spot from target models as it is not used

🐛 Bug Fixes¶

worker fixed cpu threading model
service Fix feature definition for isin() operation
online-serving Fix the job_schedule_ts_str parameter when updating online store tables in scheduled tile tasks
gh-actions Add missing build dependencies for kerberos support.
feature_readiness fix feature readiness bug due to readiness is treated as string when finding default feature ID
transforms Update get_relative_frequency to return 0 when there is no matching label
service Fix OnlineStoreComputeQuery prematurely deleted when still in use by other features
data-warehouse Fix metadata schema update for Spark and Databricks and bump working version
service Fix TABLESAMPLE syntax error in Spark for very small sample percentage
feature fix view join operation bug which causes improper query graph pruning
service Fix a bug in add_feature() where entity_id was incorrectly attached to the derived column

v0.4.0 yanked (2023-07-25)¶

v0.3.1 (2023-06-08)¶

🐛 Bug Fixes¶

websocket make websocket client more resilient connection lost
websocket fix client failure when starting secure websocket connection

v0.3.0 (2023-06-05)¶

💡 Enhancements¶

guardrails add guardrail to make sure *Table creation does not contain shared column name in different parameters
feature-list add default_feature_fraction to feature list object
datasource check if database/schema exists when listing schemas/tables in a datasource
error-handling improve error handling and messaging for Docker exceptions
feature-list Refactor compute_historical_features() to use the materialized table workflow
workflows Update daily cron, dependencies and lint workflows to use code defined github workflows.
feature refactor feature object to remove unused entity_identifiers, protected_columns & inherited_columns properties
scheduler implement soft time limit for io tasks using gevent celery worker pool
list_versions() add is_default column to feature's & feature list's list_versions object method output DataFrame
feature refactor feature class to drop FrozenFeatureModel inheritance
storage support GCS storage for Spark and DataBricks sessions
variables expose catalog_id property in the Entity and Relationship API objects
historical-features Compute historical features in batches of columns
view-object add column_cleaning_operations to view object
logging support overriding default log level using environment variable LOG_LEVEL
list_versions() remove feature_list_namespace_id and num_feature from feature_list.list_versions()
feature-api-route remove entity_ids from feature creation route payload
historical-features Improve tile cache performance by reducing unnecessary recalculation of tiles for historical requests
worker support scheduler, worker:io, worker:cpu in startup command to start different services
feature-list add default_feature_list_id to feature_list.info() output
feature remove feature_namespace_id (feature_list_namespace_id) from feature (feature list) creation payload
docs automatically create debug folder if it doesn't exist when running docs
feature-list add primary_entities to feature list's list() method output DataFrame
feature add POST /feature/batch endpoint to support batch feature creation
table-column add cleaning_operations to table column object & view column object
workflows Update workflows to use code defined github workflows.
feature-session Support Azure blob storage for Spark and DataBricks sessions
feature update feature's & feature list's version format from dictionary to string
feature-list refactor feature list class to drop FrozenFeatureListModel inheritance
display implement HTML representation for API objects .info() result
feature remove dtype from feature creation route payload
aggregate-asat Support cross aggregation option for aggregate_asat.
databricks support streamed records fetching for DataBricks session
feature-definition update feature definition by explicitly specifying on parameter in join operation
source-table-listing Exclude tables with names that has a "__" prefix in source table listing

⚠️ Deprecations¶

middleware removed TelemetryMiddleware
feature-definition remove unused statement from feature.definition
FeatureJobSettingAnalysis remove analysis_parameters from FeatureJobSettingAnalysis.info() result

🐛 Bug Fixes¶

relationship fixed bug that was causing an error when retrieving a Relationship with no updated_by set
dependencies updated requests package due to vuln
mongodb mongodb logs to be shipped to stderr to reduce disk usage
deployment fix multiple deployments sharing the same feature list bug
dependencies updated pymdown-extensions due to vuln CVE-2023-32309
dependencies fixed vulnerability in starlette
api-client API client should not handle 30x redirects as these can result in unexpected behavior
mongodb update get_persistent() by removing global persistent object (which is not thread safe)
feature-definition fixed bug in feature.definition so that it is consistent with the underlying query graph

v0.2.2 (2023-05-10)¶

💡 Enhancements¶

Update healthcare demo dataset to include timezone columns

🐛 Bug Fixes¶

Drop a materialized table only if it exists when cleaning up on error
Added dependencies workflow to repo to check for dependency changes in PRs
Fixed taskfile java tasks to properly cache the downloaded jar files.

v0.2.1 (2023-05-10)¶

🐛 Bug Fixes¶

Removed additional dependencies specified in featurebyte client

v0.2.0 (2023-05-08)¶

🛑 Breaking changes¶

featurebyte is now available for early access