Skip to content

Changelog

v1.0.2 (2024-03-15)

🐛 Bug Fixes

  • service Databricks integration fix

v1.0.1 (2024-03-12)

💡 Enhancements

  • api Support description specification during table creation.
  • api Create api to manage online stores
  • session Specify role and group in Snowflake and Databricks details to enforce permissions for accessing source and output tables
  • service Simplify user defined function route creation schema
  • online_serving Implement FEAST offline stores for Spark Thrift and DataBricks for online serving support
  • service Compute data description in batches of columns
  • service Support offset parameter for aggregate_asat
  • profile Create a profile from databricks secrets to simplify access from a Databricks workspace.
  • service Improve efficiency of feature table cache checks for saved feature lists
  • session Add client_session_keep_alive to snowflake connector to keep the session alive
  • service Support cancellation for historical features table creation task

🐛 Bug Fixes

  • service Updates output variable type of count aggregation to be integer instead of float
  • service Fix FeatureList online_enabled_feature_ids attribute not updated correctly in some cases
  • session Fix snowflake session using wrong role if the user's default role does not match role in feature store details
  • session Fix count dictionary entropy UDF behavior for edge cases
  • deployment Fix getting sample entity serving names for deployment fails when entity has null values
  • service Fix ambiguous column name error when using SCD lookup features with different offsets

v1.0.0 (2023-12-21)

💡 Enhancements

  • session Implement missing UDFs for DataBricks clusters that support Unity Catalog.
  • storage Support azure blob storage for file storage.

🐛 Bug Fixes

  • service Fixes a bug where the feature saving would fail if the feature or colum name contains quotes.
  • deployment Fix an issue where periodic tasks were not disabled when reverting a failed deployment

v0.6.2 (2023-12-01)

🛑 Breaking Changes

  • api Support using observation tables in feature, target and featurelist preview
  • Parameter observation_set in Feature.preview, Target.preview and FeatureList.preview now accepts ObservationTable object or pandas dataframe
  • Breaking change: Parameter observation_table in FeatureList.compute_historical_feature_table is renamed to observation_set
  • feature_list Change feature list catalog output dataframe column name from primary_entities to primary_entity

💡 Enhancements

  • databricks-unity Add session for databricks unity cluster, and migrate one UDF to python for databricks unity cluster.
  • target Allow users to create observation table with just a target id, but no graph.
  • service Support latest aggregation for vector columns
  • service Update repeated columns validation logic to handle excluded columns.
  • endpoints Enable observation table to associate with multiple use cases from endpoints
  • target Derive window for lookup targets as well
  • service Add critical data info validation logic
  • api Implement remove observation table from context
  • service Support rename of context, use case, observation table and historical feature table
  • target_table Persist primary entity IDs for the target observation table
  • observation_table Update observation table creation check to make sure primary entity is set
  • service Implement service to materialize features to be published to external feature store
  • service Add feature definition hash to new feature model to allow duplicated features to be detected
  • observation_table Track uploaded file name when creating an observation table from an uploaded file.
  • observation_table Add way to update purpose for observation table.
  • tests Use published featurebyte library in notebook tests.
  • service Reduce complexity of describe query to avoid memory issue during query compilation
  • session Use DBFS for Databricks session storage to simplify setup
  • target_namespace Add support for target namespace deletion
  • observation_table add minimum interval between entities to observation table
  • api Implement delete observation table from use case
  • api Implement removal of default preview and eda table for context
  • api Enable observation table to associate with multiple use cases from api
  • api Implement removal of default preview and eda table for use case

🐛 Bug Fixes

  • observation_table fix validation around primary entity IDs when creating observation tables
  • worker Use cpu worker for feature job setting analysis to avoid blocking io worker async loop
  • session Make data warehouse session creation asynchronous with a timeout to avoid blocking the asyncio main thread. This prevents the API service from being unresponsive when certain compute clusters takes a long time to start up.
  • service Fix observation table sampling so that it is always uniform over the input
  • worker Fix feature job setting analysis fails for databricks feature store
  • session Fix spark session failing with spark version >= 3.4.1
  • service Fix observation table file upload error
  • target Support value_column=None for count in forward_aggregate/target operations.
  • service Fix division by zero error when calling describe on empty views
  • worker Fix bug where feature job setting analysis backtest fails when the analysis is missing an optional histogram
  • service Fixes a view join issue that causes the generated feature not savable due to graph inconsistency.
  • use_case Allow use cases to be created with descriptive only targets
  • service Fixes an error when rendering FeatureJobStatusResult in notebooks when matplotlib package is not available.
  • feature Fix feature saving bug when the feature contains timestamp filtering

v0.6.1 (2023-11-22)

🐛 Bug Fixes

  • api fixed async task return code

v0.6.0 (2023-10-10)

🛑 Breaking Changes

  • observation_table Validate that entities are present when creating an observation table.

💡 Enhancements

  • target Use window from target namespace instead of the target version.
  • service UseCase creation to accept TargetNameSpace id as a parameter
  • historical_feature_table Make FeatureClusters optional when creating historical feature table from UI.
  • service Move online serving code template generation to the online serving service
  • model Handle old Context records with entity_ids attribute in the database
  • service Add key_with_highest_value() and key_with_lowest_value() for cross aggregates
  • api Add consistent table feature job settings validation during feature creation.
  • api Change Context Entity attribute's name to Primary Entity
  • api Use primary entity parameter in Target and Context creation
  • service Add last_updated_at in FeatureModel to indicate when feature value is last updated
  • api Revise feature list create new version to avoid throwing error when the feature list is the same as the previous version
  • service Support rprefix parameter in View's join method
  • observation_table Add an optional purpose to observation table when creating a new observation table.
  • docs Documentation for Context and UseCase
  • observation_table Track earliest point in time, and unique entity col counts as part of metadata.
  • service Support extracting value counts and customised statistics in PreviewService
  • api Remove direct observation table reference from UseCase
  • warehouse improve data warehouse asset validation
  • api Use EntityBriefInfoList for entity info for both UseCase and Context
  • api Add trigo functions to series.
  • api Include observation table operation into Context API Object
  • observation_table Add route to allow users to upload CSV files to create observation tables.
  • target Tag entity_ids when creating an observation table from a target.
  • api-client improve api-client retry
  • service Entity Validation for Context, Target and UseCase
  • service Add Context Info method into both Context API Object and Route
  • api Add functionality to calculate haversine distance.
  • service Fix PreviewService describe() method when stats_names are provided

🐛 Bug Fixes

  • service Validate non-existent Target and Context when creating Use Case
  • session Fix execute query failing when variant columns contain null values
  • service Validate null target_id when adding obs table to use case
  • service Fix maximum recursion depth exceeded error in complex queries
  • service Fix race condition when accessing cached values in ApiObject's get_by_id()
  • hive fix hive connection error when spark_catalog is not the default
  • api Target#list should include items in target namespace.
  • target Fix target definition SDK code generation by skipping project.
  • service Fix join validation logic to account for rprefix

v0.5.1 (2023-09-08)

💡 Enhancements

  • service Optimize feature readiness service update runtime.

🐛 Bug Fixes

  • packaging Restore cryptography package dependency [DEV-2233]

v0.5.0 (2023-09-06)

🛑 Breaking Changes

  • Configurations Configurations::use_profile() function is now a method rather than a classmethod
    - Configurations.use_profile("profile")
    + Configurations().use_profile("profile")
    

💡 Enhancements

  • service Cache view created from query in Spark for better performance
  • vector-aggregation Add java UDAFs for sum and max for use in spark.
  • vector-operations Add cosine_similarity to compare two vector columns.
  • vector-aggregation Add integration test to test end to end for VECTOR_AGGREGATE_MAX.
  • vector-aggregations Enable vector aggregations for tiling aggregate - max and sum - functions
  • middleware Organize exceptions to reduce verbosity in middleware
  • api Add support for updating description of table columns in the python API
  • vector-aggregation Update groupby logic for non tile based aggregates
  • api Implement API object for Use Case component
  • api Use Context name instead of Context id for the API signature
  • api Implement API object for Context
  • vector_aggregation Add UDTF for max, sum and avg for snowflake.
  • api Integrate Context API object for UseCase
  • vector-aggregation Snowflake return values for vector aggregations should be a list now, instead of a string.
  • vector-aggregation Add java UDAFs for average for use in spark.
  • vector_aggregation Only return one row in table vector aggregate function per partition
  • service Support conditionally updating a feature using a mask derived from other feature(s)
  • vector-aggregation Add guardrails to prevent array aggregations if agg func is not max or avg.
  • service Tag semantics for all special columns during table creation
  • api Implement UseCase Info
  • service Change join type to inner when joining event and item tables
  • vector-aggregation Register vector aggregate max, and update parent dtype inference logic.
  • service Implement scheduled task to clean up stale versions and drop online store tables when possible
  • use-case Implement guardrail for use case's observation table not to be deleted
  • vector-aggregations Enable vector aggregations for tiling aggregate avg function
  • api Rename description update functions for versioned assets
  • vector-aggregation Support integer values in vectors; add support integration test for simple aggregates
  • vector-aggregation Update groupby_helper to take in parent_dtype.
  • httpClient added a ssl_verify value in Configurations to allow disabling of ssl certificate verification
  • online-serving Split online store compute and insert query to minimize table locking
  • tests Use the notebook as the test id in the notebook tests.
  • vector-aggregation Add simple average spark udaf.
  • vector-aggregation Add average snowflake udtf.
  • api Associate Deployment with UseCase
  • service Skip creating a data warehouse session when online disabling a feature
  • use-case implement use case model and its associated routes
  • service Apply event timestamp filter on EventTable directly in scheduled tile jobs when possible

🐛 Bug Fixes

  • worker Block running multiple concurrent deployment create/update tasks for the same deployment
  • service Fix bug where feature job starts running while the feature is still being enabled
  • dependencies upgrading scipy dependency
  • service Fixes an invalid identifier error in sql when feature involves a mix of filtered and non-filtered versions of the same view.
  • worker Fixes a bug where scheduler does not work with certain mongodb uris.
  • online-serving Fix incompatible column types when inserting to online store tables
  • service Fix feature saving error due to tile generation bug
  • service Ensure row ordering of online serving output DataFrame matches input request data
  • dependencies Limiting python range to 3.8>=,<3.12 due to scipy constraint
  • service Use execute_query_long_running when inserting to online store tables to fix timeout errors
  • model Mongodb index on periodic task name conflicts with scheduler engine
  • service Fix conversion of date type to double in spark

v0.4.4 (2023-08-29)

🐛 Bug Fixes

  • api Fix logic for determining timezone offset column in datetime accessor
  • service Fix SDK code generation for conditional assignment when the assign value is a series
  • service Fix invalid identifier error for complex features with both item and window aggregates

💡 Enhancements

  • profile Allow creating of profile directly with fb.register_profile(name, url, token)

v0.4.3 (2023-08-21)

🐛 Bug Fixes

  • service Fix feature materialization error due to ambiguous internal column names
  • service Fix error when generating info for features in some edge cases
  • api Fix item table default job settings not synchronized when job settings are updated in the event table, fix historical feature table listing failure

v0.4.2 (2023-08-07)

🛑 Breaking Changes

  • target Update compute_target to return observation table instead of target table will make it easier to use with compute historical features
  • target Update target info to return a TableBriefInfoList instead of a custom struct this will help keep it consistent with feature, and also fix a bug in info where we wrongly assumed there was only one input table.

💡 Enhancements

  • target Add as_target to SDK, and add node to graph when it is called
  • target Add fill_value and skip_fill_na to forward_aggregate, and update name
  • target Create lookup target graph node
  • service Speed up operation structure extraction by caching the result of _extract() in BaseGraphExtractor

🐛 Bug Fixes

  • api Fix api objects listing failure in some notebooks environments
  • utils Fix is_notebook check to support Google Colab [https://github.com/featurebyte/featurebyte/issues/1598]

v0.4.1 (2023-07-25)

🛑 Breaking Changes

  • online-serving Update online store table schema to use long table format
  • dependencies Limiting python version from >=3.8,<4.0 to >=3.8,<3.13 due to scipy version constraint

💡 Enhancements

  • generic-function add user-defined-function support
  • target add basic API object for Target Initialize the basic API object for Target.
  • feature-group update the feature group save operation to use /feature/batch route
  • service Update describe query to be compatible with Spark 3.2
  • service Ensure FeatureModel's derived attributes are derived from pruned graph
  • target add basic info for Target Adds some basic information about Target's. Additional information that contains more details about the actual data will be added in a follow-up.
  • list_versions update Feature's & FeatureList's list_versions method by adding is_default to the dataframe output
  • service Move TILE_JOB_MONITOR table from data warehouse to persistent
  • service Avoid using SHOW COLUMNS to support Spark 3.2
  • table skip calling data warehouse for table metadata during table construction
  • target add ForwardAggregate node to graph for ForwardAggregate Implement ForwardAggregator - only adds node to graph. Node is still a no-op.
  • service Add option to disable audit logging for internal documents
  • query-graph optimize query graph pruning computation by combining multiple pruning tasks into one
  • target add input data and metadata for targets Add more information about target metadata.
  • target Add primary_entity property to Target API object.
  • service Refactor FeatureManager and TileManager as services
  • tests Move tutorial notebooks into the FeatureByte repo
  • service Replace ONLINE_STORE_MAPPING data warehouse table by OnlineStoreComputeQueryService
  • feature block feature readiness & feature list status transition from DRAFT to DEPRECATED
  • task_manager refactor task manager to take celery object as a parameter, and refactor task executor to import tasks explicitly
  • feature fix bug with feature_list_ids not being updated after a feature list is deleted
  • service Replace TILE_FEATURE_MAPPING table in the data warehouse with mongo persistent
  • target perform SQL generation for forward aggregate node
  • feature fix primary entity identification bug for time aggregation over item aggregation features
  • feature limit manual default feature version selection to only the versions with highest readiness level
  • feature-list revise feature list saving to reduce api calls
  • service Refactor tile task to use dependency injection
  • service Fix error when disabling features created before OnlineStoreComputeQueryService is introduced
  • deployment Skip redundant updates of ONLINE_STORE_MAPPING table
  • static-source-table support materialization of static source table from source table or view
  • catalog Create target_table API object Remove default catalog, require explicit activation of catalog before catalog operations.
  • feature-list update feature list to preserve feature order
  • target Add gates to prevent target from setting item to non-target series.
  • target Add TargetNamespace#create This will allow us to register spec-like versions of a Target, that don't have a recipe attached.
  • deployment Reduce unnecessary backfill computation when deploying features
  • service Refactor TileScheduler as a service
  • target stub out target namespace schema and models
  • service Add traceback to tile job log for troubleshooting
  • target add end-to-end integration test for target, and include preview endpoint in target
  • feature update feature & feature list save operation to use POST /feature/batch route
  • service Disable tile monitoring by default
  • service Fix listing of databases and schemas in Spark 3.2
  • target Refactor compute_target and compute_historical_feature
  • feature optimize time to deserialize feature model
  • entity-relationship remove POST /relationship_info, POST /entity/parent and DELETE /entity/parent/ endpoints
  • service Support description update and retrieval for all saved objects
  • config Add default_profile in config to allow for a default profile to be set, and require a profile to be set if default_profile is not set
  • target Create target_table API object Create the TargetTable API object, and stub out the compute_target endpoint.
  • target Add datetime and string accessors into the Target API object.
  • service Fix unnecessary usage of SQL functions incompatible with Spark 3.2 (ILIKE and DATEADD)
  • preview Improve efficiency of feature and feature list preview by reducing unnecessary tile computation
  • service Fix DATEADD undefined function error in Spark 3.2 and re-enable tests
  • service Implement TileRegistryService to track tile information in mongo persistent
  • spark-session add kerberos authentication and webhdfs support for Spark session
  • service Fix compatibility of string contains operation with Spark 3.2
  • target add CRUD API endpoints for Target First portion of the work to include the Target API object.
  • target Fully implement compute_target to materialize a dataframe
  • service Refactor info service by splitting out logic to their respective services. Most of the info service logic was not being reused. It also feels cleaner for each service to be responsible for its own info logic. This way, dependencies are clearer. We also refactor service initialization such that we consistently use the dependency injection pattern.
  • online-serving Use INSERT operation to update online store tables to address concurrency issues
  • target create target namespace when we create a target
  • service Fix more datetime transform compatibility issues in Spark 3.2
  • storage Add support for using s3 as storage for featurebyte service
  • target Create target_table services, routes, models and schema This will help us support materializing target tables in the warehouse.

⚠️ Deprecations

  • target remove blind_spot from target models as it is not used

🐛 Bug Fixes

  • worker fixed cpu threading model
  • service Fix feature definition for isin() operation
  • online-serving Fix the job_schedule_ts_str parameter when updating online store tables in scheduled tile tasks
  • gh-actions Add missing build dependencies for kerberos support.
  • feature_readiness fix feature readiness bug due to readiness is treated as string when finding default feature ID
  • transforms Update get_relative_frequency to return 0 when there is no matching label
  • service Fix OnlineStoreComputeQuery prematurely deleted when still in use by other features
  • data-warehouse Fix metadata schema update for Spark and Databricks and bump working version
  • service Fix TABLESAMPLE syntax error in Spark for very small sample percentage
  • feature fix view join operation bug which causes improper query graph pruning
  • service Fix a bug in add_feature() where entity_id was incorrectly attached to the derived column

v0.4.0 yanked (2023-07-25)

v0.3.1 (2023-06-08)

🐛 Bug Fixes

  • websocket make websocket client more resilient connection lost
  • websocket fix client failure when starting secure websocket connection

v0.3.0 (2023-06-05)

💡 Enhancements

  • guardrails add guardrail to make sure *Table creation does not contain shared column name in different parameters
  • feature-list add default_feature_fraction to feature list object
  • datasource check if database/schema exists when listing schemas/tables in a datasource
  • error-handling improve error handling and messaging for Docker exceptions
  • feature-list Refactor compute_historical_features() to use the materialized table workflow
  • workflows Update daily cron, dependencies and lint workflows to use code defined github workflows.
  • feature refactor feature object to remove unused entity_identifiers, protected_columns & inherited_columns properties
  • scheduler implement soft time limit for io tasks using gevent celery worker pool
  • list_versions() add is_default column to feature's & feature list's list_versions object method output DataFrame
  • feature refactor feature class to drop FrozenFeatureModel inheritance
  • storage support GCS storage for Spark and DataBricks sessions
  • variables expose catalog_id property in the Entity and Relationship API objects
  • historical-features Compute historical features in batches of columns
  • view-object add column_cleaning_operations to view object
  • logging support overriding default log level using environment variable LOG_LEVEL
  • list_versions() remove feature_list_namespace_id and num_feature from feature_list.list_versions()
  • feature-api-route remove entity_ids from feature creation route payload
  • historical-features Improve tile cache performance by reducing unnecessary recalculation of tiles for historical requests
  • worker support scheduler, worker:io, worker:cpu in startup command to start different services
  • feature-list add default_feature_list_id to feature_list.info() output
  • feature remove feature_namespace_id (feature_list_namespace_id) from feature (feature list) creation payload
  • docs automatically create debug folder if it doesn't exist when running docs
  • feature-list add primary_entities to feature list's list() method output DataFrame
  • feature add POST /feature/batch endpoint to support batch feature creation
  • table-column add cleaning_operations to table column object & view column object
  • workflows Update workflows to use code defined github workflows.
  • feature-session Support Azure blob storage for Spark and DataBricks sessions
  • feature update feature's & feature list's version format from dictionary to string
  • feature-list refactor feature list class to drop FrozenFeatureListModel inheritance
  • display implement HTML representation for API objects .info() result
  • feature remove dtype from feature creation route payload
  • aggregate-asat Support cross aggregation option for aggregate_asat.
  • databricks support streamed records fetching for DataBricks session
  • feature-definition update feature definition by explicitly specifying on parameter in join operation
  • source-table-listing Exclude tables with names that has a "__" prefix in source table listing

⚠️ Deprecations

  • middleware removed TelemetryMiddleware
  • feature-definition remove unused statement from feature.definition
  • FeatureJobSettingAnalysis remove analysis_parameters from FeatureJobSettingAnalysis.info() result

🐛 Bug Fixes

  • relationship fixed bug that was causing an error when retrieving a Relationship with no updated_by set
  • dependencies updated requests package due to vuln
  • mongodb mongodb logs to be shipped to stderr to reduce disk usage
  • deployment fix multiple deployments sharing the same feature list bug
  • dependencies updated pymdown-extensions due to vuln CVE-2023-32309
  • dependencies fixed vulnerability in starlette
  • api-client API client should not handle 30x redirects as these can result in unexpected behavior
  • mongodb update get_persistent() by removing global persistent object (which is not thread safe)
  • feature-definition fixed bug in feature.definition so that it is consistent with the underlying query graph

v0.2.2 (2023-05-10)

💡 Enhancements

  • Update healthcare demo dataset to include timezone columns

🐛 Bug Fixes

  • Drop a materialized table only if it exists when cleaning up on error
  • Added dependencies workflow to repo to check for dependency changes in PRs
  • Fixed taskfile java tasks to properly cache the downloaded jar files.

v0.2.1 (2023-05-10)

🐛 Bug Fixes

  • Removed additional dependencies specified in featurebyte client

v0.2.0 (2023-05-08)

🛑 Breaking changes

  • featurebyte is now available for early access