Changelog¶
v0.6.1 (2023-11-22)¶
🐛 Bug Fixes¶
api
fixed async task return code
v0.6.0 (2023-10-10)¶
🛑 Breaking Changes¶
observation_table
Validate that entities are present when creating an observation table.
💡 Enhancements¶
target
Use window from target namespace instead of the target version.service
UseCase creation to accept TargetNameSpace id as a parameterhistorical_feature_table
Make FeatureClusters optional when creating historical feature table from UI.service
Move online serving code template generation to the online serving servicemodel
Handle old Context records with entity_ids attribute in the databaseservice
Add key_with_highest_value() and key_with_lowest_value() for cross aggregatesapi
Add consistent table feature job settings validation during feature creation.api
Change Context Entity attribute's name to Primary Entityapi
Use primary entity parameter in Target and Context creationservice
Add last_updated_at in FeatureModel to indicate when feature value is last updatedapi
Revise feature list create new version to avoid throwing error when the feature list is the same as the previous versionservice
Support rprefix parameter in View's join methodobservation_table
Add an optional purpose to observation table when creating a new observation table.docs
Documentation for Context and UseCaseobservation_table
Track earliest point in time, and unique entity col counts as part of metadata.service
Support extracting value counts and customised statistics in PreviewServiceapi
Remove direct observation table reference from UseCasewarehouse
improve data warehouse asset validationapi
Use EntityBriefInfoList for entity info for both UseCase and Contextapi
Add trigo functions to series.api
Include observation table operation into Context API Objectobservation_table
Add route to allow users to upload CSV files to create observation tables.target
Tag entity_ids when creating an observation table from a target.api-client
improve api-client retryservice
Entity Validation for Context, Target and UseCaseservice
Add Context Info method into both Context API Object and Routeapi
Add functionality to calculate haversine distance.service
Fix PreviewService describe() method when stats_names are provided
🐛 Bug Fixes¶
service
Validate non-existent Target and Context when creating Use Casesession
Fix execute query failing when variant columns contain null valuesservice
Validate null target_id when adding obs table to use caseservice
Fix maximum recursion depth exceeded error in complex queriesservice
Fix race condition when accessing cached values in ApiObject's get_by_id()hive
fix hive connection error when spark_catalog is not the defaultapi
Target#list should include items in target namespace.target
Fix target definition SDK code generation by skipping project.service
Fix join validation logic to account for rprefix
v0.5.1 (2023-09-08)¶
💡 Enhancements¶
service
Optimize feature readiness service update runtime.
🐛 Bug Fixes¶
packaging
Restore cryptography package dependency [DEV-2233]
v0.5.0 (2023-09-06)¶
🛑 Breaking Changes¶
Configurations
Configurations::use_profile() function is now a method rather than a classmethod
💡 Enhancements¶
service
Cache view created from query in Spark for better performancevector-aggregation
Add java UDAFs for sum and max for use in spark.vector-operations
Add cosine_similarity to compare two vector columns.vector-aggregation
Add integration test to test end to end for VECTOR_AGGREGATE_MAX.vector-aggregations
Enable vector aggregations for tiling aggregate - max and sum - functionsmiddleware
Organize exceptions to reduce verbosity in middlewareapi
Add support for updating description of table columns in the python APIvector-aggregation
Update groupby logic for non tile based aggregatesapi
Implement API object for Use Case componentapi
Use Context name instead of Context id for the API signatureapi
Implement API object for Contextvector_aggregation
Add UDTF for max, sum and avg for snowflake.api
Integrate Context API object for UseCasevector-aggregation
Snowflake return values for vector aggregations should be a list now, instead of a string.vector-aggregation
Add java UDAFs for average for use in spark.vector_aggregation
Only return one row in table vector aggregate function per partitionservice
Support conditionally updating a feature using a mask derived from other feature(s)vector-aggregation
Add guardrails to prevent array aggregations if agg func is not max or avg.service
Tag semantics for all special columns during table creationapi
Implement UseCase Infoservice
Change join type to inner when joining event and item tablesvector-aggregation
Register vector aggregate max, and update parent dtype inference logic.service
Implement scheduled task to clean up stale versions and drop online store tables when possibleuse-case
Implement guardrail for use case's observation table not to be deletedvector-aggregations
Enable vector aggregations for tiling aggregate avg functionapi
Rename description update functions for versioned assetsvector-aggregation
Support integer values in vectors; add support integration test for simple aggregatesvector-aggregation
Update groupby_helper to take in parent_dtype.httpClient
added a ssl_verify value in Configurations to allow disabling of ssl certificate verificationonline-serving
Split online store compute and insert query to minimize table lockingtests
Use the notebook as the test id in the notebook tests.vector-aggregation
Add simple average spark udaf.vector-aggregation
Add average snowflake udtf.api
Associate Deployment with UseCaseservice
Skip creating a data warehouse session when online disabling a featureuse-case
implement use case model and its associated routesservice
Apply event timestamp filter on EventTable directly in scheduled tile jobs when possible
🐛 Bug Fixes¶
worker
Block running multiple concurrent deployment create/update tasks for the same deploymentservice
Fix bug where feature job starts running while the feature is still being enableddependencies
upgradingscipy
dependencyservice
Fixes an invalid identifier error in sql when feature involves a mix of filtered and non-filtered versions of the same view.worker
Fixes a bug where scheduler does not work with certain mongodb uris.online-serving
Fix incompatible column types when inserting to online store tablesservice
Fix feature saving error due to tile generation bugservice
Ensure row ordering of online serving output DataFrame matches input request datadependencies
Limiting python range to 3.8>=,<3.12 due to scipy constraintservice
Use execute_query_long_running when inserting to online store tables to fix timeout errorsmodel
Mongodb index on periodic task name conflicts with scheduler engineservice
Fix conversion of date type to double in spark
v0.4.4 (2023-08-29)¶
🐛 Bug Fixes¶
api
Fix logic for determining timezone offset column in datetime accessorservice
Fix SDK code generation for conditional assignment when the assign value is a seriesservice
Fix invalid identifier error for complex features with both item and window aggregates
💡 Enhancements¶
profile
Allow creating of profile directly with fb.register_profile(name, url, token)
v0.4.3 (2023-08-21)¶
🐛 Bug Fixes¶
service
Fix feature materialization error due to ambiguous internal column namesservice
Fix error when generating info for features in some edge casesapi
Fix item table default job settings not synchronized when job settings are updated in the event table, fix historical feature table listing failure
v0.4.2 (2023-08-07)¶
🛑 Breaking Changes¶
target
Update compute_target to return observation table instead of target table will make it easier to use with compute historical featurestarget
Update target info to return a TableBriefInfoList instead of a custom struct this will help keep it consistent with feature, and also fix a bug in info where we wrongly assumed there was only one input table.
💡 Enhancements¶
target
Add as_target to SDK, and add node to graph when it is calledtarget
Add fill_value and skip_fill_na to forward_aggregate, and update nametarget
Create lookup target graph nodeservice
Speed up operation structure extraction by caching the result of _extract() in BaseGraphExtractor
🐛 Bug Fixes¶
api
Fix api objects listing failure in some notebooks environmentsutils
Fix is_notebook check to support Google Colab [https://github.com/featurebyte/featurebyte/issues/1598]
v0.4.1 (2023-07-25)¶
🛑 Breaking Changes¶
online-serving
Update online store table schema to use long table formatdependencies
Limiting python version from >=3.8,<4.0 to >=3.8,<3.13 due to scipy version constraint
💡 Enhancements¶
generic-function
add user-defined-function supporttarget
add basic API object for Target Initialize the basic API object for Target.feature-group
update the feature group save operation to use/feature/batch
routeservice
Update describe query to be compatible with Spark 3.2service
Ensure FeatureModel's derived attributes are derived from pruned graphtarget
add basic info for Target Adds some basic information about Target's. Additional information that contains more details about the actual data will be added in a follow-up.list_versions
update Feature's & FeatureList'slist_versions
method by addingis_default
to the dataframe outputservice
Move TILE_JOB_MONITOR table from data warehouse to persistentservice
Avoid using SHOW COLUMNS to support Spark 3.2table
skip calling data warehouse for table metadata during table constructiontarget
add ForwardAggregate node to graph for ForwardAggregate Implement ForwardAggregator - only adds node to graph. Node is still a no-op.service
Add option to disable audit logging for internal documentsquery-graph
optimize query graph pruning computation by combining multiple pruning tasks into onetarget
add input data and metadata for targets Add more information about target metadata.target
Add primary_entity property to Target API object.service
Refactor FeatureManager and TileManager as servicestests
Move tutorial notebooks into the FeatureByte reposervice
Replace ONLINE_STORE_MAPPING data warehouse table by OnlineStoreComputeQueryServicefeature
block feature readiness & feature list status transition from DRAFT to DEPRECATEDtask_manager
refactor task manager to take celery object as a parameter, and refactor task executor to import tasks explicitlyfeature
fix bug with feature_list_ids not being updated after a feature list is deletedservice
Replace TILE_FEATURE_MAPPING table in the data warehouse with mongo persistenttarget
perform SQL generation for forward aggregate nodefeature
fix primary entity identification bug for time aggregation over item aggregation featuresfeature
limit manual default feature version selection to only the versions with highest readiness levelfeature-list
revise feature list saving to reduce api callsservice
Refactor tile task to use dependency injectionservice
Fix error when disabling features created before OnlineStoreComputeQueryService is introduceddeployment
Skip redundant updates of ONLINE_STORE_MAPPING tablestatic-source-table
support materialization of static source table from source table or viewcatalog
Create target_table API object Remove default catalog, require explicit activation of catalog before catalog operations.feature-list
update feature list to preserve feature ordertarget
Add gates to prevent target from setting item to non-target series.target
Add TargetNamespace#create This will allow us to register spec-like versions of a Target, that don't have a recipe attached.deployment
Reduce unnecessary backfill computation when deploying featuresservice
Refactor TileScheduler as a servicetarget
stub out target namespace schema and modelsservice
Add traceback to tile job log for troubleshootingtarget
add end-to-end integration test for target, and include preview endpoint in targetfeature
update feature & feature list save operation to use POST/feature/batch
routeservice
Disable tile monitoring by defaultservice
Fix listing of databases and schemas in Spark 3.2target
Refactor compute_target and compute_historical_featurefeature
optimize time to deserialize feature modelentity-relationship
remove POST /relationship_info, POST /entity/parent and DELETE /entity/parent/endpoints service
Support description update and retrieval for all saved objectsconfig
Add default_profile in config to allow for a default profile to be set, and require a profile to be set if default_profile is not settarget
Create target_table API object Create the TargetTable API object, and stub out the compute_target endpoint.target
Add datetime and string accessors into the Target API object.service
Fix unnecessary usage of SQL functions incompatible with Spark 3.2 (ILIKE and DATEADD)preview
Improve efficiency of feature and feature list preview by reducing unnecessary tile computationservice
Fix DATEADD undefined function error in Spark 3.2 and re-enable testsservice
Implement TileRegistryService to track tile information in mongo persistentspark-session
add kerberos authentication and webhdfs support for Spark sessionservice
Fix compatibility of string contains operation with Spark 3.2target
add CRUD API endpoints for Target First portion of the work to include the Target API object.target
Fully implement compute_target to materialize a dataframeservice
Refactor info service by splitting out logic to their respective services. Most of the info service logic was not being reused. It also feels cleaner for each service to be responsible for its own info logic. This way, dependencies are clearer. We also refactor service initialization such that we consistently use the dependency injection pattern.online-serving
Use INSERT operation to update online store tables to address concurrency issuestarget
create target namespace when we create a targetservice
Fix more datetime transform compatibility issues in Spark 3.2storage
Add support for using s3 as storage for featurebyte servicetarget
Create target_table services, routes, models and schema This will help us support materializing target tables in the warehouse.
⚠️ Deprecations¶
target
remove blind_spot from target models as it is not used
🐛 Bug Fixes¶
worker
fixed cpu threading modelservice
Fix feature definition for isin() operationonline-serving
Fix the job_schedule_ts_str parameter when updating online store tables in scheduled tile tasksgh-actions
Add missing build dependencies for kerberos support.feature_readiness
fix feature readiness bug due to readiness is treated as string when finding default feature IDtransforms
Update get_relative_frequency to return 0 when there is no matching labelservice
Fix OnlineStoreComputeQuery prematurely deleted when still in use by other featuresdata-warehouse
Fix metadata schema update for Spark and Databricks and bump working versionservice
Fix TABLESAMPLE syntax error in Spark for very small sample percentagefeature
fix view join operation bug which causes improper query graph pruningservice
Fix a bug in add_feature() where entity_id was incorrectly attached to the derived column
v0.4.0 yanked (2023-07-25)¶
v0.3.1 (2023-06-08)¶
🐛 Bug Fixes¶
websocket
make websocket client more resilient connection lostwebsocket
fix client failure when starting secure websocket connection
v0.3.0 (2023-06-05)¶
💡 Enhancements¶
guardrails
add guardrail to make sure*Table
creation does not contain shared column name in different parametersfeature-list
adddefault_feature_fraction
to feature list objectdatasource
check if database/schema exists when listing schemas/tables in a datasourceerror-handling
improve error handling and messaging for Docker exceptionsfeature-list
Refactorcompute_historical_features()
to use the materialized table workflowworkflows
Update daily cron, dependencies and lint workflows to use code defined github workflows.feature
refactor feature object to remove unused entity_identifiers, protected_columns & inherited_columns propertiesscheduler
implement soft time limit for io tasks using gevent celery worker poollist_versions()
addis_default
column to feature's & feature list'slist_versions
object method output DataFramefeature
refactor feature class to dropFrozenFeatureModel
inheritancestorage
support GCS storage for Spark and DataBricks sessionsvariables
exposecatalog_id
property in the Entity and Relationship API objectshistorical-features
Compute historical features in batches of columnsview-object
addcolumn_cleaning_operations
to view objectlogging
support overriding default log level using environment variableLOG_LEVEL
list_versions()
removefeature_list_namespace_id
andnum_feature
fromfeature_list.list_versions()
feature-api-route
removeentity_ids
from feature creation route payloadhistorical-features
Improve tile cache performance by reducing unnecessary recalculation of tiles for historical requestsworker
supportscheduler
,worker:io
,worker:cpu
in startup command to start different servicesfeature-list
adddefault_feature_list_id
tofeature_list.info()
outputfeature
removefeature_namespace_id
(feature_list_namespace_id
) from feature (feature list) creation payloaddocs
automatically createdebug
folder if it doesn't exist when running docsfeature-list
addprimary_entities
to feature list'slist()
method output DataFramefeature
add POST/feature/batch
endpoint to support batch feature creationtable-column
addcleaning_operations
to table column object & view column objectworkflows
Update workflows to use code defined github workflows.feature-session
Support Azure blob storage for Spark and DataBricks sessionsfeature
update feature's & feature list's version format from dictionary to stringfeature-list
refactor feature list class to dropFrozenFeatureListModel
inheritancedisplay
implement HTML representation for API objects.info()
resultfeature
removedtype
from feature creation route payloadaggregate-asat
Support cross aggregation option for aggregate_asat.databricks
support streamed records fetching for DataBricks sessionfeature-definition
update feature definition by explicitly specifyingon
parameter injoin
operationsource-table-listing
Exclude tables with names that has a "__" prefix in source table listing
⚠️ Deprecations¶
middleware
removed TelemetryMiddlewarefeature-definition
remove unused statement fromfeature.definition
FeatureJobSettingAnalysis
removeanalysis_parameters
fromFeatureJobSettingAnalysis.info()
result
🐛 Bug Fixes¶
relationship
fixed bug that was causing an error when retrieving aRelationship
with noupdated_by
setdependencies
updatedrequests
package due to vulnmongodb
mongodb logs to be shipped to stderr to reduce disk usagedeployment
fix multiple deployments sharing the same feature list bugdependencies
updatedpymdown-extensions
due to vulnCVE-2023-32309
dependencies
fixed vulnerability in starletteapi-client
API client should not handle 30x redirects as these can result in unexpected behaviormongodb
updateget_persistent()
by removing global persistent object (which is not thread safe)feature-definition
fixed bug infeature.definition
so that it is consistent with the underlying query graph
v0.2.2 (2023-05-10)¶
💡 Enhancements¶
- Update healthcare demo dataset to include timezone columns
🐛 Bug Fixes¶
- Drop a materialized table only if it exists when cleaning up on error
- Added
dependencies
workflow to repo to check for dependency changes in PRs - Fixed taskfile
java
tasks to properly cache the downloaded jar files.
v0.2.1 (2023-05-10)¶
🐛 Bug Fixes¶
- Removed additional dependencies specified in featurebyte client
v0.2.0 (2023-05-08)¶
🛑 Breaking changes¶
featurebyte
is now available for early access