Playground: Healthcare¶
The notebook creates a fresh catalog using the healthcare dataset. You can use it to practice feature engineering.
Load the featurebyte library and connect to the local instance of featurebyte¶
In [1]:
Copied!
# library imports
import pandas as pd
import numpy as np
# load the featurebyte SDK
import featurebyte as fb
# start the local server, then wait for it to be healthy before proceeding
fb.playground()
# library imports
import pandas as pd
import numpy as np
# load the featurebyte SDK
import featurebyte as fb
# start the local server, then wait for it to be healthy before proceeding
fb.playground()
02:08:26 | INFO | Using configuration file at: /home/chester/.featurebyte/config.yaml 02:08:26 | INFO | Active profile: local (http://127.0.0.1:8088) 02:08:26 | INFO | SDK version: 0.2.2 02:08:26 | INFO | Active catalog: default 02:08:26 | INFO | 0 feature list, 0 feature deployed 02:08:26 | INFO | (1/4) Starting featurebyte services Container redis Running Container spark-thrift Running Container mongo-rs Running Container featurebyte-server Running Container featurebyte-worker Running Container mongo-rs Waiting Container redis Waiting Container mongo-rs Waiting Container mongo-rs Healthy Container mongo-rs Healthy Container redis Healthy 02:08:27 | INFO | (2/4) Creating local spark feature store 02:08:27 | INFO | (3/4) Import datasets 02:08:28 | INFO | Dataset grocery already exists, skipping import 02:08:28 | INFO | Dataset healthcare already exists, skipping import 02:08:28 | INFO | Dataset creditcard already exists, skipping import 02:08:28 | INFO | (4/4) Playground environment started successfully. Ready to go! 🚀
Create a pre-built catalog for this tutorial, with the data, metadata, and features already set up¶
Note that creating a pre-built catalog is not a step you will do in real-life. This is a function specific to this quick-start tutorial to quickly skip over many of the preparatory steps and get you to a point where you can materialize features.
In a real-life project you would do data modeling, declaring the tables, entities, and the associated metadata. This would not be a frequent task, but forms the basis for best-practice feature engineering.
Load the featurebyte library and connect to the local instance of featurebyte¶
In [2]:
Copied!
# get the functions to create a pre-built catalog
from prebuilt_catalogs import *
# create a new catalog for this tutorial
catalog = create_tutorial_catalog(PrebuiltCatalog.Playground_Healthcare)
# get the functions to create a pre-built catalog
from prebuilt_catalogs import *
# create a new catalog for this tutorial
catalog = create_tutorial_catalog(PrebuiltCatalog.Playground_Healthcare)
Cleaning up existing tutorial catalogs
02:08:28 | INFO | Catalog activated: healthcare playground 20230511:0208
Building a playground catalog for healthcare named [healthcare playground 20230511:0208] Creating new catalog Catalog created Registering the source tables Registering the entities Tagging the entities to columns in the data tables ################################################################## # suggested script to load the tables and views into your notebook # get the table objects medicalproduct_table = catalog.get_table("MEDICALPRODUCT") specialtygroup_table = catalog.get_table("SPECIALTYGROUP") icd9hierarchy_table = catalog.get_table("ICD9HIERARCHY") labobservation_table = catalog.get_table("LABOBSERVATION") labresult_table = catalog.get_table("LABRESULT") prescription_table = catalog.get_table("PRESCRIPTION") visit_table = catalog.get_table("VISIT") statedetails_table = catalog.get_table("STATEDETAILS") allergy_table = catalog.get_table("ALLERGY") patientsmokingstatus_table = catalog.get_table("PATIENTSMOKINGSTATUS") diagnosis_table = catalog.get_table("DIAGNOSIS") patient_table = catalog.get_table("PATIENT") # get the view objects medicalproduct_view = medicalproduct_table.get_view() specialtygroup_view = specialtygroup_table.get_view() icd9hierarchy_view = icd9hierarchy_table.get_view() labobservation_view = labobservation_table.get_view() labresult_view = labresult_table.get_view() prescription_view = prescription_table.get_view() visit_view = visit_table.get_view() statedetails_view = statedetails_table.get_view() allergy_view = allergy_table.get_view() patientsmokingstatus_view = patientsmokingstatus_table.get_view() diagnosis_view = diagnosis_table.get_view() patient_view = patient_table.get_view() ##################################################################