import shap
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import eli5
from eli5.sklearn import PermutationImportance

Reading in the data

data = pd.read_csv('../input/hospital-readmissions/train.csv')

Training our model

y = data.readmitted

base_features = [c for c in data.columns if c != "readmitted"]

X = data[base_features]

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
my_model = RandomForestClassifier(n_estimators=30, random_state=1).fit(train_X, train_y)

Perfoming some Permutation importance first

perm1 = PermutationImportance(my_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm1, feature_names = val_X.columns.tolist())
Weight Feature
0.0451 ± 0.0068 number_inpatient
0.0087 ± 0.0046 number_emergency
0.0062 ± 0.0053 number_outpatient
0.0033 ± 0.0016 payer_code_MC
0.0020 ± 0.0016 diag_3_401
0.0016 ± 0.0031 medical_specialty_Emergency/Trauma
0.0014 ± 0.0024 A1Cresult_None
0.0014 ± 0.0021 medical_specialty_Family/GeneralPractice
0.0013 ± 0.0010 diag_2_427
0.0013 ± 0.0011 diag_2_276
0.0011 ± 0.0022 age_[50-60)
0.0010 ± 0.0022 age_[80-90)
0.0007 ± 0.0006 repaglinide_No
0.0006 ± 0.0010 diag_1_428
0.0006 ± 0.0022 payer_code_SP
0.0005 ± 0.0030 insulin_No
0.0004 ± 0.0028 diabetesMed_Yes
0.0004 ± 0.0021 diag_3_250
0.0003 ± 0.0018 diag_2_250
0.0003 ± 0.0015 glipizide_No
… 44 more …
data_for_prediction = val_X.iloc[0,:]

# Creating an object that can calculate shap values
explainer = shap.TreeExplainer(my_model)
shap_values = explainer.shap_values(data_for_prediction)
shap.initjs()
shap.force_plot(explainer.expected_value[0], shap_values[0], data_for_prediction)
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

Looking at both the permutation importance table and the shaply values it would be safe to conclude that "number_inpatient" is a really important feature