ClassConfusion is similar to SHAP in the sense of we get a view into how our model is behaving. ClassConfusion speficially will plot how the various variable distributions differed for our confused classes. For now this only works in the Colab environment. Let's look:
We'll train an ADULTS
model again
from fastai.tabular.all import *
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
dep_var = 'salary'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
splits = IndexSplitter(list(range(800,1000)))(range_of(df))
to = TabularPandas(df, procs, cat_names, cont_names, y_names="salary", splits=splits)
dls = to.dataloaders()
learn = tabular_learner(dls, layers=[200,100], metrics=accuracy)
learn.fit(1, 1e-2)
Now let's bring in ClassConfusion
from fastinference.class_confusion import *
We'll build an instance of ClassConfusion
, optionally passing in any variables we want to use, any test dataloaders we want, and whether our list of classes is ordered:
dl = dls.test_dl(df.iloc[:100])
classlist = ['<50k','>=50k']
ClassConfusion(learn, dl=dl, classlist=classlist)
We can now look into each variable and see what the distributions of the confused classes were and how they differed from the entire test set