## Lesson Video:

This article is also a Jupyter Notebook available to be run from the top down. There will be code snippets that you can then run in any environment.

Below are the versions of fastai, fastcore, wwf, and bayesian-optimization currently running at the time of writing this:

• fastai: 2.1.10
• fastcore: 1.3.13
• wwf: 0.0.8
• bayesian-optimization: 1.2.0

from fastai.tabular.all import *

path = untar_data(URLs.ADULT_SAMPLE)

from bayes_opt import BayesianOptimization

def fit_with(lr:float, wd:float, dp:float):
# create a Learner
config = tabular_config(embed_p=dp, ps=wd)
learn = tabular_learner(data, layers=[200,100], metrics=accuracy, config=config)

# Train for x epochs
with learn.no_bar():
learn.fit_one_cycle(3, lr)

# Save, print, and return the overall accuracy
acc = float(learn.validate())

return acc


Let's adjust this further to show how we would go about adjusting the learning rate, embedded weight decay, drop out, and layer size:

def fit_with(lr:float, wd:float, dp:float, n_layers:float, layer_1:float, layer_2:float, layer_3:float):

print(lr, wd, dp)
if round(n_layers) == 2:
layers = [round(layer_1), round(layer_2)]
elif int(n_layers) == 3:
layers = [round(layer_1), round(layer_2), round(layer_3)]
else:
layers = [round(layer_1)]
config = tabular_config(embed_p=float(dp),
ps=float(wd))
learn = tabular_learner(dls, layers=layers, metrics=accuracy, config = config)

with learn.no_bar() and learn.no_logging():
learn.fit(5, lr=float(lr))

acc = float(learn.validate())

return acc


Let's try it out

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
y_names = 'salary'
y_block = CategoryBlock()
splits = RandomSplitter()(range_of(df))

to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,
y_names=y_names, y_block=y_block, splits=splits)

dls = to.dataloaders(bs=512)


We'll declare our hyper-parameters:

hps = {'lr': (1e-05, 1e-01),
'wd': (4e-4, 0.4),
'dp': (0.01, 0.5),
'n_layers': (1,3),
'layer_1': (50, 200),
'layer_2': (100, 1000),
'layer_3': (200, 2000)}


And now we build the optimizer:

optim = BayesianOptimization(
f = fit_with, # our fit function
pbounds = hps, # our hyper parameters to tune
verbose = 2, # 1 prints out when a maximum is observed, 0 for silent
random_state=1
)


And now we can search!

%time optim.maximize(n_iter=10)


We can grab the best results:

print(optim.max)

{'target': 0.8404483795166016, 'params': {'dp': 0.1710613610736308, 'layer_1': 57.63154958927875, 'layer_2': 100.1567384765859, 'layer_3': 1930.4092799350558, 'lr': 0.0721697277771627, 'n_layers': 2.868052690189961, 'wd': 0.035039066808375346}}


And with a few conversions we see:

• The best number of layers was 2
• The first layer a size of 57
• The second layer a size of 100 And then of course our other hyper paramters