from fastai.vision.all import *
Lesson Video:
Introduction
This lesson is focused on deployment. This will include some of what Running with fastai (a future course) will discuss.
Deployment is often the most difficult part for machine learning engineers (MLE’s), especially people who have taken the fastai
courses.
Why?
fastai
provides a variety of options to perform inference, but only that. There’s many many more parts when it comes to deployment that MLE’s and Software Engineers deal with such as dependencies, code maintainablity, and more which are more challenging with fastai.
This lesson will have three total parts, showcasing three different levels and parts to deployment of a model:
- Using
fastai
directly for everything. Not wholly recommended, but shown for posterity - Removing
fastai
and fully recreating it. Recommended for longevity - Taking parts 1 and 2 and showcasing them in a fully deployed Hugging Face Space which can handle API interactivity.
Training a Model
Before we can do anything, we need a model to deploy. We’ll use our basic PETs example that we’re extensively familiar with now:
= untar_data(URLs.PETS)/'images'
path = get_image_files(path)
fnames = r'/([^/]+)_\d+.*'
pat = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
batch_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
item_tfms =64
bs
= DataBlock(
pets =(ImageBlock, CategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(),
splitter=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
get_y=item_tfms,
item_tfms=batch_tfms
batch_tfms
)= pets.dataloaders(path, bs=bs) dls
Next we’ll create a basic vision_learner
and utilize one of the timm
models.
:::
= vision_learner(dls, "vit_tiny_patch16_224") learn
Next we’ll train our model:
1) learn.fine_tune(
epoch | train_loss | valid_loss | time |
---|---|---|---|
0 | 2.604205 | 0.767528 | 00:16 |
epoch | train_loss | valid_loss | time |
---|---|---|---|
0 | 0.861317 | 0.427109 | 00:15 |
Now we’ll get to the good stuff
Learner.save
vs Learner.export
One very common misconception in fastai
is the difference between learn.save
and learn.export
:
Operation | save |
export |
---|---|---|
Saves model weights | ✅ | ✅ |
Can save optimizer state | ✅ | ✅ |
Saves data | ❌ | ❌ |
Saves DataLoader transforms |
❌ | ✅ |
This now begs the question, when to use which?
Learner.save
: Saves the current model and optimizer states as a checkpoint in raw PyTorch and nothing elseLearner.export
: Saves the current model, optimizer states, and emptyDataLoaders
for production.
What does that last part mean?
Remember back in lesson 2 when we looked at using raw fastai
transforms to preprocess? The transforms are saved seperately than the data itself. They’re Pipelines
, or directions on how to apply fastai
transforms on data passed in. As a result fastai
dumps the contents of the DataLoaders
by creating new empty ones to use that store how the data was brought in and the transforms applied to them.
For this lesson we will perform both learn.save
and learn.export
. export
will be used in this part as it’s easier to go from A -> B, and save
for the next part as we rip away (or try to rip away) fastai
in its entirety:
"exported_fastai")
learn.export("exported_model", with_opt=False) learn.save(
Path('models/exported_model.pth')
Performing Inference
fastai
offers two different methods for inference natively:
Learner.predict
Learner.get_preds
The first of which is meant for when you’re performing inference on a single image at a time, where as the latter is designed to be multiples of images.
In the real world, most jobs with your model will be utilizing batch-wise inference and as a result Learner.get_preds
instead.
We’ll showcase each of them and how they differ, as well as common pitfalls when utilizing each of them.
First we’ll bring back in the model:
= load_learner("exported_fastai") learn
Using predict
Advantages:
- Easiest to use
- Provides readable outputs
Disadvantages:
- Easy to make computationally inefficient
- Can only do one singular input at a time
learn.predict
is very straightforward. Simply pass in a filename or input to use that is in the format of the training or validation data to the method and it will generate predictions with transforms based on the validation set and decode the outputs to something human readable
predict
predict
expects the data to be something our original get_x
can use. As a result this would translate to something we can apply PILImage.create
on. This is a very common misnomer and frustration. If you are trying to use predict
and get an error about “unsupported” parts, try removing a step you are performing to bring it in before you pass it to predict
(such as pass in the raw bytes
to learn.predict
rather than decoding them first)
Let’s try this now.
First we’ll use a single image from our dataset:
= fnames[0]; fname fname
Path('/root/.fastai/data/oxford-iiit-pet/images/Bombay_78.jpg')
Then we’ll pass it to learn.predict
:
learn.predict(fname)
('Bombay',
tensor(3),
tensor([4.5245e-03, 2.1494e-02, 1.6617e-03, 8.9011e-01, 7.3626e-03, 2.7623e-03,
8.2243e-03, 5.4719e-04, 1.5395e-02, 4.0709e-02, 1.1661e-03, 6.1632e-04,
5.8105e-05, 2.4480e-04, 4.3024e-04, 1.3087e-04, 8.1660e-06, 1.1279e-04,
6.7830e-05, 5.6823e-05, 1.3155e-04, 1.8699e-04, 1.8285e-05, 1.0410e-04,
5.0583e-05, 3.9477e-05, 3.3636e-04, 1.6307e-03, 1.4086e-04, 1.8464e-05,
3.1109e-05, 2.0261e-04, 1.7899e-04, 3.4765e-04, 6.8315e-04, 1.9418e-04,
2.2250e-05]))
You can see that as part of the predictions we received the decoded model label, the argmax’d probability and the softmax’d probabilities. The label comes from learn.dls.vocab
and it takes each argmax
’d class number to decode them through to the proper class name.
Depending on the problem the class list may not always exist in learn.dls.vocab
. The prime example for this would be on text problems when there exists a vocab
that converts each substring or character into a number. The best way to make sure you have access to the right vocab is to either look at learn.dls.categorize.vocab
or learn.dls.multicategorize.vocab
depending on your problem!
Using get_preds
The second “in house” option provided by fastai
is get_preds
. If we consider predict
to be an option for batches of 1, get_preds
is an option for any number of batches of any size.
In terms of API abstraction, predict
wraps around get_preds
to perform what it needs to do.
Typically the cycle for get_preds
is:
- Use
Learner.test_dl
to create a “test dataloader” to use - Use
Learner.get_preds
to gather raw predictions from the model - Perform postprocessing on the predictions to make them “human readable”
Test DataLoader
noun
A fastai DataLoader which utilizes the validation transforms when sending data through the pipelines
In deployment we utilize test dataloaders for data preprocessing.
First we need to create a test_dl
. This involves passing in a list of items we want to perform inference on. To mimic what we just did a moment ago this should be a list of one:
= learn.dls.test_dl([fname]) dl
Next we can use the learn.get_preds
function and pass in our new DataLoader
to generate outputs:
= learn.get_preds(dl=dl)[0] preds
Finally to get all useful information from it we can get the raw predictions, the softmax’d predictions, and the actual labels:
= preds.softmax(dim=1)
softmax = preds.argmax(dim=1)
argmax = [learn.dls.vocab[pred] for pred in argmax]
labels softmax, argmax, labels
(tensor([[0.0261, 0.0265, 0.0260, 0.0632, 0.0261, 0.0260, 0.0262, 0.0260, 0.0263,
0.0270, 0.0260, 0.0260, 0.0259, 0.0259, 0.0260, 0.0259, 0.0259, 0.0259,
0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0260,
0.0260, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0260, 0.0260, 0.0259,
0.0259]]), tensor([3]), ['Bombay'])
Common Pitfalls
Let’s discuss some common pitfalls and hyper-optimizations we can perform without going outside the realm of fastai
too much.
First, currently get_preds
is performing much slower than learn.predict
:
= learn.dls.test_dl([fname])
dl = learn.get_preds(dl=dl) _
723 ms ± 6.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
= learn.predict(fname) _
71.8 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
This is because learn.dls.device
is set to the cpu
so it’s not actually predicting on CUDA!
Let’s switch that:
learn.dls.cuda()
<fastai.data.core.DataLoaders>
= learn.dls.test_dl([fname])
dl = learn.get_preds(dl=dl) _
649 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But wait, that didn’t actually do a whole lot! Why?
Multiprocessing is another slowdown here. In deployment we don’t really want nor need the CPU to do multiprocessing to apply our transforms. It’s wasted resources slowing us down:
= learn.dls.test_dl([fname], num_workers=0)
dl = learn.get_preds(dl=dl) _
39.2 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
And just like that we’ve reduced the time to 39.2ms!
The other wasted time is done decoding the predictions to get back the original inputs and more. Most likely you only care about the probabilities which we’ve already seen done.
cuda=False
Another way to avoid the slowdown is when doing load_learner
to pass in cpu=False
so that it will load the model on CUDA
for you automatically and be ready to use.
Takeaway
While this is nice and convient, this comes at the risk of adding fastai
as a dependency, which is a very high risk and not recommended. In the next part of this lesson we will use that model saved with learn.save
and show a better way to perform inference without fastai.