from fastai.vision.all import *Lesson Video:
Introduction
This lesson is focused on deployment. This will include some of what Running with fastai (a future course) will discuss.
Deployment is often the most difficult part for machine learning engineers (MLE’s), especially people who have taken the fastai courses.
Why?
fastai provides a variety of options to perform inference, but only that. There’s many many more parts when it comes to deployment that MLE’s and Software Engineers deal with such as dependencies, code maintainablity, and more which are more challenging with fastai.
This lesson will have three total parts, showcasing three different levels and parts to deployment of a model:
- Using
fastaidirectly for everything. Not wholly recommended, but shown for posterity - Removing
fastaiand fully recreating it. Recommended for longevity - Taking parts 1 and 2 and showcasing them in a fully deployed Hugging Face Space which can handle API interactivity.
Training a Model
Before we can do anything, we need a model to deploy. We’ll use our basic PETs example that we’re extensively familiar with now:
path = untar_data(URLs.PETS)/'images'
fnames = get_image_files(path)
pat = r'/([^/]+)_\d+.*'
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
item_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
bs=64
pets = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
item_tfms=item_tfms,
batch_tfms=batch_tfms
)
dls = pets.dataloaders(path, bs=bs)Next we’ll create a basic vision_learner and utilize one of the timm models.
:::
learn = vision_learner(dls, "vit_tiny_patch16_224")Next we’ll train our model:
learn.fine_tune(1)| epoch | train_loss | valid_loss | time |
|---|---|---|---|
| 0 | 2.604205 | 0.767528 | 00:16 |
| epoch | train_loss | valid_loss | time |
|---|---|---|---|
| 0 | 0.861317 | 0.427109 | 00:15 |
Now we’ll get to the good stuff
Learner.save vs Learner.export
One very common misconception in fastai is the difference between learn.save and learn.export:
| Operation | save |
export |
|---|---|---|
| Saves model weights | ✅ | ✅ |
| Can save optimizer state | ✅ | ✅ |
| Saves data | ❌ | ❌ |
Saves DataLoader transforms |
❌ | ✅ |
This now begs the question, when to use which?
Learner.save: Saves the current model and optimizer states as a checkpoint in raw PyTorch and nothing elseLearner.export: Saves the current model, optimizer states, and emptyDataLoadersfor production.
What does that last part mean?
Remember back in lesson 2 when we looked at using raw fastai transforms to preprocess? The transforms are saved seperately than the data itself. They’re Pipelines, or directions on how to apply fastai transforms on data passed in. As a result fastai dumps the contents of the DataLoaders by creating new empty ones to use that store how the data was brought in and the transforms applied to them.
For this lesson we will perform both learn.save and learn.export. export will be used in this part as it’s easier to go from A -> B, and save for the next part as we rip away (or try to rip away) fastai in its entirety:
learn.export("exported_fastai")
learn.save("exported_model", with_opt=False)Path('models/exported_model.pth')
Performing Inference
fastai offers two different methods for inference natively:
Learner.predictLearner.get_preds
The first of which is meant for when you’re performing inference on a single image at a time, where as the latter is designed to be multiples of images.
In the real world, most jobs with your model will be utilizing batch-wise inference and as a result Learner.get_preds instead.
We’ll showcase each of them and how they differ, as well as common pitfalls when utilizing each of them.
First we’ll bring back in the model:
learn = load_learner("exported_fastai")Using predict
Advantages:
- Easiest to use
- Provides readable outputs
Disadvantages:
- Easy to make computationally inefficient
- Can only do one singular input at a time
learn.predict is very straightforward. Simply pass in a filename or input to use that is in the format of the training or validation data to the method and it will generate predictions with transforms based on the validation set and decode the outputs to something human readable
predict
predict expects the data to be something our original get_x can use. As a result this would translate to something we can apply PILImage.create on. This is a very common misnomer and frustration. If you are trying to use predict and get an error about “unsupported” parts, try removing a step you are performing to bring it in before you pass it to predict (such as pass in the raw bytes to learn.predict rather than decoding them first)
Let’s try this now.
First we’ll use a single image from our dataset:
fname = fnames[0]; fnamePath('/root/.fastai/data/oxford-iiit-pet/images/Bombay_78.jpg')
Then we’ll pass it to learn.predict:
learn.predict(fname)('Bombay',
tensor(3),
tensor([4.5245e-03, 2.1494e-02, 1.6617e-03, 8.9011e-01, 7.3626e-03, 2.7623e-03,
8.2243e-03, 5.4719e-04, 1.5395e-02, 4.0709e-02, 1.1661e-03, 6.1632e-04,
5.8105e-05, 2.4480e-04, 4.3024e-04, 1.3087e-04, 8.1660e-06, 1.1279e-04,
6.7830e-05, 5.6823e-05, 1.3155e-04, 1.8699e-04, 1.8285e-05, 1.0410e-04,
5.0583e-05, 3.9477e-05, 3.3636e-04, 1.6307e-03, 1.4086e-04, 1.8464e-05,
3.1109e-05, 2.0261e-04, 1.7899e-04, 3.4765e-04, 6.8315e-04, 1.9418e-04,
2.2250e-05]))
You can see that as part of the predictions we received the decoded model label, the argmax’d probability and the softmax’d probabilities. The label comes from learn.dls.vocab and it takes each argmax’d class number to decode them through to the proper class name.
Depending on the problem the class list may not always exist in learn.dls.vocab. The prime example for this would be on text problems when there exists a vocab that converts each substring or character into a number. The best way to make sure you have access to the right vocab is to either look at learn.dls.categorize.vocab or learn.dls.multicategorize.vocab depending on your problem!
Using get_preds
The second “in house” option provided by fastai is get_preds. If we consider predict to be an option for batches of 1, get_preds is an option for any number of batches of any size.
In terms of API abstraction, predict wraps around get_preds to perform what it needs to do.
Typically the cycle for get_preds is:
- Use
Learner.test_dlto create a “test dataloader” to use - Use
Learner.get_predsto gather raw predictions from the model - Perform postprocessing on the predictions to make them “human readable”
Test DataLoader
noun
A fastai DataLoader which utilizes the validation transforms when sending data through the pipelines
In deployment we utilize test dataloaders for data preprocessing.
First we need to create a test_dl. This involves passing in a list of items we want to perform inference on. To mimic what we just did a moment ago this should be a list of one:
dl = learn.dls.test_dl([fname])Next we can use the learn.get_preds function and pass in our new DataLoader to generate outputs:
preds = learn.get_preds(dl=dl)[0]Finally to get all useful information from it we can get the raw predictions, the softmax’d predictions, and the actual labels:
softmax = preds.softmax(dim=1)
argmax = preds.argmax(dim=1)
labels = [learn.dls.vocab[pred] for pred in argmax]
softmax, argmax, labels(tensor([[0.0261, 0.0265, 0.0260, 0.0632, 0.0261, 0.0260, 0.0262, 0.0260, 0.0263,
0.0270, 0.0260, 0.0260, 0.0259, 0.0259, 0.0260, 0.0259, 0.0259, 0.0259,
0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0260,
0.0260, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0260, 0.0260, 0.0259,
0.0259]]), tensor([3]), ['Bombay'])
Common Pitfalls
Let’s discuss some common pitfalls and hyper-optimizations we can perform without going outside the realm of fastai too much.
First, currently get_preds is performing much slower than learn.predict:
dl = learn.dls.test_dl([fname])
_ = learn.get_preds(dl=dl)723 ms ± 6.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
_ = learn.predict(fname)71.8 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
This is because learn.dls.device is set to the cpu so it’s not actually predicting on CUDA!
Let’s switch that:
learn.dls.cuda()<fastai.data.core.DataLoaders>
dl = learn.dls.test_dl([fname])
_ = learn.get_preds(dl=dl)649 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But wait, that didn’t actually do a whole lot! Why?
Multiprocessing is another slowdown here. In deployment we don’t really want nor need the CPU to do multiprocessing to apply our transforms. It’s wasted resources slowing us down:
dl = learn.dls.test_dl([fname], num_workers=0)
_ = learn.get_preds(dl=dl)39.2 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
And just like that we’ve reduced the time to 39.2ms!
The other wasted time is done decoding the predictions to get back the original inputs and more. Most likely you only care about the probabilities which we’ve already seen done.
cuda=False
Another way to avoid the slowdown is when doing load_learner to pass in cpu=False so that it will load the model on CUDA for you automatically and be ready to use.
Takeaway
While this is nice and convient, this comes at the risk of adding fastai as a dependency, which is a very high risk and not recommended. In the next part of this lesson we will use that model saved with learn.save and show a better way to perform inference without fastai.