Deploying with fastai

Baby Steps
Lesson 5

Lesson Video:

Introduction

This lesson is focused on deployment. This will include some of what Running with fastai (a future course) will discuss.

Deployment is often the most difficult part for machine learning engineers (MLE’s), especially people who have taken the fastai courses.

Why?

fastai provides a variety of options to perform inference, but only that. There’s many many more parts when it comes to deployment that MLE’s and Software Engineers deal with such as dependencies, code maintainablity, and more which are more challenging with fastai.

This lesson will have three total parts, showcasing three different levels and parts to deployment of a model:

  1. Using fastai directly for everything. Not wholly recommended, but shown for posterity
  2. Removing fastai and fully recreating it. Recommended for longevity
  3. Taking parts 1 and 2 and showcasing them in a fully deployed Hugging Face Space which can handle API interactivity.

Training a Model

Before we can do anything, we need a model to deploy. We’ll use our basic PETs example that we’re extensively familiar with now:

from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'
fnames = get_image_files(path)
pat = r'/([^/]+)_\d+.*'
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
item_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
bs=64

pets = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
     get_items=get_image_files,
     splitter=RandomSplitter(),
     get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
     item_tfms=item_tfms,
     batch_tfms=batch_tfms
)
dls = pets.dataloaders(path, bs=bs)

Next we’ll create a basic vision_learner and utilize one of the timm models.

:::

learn = vision_learner(dls, "vit_tiny_patch16_224")

Next we’ll train our model:

learn.fine_tune(1)
epoch train_loss valid_loss time
0 2.604205 0.767528 00:16
epoch train_loss valid_loss time
0 0.861317 0.427109 00:15

Now we’ll get to the good stuff

Learner.save vs Learner.export

One very common misconception in fastai is the difference between learn.save and learn.export:

Operation save export
Saves model weights
Can save optimizer state
Saves data
Saves DataLoader transforms

This now begs the question, when to use which?

  • Learner.save: Saves the current model and optimizer states as a checkpoint in raw PyTorch and nothing else
  • Learner.export: Saves the current model, optimizer states, and empty DataLoaders for production.

What does that last part mean?

Remember back in lesson 2 when we looked at using raw fastai transforms to preprocess? The transforms are saved seperately than the data itself. They’re Pipelines, or directions on how to apply fastai transforms on data passed in. As a result fastai dumps the contents of the DataLoaders by creating new empty ones to use that store how the data was brought in and the transforms applied to them.

For this lesson we will perform both learn.save and learn.export. export will be used in this part as it’s easier to go from A -> B, and save for the next part as we rip away (or try to rip away) fastai in its entirety:

learn.export("exported_fastai")
learn.save("exported_model", with_opt=False)
Path('models/exported_model.pth')

Performing Inference

fastai offers two different methods for inference natively:

  • Learner.predict
  • Learner.get_preds

The first of which is meant for when you’re performing inference on a single image at a time, where as the latter is designed to be multiples of images.

Tip

In the real world, most jobs with your model will be utilizing batch-wise inference and as a result Learner.get_preds instead.

We’ll showcase each of them and how they differ, as well as common pitfalls when utilizing each of them.

First we’ll bring back in the model:

learn = load_learner("exported_fastai")

Using predict

Advantages:

  • Easiest to use
  • Provides readable outputs

Disadvantages:

  • Easy to make computationally inefficient
  • Can only do one singular input at a time

learn.predict is very straightforward. Simply pass in a filename or input to use that is in the format of the training or validation data to the method and it will generate predictions with transforms based on the validation set and decode the outputs to something human readable

Inputs to predict

predict expects the data to be something our original get_x can use. As a result this would translate to something we can apply PILImage.create on. This is a very common misnomer and frustration. If you are trying to use predict and get an error about “unsupported” parts, try removing a step you are performing to bring it in before you pass it to predict (such as pass in the raw bytes to learn.predict rather than decoding them first)

Let’s try this now.

First we’ll use a single image from our dataset:

fname = fnames[0]; fname
Path('/root/.fastai/data/oxford-iiit-pet/images/Bombay_78.jpg')

Then we’ll pass it to learn.predict:

learn.predict(fname)
('Bombay',
 tensor(3),
 tensor([4.5245e-03, 2.1494e-02, 1.6617e-03, 8.9011e-01, 7.3626e-03, 2.7623e-03,
         8.2243e-03, 5.4719e-04, 1.5395e-02, 4.0709e-02, 1.1661e-03, 6.1632e-04,
         5.8105e-05, 2.4480e-04, 4.3024e-04, 1.3087e-04, 8.1660e-06, 1.1279e-04,
         6.7830e-05, 5.6823e-05, 1.3155e-04, 1.8699e-04, 1.8285e-05, 1.0410e-04,
         5.0583e-05, 3.9477e-05, 3.3636e-04, 1.6307e-03, 1.4086e-04, 1.8464e-05,
         3.1109e-05, 2.0261e-04, 1.7899e-04, 3.4765e-04, 6.8315e-04, 1.9418e-04,
         2.2250e-05]))

You can see that as part of the predictions we received the decoded model label, the argmax’d probability and the softmax’d probabilities. The label comes from learn.dls.vocab and it takes each argmax’d class number to decode them through to the proper class name.

Vocab

Depending on the problem the class list may not always exist in learn.dls.vocab. The prime example for this would be on text problems when there exists a vocab that converts each substring or character into a number. The best way to make sure you have access to the right vocab is to either look at learn.dls.categorize.vocab or learn.dls.multicategorize.vocab depending on your problem!

Using get_preds

The second “in house” option provided by fastai is get_preds. If we consider predict to be an option for batches of 1, get_preds is an option for any number of batches of any size.

In terms of API abstraction, predict wraps around get_preds to perform what it needs to do.

Typically the cycle for get_preds is:

  1. Use Learner.test_dl to create a “test dataloader” to use
  2. Use Learner.get_preds to gather raw predictions from the model
  3. Perform postprocessing on the predictions to make them “human readable”

Test DataLoader

noun

A fastai DataLoader which utilizes the validation transforms when sending data through the pipelines

In deployment we utilize test dataloaders for data preprocessing.

First we need to create a test_dl. This involves passing in a list of items we want to perform inference on. To mimic what we just did a moment ago this should be a list of one:

dl = learn.dls.test_dl([fname])

Next we can use the learn.get_preds function and pass in our new DataLoader to generate outputs:

preds = learn.get_preds(dl=dl)[0]

Finally to get all useful information from it we can get the raw predictions, the softmax’d predictions, and the actual labels:

softmax = preds.softmax(dim=1)
argmax = preds.argmax(dim=1)
labels = [learn.dls.vocab[pred] for pred in argmax]
softmax, argmax, labels
(tensor([[0.0261, 0.0265, 0.0260, 0.0632, 0.0261, 0.0260, 0.0262, 0.0260, 0.0263,
          0.0270, 0.0260, 0.0260, 0.0259, 0.0259, 0.0260, 0.0259, 0.0259, 0.0259,
          0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0260,
          0.0260, 0.0259, 0.0259, 0.0259, 0.0259, 0.0259, 0.0260, 0.0260, 0.0259,
          0.0259]]), tensor([3]), ['Bombay'])

Common Pitfalls

Let’s discuss some common pitfalls and hyper-optimizations we can perform without going outside the realm of fastai too much.

First, currently get_preds is performing much slower than learn.predict:

dl = learn.dls.test_dl([fname])
_ = learn.get_preds(dl=dl)
723 ms ± 6.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
_ = learn.predict(fname)
71.8 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This is because learn.dls.device is set to the cpu so it’s not actually predicting on CUDA!

Let’s switch that:

learn.dls.cuda()
<fastai.data.core.DataLoaders>
dl = learn.dls.test_dl([fname])
_ = learn.get_preds(dl=dl)
649 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But wait, that didn’t actually do a whole lot! Why?

Multiprocessing is another slowdown here. In deployment we don’t really want nor need the CPU to do multiprocessing to apply our transforms. It’s wasted resources slowing us down:

dl = learn.dls.test_dl([fname], num_workers=0)
_ = learn.get_preds(dl=dl)
39.2 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

And just like that we’ve reduced the time to 39.2ms!

The other wasted time is done decoding the predictions to get back the original inputs and more. Most likely you only care about the probabilities which we’ve already seen done.

Using cuda=False

Another way to avoid the slowdown is when doing load_learner to pass in cpu=False so that it will load the model on CUDA for you automatically and be ready to use.

Takeaway

While this is nice and convient, this comes at the risk of adding fastai as a dependency, which is a very high risk and not recommended. In the next part of this lesson we will use that model saved with learn.save and show a better way to perform inference without fastai.