from fastai.vision.all import *Lesson Video:
Introduction
So far we’ve seen applications where you have a single image on a label and multiple images on a label. This lesson will be a mixture of the two as we try to address a real-world problem:
- If a machine learning model has
noutputs and is designed to tell that there will always be one correct answer, what do you do if an image has no right answer?
Let’s imagine a scenario where we trained a dog and cat classifier. What would happen if we gave it a picture of a bird? The raw probabilities out of the model would still sum to 1 for dog or cat, and if we took the argmax it would return either dog or cat, even though it’s neither! So, what can we do?
Fake it til you make it, turn single label to multilabel
The answer is through using multi-label classification. The way it worked in our previous lesson is BCE would find and return the probability that any of the n labels were present and we converted this to a boolean tensor from 0-n. This also means that we could have a tensor of size n where every answer is False, meaning no recognizable classes are there!
This is exactly what we will perform today.
First let’s import fastai and download the PETs dataset again:
Below are the exact imports from what we are using today:
import torch
from torch import tensor
from torchvision.models.resnet import resnet34
import requests
import pandas as pd
from fastcore.transform import Pipeline
from fastcore.xtras import Path # @patched Pathlib.path
from fastai.data.core import Datasets
from fastai.data.block import DataBlock, ImageBlock, MultiCategoryBlock
from fastai.data.external import URLs, untar_data
from fastai.data.transforms import (
ColReader,
IntToFloatTensor,
MultiCategorize,
Normalize,
OneHotEncode,
RandomSplitter,
RegexLabeller,
get_image_files
)
from fastai.metrics import accuracy_multi
from fastai.vision.augment import aug_transforms, RandomResizedCrop
from fastai.vision.core import PILImage
from fastai.vision.learner import vision_learner
from fastai.learner import Learner
from fastai.callback.schedule import Learner # @patched Learner functions like lr_find and fit_one_cyclepath = untar_data(URLs.PETS)/'images'Next we’ll bring back our code we used originally to create our DataBlock:
fnames = get_image_files(path/'images')
pat = r'(.+)_\d+.jpg$'
item_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
bs=64pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
item_tfms=item_tfms,
batch_tfms=batch_tfms)Now currently it’s setup for our single-label classification again. So how do we turn this into multilable? We instead use the MultiCategoryBlock and add in a simple function that converts our label into a list of labels (or a list of a single label) for us to use:
def label_to_list(o): return [o]Then we’ll pass this along into our new DataBlock:
multi_pets = DataBlock(
blocks=(ImageBlock, MultiCategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=Pipeline(
[RegexLabeller(pat = r'/([^/]+)_\d+.*'), label_to_list]
),
item_tfms=item_tfms,
batch_tfms=batch_tfms
)When using the DataBlock API and you have a sequence of items the get_y should follow, these must be wrapped in a Pipeline class to work.
And we can create some new DataLoaders:
dls = multi_pets.dataloaders(path, bs=32)dls.show_batch()
Now overall these don’t look that different right? This is because each label is still a single label. What we’ve really changed here is how vision_learner will read in our DataLoaders object and understand what to do with the outputs.
Let’s recreate what we’ve just done here as Datasets as well to understand what the API change would be:
train_idxs, valid_idxs = RandomSplitter()(get_image_files(path))tfms = [
[PILImage.create],
[
RegexLabeller(pat = r'/([^/]+)_\d+.*'),
label_to_list,
MultiCategorize(vocab=list(dls.vocab)),
OneHotEncode(len(dls.vocab))
]
]dsets = Datasets(get_image_files(path), tfms=tfms, splits=[train_idxs, valid_idxs])dsets[0](PILImage mode=RGB size=500x333,
TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.]))
Notice here that there is only one index that contains a 1 value in our label, which corresponds to our original label after its been encoded
Then we’ll create some DataLoaders:
dls = dsets.dataloaders(
after_item=[ToTensor(), RandomResizedCrop(460, min_scale=.75)],
after_batch=[IntToFloatTensor(), *aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)],
bs=32
)And we can verify it still looks the same:
dls.show_batch()
Great! Now we can train our model
Train the model!
Next we’ll train the model deviating ever so slightly to the previous lesson’s advice, with an important reason:
learn = vision_learner(dls, resnet34, metrics=[partial(accuracy_multi, thresh=0.95)])Why do we suddenly use a different thresh?
I just told you that we should make sure the metric and loss functions thresholds should align, but yet here the loss function is 0.5 and the metric is 0.95, why? This is because when we deploy the model we will make sure that it’s set to 0.95, but during training we don’t want to bias the model towards extreme predictions.
learn.fine_tune(4, 2e-3)| epoch | train_loss | valid_loss | accuracy_multi | time |
|---|---|---|---|---|
| 0 | 0.412115 | 0.069953 | 0.973741 | 00:35 |
| epoch | train_loss | valid_loss | accuracy_multi | time |
|---|---|---|---|---|
| 0 | 0.052030 | 0.024630 | 0.981823 | 00:41 |
| 1 | 0.025953 | 0.014660 | 0.989120 | 00:41 |
| 2 | 0.016864 | 0.011797 | 0.991241 | 00:41 |
| 3 | 0.010502 | 0.010177 | 0.992301 | 00:41 |
Model Evaluation
All that’s left is to make sure our model doesn’t know the difference between a donkey and some dogs or cats!
To do so (and so our exported Learner knows what we’re doing), let’s manually change the loss function’s threshold to be what we want:
learn.loss_func.thresh = 0.95Then we can try predicting on an image from one of our classes:
PERSIAN_CAT_URL = "https://azure.wgp-cdn.co.uk/app-yourcat/posts/iStock-174776419-1.jpg"response = requests.get(PERSIAN_CAT_URL)
im = PILImage.create(response.content)im.show();
learn.predict(im)[0](#1) ['Persian']
We can see it only returned one label. What happens if we try a donkey?
I chose a donkey here because it’s an animal very unlike a cat or a dog, but try your own image! To learn more about this sort of idea, look up the “Hotdog, not hotdog” problem
DONKEY_URL = "https://cdn.britannica.com/68/143568-050-5246474F/Donkey.jpg"
response = requests.get(DONKEY_URL)
learn.predict(response.content)[0](#0) []
DONKEY_URL = "https://cdn.britannica.com/68/143568-050-5246474F/Donkey.jpg"
response = requests.get(DONKEY_URL)
learn.predict(response.content)[0](#0) []
response = requests.get(DONKEY_URL)
learn.predict(response.content)[0]predict knows how to work with a large variety of data, including the raw bytes that get returned from requests.
We successfully didn’t find a cat or dog!
You can follow the same steps as shown in the previous notebook from this lesson to get these predictions without the use of fastai, it still has the exact same pipeline!