from fastai.vision.all import *
Lesson Video:
Introduction
So far we’ve seen applications where you have a single image on a label and multiple images on a label. This lesson will be a mixture of the two as we try to address a real-world problem:
- If a machine learning model has
n
outputs and is designed to tell that there will always be one correct answer, what do you do if an image has no right answer?
Let’s imagine a scenario where we trained a dog and cat classifier. What would happen if we gave it a picture of a bird? The raw probabilities out of the model would still sum to 1 for dog
or cat
, and if we took the argmax
it would return either dog
or cat
, even though it’s neither! So, what can we do?
Fake it til you make it, turn single label to multilabel
The answer is through using multi-label classification. The way it worked in our previous lesson is BCE would find and return the probability that any of the n
labels were present and we converted this to a boolean tensor from 0-n
. This also means that we could have a tensor of size n
where every answer is False
, meaning no recognizable classes are there!
This is exactly what we will perform today.
First let’s import fastai and download the PETs
dataset again:
Below are the exact imports from what we are using today:
import torch
from torch import tensor
from torchvision.models.resnet import resnet34
import requests
import pandas as pd
from fastcore.transform import Pipeline
from fastcore.xtras import Path # @patched Pathlib.path
from fastai.data.core import Datasets
from fastai.data.block import DataBlock, ImageBlock, MultiCategoryBlock
from fastai.data.external import URLs, untar_data
from fastai.data.transforms import (
ColReader,
IntToFloatTensor,
MultiCategorize,
Normalize,
OneHotEncode,
RandomSplitter,
RegexLabeller,
get_image_files
)
from fastai.metrics import accuracy_multi
from fastai.vision.augment import aug_transforms, RandomResizedCrop
from fastai.vision.core import PILImage
from fastai.vision.learner import vision_learner
from fastai.learner import Learner
from fastai.callback.schedule import Learner # @patched Learner functions like lr_find and fit_one_cycle
= untar_data(URLs.PETS)/'images' path
Next we’ll bring back our code we used originally to create our DataBlock
:
= get_image_files(path/'images')
fnames = r'(.+)_\d+.jpg$'
pat = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
item_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
batch_tfms =64 bs
= DataBlock(blocks=(ImageBlock, CategoryBlock),
pets =get_image_files,
get_items=RandomSplitter(),
splitter=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
get_y=item_tfms,
item_tfms=batch_tfms) batch_tfms
Now currently it’s setup for our single-label classification again. So how do we turn this into multilable? We instead use the MultiCategoryBlock
and add in a simple function that converts our label into a list of labels (or a list of a single label) for us to use:
def label_to_list(o): return [o]
Then we’ll pass this along into our new DataBlock
:
= DataBlock(
multi_pets =(ImageBlock, MultiCategoryBlock),
blocks=get_image_files,
get_items=RandomSplitter(),
splitter=Pipeline(
get_y= r'/([^/]+)_\d+.*'), label_to_list]
[RegexLabeller(pat
),=item_tfms,
item_tfms=batch_tfms
batch_tfms )
When using the DataBlock
API and you have a sequence of items the get_y
should follow, these must be wrapped in a Pipeline
class to work.
And we can create some new DataLoaders
:
= multi_pets.dataloaders(path, bs=32) dls
dls.show_batch()
Now overall these don’t look that different right? This is because each label is still a single label. What we’ve really changed here is how vision_learner
will read in our DataLoaders
object and understand what to do with the outputs.
Let’s recreate what we’ve just done here as Datasets
as well to understand what the API change would be:
= RandomSplitter()(get_image_files(path)) train_idxs, valid_idxs
= [
tfms
[PILImage.create],
[= r'/([^/]+)_\d+.*'),
RegexLabeller(pat
label_to_list,=list(dls.vocab)),
MultiCategorize(vocablen(dls.vocab))
OneHotEncode(
] ]
= Datasets(get_image_files(path), tfms=tfms, splits=[train_idxs, valid_idxs]) dsets
0] dsets[
(PILImage mode=RGB size=500x333,
TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.]))
Notice here that there is only one index that contains a 1
value in our label, which corresponds to our original label after its been encoded
Then we’ll create some DataLoaders
:
= dsets.dataloaders(
dls =[ToTensor(), RandomResizedCrop(460, min_scale=.75)],
after_item=[IntToFloatTensor(), *aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)],
after_batch=32
bs )
And we can verify it still looks the same:
dls.show_batch()
Great! Now we can train our model
Train the model!
Next we’ll train the model deviating ever so slightly to the previous lesson’s advice, with an important reason:
= vision_learner(dls, resnet34, metrics=[partial(accuracy_multi, thresh=0.95)]) learn
Why do we suddenly use a different thresh?
I just told you that we should make sure the metric and loss functions thresholds should align, but yet here the loss function is 0.5 and the metric is 0.95, why? This is because when we deploy the model we will make sure that it’s set to 0.95, but during training we don’t want to bias the model towards extreme predictions.
4, 2e-3) learn.fine_tune(
epoch | train_loss | valid_loss | accuracy_multi | time |
---|---|---|---|---|
0 | 0.412115 | 0.069953 | 0.973741 | 00:35 |
epoch | train_loss | valid_loss | accuracy_multi | time |
---|---|---|---|---|
0 | 0.052030 | 0.024630 | 0.981823 | 00:41 |
1 | 0.025953 | 0.014660 | 0.989120 | 00:41 |
2 | 0.016864 | 0.011797 | 0.991241 | 00:41 |
3 | 0.010502 | 0.010177 | 0.992301 | 00:41 |
Model Evaluation
All that’s left is to make sure our model doesn’t know the difference between a donkey and some dogs or cats!
To do so (and so our exported Learner
knows what we’re doing), let’s manually change the loss function’s threshold to be what we want:
= 0.95 learn.loss_func.thresh
Then we can try predicting on an image from one of our classes:
= "https://azure.wgp-cdn.co.uk/app-yourcat/posts/iStock-174776419-1.jpg" PERSIAN_CAT_URL
= requests.get(PERSIAN_CAT_URL)
response = PILImage.create(response.content) im
; im.show()
0] learn.predict(im)[
(#1) ['Persian']
We can see it only returned one label. What happens if we try a donkey?
I chose a donkey here because it’s an animal very unlike a cat or a dog, but try your own image! To learn more about this sort of idea, look up the “Hotdog, not hotdog” problem
= "https://cdn.britannica.com/68/143568-050-5246474F/Donkey.jpg"
DONKEY_URL = requests.get(DONKEY_URL)
response 0] learn.predict(response.content)[
(#0) []
= "https://cdn.britannica.com/68/143568-050-5246474F/Donkey.jpg"
DONKEY_URL = requests.get(DONKEY_URL)
response 0] learn.predict(response.content)[
(#0) []
= requests.get(DONKEY_URL)
response 0] learn.predict(response.content)[
predict
knows how to work with a large variety of data, including the raw bytes that get returned from requests
.
We successfully didn’t find a cat or dog!
You can follow the same steps as shown in the previous notebook from this lesson to get these predictions without the use of fastai, it still has the exact same pipeline!