Walk with fastai - Recognizing Unknown Images, or the Unknown Label Problem

Lesson Video:

Introduction

So far we’ve seen applications where you have a single image on a label and multiple images on a label. This lesson will be a mixture of the two as we try to address a real-world problem:

If a machine learning model has n outputs and is designed to tell that there will always be one correct answer, what do you do if an image has no right answer?

Let’s imagine a scenario where we trained a dog and cat classifier. What would happen if we gave it a picture of a bird? The raw probabilities out of the model would still sum to 1 for dog or cat, and if we took the argmax it would return either dog or cat, even though it’s neither! So, what can we do?

Fake it til you make it, turn single label to multilabel

The answer is through using multi-label classification. The way it worked in our previous lesson is BCE would find and return the probability that any of the n labels were present and we converted this to a boolean tensor from 0-n. This also means that we could have a tensor of size n where every answer is False, meaning no recognizable classes are there!

This is exactly what we will perform today.

First let’s import fastai and download the PETs dataset again:

from fastai.vision.all import *

Below are the exact imports from what we are using today:

import torch
from torch import tensor
from torchvision.models.resnet import resnet34
import requests

import pandas as pd
from fastcore.transform import Pipeline
from fastcore.xtras import Path # @patched Pathlib.path

from fastai.data.core import Datasets
from fastai.data.block import DataBlock, ImageBlock, MultiCategoryBlock
from fastai.data.external import URLs, untar_data
from fastai.data.transforms import (
    ColReader,
    IntToFloatTensor, 
    MultiCategorize,
    Normalize, 
    OneHotEncode, 
    RandomSplitter,
    RegexLabeller,
    get_image_files
)

from fastai.metrics import accuracy_multi

from fastai.vision.augment import aug_transforms, RandomResizedCrop
from fastai.vision.core import PILImage
from fastai.vision.learner import vision_learner
from fastai.learner import Learner
from fastai.callback.schedule import Learner # @patched Learner functions like lr_find and fit_one_cycle

path = untar_data(URLs.PETS)/'images'

Next we’ll bring back our code we used originally to create our DataBlock:

fnames = get_image_files(path/'images')
pat = r'(.+)_\d+.jpg$'
item_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
bs=64

pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
                 item_tfms=item_tfms,
                 batch_tfms=batch_tfms)

Now currently it’s setup for our single-label classification again. So how do we turn this into multilable? We instead use the MultiCategoryBlock and add in a simple function that converts our label into a list of labels (or a list of a single label) for us to use:

def label_to_list(o): return [o]

Then we’ll pass this along into our new DataBlock:

multi_pets = DataBlock(
    blocks=(ImageBlock, MultiCategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(),
    get_y=Pipeline(
        [RegexLabeller(pat = r'/([^/]+)_\d+.*'), label_to_list]
    ),
    item_tfms=item_tfms,
    batch_tfms=batch_tfms
)

When using the DataBlock API and you have a sequence of items the get_y should follow, these must be wrapped in a Pipeline class to work.

And we can create some new DataLoaders:

dls = multi_pets.dataloaders(path, bs=32)

dls.show_batch()

Now overall these don’t look that different right? This is because each label is still a single label. What we’ve really changed here is how vision_learner will read in our DataLoaders object and understand what to do with the outputs.

Let’s recreate what we’ve just done here as Datasets as well to understand what the API change would be:

train_idxs, valid_idxs = RandomSplitter()(get_image_files(path))

tfms = [
    [PILImage.create],
    [
        RegexLabeller(pat = r'/([^/]+)_\d+.*'),
        label_to_list,
        MultiCategorize(vocab=list(dls.vocab)),
        OneHotEncode(len(dls.vocab))
    ]
]

dsets = Datasets(get_image_files(path), tfms=tfms, splits=[train_idxs, valid_idxs])

dsets[0]

(PILImage mode=RGB size=500x333,
 TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0.]))

Note

Notice here that there is only one index that contains a 1 value in our label, which corresponds to our original label after its been encoded

Then we’ll create some DataLoaders:

dls = dsets.dataloaders(
    after_item=[ToTensor(), RandomResizedCrop(460, min_scale=.75)],
    after_batch=[IntToFloatTensor(), *aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)],
    bs=32
)

And we can verify it still looks the same:

dls.show_batch()

Great! Now we can train our model

Train the model!

Next we’ll train the model deviating ever so slightly to the previous lesson’s advice, with an important reason:

learn = vision_learner(dls, resnet34, metrics=[partial(accuracy_multi, thresh=0.95)])

Why do we suddenly use a different thresh?

I just told you that we should make sure the metric and loss functions thresholds should align, but yet here the loss function is 0.5 and the metric is 0.95, why? This is because when we deploy the model we will make sure that it’s set to 0.95, but during training we don’t want to bias the model towards extreme predictions.

learn.fine_tune(4, 2e-3)

epoch	train_loss	valid_loss	accuracy_multi	time
0	0.412115	0.069953	0.973741	00:35

epoch	train_loss	valid_loss	accuracy_multi	time
0	0.052030	0.024630	0.981823	00:41
1	0.025953	0.014660	0.989120	00:41
2	0.016864	0.011797	0.991241	00:41
3	0.010502	0.010177	0.992301	00:41

Model Evaluation

All that’s left is to make sure our model doesn’t know the difference between a donkey and some dogs or cats!

To do so (and so our exported Learner knows what we’re doing), let’s manually change the loss function’s threshold to be what we want:

learn.loss_func.thresh = 0.95

Then we can try predicting on an image from one of our classes:

PERSIAN_CAT_URL = "https://azure.wgp-cdn.co.uk/app-yourcat/posts/iStock-174776419-1.jpg"

response = requests.get(PERSIAN_CAT_URL)
im = PILImage.create(response.content)

im.show();

learn.predict(im)[0]

(#1) ['Persian']

We can see it only returned one label. What happens if we try a donkey?

I chose a donkey here because it’s an animal very unlike a cat or a dog, but try your own image! To learn more about this sort of idea, look up the “Hotdog, not hotdog” problem

Code
Code + Explanation

DONKEY_URL = "https://cdn.britannica.com/68/143568-050-5246474F/Donkey.jpg"
response = requests.get(DONKEY_URL)
learn.predict(response.content)[0]

(#0) []

DONKEY_URL = "https://cdn.britannica.com/68/143568-050-5246474F/Donkey.jpg"
response = requests.get(DONKEY_URL)
learn.predict(response.content)[0]

(#0) []

response = requests.get(DONKEY_URL)
learn.predict(response.content)[0]

predict knows how to work with a large variety of data, including the raw bytes that get returned from requests.

We successfully didn’t find a cat or dog!

You can follow the same steps as shown in the previous notebook from this lesson to get these predictions without the use of fastai, it still has the exact same pipeline!