Lesson Video:

What is our goal today?

  • Make an image classifier for Cats and Dogs
  • Make our own image classifier
  • Start to understand the DataBlock API

Let's grab the library:

Today we will be using the basics, callback, and vision libraries as we have a vision task

from fastai.basics import *
from fastai.vision.all import *
from fastai.callback.all import *

Below you will find the exact imports for everything we use today

import numpy as np
from fastcore.xtras import Path # @patch'd properties to the Pathlib module

from fastai.callback.fp16 import to_fp16
from fastai.callback.schedule import fit_one_cycle, lr_find 

from fastai.data.block import CategoryBlock, DataBlock
from fastai.data.external import untar_data, URLs
from fastai.data.transforms import get_image_files, Normalize, RandomSplitter, RegexLabeller

from fastai.interpret import ClassificationInterpretation
from fastai.learner import Learner # imports @patch'd properties to Learner including `save`, `load`, `freeze`, and `unfreeze`

from fastai.metrics import error_rate

from fastai.vision.augment import aug_transforms, RandomResizedCrop
from fastai.vision.core import imagenet_stats
from fastai.vision.data import ImageDataLoaders, ImageBlock
from fastai.vision.learner import cnn_learner

from torchvision.models.resnet import resnet34, resnet50

Overall process using machine learning models and fastai:

  1. Make our DataLoaders
  2. Make a Learner with some "equipment"
  3. Train

Looking at Data

We'll be trying to identify between 12 species of cats and 25 species of dogs (37 in total). Five years ago, the best was 59% with seperate classifications for the image, head, and body of the animal. Let's try just doing one image for everything.

But before anything, we need data!

If we call help on untar_data we can see it's doc description

help(untar_data)
Help on function untar_data in module fastai.data.external:

untar_data(url, fname=None, dest=None, c_key='data', force_download=False, extract_func=<function tar_extract at 0x7faf97d5a158>)
    Download `url` to `fname` if `dest` doesn't exist, and un-tgz to folder `dest`.

We can also pull up the source code by adding a ?? at the end:

untar_data??

Let's download the PETS dataset

path = untar_data(URLs.PETS)

And set our seed

np.random.seed(2)

How will our data look?

path.ls()[:3]
(#2) [Path('/root/.fastai/data/oxford-iiit-pet/images'),Path('/root/.fastai/data/oxford-iiit-pet/annotations')]

Let's build a DataLoaders. First we'll need the path to our data, some filenames, and the regex pattern to extract our labels:

path = untar_data(URLs.PETS)
fnames = get_image_files(path/'images')
pat = r'(.+)_\d+.jpg$'

Some basic transforms for getting all of our images the same size (item_tfms), and some augmentations and Normalization to be done on the GPU (batch_tfms)

item_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
bs=64

ImageDataBunch (renamed to ImageDataLoaders) - highest level API

dls = ImageDataLoaders.from_name_re(path, fnames, pat, batch_tfms=batch_tfms, 
                                   item_tfms=item_tfms, bs=bs)

What is the API?

Let's rebuild using the DataBlock API

We'll need to define what our input and outputs should be (An Image and a Category for classification), how to get our items, how to split our data, how to extract our labels, and our augmentation as before

pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
                 item_tfms=item_tfms,
                 batch_tfms=batch_tfms)
path_im = path/'images'
dls = pets.dataloaders(path_im, bs=bs)

We can take a look at a batch of our images using show_batch and pass in a aximum number of images to show, and how large we want to view them as

dls.show_batch(max_n=9, figsize=(6,7))

If we want to see how many classes we have, and the names of them we can simply call dls.vocab. The first is the number of classes, the second is the names of our classes. You may notice this looks a bit odd, that's because this L is a new invention of Jeremy and Sylvian. Essentially it's a Python list taken to the extreme.

Before if we wanted to grab the index for the name of a class (eg. our model output 0 as our class), we would need to use data.c2i to grab the Class2Index mapping. This is still here, it lives in dls.vocab.o2i

dls.vocab
(#37) ['Abyssinian','Bengal','Birman','Bombay','British_Shorthair','Egyptian_Mau','Maine_Coon','Persian','Ragdoll','Russian_Blue'...]
dls.vocab.o2i
{'Abyssinian': 0,
 'Bengal': 1,
 'Birman': 2,
 'Bombay': 3,
 'British_Shorthair': 4,
 'Egyptian_Mau': 5,
 'Maine_Coon': 6,
 'Persian': 7,
 'Ragdoll': 8,
 'Russian_Blue': 9,
 'Siamese': 10,
 'Sphynx': 11,
 'american_bulldog': 12,
 'american_pit_bull_terrier': 13,
 'basset_hound': 14,
 'beagle': 15,
 'boxer': 16,
 'chihuahua': 17,
 'english_cocker_spaniel': 18,
 'english_setter': 19,
 'german_shorthaired': 20,
 'great_pyrenees': 21,
 'havanese': 22,
 'japanese_chin': 23,
 'keeshond': 24,
 'leonberger': 25,
 'miniature_pinscher': 26,
 'newfoundland': 27,
 'pomeranian': 28,
 'pug': 29,
 'saint_bernard': 30,
 'samoyed': 31,
 'scottish_terrier': 32,
 'shiba_inu': 33,
 'staffordshire_bull_terrier': 34,
 'wheaten_terrier': 35,
 'yorkshire_terrier': 36}

Time to make and train a model!

We will be using a convolutional neural network backbone and a fully connected head with a single hidden layer as our classifier. Don't worry if thats a bunch of nonsense for now. Right now, just know this: we are piggybacking off of a model to help us classify images into 37 categories.

First, we need to make our Neural Network and our Learner like before.

A Learner needs (on a base level):

  • DataLoaders
  • Some architecture
  • A evaluation metric
  • A loss function
  • An optimizer

We'll also use mixed_precision (fp16)

learn = cnn_learner(dls, resnet34, pretrained=True, metrics=error_rate).to_fp16()

Some assumptions being made here:

  • Loss function is assumed as classification, so CrossEntropyFlat
  • Optimizer is assumed to be Adam

Now, we can train it! We will train it for four cycles through all our data

learn.fit_one_cycle(4)
epoch train_loss valid_loss error_rate time
0 1.919652 0.358465 0.114344 01:12
1 0.676084 0.240199 0.077808 01:11
2 0.378746 0.214091 0.068336 01:12
3 0.277557 0.208185 0.064953 01:12
learn.save('stage_1')

Lets look at our results

With the model trained, let's look at where our it might've messed up. What species did it have trouble differentiating between? So long as the misidentifications are not too crazy, our model is actually working.

Let's plot our losses and make a confusion matrix to visualize this. Below checks to make sure that all pieces we need are available

interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()

len(dls.valid_ds)==len(losses)==len(idxs)
True

plot_top_losses needs x number of images to use, and a figure size.

interp.plot_top_losses(9, figsize=(15,10))

plot_confusion_matrix just needs a figure size. dpi adjusts the quality

interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

We can also directly grab our most confused (A raw version of the confusion matrix), and pass in a threshold

interp.most_confused(min_val=3)
[('Ragdoll', 'Birman', 7),
 ('Egyptian_Mau', 'Bengal', 3),
 ('Maine_Coon', 'Ragdoll', 3)]

Unfreezing our data, fine-tuning, and our learning rates

So, we have the model. Let's fine tune it. First, we need to load our model back in.

learn.load('stage_1');

Now we will unfreeze and train more

learn.unfreeze()
learn.fit_one_cycle(4)
epoch train_loss valid_loss error_rate time
0 1.168013 4.194322 0.731394 01:16
1 1.062242 1.088786 0.305142 01:14
2 0.599455 0.481395 0.161705 01:14
3 0.290130 0.382984 0.117727 01:13

Now when we unfreeze, we unfreeze all the layers. So to show how a difference of a proper learning rate looks, let's load in those old weights and try using lr_find()

learn.load('stage_1');
learn.lr_find()

Alright so if we look here, we don't start really spiking our losses until ~10^-2 so a good spot is between 1e-6 and 1e-4, let's do that!

learn.unfreeze()
learn.fit_one_cycle(4, lr_max=slice(1e-6, 1e-4))
epoch train_loss valid_loss error_rate time
0 0.304765 0.138828 0.047361 01:14
1 0.262109 0.130033 0.043302 01:14
2 0.210819 0.133986 0.041949 01:14
3 0.196996 0.129849 0.040595 01:15

We can see that picking a proper learning rate can help speed things up!

learn.save('stage_2')

Now lets try with a resnet50!

If you need to restart your kernel due to memory errors I've attached the dls code below

pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
                 item_tfms=item_tfms,
                 batch_tfms=batch_tfms)

dls = pets.dataloaders(untar_data(URLs.PETS)/"images", bs=32)

Go ahead and try running the below code yourself. You should see a higher increase in accuracy!

Steps to try:

  1. Create your Learner
  2. Find a learning rate
  3. Fit for 5 epochs
  4. Unfreeze and fit for two more
 

The answer is hidden under here

learn = cnn_learner(data, resnet50, pretrained=True, metrics=error_rate)
learn.lr_find()
learn.fit_one_cycle(5, lr_max=slice(3e-4, 3e-3))
learn.save('resnet50')
learn.load('resnet50')
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=4e-4)
learn.save('resnet50')