Lesson Video:

What is our goal today?

  • Make an image classifier for Cats and Dogs
  • Make our own image classifier
  • Start to understand the DataBlock API

Let's grab the library:

Today we will be using the basics, callback, and vision libraries as we have a vision task

from fastai.basics import *
from fastai.vision.all import *
from fastai.callback.all import *

Below you will find the exact imports for everything we use today

import numpy as np
from fastcore.xtras import Path # @patch'd properties to the Pathlib module

from fastai.callback.fp16 import to_fp16
from fastai.callback.schedule import fit_one_cycle, lr_find 

from fastai.data.block import CategoryBlock, DataBlock
from fastai.data.external import untar_data, URLs
from fastai.data.transforms import get_image_files, Normalize, RandomSplitter, RegexLabeller

from fastai.interpret import ClassificationInterpretation
from fastai.learner import Learner # imports @patch'd properties to Learner including `save`, `load`, `freeze`, and `unfreeze`

from fastai.metrics import error_rate

from fastai.vision.augment import aug_transforms, RandomResizedCrop
from fastai.vision.core import imagenet_stats
from fastai.vision.data import ImageDataLoaders, ImageBlock
from fastai.vision.learner import cnn_learner

from torchvision.models.resnet import resnet34, resnet50

Overall process using machine learning models and fastai:

  1. Make our DataLoaders
  2. Make a Learner with some "equipment"
  3. Train

Looking at Data

We'll be trying to identify between 12 species of cats and 25 species of dogs (37 in total). Five years ago, the best was 59% with seperate classifications for the image, head, and body of the animal. Let's try just doing one image for everything.

But before anything, we need data!

If we call help on untar_data we can see it's doc description

Help on function untar_data in module fastai.data.external:

untar_data(url, fname=None, dest=None, c_key='data', force_download=False, extract_func=<function tar_extract at 0x7faf97d5a158>)
    Download `url` to `fname` if `dest` doesn't exist, and un-tgz to folder `dest`.

We can also pull up the source code by adding a ?? at the end:


Let's download the PETS dataset

path = untar_data(URLs.PETS)

And set our seed


How will our data look?

(#2) [Path('/root/.fastai/data/oxford-iiit-pet/images'),Path('/root/.fastai/data/oxford-iiit-pet/annotations')]

Let's build a DataLoaders. First we'll need the path to our data, some filenames, and the regex pattern to extract our labels:

path = untar_data(URLs.PETS)
fnames = get_image_files(path/'images')
pat = r'(.+)_\d+.jpg$'

Some basic transforms for getting all of our images the same size (item_tfms), and some augmentations and Normalization to be done on the GPU (batch_tfms)

item_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]

ImageDataBunch (renamed to ImageDataLoaders) - highest level API

dls = ImageDataLoaders.from_name_re(path, fnames, pat, batch_tfms=batch_tfms, 
                                   item_tfms=item_tfms, bs=bs)

What is the API?

Let's rebuild using the DataBlock API

We'll need to define what our input and outputs should be (An Image and a Category for classification), how to get our items, how to split our data, how to extract our labels, and our augmentation as before

pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
path_im = path/'images'
dls = pets.dataloaders(path_im, bs=bs)

We can take a look at a batch of our images using show_batch and pass in a aximum number of images to show, and how large we want to view them as

dls.show_batch(max_n=9, figsize=(6,7))

If we want to see how many classes we have, and the names of them we can simply call dls.vocab. The first is the number of classes, the second is the names of our classes. You may notice this looks a bit odd, that's because this L is a new invention of Jeremy and Sylvian. Essentially it's a Python list taken to the extreme.

Before if we wanted to grab the index for the name of a class (eg. our model output 0 as our class), we would need to use data.c2i to grab the Class2Index mapping. This is still here, it lives in dls.vocab.o2i

(#37) ['Abyssinian','Bengal','Birman','Bombay','British_Shorthair','Egyptian_Mau','Maine_Coon','Persian','Ragdoll','Russian_Blue'...]
{'Abyssinian': 0,
 'Bengal': 1,
 'Birman': 2,
 'Bombay': 3,
 'British_Shorthair': 4,
 'Egyptian_Mau': 5,
 'Maine_Coon': 6,
 'Persian': 7,
 'Ragdoll': 8,
 'Russian_Blue': 9,
 'Siamese': 10,
 'Sphynx': 11,
 'american_bulldog': 12,
 'american_pit_bull_terrier': 13,
 'basset_hound': 14,
 'beagle': 15,
 'boxer': 16,
 'chihuahua': 17,
 'english_cocker_spaniel': 18,
 'english_setter': 19,
 'german_shorthaired': 20,
 'great_pyrenees': 21,
 'havanese': 22,
 'japanese_chin': 23,
 'keeshond': 24,
 'leonberger': 25,
 'miniature_pinscher': 26,
 'newfoundland': 27,
 'pomeranian': 28,
 'pug': 29,
 'saint_bernard': 30,
 'samoyed': 31,
 'scottish_terrier': 32,
 'shiba_inu': 33,
 'staffordshire_bull_terrier': 34,
 'wheaten_terrier': 35,
 'yorkshire_terrier': 36}

Time to make and train a model!

We will be using a convolutional neural network backbone and a fully connected head with a single hidden layer as our classifier. Don't worry if thats a bunch of nonsense for now. Right now, just know this: we are piggybacking off of a model to help us classify images into 37 categories.

First, we need to make our Neural Network and our Learner like before.

A Learner needs (on a base level):

  • DataLoaders
  • Some architecture
  • A evaluation metric
  • A loss function
  • An optimizer

We'll also use mixed_precision (fp16)

learn = cnn_learner(dls, resnet34, pretrained=True, metrics=error_rate).to_fp16()

Some assumptions being made here:

  • Loss function is assumed as classification, so CrossEntropyFlat
  • Optimizer is assumed to be Adam

Now, we can train it! We will train it for four cycles through all our data

epoch train_loss valid_loss error_rate time
0 1.919652 0.358465 0.114344 01:12
1 0.676084 0.240199 0.077808 01:11
2 0.378746 0.214091 0.068336 01:12
3 0.277557 0.208185 0.064953 01:12

Lets look at our results

With the model trained, let's look at where our it might've messed up. What species did it have trouble differentiating between? So long as the misidentifications are not too crazy, our model is actually working.

Let's plot our losses and make a confusion matrix to visualize this. Below checks to make sure that all pieces we need are available

interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()


plot_top_losses needs x number of images to use, and a figure size.

interp.plot_top_losses(9, figsize=(15,10))

plot_confusion_matrix just needs a figure size. dpi adjusts the quality

interp.plot_confusion_matrix(figsize=(12,12), dpi=60)