Today we will be using the basics
, callback
, and vision
libraries as we have a vision task
from fastai.basics import *
from fastai.vision.all import *
from fastai.callback.all import *
Below you will find the exact imports for everything we use today
import numpy as np
from fastcore.xtras import Path # @patch'd properties to the Pathlib module
from fastai.callback.fp16 import to_fp16
from fastai.callback.schedule import fit_one_cycle, lr_find
from fastai.data.block import CategoryBlock, DataBlock
from fastai.data.external import untar_data, URLs
from fastai.data.transforms import get_image_files, Normalize, RandomSplitter, RegexLabeller
from fastai.interpret import ClassificationInterpretation
from fastai.learner import Learner # imports @patch'd properties to Learner including `save`, `load`, `freeze`, and `unfreeze`
from fastai.metrics import error_rate
from fastai.vision.augment import aug_transforms, RandomResizedCrop
from fastai.vision.core import imagenet_stats
from fastai.vision.data import ImageDataLoaders, ImageBlock
from fastai.vision.learner import cnn_learner
from torchvision.models.resnet import resnet34, resnet50
Overall process using machine learning models and fastai:
- Make our
DataLoaders
- Make a
Learner
with some "equipment" - Train
We'll be trying to identify between 12 species of cats and 25 species of dogs (37 in total). Five years ago, the best was 59% with seperate classifications for the image, head, and body of the animal. Let's try just doing one image for everything.
But before anything, we need data!
If we call help
on untar_data
we can see it's doc description
help(untar_data)
We can also pull up the source code by adding a ??
at the end:
untar_data??
Let's download the PETS
dataset
path = untar_data(URLs.PETS)
And set our seed
np.random.seed(2)
How will our data look?
path.ls()[:3]
Let's build a DataLoaders
. First we'll need the path
to our data, some filenames, and the regex pattern to extract our labels:
path = untar_data(URLs.PETS)
fnames = get_image_files(path/'images')
pat = r'(.+)_\d+.jpg$'
Some basic transforms for getting all of our images the same size (item_tfms
), and some augmentations and Normalization to be done on the GPU (batch_tfms
)
item_tfms = RandomResizedCrop(460, min_scale=0.75, ratio=(1.,1.))
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
bs=64
ImageDataBunch
(renamed to ImageDataLoaders) - highest level API
dls = ImageDataLoaders.from_name_re(path, fnames, pat, batch_tfms=batch_tfms,
item_tfms=item_tfms, bs=bs)
What is the API?
Let's rebuild using the DataBlock
API
We'll need to define what our input and outputs should be (An Image
and a Category
for classification), how to get our items, how to split our data, how to extract our labels, and our augmentation as before
pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
item_tfms=item_tfms,
batch_tfms=batch_tfms)
path_im = path/'images'
dls = pets.dataloaders(path_im, bs=bs)
We can take a look at a batch of our images using show_batch
and pass in a aximum number of images to show, and how large we want to view them as
dls.show_batch(max_n=9, figsize=(6,7))
If we want to see how many classes we have, and the names of them we can simply call dls.vocab
. The first is the number of classes, the second is the names of our classes. You may notice this looks a bit odd, that's because this L
is a new invention of Jeremy and Sylvian. Essentially it's a Python list taken to the extreme.
Before if we wanted to grab the index for the name of a class (eg. our model output 0 as our class), we would need to use data.c2i
to grab the Class2Index mapping. This is still here, it lives in dls.vocab.o2i
dls.vocab
dls.vocab.o2i
We will be using a convolutional neural network backbone and a fully connected head with a single hidden layer as our classifier. Don't worry if thats a bunch of nonsense for now. Right now, just know this: we are piggybacking off of a model to help us classify images into 37 categories.
First, we need to make our Neural Network and our Learner
like before.
A Learner
needs (on a base level):
DataLoaders
- Some architecture
- A evaluation metric
- A loss function
- An optimizer
We'll also use mixed_precision
(fp16
)
learn = cnn_learner(dls, resnet34, pretrained=True, metrics=error_rate).to_fp16()
Some assumptions being made here:
- Loss function is assumed as classification, so
CrossEntropyFlat
- Optimizer is assumed to be Adam
Now, we can train it! We will train it for four cycles through all our data
learn.fit_one_cycle(4)
learn.save('stage_1')
With the model trained, let's look at where our it might've messed up. What species did it have trouble differentiating between? So long as the misidentifications are not too crazy, our model is actually working.
Let's plot our losses and make a confusion matrix to visualize this. Below checks to make sure that all pieces we need are available
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
len(dls.valid_ds)==len(losses)==len(idxs)
plot_top_losses
needs x
number of images to use, and a figure size.
interp.plot_top_losses(9, figsize=(15,10))
plot_confusion_matrix
just needs a figure size. dpi adjusts the quality
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
We can also directly grab our most confused (A raw version of the confusion matrix), and pass in a threshold
interp.most_confused(min_val=3)
So, we have the model. Let's fine tune it. First, we need to load our model back in.
learn.load('stage_1');
Now we will unfreeze and train more
learn.unfreeze()
learn.fit_one_cycle(4)
Now when we unfreeze
, we unfreeze all the layers. So to show how a difference of a proper learning rate looks, let's load in those old weights and try using lr_find()
learn.load('stage_1');
learn.lr_find()
Alright so if we look here, we don't start really spiking our losses until ~10^-2 so a good spot is between 1e-6 and 1e-4, let's do that!
learn.unfreeze()
learn.fit_one_cycle(4, lr_max=slice(1e-6, 1e-4))
We can see that picking a proper learning rate can help speed things up!
learn.save('stage_2')
pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'),
item_tfms=item_tfms,
batch_tfms=batch_tfms)
dls = pets.dataloaders(untar_data(URLs.PETS)/"images", bs=32)
Go ahead and try running the below code yourself. You should see a higher increase in accuracy!
Steps to try:
- Create your Learner
- Find a learning rate
- Fit for 5 epochs
- Unfreeze and fit for two more
learn = cnn_learner(data, resnet50, pretrained=True, metrics=error_rate)
learn.lr_find()
learn.fit_one_cycle(5, lr_max=slice(3e-4, 3e-3))
learn.save('resnet50')
learn.load('resnet50')
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=4e-4)
learn.save('resnet50')