Using Raw PyTorch

Removing most of the fastai library to train a model
Lesson 2

Lesson Video:

Introduction

In this chapter we’re going to go back to the previous lesson and train on the PETs dataset again, however there is a specific set of rules we will be following:

  1. We cannot use the fastai data API, it must be done in raw PyTorch
  2. We cannot use vision_learner, we must create our own model
  3. We cannot use fastai’s Optimizer, it must be a PyTorch optimizer.

If you don’t know what that last part is, that is okay. We’ll cover it briefly in this lecture

Removing the Data API

Downloading the dataset

The only part of fastai we will be allowed to use is untar_data to get the dataset and imagenet_stats, so let’s import it and grab it now:

from fastai.data.external import untar_data, URLs
from fastai.vision.data import imagenet_stats
from fastcore.xtras import Path # to bring in some patched functionalities we will use later

dataset_path = untar_data(URLs.PETS)
dataset_path.ls()
(#2) [Path('/home/zach_mueller_huggingface_co/.fastai/data/oxford-iiit-pet/annotations'),Path('/home/zach_mueller_huggingface_co/.fastai/data/oxford-iiit-pet/images')]
imagenet_stats
([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

Defining our Transforms

The next step is to define our Transform’s so they mimic fastai close enough to get by. Our transforms can be boiled down to:

  1. Resize randomly to (224, 224)
  2. Convert to a Tensor
  3. Perform some more data augmentation such as lighting and rotation
  4. Convert to a float tensor
  5. Normalize based on ImageNet

We’ll do all but 3 here for simplicity.

In PyTorch this sequence of transforms is actually defined as a model nn.Sequential layer:

from torch import nn
from torchvision.transforms import CenterCrop, RandomResizedCrop, ToTensor, Normalize

train_transforms = nn.Sequential(
    RandomResizedCrop((224,224)),
    Normalize(*imagenet_stats)
)

valid_transforms = nn.Sequential(
    CenterCrop((224,224)),
    Normalize(*imagenet_stats)
)

Make a PyTorch Dataset

Next we need to create a Dataset class for us to use. Basically this is a simple class that will have three main functions:

In Python this is a class constructor or what is called when you do MyClass(). For our class this will include taking in and storing the list of filenames, transforms, and a way to turn the label strings into a number

In Python this function is how you get the length of some collection of data or items when doing len(MyThing()). For our class this will return the length of all the items used in the dataset.

In Python this function is what gets called when you index into an object, such as myList[x] and will return whatever you are trying to grab when doing so. For our class this will grab and open a file, apply the transforms, and return a tuple of the image and the label

import re
from PIL import Image
from torch.utils.data import Dataset

# This example is highly based on the work of Sylvain Gugger
# for the Accelerate notebook example which can be found here: 
# https://github.com/huggingface/notebooks/blob/main/examples/accelerate_examples/simple_cv_example.ipynb
class PetsDataset(Dataset):
    "A basic dataset that will return a tuple of (image, label)"
    def __init__(self, filenames:list, transforms:nn.Sequential, label_to_int:dict):
        self.filenames = filenames
        self.transforms = transforms
        self.label_to_int = label_to_int
        self.to_tensor = ToTensor()
    
    def __len__(self):
        return len(self.filenames)
    
    def apply_x_transforms(self, filename):
        image = Image.open(filename).convert("RGB")
        tensor_image = self.to_tensor(image)
        return self.transforms(tensor_image)
    
    def apply_y_transforms(self, filename):
        label = re.findall(r"^(.*)_\d+\.jpg$", filename.name)[0].lower()
        return self.label_to_int[label]
    
    def __getitem__(self, index):
        filename = self.filenames[index]
        x = self.apply_x_transforms(filename)
        y = self.apply_y_transforms(filename)
        return (x,y)
import re
from PIL import Image
from torch.utils.data import Dataset

# This example is highly based on the work of Sylvain Gugger
# for the Accelerate notebook example which can be found here: 
# https://github.com/huggingface/notebooks/blob/main/examples/accelerate_examples/simple_cv_example.ipynb
class PetsDataset(Dataset):
    "A basic dataset that will return a tuple of (image, label)"
    def __init__(self, filenames:list, transforms:nn.Sequential, label_to_int:dict):
        self.filenames = filenames
        self.transforms = transforms
        self.label_to_int = label_to_int
        self.to_tensor = ToTensor()
    
    def __len__(self):
        return len(self.filenames)
    
    def apply_x_transforms(self, filename):
        image = Image.open(filename).convert("RGB")
        tensor_image = self.to_tensor(image)
        return self.transforms(tensor_image)
    
    def apply_y_transforms(self, filename):
        label = re.findall(r"^(.*)_\d+\.jpg$", filename.name)[0].lower()
        return self.label_to_int[label]
    
    def __getitem__(self, index):
        filename = self.filenames[index]
        x = self.apply_x_transforms(filename)
        y = self.apply_y_transforms(filename)
        return (x,y)

    def apply_x_transforms(self, filename):
        image = Image.open(filename).convert("RGB")
        tensor_image = self.to_tensor(image)
        return self.transforms(tensor_image)

This function first opens an image in Pillow and converts it to an RGB color channel, then turns this PIL Image into a tensor, before finally applying the transforms we want applied to it.


    def apply_y_transforms(self, filename):
        label = re.findall(r"^(.*)_\d+\.jpg$", filename.name)[0].lower()
        return self.label_to_int[label]

This function uses regex to extract the filename based on the expectation it will show up as label_{some_number}.jpg and then converts this string label into an integer based on the label to integer dictionary


    def __getitem__(self, index):
        filename = self.filenames[index]
        x = self.apply_x_transforms(filename)
        y = self.apply_y_transforms(filename)
        return (x,y)

This function first grabs the filename we want to use based on the index passed, then calls our defined apply_{type}_transform function before finally returning a tuple of the input and output.

Prepare for the Dataset

Next we need to prepare for the dataset by:

  • Getting a dictionary of labels to encoded classes
  • Split the dataset randomly 80/20

Labels as encoded classes

To get the labels as encoded classes, we can create a list of just labels then find the unique ones from them:

label_pat = r"^(.*)_\d+\.jpg$"
filenames = (dataset_path/'images').ls(file_exts=".jpg")
label_pat = r"^(.*)_\d+\.jpg$"
filenames = (dataset_path/'images').ls(file_exts=".jpg")

label_pat = r"^(.*)_\d+\.jpg$"

This is the same regex pattern as before


.ls(file_exts='.jpg')

This performs a monkey-patched functionality to pathlib.Path in fastcore to perform an ls operation on the path, returning only files that end with .jpg

labels = filenames.map(
    lambda x: re.findall(label_pat, x.name)[0].lower()
).unique()
labels = filenames.map(
    lambda x: re.findall(label_pat, x.name)[0].lower()
).unique()

labels.map

A map will apply some function to every single item in a collection. Generally it’s seen as map(func, items). Since this list is a fastcore.foundations.L, we can just use map() directly and have it apply to labels


lambda x: 

A lambda function is what is called an anonymous function. These don’t need def name():... and instead assume the input is whatever goes before the :


 re.findall(label_pat, x.name)[0].lower()

This will apply our label_pat to the filename, return the first found item, and lowercase it.


unique()

This will look inside our resulting labels and return a list of every single unique value inside it

labels
(#37) ['beagle','yorkshire_terrier','staffordshire_bull_terrier','japanese_chin','maine_coon','chihuahua','basset_hound','samoyed','great_pyrenees','russian_blue'...]

And now we have a list of our 37 labels! All that’s left is to quickly turn this into a dictionary of indexes:

label_to_int = {index:key for key, index in enumerate(labels)}
label_to_int.keys(), label_to_int["siamese"]
(dict_keys(['beagle', 'yorkshire_terrier', 'staffordshire_bull_terrier', 'japanese_chin', 'maine_coon', 'chihuahua', 'basset_hound', 'samoyed', 'great_pyrenees', 'russian_blue', 'scottish_terrier', 'bombay', 'english_setter', 'havanese', 'english_cocker_spaniel', 'american_bulldog', 'sphynx', 'birman', 'british_shorthair', 'saint_bernard', 'german_shorthaired', 'abyssinian', 'keeshond', 'boxer', 'miniature_pinscher', 'wheaten_terrier', 'egyptian_mau', 'pomeranian', 'pug', 'leonberger', 'american_pit_bull_terrier', 'persian', 'shiba_inu', 'ragdoll', 'newfoundland', 'bengal', 'siamese']),
 36)

Splitting the dataset

Finally to split the dataset we can use numpy to shuffle our filenames before then splitting them 80/20:

import numpy as np
shuffled_indexes = np.random.permutation(len(filenames))
split = int(0.8 * len(filenames))
train_indexes, valid_indexes = (
    shuffled_indexes[:split], shuffled_indexes[split:]
)
import numpy as np
shuffled_indexes = np.random.permutation(len(filenames))
split = int(0.8 * len(filenames))
train_indexes, valid_indexes = (
    shuffled_indexes[:split], shuffled_indexes[split:]
)

multline

This will return a list from 0 -> len(filenames) that is completely shuffled, such as [0,5,12,...]


split = int(0.8 * len(filenames))
train_indexes, valid_indexes = (
    shuffled_indexes[:split], shuffled_indexes[split:]
)

This will find the closest integer that is 80% through the length of our filenames and then split the list of shuffled_indexes by this

We can then grab our train and validation filenames:

train_fnames = filenames[train_indexes]
valid_fnames = filenames[valid_indexes]

Creating our Datasets

Finally we need to create the actual dataset objects. To do so we just call our PetsDataset class with the items required:

train_dataset = PetsDataset(
    train_fnames,
    train_transforms,
    label_to_int
)

valid_dataset = PetsDataset(
    valid_fnames,
    valid_transforms,
    label_to_int
)

We can look at one of the items in the dataset:

x,y = train_dataset[0]
x.shape, y
(torch.Size([3, 224, 224]), 3)

Which has been transformed into a 224x224 tensor, and a class label of 3

Creating PyTorch Dataloaders

Next we need to create a set of DataLoader’s to use. These will get wrapped by fastai’s DataLoaders class, which let’s us use them directly in the framework:

from torch.utils.data import DataLoader
train_dataloader = DataLoader(
    train_dataset,
    shuffle=True,
    drop_last=True,
    batch_size=64
)
valid_dataloader = DataLoader(
    valid_dataset,
    batch_size=128
)

We can increase the batch size to 128 because gradients won’t be calculated during the validation set and we have more memory free

from fastai.data.core import DataLoaders
dls = DataLoaders(train_dataloader, valid_dataloader)

The DataLoaders class accepts any number of DataLoader’s (from fastai or PyTorch), and each are accessible through dls[index]; however only the first two will be available as dls.train and dls.valid respectively.

Creating a PyTorch Model

We’ll be doing a similar method to what was shown earlier this lesson to create a pretrained model through PyTorch:

from torchvision.models import resnet34

model = resnet34(pretrained=True)

And change the last layer’s outputs to be our number of classes:

model.fc = nn.Linear(512, 37, bias=True)
model.fc
Linear(in_features=512, out_features=37, bias=True)

The last thing we need to do is perform gradual unfreezing of our layers. What is this?

Gradual Unfreezing

In Concept

When we loaded the model in through vision_learner, it did a number of changes to our model.

  1. It “cut off” that fc layer as well as the pooling layer (avgpool) to create a body
  2. It used create_head to make a new head that fastai uses for their cnn models
  3. It then froze the backbone of the model, or the body

Freezing means that the parameters inside that section of the model are considered untrainable, meaning their parameters won’t get updated as we train. This will be applied to both Model Body’s shown

Essentially, it looks like so:

Mermaid diagram of the original versus new model

While we won’t create a custom head, we will still be performing the freezing:

In Code

list(model.children())[-1]
Linear(in_features=512, out_features=37, bias=True)
list(model.children())[-1]
Linear(in_features=512, out_features=37, bias=True)

model.children()

All of a PyTorch model’s layers live in the children() generator. We can find the last layer by turning that into a list and indexing into it properly

for layer in list(model.children())[:-1]:
    if hasattr(layer, "requires_grad_"):
        layer.requires_grad_(False)
Why iterate over all the layers including the pooling layer?

Pooling layers have no parameters, so there is no requires_grad_ to be set.

And now the backbone of the model is frozen. We’re almost there!

Creating an Optimizer

The last step we will go over here is creating the Optimizer.

What is an optimizer?

It is the backbone of our training. It is what goes through and calculates how to update our weights relative to our loss for a particular batch.

By default fastai uses the AdamW optimizer (shown as Adam in fastai). As a result we’ll use it here:

from torch.optim import AdamW

To use a PyTorch optimizer in the fastai framework, we make use of an OptimWrapper class fastai has to convert the PyTorch optimizer into something compatible:

from functools import partial
from fastai.optimizer import OptimWrapper
opt_func = partial(OptimWrapper, opt=AdamW)
opt_func = partial(OptimWrapper, opt=AdamW)

partial

A partial function is a function that has overloaded constructors, so when we call opt_func() now it will automatically have the opt parameter be set to the AdamW class.

Bringing in fastai and Training!

We have all the steps in place now to finally begin training. As mentioned previously fastai’s training magic is all within the Learner class. As a result, we will import it and any patched methods we want to use:

from fastai.losses import CrossEntropyLossFlat
from fastai.metrics import accuracy
from fastai.learner import Learner
from fastai.callback.schedule import Learner # To get `fit_one_cycle`, `lr_find`, and more

To bring inMSELossFlaty @patched functions defined in a fastai module, we can import the entire module or import the class. Both do not immediatly pollute the namespace, however the one shown here is better for code clarity

We then pass all the items we’ve written so far to the Learner:

model.cuda();
learn = Learner(
    dls, 
    model, 
    opt_func=opt_func, 
    loss_func=CrossEntropyLossFlat(), 
    metrics=accuracy
)

And now we can train like normal!

learn.lr_find()
SuggestedLRs(valley=0.0010000000474974513)

learn.fit_one_cycle(5, 1e-3)
epoch train_loss valid_loss accuracy time
0 2.816982 1.678761 0.711096 00:57
1 1.372453 0.673238 0.851150 00:54
2 0.891639 0.519573 0.867388 00:54
3 0.747724 0.478146 0.882273 00:55
4 0.706740 0.471565 0.884303 00:55

Getting Predictions

Now that we have a trained model, how do we get predictions?

First we’ll open one of our images:

im = Image.open(filenames[0])
im

Then we’ll extract the trained model from the Learner

net = learn.model

Apply the validation transforms onto the image:

tfm_x = valid_transforms(ToTensor()(im))
tfm_x = tfm_x.unsqueeze(0); tfm_x.shape
torch.Size([1, 3, 224, 224])
tfm_x = valid_transforms(ToTensor()(im))
tfm_x = tfm_x.unsqueeze(0); tfm_x.shape
torch.Size([1, 3, 224, 224])

tfm_x = valid_transforms(ToTensor()(im))

We apply the valid_transforms to our image after converting it to a tensor


tfm_x = tfm_x.unsqueeze(0); tfm_x.shape

Then we need to turn it into a “batch” of one item by adding a single dimension to the front (turning it from (3, 224, 224) to (1, 3, 224, 224))

Before finally getting our prediction through raw PyTorch:

import torch
net.eval()
with torch.no_grad():
    preds = net(tfm_x.cuda())
pred = preds.argmax(dim=-1)[0]
label = list(label_to_int.keys())[pred]
pred, label
import torch
net.eval()
with torch.no_grad():
    preds = net(tfm_x.cuda())
pred = preds.argmax(dim=-1)[0]
label = list(label_to_int.keys())[pred]
pred, label

net.eval()

When performing inference, we set the model to evaluation mode. This modifies any layers that keep track of things during training and are deterministic such as BatchNorm


with torch.no_grad():
    preds = net(tfm_x.cuda())

We wrap the prediction around torch.no_grad to skip calculating the gradients. This make inference time a bit faster and saves a bit of memory. Also the input needs to be on the right device (cuda)


pred = preds.argmax(dim=-1)[0]
label = list(label_to_int.keys())[pred]

To get the class result, we find what index had the highest value. Since we only predicted on one value we can take the first item. And finally we can take our label_to_int dictionary from earlier and index into it to grab the true label.