General Components of the Vision API
In this notebook we'll be looking at the main bits and pieces that revolve around the Computer Vision sublibrary in fastai. We won't train a model, instead we'll show a few functions specific to vision, brielfly explain at a high level what they do, and show examples.
Opening Images
fastai utilizes the Pillow library to open images and apply transforms. To open up any image using Pillow inside the fastai library, we have PILImage.create:
from fastai.vision.all import *
We'll quickly grab the PETS dataset to examine colored images:
path = untar_data(URLs.PETS)
fnames = get_image_files(path/'images')
And open one of them with their filename:
im = PILImage.create(fnames[0])
We can show the image with im.show():
im.show()
We can also call the usual functions you may inside of Pillow, such as .shape and .size:
im.shape, im.size
PILImage can accept a varity of inputs to cover the most common types you will see in the wild:
TensorImage(fastaispecific)TensorMask(fastaispecific)- A
Tensor - A
ndarray - Bytes (note that it will call
io.BytesIOto open it) - Else it will call
Pillow'sImage.openfunction
We can also have black and white images, which has it's own PIL denomer, PILImageBW. We'll see an example with MNIST below:
path_m = untar_data(URLs.MNIST_SAMPLE)
imgs = get_image_files(path_m/'train')
im_bw = PILImageBW.create(imgs[0])
And we can show just like the previous one:
im_bw.show()
Opening Masks
Along with PILImage and PILImageBW we have PILMask designed to open masks. Let's see a quick example:
path_c = untar_data(URLs.CAMVID_TINY)
msks = get_image_files(path_c/'labels')
msk = PILMask.create(msks[0])
msk.show()
Each of these functions inherit from PILBase, which is a simple class that expands the usage of Image.Image.
Pairing it with the DataBlock
When using vision in the DataBlock API, two blocks are generally used:
When specifying if we want a black and white image, we can pass in a cls to ImageBlock like so:
ImageBlock(cls=PILImageBW)
There are more tasks than simply just Semantic Segmentation, so the entire list of vision-related blocks are below:
Special Learners
Each subsection of the library tends to have its own special Learner wrappers to apply a bit of magic. For Computer Vision, this comes in the form of cnn_learner and unet_learner.
A quick high-level explaination of cnn_learner is we can pass in a callable backbone model, fastai will freeze the weights, and apply their own custom head on top with two pooling layers. You will see this referenced the most in the vision section of this website
unet_learner is a method for generating a Learner paired with the Dynamic Unet architecture and is designed specifically for segmentation (though this model can be used for other tasks)
GANLearner, as it would suggest, is a Learner that should be used when working with GANs. It's a different API altogether compared to the previous two, given that GANs operate on a generator/discriminator dynamic.
item_tfms = [Resize(224)]
batch_tfms = [*aug_transforms(size=256)]
aug_transforms will generate a few random transforms that are applied efficiently on your batch.