General Components of the Vision API
In this notebook we'll be looking at the main bits and pieces that revolve around the Computer Vision sublibrary in fastai
. We won't train a model, instead we'll show a few functions specific to vision, brielfly explain at a high level what they do, and show examples.
Opening Images
fastai
utilizes the Pillow
library to open images and apply transforms. To open up any image using Pillow
inside the fastai
library, we have PILImage.create
:
from fastai.vision.all import *
We'll quickly grab the PETS
dataset to examine colored images:
path = untar_data(URLs.PETS)
fnames = get_image_files(path/'images')
And open one of them with their filename:
im = PILImage.create(fnames[0])
We can show the image with im.show()
:
im.show()
We can also call the usual functions you may inside of Pillow
, such as .shape
and .size
:
im.shape, im.size
PILImage
can accept a varity of inputs to cover the most common types you will see in the wild:
TensorImage
(fastai
specific)TensorMask
(fastai
specific)- A
Tensor
- A
ndarray
- Bytes (note that it will call
io.BytesIO
to open it) - Else it will call
Pillow
'sImage.open
function
We can also have black and white images, which has it's own PIL
denomer, PILImageBW
. We'll see an example with MNIST
below:
path_m = untar_data(URLs.MNIST_SAMPLE)
imgs = get_image_files(path_m/'train')
im_bw = PILImageBW.create(imgs[0])
And we can show just like the previous one:
im_bw.show()
Opening Masks
Along with PILImage
and PILImageBW
we have PILMask
designed to open masks. Let's see a quick example:
path_c = untar_data(URLs.CAMVID_TINY)
msks = get_image_files(path_c/'labels')
msk = PILMask.create(msks[0])
msk.show()
Each of these functions inherit from PILBase
, which is a simple class that expands the usage of Image.Image
.
Pairing it with the DataBlock
When using vision in the DataBlock API, two blocks are generally used:
When specifying if we want a black and white image, we can pass in a cls
to ImageBlock
like so:
ImageBlock(cls=PILImageBW)
There are more tasks than simply just Semantic Segmentation, so the entire list of vision-related blocks are below:
Special Learners
Each subsection of the library tends to have its own special Learner
wrappers to apply a bit of magic. For Computer Vision, this comes in the form of cnn_learner
and unet_learner
.
A quick high-level explaination of cnn_learner
is we can pass in a callable backbone model, fastai
will freeze the weights, and apply their own custom head on top with two pooling layers. You will see this referenced the most in the vision section of this website
unet_learner
is a method for generating a Learner
paired with the Dynamic Unet architecture and is designed specifically for segmentation (though this model can be used for other tasks)
GANLearner
, as it would suggest, is a Learner
that should be used when working with GANs
. It's a different API altogether compared to the previous two, given that GANs operate on a generator/discriminator dynamic.
item_tfms = [Resize(224)]
batch_tfms = [*aug_transforms(size=256)]
aug_transforms
will generate a few random transforms that are applied efficiently on your batch.