Walk with fastai - What is the fastai API?

Lesson Video:

Three Levels

Consists of 3 API levels:

What are the three levels?

If you ask me, the three API levels go {Task}DataLoaders.from_{source} -> DataBlock -> Datasets.

If you ask Jeremy it goes DataBlock -> Datasets -> raw PyTorch.

I don’t consider raw PyTorch part of the data API

Highest level of the API, but not much flexibility
Debateable, but I consider it to be

Consists of:

ImageDataLoaders
SegmentationDataLoaders
TextDataLoaders
TabularDataLoaders

Each have various class constructors

ImageDataLoaders

from_folder
from_path_func
from_name_func
from_path_re
from_name_re
from_df
from_csv
from_lists

SegmentationDataLoaders

from_label_func

TextDataLoaders

from_folder
from_df
from_csv

TabularDataLoaders

from_df
from_csv

You should graduate from these well before you finish this course

Medium level API (debateable)
Medium flexibility
Building blocks of the framework
What we will focus on

Warning

It’s difficult to implement custom bits with this class

Vision

ImageBlock
MaskBlock
PointBlock
BBoxBlock

Text

TextBlock

Tabular

TabularPandas

Labels (that don’t fall in the previous)

CategoryBlock
MultiCategoryBlock
RegressionBlock

Base

TransformBlock

Lowest Level
Highest Flexibility
Hardest to learn due to so much magic
Consists of the “groundwork” for all the other wrappers

Vision

PILBase
PILImage
PILImageBW
PILMask
TensorPoint
TensorBBox
LabelBBox
PointScaler
BBoxLabeler

Tabular

Tabular

Text

TensorText
LMTensorText
Tokenizer

How they all intertwine

Define a set of blocks for your problem

Write how to extract the information needed for each Block from the source

Create a splitting function that takes in some data and returns a tuple of indicies

List a set of item and batch transforms to be applied to the data

Call the dataloaders function and pass in a batch size

What are our options?

get_x
get_y
get_items
n_inp

ColSplitter
EndSplitter
FileSplitter
FuncSplitter
GrandparentSplitter
IndexSplitter
IndexSplitter
MaskSplitter
RandomSplitter
RandomSubsetSplitter
TrainTestSplitter

Categorize
ColReader
MultiCategorize
RegexLabeller
RegressionSetup
parent_label

Building some DataLoaders

When we are ready to create our DataLoaders, we pass in the items to use, a batch_size, and the transforms to be performed to the DataLoader constructor:

Why are Item and Batch Transforms handled differently?

Item transforms are happened first and are used to prepare a batch. This includes transformations such as converting to a torch.tensor, ensuring that images, text, or tabular data can be collated together (the same size/shape).

Batch transforms are performed on an entire subset of data (after they have all been through the item transforms) at once as a big matrix. Examples can include further resizing, normalizing the data, and other data augmentation. As a result they are multitudes faster