Lesson Video:
Three Levels
Consists of 3 API levels:
If you ask me, the three API levels go {Task}DataLoaders.from_{source}
-> DataBlock
-> Datasets
.
If you ask Jeremy it goes DataBlock
-> Datasets
-> raw PyTorch.
I don’t consider raw PyTorch part of the data API
- Highest level of the API, but not much flexibility
- Debateable, but I consider it to be
Consists of:
ImageDataLoaders
SegmentationDataLoaders
TextDataLoaders
TabularDataLoaders
Each have various class constructors
You should graduate from these well before you finish this course
- Medium level API (debateable)
- Medium flexibility
- Building blocks of the framework
- What we will focus on
- It’s difficult to implement custom bits with this class
- Lowest Level
- Highest Flexibility
- Hardest to learn due to so much magic
- Consists of the “groundwork” for all the other wrappers
How they all intertwine
Define a set of blocks for your problem
Write how to extract the information needed for each Block from the source
Create a splitting function that takes in some data and returns a tuple of indicies
List a set of item and batch transforms to be applied to the data
Call the dataloaders function and pass in a batch size
What are our options?
get_x
get_y
get_items
n_inp
ColSplitter
EndSplitter
FileSplitter
FuncSplitter
GrandparentSplitter
IndexSplitter
IndexSplitter
MaskSplitter
RandomSplitter
RandomSubsetSplitter
TrainTestSplitter
Categorize
ColReader
MultiCategorize
RegexLabeller
RegressionSetup
parent_label
Building some DataLoaders
When we are ready to create our DataLoaders
, we pass in the items
to use, a batch_size
, and the transforms to be performed to the DataLoader
constructor:
Item transforms are happened first and are used to prepare a batch. This includes transformations such as converting to a torch.tensor
, ensuring that images, text, or tabular data can be collated together (the same size/shape).
Batch transforms are performed on an entire subset of data (after they have all been through the item transforms) at once as a big matrix. Examples can include further resizing, normalizing the data, and other data augmentation. As a result they are multitudes faster