graph LR
A{"🤗 Accelerate#32;"}
A --> B["Launching<br>Interface#32;"]
A --> C["Training Library#32;"]
A --> D["Big Model<br>Inference#32;"]
graph LR
A{"🤗 Accelerate#32;"}
A --> B["Launching<br>Interface#32;"]
A --> C["Training Library#32;"]
A --> D["Big Model<br>Inference#32;"]
Launching scripts in different environments is complicated:
And more!
But it doesn’t have to be:
A single command to launch with DeepSpeed, Fully Sharded Data Parallelism, across single and multi CPUs and GPUs, and to train on TPUs1 too!
Generate a device-specific configuration through accelerate config
Or don’t. accelerate config doesn’t have to be done!
torchrun --nnodes=1 --nproc_per_node=2 script.py
accelerate launch --multi_gpu --nproc_per_node=2 script.pyA quick default configuration can be made too:
With the notebook_launcher it’s also possible to launch code directly from your Jupyter environment too!
Okay, will accelerate launch make do_the_thing.py use all my GPUs magically?
accelerate launch to launch a python script in various distributed environmentsaccelerate solves this by ensuring the same code can be ran on a CPU or GPU, multiples, and on TPUs!from accelerate import Accelerator
accelerator = Accelerator()
dataloader, model, optimizer scheduler = (
accelerator.prepare(
dataloader, model, optimizer, scheduler
)
)
for batch in dataloader:
optimizer.zero_grad()
inputs, targets = batch
# inputs = inputs.to(device)
# targets = targets.to(device)
outputs = model(inputs)
loss = loss_function(outputs, targets)
accelerator.backward(loss) # loss.backward()
optimizer.step()
scheduler.step()What all happened in Accelerator.prepare?
Accelerator looked at the configurationdataloader was converted into one that can dispatch each batch onto a seperate GPUmodel was wrapped with the appropriate DDP wrapper from either torch.distributed or torch_xlaoptimizer and scheduler were both converted into an AcceleratedOptimizer and AcceleratedScheduler which knows how to handle any distributed scenariofastaiTo utilize the notebook_launcher and accelerate at once it requires a few steps:
DataLoaders creation to inside the train functiondistrib_ctx context manager fastai providesfastaiHere it is in code, based on the distributed app examples
from fastai.vision.all import *
from fastai.distributed import *
path = untar_data(URLs.PETS)/'images'
def train():
dls = ImageDataLoaders.from_name_func(
path, get_image_files(path), valid_pct=0.2,
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
with learn.distrib_ctx(in_notebook=True, sync_bn=False):
learn.fine_tune(1)
notebook_launcher(train, num_processes=2)fastaiHere it is in code, based on the distributed app examples
from fastai.vision.all import *
from fastai.distributed import *
path = untar_data(URLs.PETS)/'images'
def train():
dls = ImageDataLoaders.from_name_func(
path, get_image_files(path), valid_pct=0.2,
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
with learn.distrib_ctx(in_notebook=True, sync_bn=False):
learn.fine_tune(1)
notebook_launcher(train, num_processes=2)fastaiThe key important parts to remember are:
notebook_launchernotebook_launcher to run the training function after everything is complete.