Training Big Models

Zachary Mueller

Big Models and Big Data

What is “Big Data?”

Big. Data.

Terrabytes and Pentabytes of data.

Big Models and Big Data

Along with big data, we’re seeing more and more models follow an exponential growth when it comes to model size.

  • GPT-2: 1.5 billion params
  • GPT-3: 175 billion params

So, how do you train them?

It always comes down to compute

More and more GPUs (or TPUs) are the answer.

Training GPT-3 on A100 GPUs would take 1,024 GPUs and 34 days to accomplish, a pretty penny.

But how?

DeepSpeed

One common way to approach training large models is to utilize Microsoft’s DeepSpeed library. It provides tools for training such big models, noteably ZeRO

DeepSpeed - ZeRO

  • Z - Zero
  • R - Redundancy
  • O - Optimizer

DeepSpeed and Zero Continued

How can you use this?

  • Stepping away from fastai
  • Into the realm of PyTorch and accelerate
  • Check out the tutorial here