What is “Big Data?”
Big. Data.
Terrabytes and Pentabytes of data.
Along with big data, we’re seeing more and more models follow an exponential growth when it comes to model size.
So, how do you train them?
More and more GPUs (or TPUs) are the answer.
Training GPT-3 on A100 GPUs would take 1,024 GPUs and 34 days to accomplish, a pretty penny.
But how?
One common way to approach training large models is to utilize Microsoft’s DeepSpeed library. It provides tools for training such big models, noteably ZeRO
fastai
PyTorch
and accelerate