Detailed Course Outline
Introduction
- Meet the instructor.
 - Create an account at courses.nvidia.com/join
 
Stochastic Gradient Descent and the Effects of Batch Size
- Learn the significance of stochastic gradient descent when training on multiple GPUs
 - Understand the issues with sequential single-thread data processing and the theory behind speeding up applications with parallel processing.
 - Understand loss function, gradient descent, and stochastic gradient descent (SGD).
 - Understand the effect of batch size on accuracy and training time with an eye towards its use on multi-GPU systems.
 
Training on Multiple GPUs with PyTorch Distributed Data Parallel (DDP)
- Learn to convert single GPU training to multiple GPUs using PyTorch Distributed Data Parallel
 - Understand how DDP coordinates training among multiple GPUs.
 - Refactor single-GPU training programs to run on multiple GPUs with DDP.
 
Maintaining Model Accuracy when Scaling to Multiple GPUs
- Understand and apply key algorithmic considerations to retain accuracy when training on multiple GPUs
 - Understand what might cause accuracy to decrease when parallelizing training on multiple GPUs.
 - Learn and understand techniques for maintaining accuracy when scaling training to multiple GPUs.
 
Workshop Assessment
- Use what you have learned during the workshop: complete the workshop assessment to earn a certificate of competency
 
Final Review
- Review key learnings and wrap up questions.
 - Take the workshop survey.