Lecture 07 & 08: Parallelism Training
About
📝 100 AI Papers with Code
About this series
Transformer
Vision Transformer
🎓 Stanford CS336: LLM from Scratch
About this course
Lecture 01: Introduction & BPE
Lecture 02: PyTorch Basics & Resource Accounts
Lecture 03: Transformer LM Architecture
Lecture 04: MoE Architecture
Lecture 05&06: GPU Optimization, Triton & FlashAttention
Lecture 07&08: Parallelism
Lecture 09&11: Scaling Laws
Lecture 10: Inference & Deployment
Lecture 12: Evaluation
Lecture 13&14: Data Collection & Processing
Lecture 15: LLM Alignment SFT & RLHF(PPO, DPO)
Lecture 16 & 17: LLM Alignment SFT & RLVR(GRPO)
Assignment 01: BPE Tokenizer & Transformer LM
Assignment 02: Flash Attention & Parallelism
Assignment 05: SFT & GRPO
📖 Deep Learning Foundation & Concepts
About this book
Lecture 07 & 08: Parallelism Training
Lecture 07与08介绍了深度学习中的并行训练方法,包括Data Parallelism, Model Parallelism,ZeRO,Pipleline Parallelism等技术。重点讲解了各类并行方法的原理、实现及其在大规模模型训练中的应用。个人认为这两节内容非常重要,理解这些并行训练技术对于处理大规模深度学习模型至关重要。
Back to top
Lecture 05&06: GPU Optimization, Triton & FlashAttention
Lecture 09&11: Scaling Laws