16: Scalable Diffusion Models with Transformers (
DiT
)
About
📝 100 AI Papers with Code
About this series
Transformer
Vision Transformer
🎓 Stanford CS336: LLM from Scratch
About this course
Lecture 01: Introduction & BPE
Lecture 02: PyTorch Basics & Resource Accounts
Lecture 03: Transformer LM Architecture
Lecture 04: MoE Architecture
Lecture 05&06: GPU Optimization, Triton & FlashAttention
Lecture 07&08: Parallelism
Lecture 09&11: Scaling Laws
Lecture 10: Inference & Deployment
Lecture 12: Evaluation
Lecture 13&14: Data Collection & Processing
Lecture 15: LLM Alignment SFT & RLHF(PPO, DPO)
Lecture 16 & 17: LLM Alignment SFT & RLVR(GRPO)
Assignment 01: BPE Tokenizer & Transformer LM
Assignment 02: Flash Attention & Parallelism
Assignment 05: SFT & GRPO
📖 Deep Learning Foundation & Concepts
About this book
On this page
1
DiT
2
Preliminary
3
DiT
3.1
Experiment
4
Summary
5
Key Concepts
6
Q & A
7
Related resource & Further Reading
8
Summary
9
Key Concepts
10
Q & A
11
Related resource & Further Reading
16: Scalable Diffusion Models with Transformers (
DiT
)
Generative Model
Diffusion Model
一种将 Transformer 架构引入扩散模型的生成方法,通过序列化建模与规模化训练,在大模型与大数据设置下实现更强的生成质量与可扩展性。
# Preliminary
1
DiT
2
Preliminary
3
DiT
3.1
Experiment
4
Summary
5
Key Concepts
6
Q & A
7
Related resource & Further Reading
8
Summary
9
Key Concepts
10
Q & A
11
Related resource & Further Reading
Back to top