05: ViViT: A Video Vision Transformer(
ViViT
)
About
📝 100 AI Papers with Code
About this series
Transformer
Vision Transformer
🎓 Stanford CS336: LLM from Scratch
About this course
Lecture 01: Introduction & BPE
Lecture 02: PyTorch Basics & Resource Accounts
Lecture 03: Transformer LM Architecture
Lecture 04: MoE Architecture
Lecture 05&06: GPU Optimization, Triton & FlashAttention
Lecture 07&08: Parallelism
Lecture 09&11: Scaling Laws
Lecture 10: Inference & Deployment
Lecture 12: Evaluation
Lecture 13&14: Data Collection & Processing
Lecture 15: LLM Alignment SFT & RLHF(PPO, DPO)
Lecture 16 & 17: LLM Alignment SFT & RLVR(GRPO)
Assignment 01: BPE Tokenizer & Transformer LM
Assignment 02: Flash Attention & Parallelism
Assignment 05: SFT & GRPO
📖 Deep Learning Foundation & Concepts
About this book
On this page
1
Preliminary
2
ViViT
2.1
Experiment
3
Summary
4
Key Concepts
5
Q & A
6
Related resource & Further Reading
05: ViViT: A Video Vision Transformer(
ViViT
)
Computer Vision
Transformer
ViViT: A Video Vision Transformer提出将 Vision Transformer 系统性扩展到视频建模,通过
时空分解
与
高效注意力设计
直接对视频序列进行建模,在视频分类等任务上取得强性能与良好可扩展性。
1
Preliminary
2
ViViT
2.1
Experiment
3
Summary
4
Key Concepts
5
Q & A
6
Related resource & Further Reading
Back to top