06: Learning Transferable Visual Models From Natural Language Supervision (
CLIP
)
About
📝 100 AI Papers with Code
About this series
Transformer
Vision Transformer
🎓 Stanford CS336: LLM from Scratch
About this course
Lecture 01: Introduction & BPE
Lecture 02: PyTorch Basics & Resource Accounts
Lecture 03: Transformer LM Architecture
Lecture 04: MoE Architecture
Lecture 05&06: GPU Optimization, Triton & FlashAttention
Lecture 07&08: Parallelism
Lecture 09&11: Scaling Laws
Lecture 10: Inference & Deployment
Lecture 12: Evaluation
Lecture 13&14: Data Collection & Processing
Lecture 15: LLM Alignment SFT & RLHF(PPO, DPO)
Lecture 16 & 17: LLM Alignment SFT & RLVR(GRPO)
Assignment 01: BPE Tokenizer & Transformer LM
Assignment 02: Flash Attention & Parallelism
Assignment 05: SFT & GRPO
📖 Deep Learning Foundation & Concepts
About this book
On this page
1
Preliminary
2
CLIP
3
Summary
4
Key Concepts
5
Q & A
6
Related resource & Further Reading
06: Learning Transferable Visual Models From Natural Language Supervision (
CLIP
)
Multi Modality
Representation Learning
一种通过对齐图像与自然语言文本的对比学习框架,在海量图文对上训练统一表示,从而获得强零样本泛化能力的视觉模型。
1
Preliminary
2
CLIP
3
Summary
4
Key Concepts
5
Q & A
6
Related resource & Further Reading
Back to top