06: Learning Transferable Visual Models From Natural Language Supervision (CLIP)

Multi Modality
Representation Learning
一种通过对齐图像与自然语言文本的对比学习框架,在海量图文对上训练统一表示,从而获得强零样本泛化能力的视觉模型。
Author

Yuyang Zhang

1 Preliminary

2 CLIP

3 Summary

4 Key Concepts

5 Q & A

Back to top