Vision Transformers

Source Paper: https://arxiv.org/pdf/2010.11929v2

You might have already heard of the Transformers architecture, with the popular GPT models that seems to be all the craze these days. Some one these models can understand images as well. This is made possible with the help of Vision Transformers. A popular example would be OpenAI’s CLIP model.

Paper: https://arxiv.org/pdf/2103.00020