3

What If We Recaption Billions of Web Images with LLaMA-3?

Rejuvenating image-GPT as Strong Visual Representation Learners