Meta has introduced Llama 3.2, its first open-source AI model that can process both images and text. This upgrade enables developers to create advanced applications, such as augmented reality apps, visual search engines, and document analysis tools.
While Meta is catching up to competitors like OpenAI and Google, which released multimodal models last year, Llama 3.2 offers two vision models (with 11 and 90 billion parameters) and two smaller text-only models (1 and 3 billion parameters). These smaller models are optimized for mobile hardware like Qualcomm and MediaTek chips. The older Llama 3.1 model, released in July, remains relevant with a larger 405 billion parameter version focused on text generation.