Microsoft Phi-3.5 launched: A more competitive AI model

Microsoft has just officially announced the release of a new series of small language models called Phi-3.5, including three variants Phi-3.5-vision, Phi-3.5-MoE and Phi-3.5-mini.

These lightweight AI language models are built on aggregated data and filtered public web pages, supporting 128K token context lengths. All new Phi-3.5 models are now available on Hugging Face under the MIT license.

Microsoft Phi-3.5 launched: A more competitive AI model Picture 1Microsoft Phi-3.5 launched: A more competitive AI model Picture 1

Phi-3.5-MoE: A groundbreaking combination

The Phi-3.5-MoE stands out as the first model in Microsoft's Phi family that can take advantage of Mixture of Experts (MoE) technology. This 16 x 3.8 billion parameter MoE model enabled only 6.6 billion parameters and was trained on 4.9T tokens using 512 H100 GPU systems. In today's popular AI standards, Phi-3.5-MoE outperforms Llama-3.1 8B, Gemma-2-9B and Gemini-1.5-Flash, and is close to the current leading model GPT-4o-mini .

Phi-3.5-mini: Compact and Powerful

The Phi-3.5-mini is a 3.8 billion parameter model, surpassing the Llama3.1 8B or Mistral 7B, and even competing with the Mistral NeMo 12B. It is trained on 3.4T tokens using 512 H100 GPUs. With only 3.8B active parameters, this model is competitive on multilingual tasks compared to LLMs with more active parameters. Additionally, Phi-3.5-mini now supports 128K context length, while main competitor Gemma-2 only supports 8K.

Phi-3.5-vision: Enhanced multi-frame image processing capabilities

Phi-3.5-vision is a 4.2 billion parameter model trained on 500B tokens using 256 A100 GPUs. This model now supports multi-frame image understanding and inference. Phi-3.5-vision improved performance on MMMU (from 40.2 to 43.0), MMBench (from 80.5 to 81.9) and TextVQA document processing benchmark (from 70.9 to 72.0).

Microsoft plans to share more details about the Phi-3.5 model line this month, primarily showcasing advances in AI model performance and capabilities. With a focus on lightweight design and multimodal understanding, the Phi-3.5 family of models can be applied more widely across a variety of AI applications.

5 ★ | 2 Vote