fbpx

New models added to the Phi-3 family, available on Microsoft Azure

Read more announcements from Azure at Microsoft Build 2024: New ways Azure helps you build transformational AI experiences and The new era of compute powering Azure AI solutions.


At Microsoft Build 2024, we are excited to add new models to the Phi-3 family of small, open models developed by Microsoft. We are introducing Phi-3-vision, a multimodal model that brings together language and vision capabilities. You can try Phi-3-vision today.

Phi-3-small and Phi-3-medium, announced earlier, are now available on Microsoft Azure, empowering developers with models for generative AI applications that require strong reasoning, limited compute, and latency bound scenarios. Lastly, previously available Phi-3-mini, as well as Phi-3-medium, are now also available through Azure AI’s models as a service offering, allowing users to get started quickly and easily.

</figure>

The Phi-3 family

Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks. They are trained using high quality training data, as explained in Tiny but mighty: The Phi-3 small language models with big potential. The availability of Phi-3 models expands the selection of high-quality models for Azure customers, offering more practical choices as they compose and build generative AI applications.

Phi-3-vision

Bringing together language and vision capabilities

There are four models in the Phi-3 model family; each model is instruction-tuned and developed in accordance with Microsoft’s responsible AI, safety, and security standards to ensure it’s ready to use off-the-shelf.

  • Phi-3-vision is a 4.2B parameter multimodal model with language and vision capabilities.
  • Phi-3-mini is a 3.8B parameter language model, available in two context lengths (128K and 4K).
  • Phi-3-small is a 7B parameter language model, available in two context lengths (128K and 8K).
  • Phi-3-medium is a 14B parameter language model, available in two context lengths (128K and 4K).

Find all Phi-3 models on Azure AI and Hugging Face.

Phi-3 models have been optimized to run across a variety of hardware. Optimized variants are available with ONNX Runtime and DirectML providing developers with support across a wide range of devices and platforms including mobile and web deployments. Phi-3 models are also available as NVIDIA NIM inference microservices with a standard API interface that can be deployed anywhere and have been optimized for inference on NVIDIA GPUs and Intel accelerators.

It’s inspiring to see how developers are using Phi-3 to do incredible things—from ITC, an Indian conglomerate, which has built a copilot for Indian farmers to ask questions about their crops in their own vernacular, to the Khan Academy, who is currently leveraging Azure OpenAI Service to power their Khanmigo for teachers pilot and experimenting with Phi-3 to improve math tutoring in an affordable, scalable, and adaptable manner. Healthcare software company Epic is looking to also use Phi-3 to summarize complex patient histories more efficiently. Seth Hain, senior vice president of R&D at Epic explains, “AI is embedded directly into Epic workflows to help solve important issues like clinician burnout, staffing shortages, and organizational financial challenges. Small language models, like Phi-3, have robust yet efficient reasoning capabilities that enable us to offer high-quality generative AI at a lower cost across our applications that help with challenges like summarizing complex patient histories and responding faster to patients.”

Digital Green, used by more than 6 million farmers, is introducing video to their AI assistant, Farmer.Chat, adding to their multimodal conversational interface. “We’re excited about leveraging Phi-3 to increase the efficiency of Farmer.Chat and to enable rural communities to leverage the power of AI to uplift themselves,” said Rikin Gandhi, CEO, Digital Green.

Bringing multimodality to Phi-3

Phi-3-vision is the first multimodal model in the Phi-3 family, bringing together text and images, and the ability to reason over real-world images and extract and reason over text from images. It has also been optimized for chart and diagram understanding and can be used to generate insights and answer questions. Phi-3-vision builds on the language capabilities of the Phi-3-mini, continuing to pack strong language and image reasoning quality in a small model.

Phi-3-vision can generate insights from charts and diagrams: