Phi-4-mini
Run Phi-4-mini optimized for Qualcomm NPUs with nexaSDK.
Quickstart
Install nexaSDK and create a free account at sdk.nexa.ai
Activate your device with your access token:
nexa config set license '<access_token>'
Run the model on Qualcomm NPU in one line:
nexa infer NexaAI/phi4-mini-npu-turbo
Model Description
Phi-4-mini is a ~3.8B-parameter instruction-tuned model from Microsoft’s Phi-4 family. Trained on a blend of synthetic “textbook-style” data, filtered public web content, curated books/Q&A, and high-quality supervised chat data, it emphasizes reasoning-dense capabilities while maintaining a compact footprint. This NPU Turbo build uses Nexa’s Qualcomm backend (QNN/Hexagon) to deliver lower latency and higher throughput on-device, with support for 128K context and efficient long-context memory handling.
Features
- Lightweight yet capable: strong reasoning (math/logic) in a compact 3.8B model.
- Instruction-following: enhanced SFT + DPO alignment for reliable chat.
- Content generation: drafting, completion, summarization, code comments, and more.
- Conversational AI: context-aware assistants/agents with long-context support (128K).
- NPU-Turbo path: INT8/INT4 quantization, op fusion, and KV-cache residency for Snapdragon® NPUs via nexaSDK.
- Customizable: fine-tune/adapt for domain-specific or enterprise use.
Use Cases
- Personal & enterprise chatbots
- On-device/offline assistants (latency-bound scenarios)
- Document/report/email summarization
- Education, tutoring, and STEM reasoning tools
- Vertical applications (e.g., healthcare, finance, legal) with appropriate safeguards
Inputs and Outputs
Input:
- Text prompts or conversation history (chat-format, tokenized sequences).
Output:
- Generated text: responses, explanations, or creative content.
- Optionally: raw logits/probabilities for advanced downstream tasks.
License
- Licensed under: MIT License
References
📰 Phi-4-mini Microsoft Blog
📖 Phi-4-mini Technical Report
👩🍳 Phi Cookbook
🚀 Model paper
- Downloads last month
- 34