zackli4ai commited on
Commit
3898881
·
verified ·
1 Parent(s): 86228fd

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +62 -0
  2. config.json +0 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phi-4-mini
2
+
3
+ Run **Phi-4-mini** optimized for **Qualcomm NPUs** with [nexaSDK](https://sdk.nexa.ai).
4
+
5
+ ## Quickstart
6
+
7
+ 1. **Install nexaSDK** and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
8
+ 2. **Activate your device** with your access token:
9
+
10
+ ```bash
11
+ nexa config set license '<access_token>'
12
+ ```
13
+ 3. Run the model on Qualcomm NPU in one line:
14
+
15
+ ```bash
16
+ nexa infer NexaAI/phi4-mini-npu-turbo
17
+ ```
18
+
19
+ ## Model Description
20
+
21
+ **Phi-4-mini** is a \~3.8B-parameter instruction-tuned model from Microsoft’s Phi-4 family.
22
+ Trained on a blend of synthetic “textbook-style” data, filtered public web content, curated books/Q\&A, and high-quality supervised chat data, it emphasizes **reasoning-dense** capabilities while maintaining a compact footprint. This NPU **Turbo** build uses Nexa’s Qualcomm backend (QNN/Hexagon) to deliver **lower latency** and **higher throughput** on-device, with support for **128K context** and efficient long-context memory handling.
23
+
24
+ ## Features
25
+
26
+ * **Lightweight yet capable**: strong reasoning (math/logic) in a compact 3.8B model.
27
+ * **Instruction-following**: enhanced SFT + DPO alignment for reliable chat.
28
+ * **Content generation**: drafting, completion, summarization, code comments, and more.
29
+ * **Conversational AI**: context-aware assistants/agents with long-context support (128K).
30
+ * **NPU-Turbo path**: INT8/INT4 quantization, op fusion, and KV-cache residency for Snapdragon® NPUs via nexaSDK.
31
+ * **Customizable**: fine-tune/adapt for domain-specific or enterprise use.
32
+
33
+ ## Use Cases
34
+
35
+ * Personal & enterprise chatbots
36
+ * On-device/offline assistants (latency-bound scenarios)
37
+ * Document/report/email summarization
38
+ * Education, tutoring, and STEM reasoning tools
39
+ * Vertical applications (e.g., healthcare, finance, legal) with appropriate safeguards
40
+
41
+ ## Inputs and Outputs
42
+
43
+ **Input**:
44
+
45
+ * Text prompts or conversation history (chat-format, tokenized sequences).
46
+
47
+ **Output**:
48
+
49
+ * Generated text: responses, explanations, or creative content.
50
+ * Optionally: raw logits/probabilities for advanced downstream tasks.
51
+
52
+ ## License
53
+
54
+ * Licensed under: [MIT License](https://huggingface.co/microsoft/phi-4/resolve/main/LICENSE)
55
+
56
+ ## References
57
+
58
+ * 📰 Phi-4-mini Microsoft Blog
59
+ * 📖 Phi-4-mini Technical Report
60
+ * 👩‍🍳 Phi Cookbook
61
+ * 🏡 Phi Portal
62
+ * 🖥️ Try it on Azure, Hugging Face
config.json ADDED
File without changes