Upload README_en.md
Browse files- README_en.md +67 -0
README_en.md
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# microsoft/phi-4 Quantized Models
|
2 |
+
## Overview
|
3 |
+
This model applies GPTQ quantization to [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) as the base model. It optimizes performance in Japanese environments by using Japanese text as calibration data.
|
4 |
+
- **Model Variants**:
|
5 |
+
- [nejumi/phi-4-GPTQ-Int4-calib-ja-1k](https://huggingface.co/nejumi/phi-4-GPTQ-Int4-calib-ja-1k)
|
6 |
+
- [nejumi/phi-4-GPTQ-Int8-calib-ja-1k](https://huggingface.co/nejumi/phi-4-GPTQ-Int8-calib-ja-1k)
|
7 |
+
- **Base Model**: [microsoft/phi-4](https://huggingface.co/microsoft/phi-4)
|
8 |
+
- **Model Size**: 14,659,507,200 parameters
|
9 |
+
- **Category**: 10B≤ <30B
|
10 |
+
---
|
11 |
+
### Quantization Parameters 🐝[Link to W&B](https://wandb.ai/wandb-japan/GPTQ_experiments2/runs/r5axhf09)
|
12 |
+
- bits: 4 or 8
|
13 |
+
- group_size: 128
|
14 |
+
- perc_damp: 0.01
|
15 |
+
- desc_act: True
|
16 |
+
- use_exllama: False
|
17 |
+
- model_seqlen: 2048
|
18 |
+
---
|
19 |
+
## Performance Evaluation
|
20 |
+
Evaluation results from [Nejumi LLM Leaderboard 3 (W&B)](https://wandb.ai/wandb-japan/llm-leaderboard3/reports/Nejumi-LLM-3---Vmlldzo4NTI1NTUx)
|
21 |
+

|
22 |
+
Blue: Original
|
23 |
+
Orange: 8bit
|
24 |
+
Green: 4bit
|
25 |
+
### Benchmark Overall Results
|
26 |
+
| Model | GLP Average | ALT Average | Overall Average |
|
27 |
+
|--------|---------|---------|----------|
|
28 |
+
| phi-4 Int4 | 0.5815 | 0.6953 | 0.6384 |
|
29 |
+
| phi-4 Int8 | 0.5948 | 0.7015 | 0.6482 |
|
30 |
+
| phi-4 Original | 0.5950 | 0.7005 | 0.6477 |
|
31 |
+
### General Language Performance (GLP) Details
|
32 |
+
| Subcategory | Int4 | Int8 | Original |
|
33 |
+
|-------------|------|------|------|
|
34 |
+
| Expression | 0.8567 | 0.8717 | 0.8583 |
|
35 |
+
| Translation | 0.8458 | 0.8480 | 0.8457 |
|
36 |
+
| Information Retrieval | 0.8780 | 0.8806 | 0.8809 |
|
37 |
+
| Reasoning | 0.6400 | 0.5850 | 0.6550 |
|
38 |
+
| Mathematical Reasoning | 0.5400 | 0.5967 | 0.5817 |
|
39 |
+
| Extraction | 0.3304 | 0.3408 | 0.3470 |
|
40 |
+
| Knowledge & QA | 0.5587 | 0.5735 | 0.5685 |
|
41 |
+
| English | 0.3035 | 0.2351 | 0.2158 |
|
42 |
+
| Semantic Analysis | 0.4220 | 0.5200 | 0.5070 |
|
43 |
+
| Syntax Analysis | 0.4399 | 0.4967 | 0.4903 |
|
44 |
+
### Alignment (ALT) Details
|
45 |
+
| Subcategory | Int4 | Int8 | Original |
|
46 |
+
|-------------|------|------|------|
|
47 |
+
| Controllability | 0.6908 | 0.6949 | 0.6938 |
|
48 |
+
| Ethics & Morality | 0.8800 | 0.9100 | 0.9000 |
|
49 |
+
| Toxicity | 0.8143 | 0.8121 | 0.8007 |
|
50 |
+
| Bias | 0.8858 | 0.8730 | 0.8650 |
|
51 |
+
| Robustness | 0.3717 | 0.4208 | 0.4226 |
|
52 |
+
| Truthfulness | 0.5292 | 0.4983 | 0.5206 |
|
53 |
+
### Benchmark Scores
|
54 |
+
| Benchmark | Int4 | Int8 | Original |
|
55 |
+
|-------------|------|------|------|
|
56 |
+
| JASTER (0-shot) | 0.3880 | 0.4262 | 0.4186 |
|
57 |
+
| JASTER (2-shot) | 0.6136 | 0.6441 | 0.6398 |
|
58 |
+
| MT-Bench | 8.2438 | 8.2000 | 8.1313 |
|
59 |
+
| LCTG | 0.6860 | 0.6670 | 0.6750 |
|
60 |
+
---
|
61 |
+
## Model Characteristics & Evaluation
|
62 |
+
- **High Stability**: Standard GPTQ quantization achieves sufficient performance for 14B class models
|
63 |
+
- **Basic Tasks**: Maintains high performance of 0.84+ in expression, translation, and information retrieval; MT-Bench scores largely maintain the original model's very high level for this model size
|
64 |
+
- **Alignment**: Particularly high scores in ethics, morality, and bias metrics
|
65 |
+
---
|
66 |
+
## License
|
67 |
+
This model follows the license of its base model [microsoft/phi-4](https://huggingface.co/microsoft/phi-4). Please refer to the base model's license for details.
|