nejumi commited on
Commit
ad84935
·
verified ·
1 Parent(s): 0ceb24b

Upload README_en.md

Browse files
Files changed (1) hide show
  1. README_en.md +67 -0
README_en.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # microsoft/phi-4 Quantized Models
2
+ ## Overview
3
+ This model applies GPTQ quantization to [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) as the base model. It optimizes performance in Japanese environments by using Japanese text as calibration data.
4
+ - **Model Variants**:
5
+ - [nejumi/phi-4-GPTQ-Int4-calib-ja-1k](https://huggingface.co/nejumi/phi-4-GPTQ-Int4-calib-ja-1k)
6
+ - [nejumi/phi-4-GPTQ-Int8-calib-ja-1k](https://huggingface.co/nejumi/phi-4-GPTQ-Int8-calib-ja-1k)
7
+ - **Base Model**: [microsoft/phi-4](https://huggingface.co/microsoft/phi-4)
8
+ - **Model Size**: 14,659,507,200 parameters
9
+ - **Category**: 10B≤ <30B
10
+ ---
11
+ ### Quantization Parameters 🐝[Link to W&B](https://wandb.ai/wandb-japan/GPTQ_experiments2/runs/r5axhf09)
12
+ - bits: 4 or 8
13
+ - group_size: 128
14
+ - perc_damp: 0.01
15
+ - desc_act: True
16
+ - use_exllama: False
17
+ - model_seqlen: 2048
18
+ ---
19
+ ## Performance Evaluation
20
+ Evaluation results from [Nejumi LLM Leaderboard 3 (W&B)](https://wandb.ai/wandb-japan/llm-leaderboard3/reports/Nejumi-LLM-3---Vmlldzo4NTI1NTUx)
21
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bcb332b7375f6b8456d937/BLMB8XfItDJArvkuROCay.png)
22
+ Blue: Original
23
+ Orange: 8bit
24
+ Green: 4bit
25
+ ### Benchmark Overall Results
26
+ | Model | GLP Average | ALT Average | Overall Average |
27
+ |--------|---------|---------|----------|
28
+ | phi-4 Int4 | 0.5815 | 0.6953 | 0.6384 |
29
+ | phi-4 Int8 | 0.5948 | 0.7015 | 0.6482 |
30
+ | phi-4 Original | 0.5950 | 0.7005 | 0.6477 |
31
+ ### General Language Performance (GLP) Details
32
+ | Subcategory | Int4 | Int8 | Original |
33
+ |-------------|------|------|------|
34
+ | Expression | 0.8567 | 0.8717 | 0.8583 |
35
+ | Translation | 0.8458 | 0.8480 | 0.8457 |
36
+ | Information Retrieval | 0.8780 | 0.8806 | 0.8809 |
37
+ | Reasoning | 0.6400 | 0.5850 | 0.6550 |
38
+ | Mathematical Reasoning | 0.5400 | 0.5967 | 0.5817 |
39
+ | Extraction | 0.3304 | 0.3408 | 0.3470 |
40
+ | Knowledge & QA | 0.5587 | 0.5735 | 0.5685 |
41
+ | English | 0.3035 | 0.2351 | 0.2158 |
42
+ | Semantic Analysis | 0.4220 | 0.5200 | 0.5070 |
43
+ | Syntax Analysis | 0.4399 | 0.4967 | 0.4903 |
44
+ ### Alignment (ALT) Details
45
+ | Subcategory | Int4 | Int8 | Original |
46
+ |-------------|------|------|------|
47
+ | Controllability | 0.6908 | 0.6949 | 0.6938 |
48
+ | Ethics & Morality | 0.8800 | 0.9100 | 0.9000 |
49
+ | Toxicity | 0.8143 | 0.8121 | 0.8007 |
50
+ | Bias | 0.8858 | 0.8730 | 0.8650 |
51
+ | Robustness | 0.3717 | 0.4208 | 0.4226 |
52
+ | Truthfulness | 0.5292 | 0.4983 | 0.5206 |
53
+ ### Benchmark Scores
54
+ | Benchmark | Int4 | Int8 | Original |
55
+ |-------------|------|------|------|
56
+ | JASTER (0-shot) | 0.3880 | 0.4262 | 0.4186 |
57
+ | JASTER (2-shot) | 0.6136 | 0.6441 | 0.6398 |
58
+ | MT-Bench | 8.2438 | 8.2000 | 8.1313 |
59
+ | LCTG | 0.6860 | 0.6670 | 0.6750 |
60
+ ---
61
+ ## Model Characteristics & Evaluation
62
+ - **High Stability**: Standard GPTQ quantization achieves sufficient performance for 14B class models
63
+ - **Basic Tasks**: Maintains high performance of 0.84+ in expression, translation, and information retrieval; MT-Bench scores largely maintain the original model's very high level for this model size
64
+ - **Alignment**: Particularly high scores in ethics, morality, and bias metrics
65
+ ---
66
+ ## License
67
+ This model follows the license of its base model [microsoft/phi-4](https://huggingface.co/microsoft/phi-4). Please refer to the base model's license for details.