Update README.md
Browse files
README.md
CHANGED
|
@@ -36,7 +36,6 @@ cd Quark/examples/torch/language_modeling/llm_ptq/
|
|
| 36 |
exclude_layers="*self_attn* *mlp.gate.* *lm_head"
|
| 37 |
python3 quantize_quark.py --model_dir $MODEL_DIR \
|
| 38 |
--quant_scheme w_mxfp4_a_mxfp4 \
|
| 39 |
-
--group_size 32 \
|
| 40 |
--num_calib_data 128 \
|
| 41 |
--exclude_layers $exclude_layers \
|
| 42 |
--multi_gpu \
|
|
@@ -52,7 +51,8 @@ This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai
|
|
| 52 |
|
| 53 |
## Evaluation
|
| 54 |
|
| 55 |
-
The model was evaluated on AIME24, GPQA Diamond, and MATH-500
|
|
|
|
| 56 |
|
| 57 |
### Accuracy
|
| 58 |
|
|
@@ -98,7 +98,7 @@ The model was evaluated on AIME24, GPQA Diamond, and MATH-500 benchmarks using t
|
|
| 98 |
</td>
|
| 99 |
</tr>
|
| 100 |
<tr>
|
| 101 |
-
<td>
|
| 102 |
</td>
|
| 103 |
<td>95.30
|
| 104 |
</td>
|
|
@@ -125,7 +125,7 @@ lighteval vllm $MODEL_ARGS "custom|aime24_single|0|0,custom|math_500_single|0|0,
|
|
| 125 |
2>&1 | tee -a "$LOG"
|
| 126 |
```
|
| 127 |
|
| 128 |
-
The result of gsm8k was obtained using lm-eval-harness.
|
| 129 |
|
| 130 |
```
|
| 131 |
MODEL_ARGS="model=amd/DeepSeek-R1-0528-MXFP4-ASQ,base_url=http://localhost:8000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=38768,temperature=0.6,top_p=0.95,add_bos_token=True,seed=$SEED"
|
|
|
|
| 36 |
exclude_layers="*self_attn* *mlp.gate.* *lm_head"
|
| 37 |
python3 quantize_quark.py --model_dir $MODEL_DIR \
|
| 38 |
--quant_scheme w_mxfp4_a_mxfp4 \
|
|
|
|
| 39 |
--num_calib_data 128 \
|
| 40 |
--exclude_layers $exclude_layers \
|
| 41 |
--multi_gpu \
|
|
|
|
| 51 |
|
| 52 |
## Evaluation
|
| 53 |
|
| 54 |
+
The model was evaluated on AIME24, GPQA Diamond, MATH-500, and GSM8K benchmarks. The tasks of AIME24, GPQA Diamond, and MATH-500 were conducted using the [lighteval](https://github.com/huggingface/lighteval/tree/v0.10.0) framework, with each tasks running 10 times using different random seeds for reliable performance estimation.
|
| 55 |
+
The task of GSM8K was conducted
|
| 56 |
|
| 57 |
### Accuracy
|
| 58 |
|
|
|
|
| 98 |
</td>
|
| 99 |
</tr>
|
| 100 |
<tr>
|
| 101 |
+
<td>GSM8K
|
| 102 |
</td>
|
| 103 |
<td>95.30
|
| 104 |
</td>
|
|
|
|
| 125 |
2>&1 | tee -a "$LOG"
|
| 126 |
```
|
| 127 |
|
| 128 |
+
The result of gsm8k was obtained using [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
| 129 |
|
| 130 |
```
|
| 131 |
MODEL_ARGS="model=amd/DeepSeek-R1-0528-MXFP4-ASQ,base_url=http://localhost:8000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=38768,temperature=0.6,top_p=0.95,add_bos_token=True,seed=$SEED"
|