Added evaluation metrics
Browse files
README.md
CHANGED
@@ -19,6 +19,30 @@ Only the weights of the linear operators within `language_model` transformers bl
|
|
19 |
|
20 |
Model checkpoint is saved in [compressed_tensors](https://github.com/neuralmagic/compressed-tensors) format.
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
## Usage
|
23 |
|
24 |
* To use the model in `transformers` update the package to stable release of Mistral-3
|
|
|
19 |
|
20 |
Model checkpoint is saved in [compressed_tensors](https://github.com/neuralmagic/compressed-tensors) format.
|
21 |
|
22 |
+
## Evaluation
|
23 |
+
|
24 |
+
This model was evaluated on the OpenLLM v1 benchmarks. Model outputs were generated with the `vLLM` engine.
|
25 |
+
|
26 |
+
| Model | ArcC | GSM8k | Hellaswag | MMLU | TruthfulQA-mc2 | Winogrande | Average | Recovery |
|
27 |
+
|----------------------------|:------:|:------:|:---------:|:------:|:--------------:|:----------:|:-------:|:--------:|
|
28 |
+
| Mistral-Small-3.1-24B-Instruct-2503 | 0.7125 | 0.8848 | 0.8576 | 0.8107 | 0.6409 | 0.8398 | 0.7910 | 1.0000 |
|
29 |
+
| Mistral-Small-3.1-24B-Instruct-2503-INT4 (this) | 0.7073 | 0.8711 | 0.8530 | 0.8062 | 0.6252 | 0.8256 | 0.7814 | 0.9878 |
|
30 |
+
|
31 |
+
## Reproduction
|
32 |
+
|
33 |
+
The results were obtained using the following commands:
|
34 |
+
|
35 |
+
```bash
|
36 |
+
MODEL=ISTA-DASLab/Mistral-Small-3.1-24B-Instruct-2503-GPTQ-4b-128g
|
37 |
+
MODEL_ARGS="pretrained=$MODEL,max_model_len=4096,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.80"
|
38 |
+
|
39 |
+
lm_eval \
|
40 |
+
--model vllm \
|
41 |
+
--model_args $MODEL_ARGS \
|
42 |
+
--tasks openllm \
|
43 |
+
--batch_size auto
|
44 |
+
```
|
45 |
+
|
46 |
## Usage
|
47 |
|
48 |
* To use the model in `transformers` update the package to stable release of Mistral-3
|