SpiridonSunRotator commited on
Commit
e81d14b
·
verified ·
1 Parent(s): 1545c69

Added evaluation metrics

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -19,6 +19,30 @@ Only the weights of the linear operators within `language_model` transformers bl
19
 
20
  Model checkpoint is saved in [compressed_tensors](https://github.com/neuralmagic/compressed-tensors) format.
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Usage
23
 
24
  * To use the model in `transformers` update the package to stable release of Mistral-3
 
19
 
20
  Model checkpoint is saved in [compressed_tensors](https://github.com/neuralmagic/compressed-tensors) format.
21
 
22
+ ## Evaluation
23
+
24
+ This model was evaluated on the OpenLLM v1 benchmarks. Model outputs were generated with the `vLLM` engine.
25
+
26
+ | Model | ArcC | GSM8k | Hellaswag | MMLU | TruthfulQA-mc2 | Winogrande | Average | Recovery |
27
+ |----------------------------|:------:|:------:|:---------:|:------:|:--------------:|:----------:|:-------:|:--------:|
28
+ | Mistral-Small-3.1-24B-Instruct-2503 | 0.7125 | 0.8848 | 0.8576 | 0.8107 | 0.6409 | 0.8398 | 0.7910 | 1.0000 |
29
+ | Mistral-Small-3.1-24B-Instruct-2503-INT4 (this) | 0.7073 | 0.8711 | 0.8530 | 0.8062 | 0.6252 | 0.8256 | 0.7814 | 0.9878 |
30
+
31
+ ## Reproduction
32
+
33
+ The results were obtained using the following commands:
34
+
35
+ ```bash
36
+ MODEL=ISTA-DASLab/Mistral-Small-3.1-24B-Instruct-2503-GPTQ-4b-128g
37
+ MODEL_ARGS="pretrained=$MODEL,max_model_len=4096,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.80"
38
+
39
+ lm_eval \
40
+ --model vllm \
41
+ --model_args $MODEL_ARGS \
42
+ --tasks openllm \
43
+ --batch_size auto
44
+ ```
45
+
46
  ## Usage
47
 
48
  * To use the model in `transformers` update the package to stable release of Mistral-3