base_model: | |
- meta-llama/Meta-Llama-3.1-8B-Instruct | |
datasets: | |
- BAAI/Infinity-Instruct | |
license: apache-2.0 | |
library_name: transformers | |
pipeline_tag: text-generation | |
We prune the Llama-3.1-8B-Instruct to 1.4B and fine-tune it with LLM-Neo method,which combines LoRA and KD in one. Training data is sampling from BAAI/Infinity-Instruct for 1 Million lines. | |
For more information, please refer to the paper: [LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models](https://huggingface.co/papers/2411.06839) | |
Code can be found here: https://github.com/yang3121099/LLM-Neo | |
## Benchmarks | |
In this section, we report the results for Llama3.1-Neo-1B-100w on standard automatic benchmarks. For all the evaluations, we use [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library. | |
### Evaluation results | |
<table> | |
<tr> | |
<td><strong>Category</strong> | |
</td> | |
<td><strong>Benchmark</strong> | |
</td> | |
<td><strong>Version</strong> | |
</td> | |
<td><strong>n-shot</strong> | |
</td> | |
<td><strong>Metric</strong> | |
</td> | |
<td><strong>Value</strong> | |
</td> | |
<td><strong>Stderr</strong> | |
</td> | |
</tr> | |
<tr> | |
<td rowspan="2" >ARC | |
</td> | |
<td>ARC-Challenge</td> | |
<td>1</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.1920</td> | |
<td>± 0.0115</td> | |
</tr> | |
<tr> | |
<td>ARC-Easy</td> | |
<td>1</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.3834</td> | |
<td>± 0.0100</td> | |
</tr> | |
<tr> | |
<td rowspan="3" >CEVAL</td> | |
<td>CEVAL (valid)</td> | |
<td>N/A</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.2370</td> | |
<td>± 0.0117</td> | |
</tr> | |
<tr> | |
<td>CEVAL (Accountant)</td> | |
<td>1</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.2449</td> | |
<td>± 0.0621</td> | |
</tr> | |
<tr> | |
<td>CEVAL (Advanced Mathematics)</td> | |
<td>1</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.3158</td> | |
<td>± 0.1096</td> | |
</tr> | |
<tr> | |
<td rowspan="2" >MMLU</td> | |
<td>MMLU</td> | |
<td>N/A</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.2439</td> | |
<td>± 0.0036</td> | |
</tr> | |
<tr> | |
<td>MMLU (Abstract Algebra)</td> | |
<td>0</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.2500</td> | |
<td>± 0.0435</td> | |
</tr> | |
<tr> | |
<td rowspan="2" >PIQA</td> | |
<td>PIQA</td> | |
<td>1</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.5843</td> | |
<td>± 0.0115</td> | |
</tr> | |
<tr> | |
<td>PIQA (Normalized)</td> | |
<td>1</td> | |
<td>0</td> | |
<td>acc_norm</td> | |
<td>0.5822</td> | |
<td>± 0.0115</td> | |
</tr> | |
<tr> | |
<td>Winogrande</td> | |
<td>Winogrande</td> | |
<td>1</td> | |
<td>0</td> | |
<td>acc</td> | |
<td>0.5249</td> | |
<td>± 0.0140</td> | |
</tr> | |
</table> |