|
--- |
|
license: apache-2.0 |
|
tags: |
|
- generated_from_trainer |
|
- smol_llama |
|
- llama2 |
|
metrics: |
|
- accuracy |
|
base_model: BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v12-minipile |
|
inference: |
|
parameters: |
|
max_new_tokens: 64 |
|
do_sample: true |
|
temperature: 0.8 |
|
repetition_penalty: 1.15 |
|
no_repeat_ngram_size: 4 |
|
eta_cutoff: 0.001 |
|
renormalize_logits: true |
|
widget: |
|
- text: My name is El Microondas the Wise and |
|
example_title: El Microondas |
|
- text: Kennesaw State University is a public |
|
example_title: Kennesaw State University |
|
- text: Bungie Studios is an American video game developer. They are most famous for |
|
developing the award winning Halo series of video games. They also made Destiny. |
|
The studio was founded |
|
example_title: Bungie |
|
- text: The Mona Lisa is a world-renowned painting created by |
|
example_title: Mona Lisa |
|
- text: The Harry Potter series, written by J.K. Rowling, begins with the book titled |
|
example_title: Harry Potter Series |
|
- text: 'Question: I have cities, but no houses. I have mountains, but no trees. I |
|
have water, but no fish. What am I? |
|
|
|
Answer:' |
|
example_title: Riddle |
|
- text: The process of photosynthesis involves the conversion of |
|
example_title: Photosynthesis |
|
- text: Jane went to the store to buy some groceries. She picked up apples, oranges, |
|
and a loaf of bread. When she got home, she realized she forgot |
|
example_title: Story Continuation |
|
- text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, |
|
and another train leaves Station B at 10:00 AM and travels at 80 mph, when will |
|
they meet if the distance between the stations is 300 miles? |
|
|
|
To determine' |
|
example_title: Math Problem |
|
- text: In the context of computer programming, an algorithm is |
|
example_title: Algorithm Definition |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 23.81 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 29.39 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 25.37 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 44.77 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 51.14 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 0.91 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI |
|
|
|
> note that training still WIP |
|
|
|
This model is a fine-tuned version of [BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v12-minipile](https://huggingface.co/BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v12-minipile) on the None dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 2.5937 |
|
- Accuracy: 0.4948 |
|
|
|
## Training and evaluation data |
|
|
|
KI dataset |
|
|
|
|
|
|
|
`hf-causal-experimental (pretrained=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8` |
|
|
|
| Task |Version| Metric | Value | |Stderr| |
|
|--------------|------:|--------|------:|---|-----:| |
|
|arc_easy | 0|acc | 0.4322|± |0.0102| |
|
| | |acc_norm| 0.3960|± |0.0100| |
|
|boolq | 1|acc | 0.6196|± |0.0085| |
|
|lambada_openai| 0|ppl |61.6595|± |2.4362| |
|
| | |acc | 0.2779|± |0.0062| |
|
|openbookqa | 0|acc | 0.1540|± |0.0162| |
|
| | |acc_norm| 0.2840|± |0.0202| |
|
|piqa | 0|acc | 0.6028|± |0.0114| |
|
| | |acc_norm| 0.6028|± |0.0114| |
|
|winogrande | 0|acc | 0.5193|± |0.0140| |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.00025 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 4 |
|
- seed: 2280 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08 |
|
- lr_scheduler_type: inverse_sqrt |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 1.0 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------:| |
|
| 2.5744 | 0.08 | 200 | 2.7138 | 0.4776 | |
|
| 2.5387 | 0.16 | 400 | 2.6713 | 0.4836 | |
|
| 2.4718 | 0.23 | 600 | 2.6462 | 0.4873 | |
|
| 2.4681 | 0.31 | 800 | 2.6328 | 0.4892 | |
|
| 2.5351 | 0.39 | 1000 | 2.6227 | 0.4908 | |
|
| 2.5316 | 0.47 | 1200 | 2.6159 | 0.4914 | |
|
| 2.527 | 0.54 | 1400 | 2.6103 | 0.4921 | |
|
| 2.4838 | 0.62 | 1600 | 2.6058 | 0.4930 | |
|
| 2.4483 | 0.7 | 1800 | 2.6024 | 0.4934 | |
|
| 2.426 | 0.78 | 2000 | 2.5998 | 0.4937 | |
|
| 2.4685 | 0.86 | 2200 | 2.5961 | 0.4944 | |
|
| 2.4473 | 0.93 | 2400 | 2.5937 | 0.4948 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.36.0.dev0 |
|
- Pytorch 2.1.0 |
|
- Datasets 2.15.0 |
|
- Tokenizers 0.15.0 |
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__NanoLlama-GQA-L10-A32_KV8-v13-KI) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |29.23| |
|
|AI2 Reasoning Challenge (25-Shot)|23.81| |
|
|HellaSwag (10-Shot) |29.39| |
|
|MMLU (5-Shot) |25.37| |
|
|TruthfulQA (0-shot) |44.77| |
|
|Winogrande (5-shot) |51.14| |
|
|GSM8k (5-shot) | 0.91| |
|
|
|
|