File size: 5,583 Bytes
0f1e5b7
 
 
2c50579
84e3c76
29e5b35
5510c25
207d92c
9aae2b1
207d92c
 
 
 
 
 
 
 
6ed71b1
 
207d92c
 
52641d8
207d92c
52641d8
6ed71b1
207d92c
 
9288eb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207d92c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84e3c76
 
 
207d92c
 
 
 
 
 
 
 
 
 
 
 
 
 
84e3c76
207d92c
84e3c76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207d92c
 
 
 
7f7c58f
84e3c76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207d92c
 
 
 
 
 
a86f455
 
207d92c
 
 
84e3c76
207d92c
7ca40da
207d92c
a8bd480
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
license: cc-by-nc-4.0
---
I FUCKED UP, THIS MODEL IS MEANT TO BE A BFLOAT16 MODEL, I'M CURRENTLY REDOING IT IN THE CORRECT WAY (look at the recipe, it end in float16, i'm so dumb lmao). It SHOULD be even better, I saw the problem after finetuning it, something was off. It's usable, it rank the best, but it's not even on the right float...KEK

Fixed model should be here: [NeverSleep/Mistral-11B-OmniMix-bf16](https://huggingface.co/NeverSleep/Mistral-11B-OmniMix-bf16)

Don't mind this one at the moment, I need to finetune it for RP, it's just a test.

## Description

This repo contains fp16 files of Mistral-11B-OmniMix.

My goal for this model was only to make it score the highest possible with merge and layer toying, proving that:
- Benchmark are objective
- You should try a model yourself and don't go blindly to the highest rated one
- Merge/Layer toying CAN be usable to do better model (maybe?)


## Model used
- [Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
- [Mistral-7B-v0.1-Open-Platypus](https://huggingface.co/akjindal53244/Mistral-7B-v0.1-Open-Platypus)
- [CollectiveCognition-v1.1-Mistral-7B](https://huggingface.co/teknium/CollectiveCognition-v1.1-Mistral-7B)
- [zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)



## Prompt template

The best one after further testing is this one:

```
<|system|>
Below is an instruction that describes a task. Write a response that appropriately completes the request.
<|user|>
{prompt}
<|assistant|>
```


![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ab1241ad514ca8d1430003/tWIx8yeoallv94zrhN6L-.png)

But these one work too:

```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

```

```
USER: <prompt>
ASSISTANT:
```

Or use any prompting system from one of the 4 source model, should work.

## The secret sauce

Mistral-11B-OpenOrcaPlatypus :
```
slices:
  - sources:
    - model: Open-Orca/Mistral-7B-OpenOrca
      layer_range: [0, 24]
  - sources:
    - model: akjindal53244/Mistral-7B-v0.1-Open-Platypus
      layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16
```

Mistral-11B-CC-Zephyr :
```
slices:
  - sources:
    - model: "/content/drive/MyDrive/CC-v1.1-7B-bf16"
      layer_range: [0, 24]
  - sources:
    - model: "/content/drive/MyDrive/Zephyr-7B"
      layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16
```

Mistral-11B-OmniMix :
```
slices:
  - sources:
      - model: Mistral-11B-OpenOrcaPlatypus
        layer_range: [0, 48]
      - model: Mistral-11B-CC-Zephyr
        layer_range: [0, 48]
merge_method: slerp
base_model: Undi95/Mistral-11B-OpenOrcaPlatypus
parameters:
  t:
    - filter: lm_head 
      value: [0.75]
    - filter: embed_tokens
      value: [0.75]
    - filter: self_attn
      value: [0.75, 0.25]
    - filter: mlp
      value:  [0.25, 0.75]
    - filter: layernorm
      value: [0.5, 0.5]
    - filter: modelnorm
      value: [0.75]
    - value: 0.5 # fallback for rest of tensors
dtype: float16
```
I use [mergekit](https://github.com/cg123/mergekit) for all the manipulation told here.

## Some scoring I done myself

This was named "Mistral-11B-TestBench11", keep that in mind while looking trough this.

hf-causal-experimental (pretrained=/content/drive/MyDrive/Mistral-11B-Test), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4
|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.5597|±  |0.0145|
|             |       |acc_norm|0.5819|±  |0.0144|
|arc_easy     |      0|acc     |0.8308|±  |0.0077|
|             |       |acc_norm|0.8215|±  |0.0079|
|hellaswag    |      0|acc     |0.6371|±  |0.0048|
|             |       |acc_norm|0.8213|±  |0.0038|
|piqa         |      0|acc     |0.8134|±  |0.0091|
|             |       |acc_norm|0.8275|±  |0.0088|
|truthfulqa_mc|      1|mc1     |0.3990|±  |0.0171|
|             |       |mc2     |0.5685|±  |0.0155|
|winogrande   |      0|acc     |0.7474|±  |0.0122|


![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ab1241ad514ca8d1430003/LggyIlV-oY7NbLwi7mnix.png)

This model seem to be the best out of my 3 latest try:

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ab1241ad514ca8d1430003/hnqNyljs5Y8JppuA_io8w.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ab1241ad514ca8d1430003/b-a-sB2qRHApPX52S2nD7.png)

You can find all the work I have done trying on this [Pastebin](https://pastebin.com/nHLCxQJv).

## Others

Special thanks to Sushi, [Henky](https://github.com/KoboldAI/KoboldAI-Client) for the machine he give me for big task, and [Charles Goddard](https://github.com/cg123) for his amazing tool.

If you want to support me, you can [here](https://ko-fi.com/undiai).

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Undi95__Mistral-11B-TestBench11)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 53.01   |
| ARC (25-shot)         | 64.42          |
| HellaSwag (10-shot)   | 83.93    |
| MMLU (5-shot)         | 63.82         |
| TruthfulQA (0-shot)   | 56.68   |
| Winogrande (5-shot)   | 77.74   |
| GSM8K (5-shot)        | 14.94        |
| DROP (3-shot)         | 9.57         |