aashish1904 commited on
Commit
bd74a07
1 Parent(s): cb74418

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +168 -0
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ base_model: nvidia/Llama-3.1-Minitron-4B-Width-Base
9
+ tags:
10
+ - chat
11
+
12
+ ---
13
+
14
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
15
+
16
+ # QuantFactory/magnum-v2-4b-GGUF
17
+ This is quantized version of [anthracite-org/magnum-v2-4b](https://huggingface.co/anthracite-org/magnum-v2-4b) created using llama.cpp
18
+
19
+ # Original Model Card
20
+
21
+
22
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a46cbfb9c2bdfae75b3a6/9JwXZze4tHRGpc_RzE2AU.png)
23
+ This is the eighth in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml).
24
+
25
+ ## Prompting
26
+ Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:
27
+
28
+ ```py
29
+ """<|im_start|>system
30
+ system prompt<|im_end|>
31
+ <|im_start|>user
32
+ Hi there!<|im_end|>
33
+ <|im_start|>assistant
34
+ Nice to meet you!<|im_end|>
35
+ <|im_start|>user
36
+ Can I ask a question?<|im_end|>
37
+ <|im_start|>assistant
38
+ """
39
+ ```
40
+
41
+ ## Support
42
+
43
+ To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
44
+
45
+ However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
46
+
47
+ To create a working GGUF file, make the following adjustments:
48
+
49
+ 1. Remove the `"rope_scaling": {}` entry from `config.json`
50
+ 2. Change `"max_position_embeddings"` to `8192` in `config.json`
51
+
52
+ These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.
53
+
54
+ ## axolotl config
55
+
56
+ <details><summary>See axolotl config</summary>
57
+
58
+ axolotl version: `0.4.1`
59
+ ```yaml
60
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
61
+ model_type: AutoModelForCausalLM
62
+ tokenizer_type: AutoTokenizer
63
+
64
+ load_in_8bit: false
65
+ load_in_4bit: false
66
+ strict: false
67
+
68
+ datasets:
69
+ - path: anthracite-org/Gryphe-3.5-16k-Subset
70
+ type: sharegpt
71
+ conversation: chatml
72
+ - path: Epiculous/Synthstruct-Gens-v1-Filtered-n-Cleaned
73
+ type: sharegpt
74
+ conversation: chatml
75
+ - path: anthracite-org/Stheno-Data-Filtered
76
+ type: sharegpt
77
+ conversation: chatml
78
+ - path: Epiculous/SynthRP-Gens-v1-Filtered-n-Cleaned
79
+ type: sharegpt
80
+ conversation: chatml
81
+ - path: lodrick-the-lafted/NopmWritingStruct
82
+ type: sharegpt
83
+ conversation: chatml
84
+ - path: anthracite-org/kalo-opus-instruct-22k-no-refusal
85
+ type: sharegpt
86
+ conversation: chatml
87
+
88
+ chat_template: chatml
89
+
90
+ val_set_size: 0.01
91
+ output_dir: ./outputs/out
92
+
93
+ adapter:
94
+ lora_r:
95
+ lora_alpha:
96
+ lora_dropout:
97
+ lora_target_linear:
98
+
99
+ sequence_len: 16384
100
+ # sequence_len: 32768
101
+ sample_packing: true
102
+ eval_sample_packing: false
103
+ pad_to_sequence_len: true
104
+
105
+ wandb_project:
106
+ wandb_entity:
107
+ wandb_watch:
108
+ wandb_name:
109
+ wandb_log_model:
110
+
111
+ gradient_accumulation_steps: 32
112
+ micro_batch_size: 1
113
+ num_epochs: 2
114
+ optimizer: adamw_bnb_8bit
115
+ lr_scheduler: cosine
116
+ learning_rate: 0.00002
117
+ weight_decay: 0.05
118
+
119
+ train_on_inputs: false
120
+ group_by_length: false
121
+ bf16: auto
122
+ fp16:
123
+ tf32: true
124
+
125
+ gradient_checkpointing: true
126
+ early_stopping_patience:
127
+ resume_from_checkpoint:
128
+ local_rank:
129
+ logging_steps: 1
130
+ xformers_attention:
131
+ flash_attention: true
132
+
133
+ warmup_ratio: 0.1
134
+ evals_per_epoch: 4
135
+ eval_table_size:
136
+ eval_max_new_tokens: 128
137
+ saves_per_epoch: 1
138
+
139
+ debug:
140
+ deepspeed:
141
+ fsdp:
142
+ fsdp_config:
143
+
144
+ special_tokens:
145
+ pad_token: <|finetune_right_pad_id|>
146
+
147
+ ```
148
+
149
+ </details><br>
150
+
151
+ ## Credits
152
+
153
+ - [anthracite-org/Stheno-Data-Filtered](https://huggingface.co/datasets/anthracite-org/Stheno-Data-Filtered)
154
+ - [anthracite-org/kalo-opus-instruct-22k-no-refusal](https://huggingface.co/datasets/anthracite-org/kalo-opus-instruct-22k-no-refusal)
155
+ - [lodrick-the-lafted/NopmWritingStruct](https://huggingface.co/datasets/lodrick-the-lafted/NopmWritingStruct)
156
+ - [NewEden/Gryphe-3.5-16k-Subset](https://huggingface.co/datasets/NewEden/Gryphe-3.5-16k-Subset)
157
+ - [Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned](https://huggingface.co/datasets/Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned)
158
+ - [Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned](https://huggingface.co/datasets/Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned)
159
+
160
+ This model has been a team effort, and the credits goes to all members of Anthracite.
161
+
162
+ ## Training
163
+ The training was done for 2 epochs. We used 2 x [RTX 6000s](https://store.nvidia.com/en-us/nvidia-rtx/products/nvidia-rtx-6000-ada-generation/) GPUs graciously provided by [Kubernetes_Bad](https://huggingface.co/kubernetes-bad) for the full-parameter fine-tuning of the model.
164
+
165
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
166
+
167
+ ## Safety
168
+ ...