xtristan commited on
Commit
841e0c8
·
verified ·
1 Parent(s): 2c1f800

End of training

Browse files
Files changed (1) hide show
  1. README.md +175 -2
README.md CHANGED
@@ -1,7 +1,180 @@
1
  ---
2
- base_model: Qwen/Qwen3-32B
3
  library_name: peft
 
 
 
 
 
 
 
 
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ### Framework versions
6
 
7
- - PEFT 0.15.2
 
 
 
 
 
1
  ---
 
2
  library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen3-32B
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: shuttle-3.5-ckpts
10
+ results: []
11
  ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.9.0`
20
+ ```yaml
21
+ # Weights and Biases logging config
22
+ wandb_project: shuttle-3.5
23
+ wandb_name: "3.5"
24
+
25
+ # Model architecture config
26
+ base_model: Qwen/Qwen3-32B
27
+ model_type: AutoModelForCausalLM
28
+ tokenizer_type: AutoTokenizer
29
+ chat_template: chatml
30
+
31
+ # Hugging Face saving config
32
+ hub_model_id: shuttleai/shuttle-3.5-ckpts
33
+ hub_strategy: all_checkpoints
34
+
35
+ # Model checkpointing config
36
+ output_dir: ./lora-out
37
+ saves_per_epoch: 10
38
+ save_safetensors: true
39
+ save_total_limit: 5
40
+
41
+ # Mixed precision training config
42
+ bf16: true
43
+ fp16: false
44
+ tf32: false
45
+
46
+ # Model loading config
47
+ load_in_8bit: false
48
+ load_in_4bit: true
49
+ strict: false
50
+
51
+ # Sequence config
52
+ sequence_len: 16384
53
+ s2_attention: false
54
+ sample_packing: true
55
+ eval_sample_packing: true
56
+ pad_to_sequence_len: true
57
+ train_on_inputs: false
58
+ group_by_length: false
59
+
60
+ # QLoRA adapter config
61
+ adapter: qlora
62
+ lora_r: 64
63
+ lora_alpha: 64
64
+ lora_dropout: 0.05
65
+ peft_use_dora: false
66
+ lora_target_modules:
67
+ - gate_proj
68
+ - down_proj
69
+ - up_proj
70
+ - q_proj
71
+ - v_proj
72
+ - k_proj
73
+ - o_proj
74
+
75
+ # Dataset config
76
+ datasets:
77
+ - path: ./dataset
78
+ type: chat_template
79
+ val_set_size: 0.05
80
+ evals_per_epoch: 10
81
+ dataset_prepared_path: ./prepared-datasets
82
+ shuffle_merged_datasets: true
83
+
84
+ # Training hyperparameters
85
+ num_epochs: 1
86
+ gradient_accumulation_steps: 2
87
+ micro_batch_size: 2
88
+ eval_batch_size: 1
89
+ warmup_steps: 500
90
+ optimizer: paged_adamw_8bit
91
+ lr_scheduler: cosine
92
+ learning_rate: 2e-4
93
+ loraplus_lr_ratio: 8
94
+ cosine_min_lr_ratio: 0.1
95
+ weight_decay: 0.1
96
+ max_grad_norm: 1
97
+ logging_steps: 1
98
+
99
+ # Model optimization
100
+ gradient_checkpointing: unsloth
101
+ xformers_attention: false
102
+ flash_attention: true
103
+ sdp_attention: false
104
+ unsloth_cross_entropy_loss: true
105
+ unsloth_lora_mlp: false
106
+ unsloth_lora_qkv: false
107
+ unsloth_lora_o: false
108
+
109
+ # Loss monitoring config
110
+ early_stopping_patience: false
111
+ loss_watchdog_threshold: 100.0
112
+ loss_watchdog_patience: 3
113
+
114
+ # Debug config
115
+ debug: false
116
+ seed: 42
117
+
118
+ deepspeed: deepspeed_configs/zero2.json
119
+ ```
120
+
121
+ </details><br>
122
+
123
+ # shuttle-3.5-ckpts
124
+
125
+ This model is a fine-tuned version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) on an unknown dataset.
126
+ It achieves the following results on the evaluation set:
127
+ - Loss: 0.9783
128
+
129
+ ## Model description
130
+
131
+ More information needed
132
+
133
+ ## Intended uses & limitations
134
+
135
+ More information needed
136
+
137
+ ## Training and evaluation data
138
+
139
+ More information needed
140
+
141
+ ## Training procedure
142
+
143
+ ### Training hyperparameters
144
+
145
+ The following hyperparameters were used during training:
146
+ - learning_rate: 0.0002
147
+ - train_batch_size: 2
148
+ - eval_batch_size: 1
149
+ - seed: 42
150
+ - distributed_type: multi-GPU
151
+ - gradient_accumulation_steps: 2
152
+ - total_train_batch_size: 4
153
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
154
+ - lr_scheduler_type: cosine
155
+ - lr_scheduler_warmup_steps: 500
156
+ - num_epochs: 1.0
157
+
158
+ ### Training results
159
+
160
+ | Training Loss | Epoch | Step | Validation Loss |
161
+ |:-------------:|:------:|:----:|:---------------:|
162
+ | 7.5468 | 0.0006 | 1 | 7.0761 |
163
+ | 4.9993 | 0.1006 | 160 | 5.6051 |
164
+ | 3.358 | 0.2011 | 320 | 2.5960 |
165
+ | 1.809 | 0.3017 | 480 | 1.3915 |
166
+ | 2.088 | 0.4023 | 640 | 1.1270 |
167
+ | 1.8377 | 0.5028 | 800 | 1.0472 |
168
+ | 1.8002 | 0.6034 | 960 | 1.0100 |
169
+ | 1.7863 | 0.7040 | 1120 | 0.9924 |
170
+ | 1.4572 | 0.8045 | 1280 | 0.9861 |
171
+ | 1.8509 | 0.9051 | 1440 | 0.9783 |
172
+
173
+
174
  ### Framework versions
175
 
176
+ - PEFT 0.15.2
177
+ - Transformers 4.51.3
178
+ - Pytorch 2.5.1+cu124
179
+ - Datasets 3.5.0
180
+ - Tokenizers 0.21.1