lbourdois commited on
Commit
ab94533
·
verified ·
1 Parent(s): 2dfa1b2

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +178 -164
README.md CHANGED
@@ -1,165 +1,179 @@
1
- ---
2
- library_name: peft
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-1.5B-Instruct
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- model-index:
9
- - name: a7706e92-133c-4e5d-bca1-aad5a4fc27e6
10
- results: []
11
- ---
12
-
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
- <details><summary>See axolotl config</summary>
18
-
19
- axolotl version: `0.4.1`
20
- ```yaml
21
- accelerate_config:
22
- dynamo_backend: inductor
23
- mixed_precision: bf16
24
- num_machines: 1
25
- num_processes: auto
26
- use_cpu: false
27
- adapter: lora
28
- base_model: Qwen/Qwen2.5-1.5B-Instruct
29
- bf16: auto
30
- chat_template: llama3
31
- dataset_prepared_path: null
32
- datasets:
33
- - data_files:
34
- - 733c43e45c9d282a_train_data.json
35
- ds_type: json
36
- format: custom
37
- path: /workspace/input_data/733c43e45c9d282a_train_data.json
38
- type:
39
- field_instruction: problem
40
- field_output: solution
41
- format: '{instruction}'
42
- no_input_format: '{instruction}'
43
- system_format: '{system}'
44
- system_prompt: ''
45
- debug: null
46
- deepspeed: null
47
- device_map: auto
48
- early_stopping_patience: null
49
- eval_max_new_tokens: 128
50
- eval_table_size: null
51
- evals_per_epoch: 4
52
- flash_attention: false
53
- fp16: null
54
- fsdp: null
55
- fsdp_config: null
56
- gradient_accumulation_steps: 16
57
- gradient_checkpointing: true
58
- group_by_length: false
59
- hub_model_id: VERSIL91/a7706e92-133c-4e5d-bca1-aad5a4fc27e6
60
- hub_repo: null
61
- hub_strategy: checkpoint
62
- hub_token: null
63
- learning_rate: 0.0001
64
- local_rank: null
65
- logging_steps: 1
66
- lora_alpha: 16
67
- lora_dropout: 0.05
68
- lora_fan_in_fan_out: null
69
- lora_model_dir: null
70
- lora_r: 8
71
- lora_target_linear: true
72
- lora_target_modules:
73
- - q_proj
74
- - v_proj
75
- lr_scheduler: cosine
76
- max_memory:
77
- 0: 70GiB
78
- max_steps: 50
79
- micro_batch_size: 2
80
- mlflow_experiment_name: /tmp/733c43e45c9d282a_train_data.json
81
- model_type: AutoModelForCausalLM
82
- num_epochs: 1
83
- optimizer: adamw_bnb_8bit
84
- output_dir: miner_id_24
85
- pad_to_sequence_len: true
86
- quantization_config:
87
- llm_int8_enable_fp32_cpu_offload: true
88
- load_in_8bit: true
89
- resume_from_checkpoint: null
90
- s2_attention: null
91
- sample_packing: false
92
- saves_per_epoch: 4
93
- sequence_len: 512
94
- strict: false
95
- tf32: false
96
- tokenizer_type: AutoTokenizer
97
- torch_compile: true
98
- train_on_inputs: false
99
- trust_remote_code: true
100
- val_set_size: 0.05
101
- wandb_entity: null
102
- wandb_mode: online
103
- wandb_name: a7706e92-133c-4e5d-bca1-aad5a4fc27e6
104
- wandb_project: Gradients-On-Demand
105
- wandb_run: your_name
106
- wandb_runid: a7706e92-133c-4e5d-bca1-aad5a4fc27e6
107
- warmup_steps: 10
108
- weight_decay: 0.0
109
- xformers_attention: null
110
-
111
- ```
112
-
113
- </details><br>
114
-
115
- # a7706e92-133c-4e5d-bca1-aad5a4fc27e6
116
-
117
- This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) on the None dataset.
118
- It achieves the following results on the evaluation set:
119
- - Loss: 0.5173
120
-
121
- ## Model description
122
-
123
- More information needed
124
-
125
- ## Intended uses & limitations
126
-
127
- More information needed
128
-
129
- ## Training and evaluation data
130
-
131
- More information needed
132
-
133
- ## Training procedure
134
-
135
- ### Training hyperparameters
136
-
137
- The following hyperparameters were used during training:
138
- - learning_rate: 0.0001
139
- - train_batch_size: 2
140
- - eval_batch_size: 2
141
- - seed: 42
142
- - gradient_accumulation_steps: 16
143
- - total_train_batch_size: 32
144
- - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
145
- - lr_scheduler_type: cosine
146
- - lr_scheduler_warmup_steps: 10
147
- - training_steps: 50
148
-
149
- ### Training results
150
-
151
- | Training Loss | Epoch | Step | Validation Loss |
152
- |:-------------:|:------:|:----:|:---------------:|
153
- | 0.6358 | 0.0001 | 1 | 0.6944 |
154
- | 0.5682 | 0.0019 | 13 | 0.6041 |
155
- | 0.5173 | 0.0039 | 26 | 0.5379 |
156
- | 0.5272 | 0.0058 | 39 | 0.5173 |
157
-
158
-
159
- ### Framework versions
160
-
161
- - PEFT 0.13.2
162
- - Transformers 4.46.0
163
- - Pytorch 2.5.0+cu124
164
- - Datasets 3.0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  - Tokenizers 0.20.1
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ language:
9
+ - zho
10
+ - eng
11
+ - fra
12
+ - spa
13
+ - por
14
+ - deu
15
+ - ita
16
+ - rus
17
+ - jpn
18
+ - kor
19
+ - vie
20
+ - tha
21
+ - ara
22
+ model-index:
23
+ - name: a7706e92-133c-4e5d-bca1-aad5a4fc27e6
24
+ results: []
25
+ ---
26
+
27
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
+ should probably proofread and complete it, then remove this comment. -->
29
+
30
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
31
+ <details><summary>See axolotl config</summary>
32
+
33
+ axolotl version: `0.4.1`
34
+ ```yaml
35
+ accelerate_config:
36
+ dynamo_backend: inductor
37
+ mixed_precision: bf16
38
+ num_machines: 1
39
+ num_processes: auto
40
+ use_cpu: false
41
+ adapter: lora
42
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
43
+ bf16: auto
44
+ chat_template: llama3
45
+ dataset_prepared_path: null
46
+ datasets:
47
+ - data_files:
48
+ - 733c43e45c9d282a_train_data.json
49
+ ds_type: json
50
+ format: custom
51
+ path: /workspace/input_data/733c43e45c9d282a_train_data.json
52
+ type:
53
+ field_instruction: problem
54
+ field_output: solution
55
+ format: '{instruction}'
56
+ no_input_format: '{instruction}'
57
+ system_format: '{system}'
58
+ system_prompt: ''
59
+ debug: null
60
+ deepspeed: null
61
+ device_map: auto
62
+ early_stopping_patience: null
63
+ eval_max_new_tokens: 128
64
+ eval_table_size: null
65
+ evals_per_epoch: 4
66
+ flash_attention: false
67
+ fp16: null
68
+ fsdp: null
69
+ fsdp_config: null
70
+ gradient_accumulation_steps: 16
71
+ gradient_checkpointing: true
72
+ group_by_length: false
73
+ hub_model_id: VERSIL91/a7706e92-133c-4e5d-bca1-aad5a4fc27e6
74
+ hub_repo: null
75
+ hub_strategy: checkpoint
76
+ hub_token: null
77
+ learning_rate: 0.0001
78
+ local_rank: null
79
+ logging_steps: 1
80
+ lora_alpha: 16
81
+ lora_dropout: 0.05
82
+ lora_fan_in_fan_out: null
83
+ lora_model_dir: null
84
+ lora_r: 8
85
+ lora_target_linear: true
86
+ lora_target_modules:
87
+ - q_proj
88
+ - v_proj
89
+ lr_scheduler: cosine
90
+ max_memory:
91
+ 0: 70GiB
92
+ max_steps: 50
93
+ micro_batch_size: 2
94
+ mlflow_experiment_name: /tmp/733c43e45c9d282a_train_data.json
95
+ model_type: AutoModelForCausalLM
96
+ num_epochs: 1
97
+ optimizer: adamw_bnb_8bit
98
+ output_dir: miner_id_24
99
+ pad_to_sequence_len: true
100
+ quantization_config:
101
+ llm_int8_enable_fp32_cpu_offload: true
102
+ load_in_8bit: true
103
+ resume_from_checkpoint: null
104
+ s2_attention: null
105
+ sample_packing: false
106
+ saves_per_epoch: 4
107
+ sequence_len: 512
108
+ strict: false
109
+ tf32: false
110
+ tokenizer_type: AutoTokenizer
111
+ torch_compile: true
112
+ train_on_inputs: false
113
+ trust_remote_code: true
114
+ val_set_size: 0.05
115
+ wandb_entity: null
116
+ wandb_mode: online
117
+ wandb_name: a7706e92-133c-4e5d-bca1-aad5a4fc27e6
118
+ wandb_project: Gradients-On-Demand
119
+ wandb_run: your_name
120
+ wandb_runid: a7706e92-133c-4e5d-bca1-aad5a4fc27e6
121
+ warmup_steps: 10
122
+ weight_decay: 0.0
123
+ xformers_attention: null
124
+
125
+ ```
126
+
127
+ </details><br>
128
+
129
+ # a7706e92-133c-4e5d-bca1-aad5a4fc27e6
130
+
131
+ This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) on the None dataset.
132
+ It achieves the following results on the evaluation set:
133
+ - Loss: 0.5173
134
+
135
+ ## Model description
136
+
137
+ More information needed
138
+
139
+ ## Intended uses & limitations
140
+
141
+ More information needed
142
+
143
+ ## Training and evaluation data
144
+
145
+ More information needed
146
+
147
+ ## Training procedure
148
+
149
+ ### Training hyperparameters
150
+
151
+ The following hyperparameters were used during training:
152
+ - learning_rate: 0.0001
153
+ - train_batch_size: 2
154
+ - eval_batch_size: 2
155
+ - seed: 42
156
+ - gradient_accumulation_steps: 16
157
+ - total_train_batch_size: 32
158
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
159
+ - lr_scheduler_type: cosine
160
+ - lr_scheduler_warmup_steps: 10
161
+ - training_steps: 50
162
+
163
+ ### Training results
164
+
165
+ | Training Loss | Epoch | Step | Validation Loss |
166
+ |:-------------:|:------:|:----:|:---------------:|
167
+ | 0.6358 | 0.0001 | 1 | 0.6944 |
168
+ | 0.5682 | 0.0019 | 13 | 0.6041 |
169
+ | 0.5173 | 0.0039 | 26 | 0.5379 |
170
+ | 0.5272 | 0.0058 | 39 | 0.5173 |
171
+
172
+
173
+ ### Framework versions
174
+
175
+ - PEFT 0.13.2
176
+ - Transformers 4.46.0
177
+ - Pytorch 2.5.0+cu124
178
+ - Datasets 3.0.1
179
  - Tokenizers 0.20.1