ugaoo commited on
Commit
53a6d06
·
verified ·
1 Parent(s): b2ba87f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -145
README.md CHANGED
@@ -1,145 +0,0 @@
1
- ---
2
- library_name: peft
3
- license: llama3.1
4
- base_model: meta-llama/Llama-3.1-70B-Instruct
5
- tags:
6
- - generated_from_trainer
7
- datasets:
8
- - ugaoo/llama_3170_wrong_only
9
- model-index:
10
- - name: out/llama_3170_wrong_only
11
- results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
-
20
- axolotl version: `0.8.0.dev0`
21
- ```yaml
22
- base_model: meta-llama/Llama-3.1-70B-Instruct
23
- model_type: AutoModelForCausalLM
24
- tokenizer_type: AutoTokenizer
25
- trust_remote_code: true
26
-
27
- load_in_8bit: false
28
- load_in_4bit: true
29
- strict: false
30
-
31
- datasets:
32
- - path: ugaoo/llama_3170_wrong_only
33
- type: alpaca
34
- val_set_size: 0
35
- output_dir: ./out/llama_3170_wrong_only
36
-
37
- sequence_len: 4000
38
- sample_packing: true
39
- pad_to_sequence_len: true
40
-
41
- adapter: qlora
42
- lora_r: 256
43
- lora_alpha: 512
44
- lora_dropout: 0.05
45
- lora_target_linear: true
46
- lora_target_modules:
47
- - q_proj
48
- - k_proj
49
- - v_proj
50
- - o_proj
51
- - up_proj
52
- - down_proj
53
- - gate_proj
54
- lora_modules_to_save:
55
- - embed_tokens
56
- - lm_head
57
-
58
- wandb_project: cosmosearch
59
- wandb_entity:
60
- wandb_watch:
61
- wandb_name: llama_3170_wrong_only_llama31
62
- wandb_log_model:
63
-
64
- gradient_accumulation_steps: 3
65
- micro_batch_size: 4
66
- num_epochs: 6
67
- optimizer: adamw_torch
68
- lr_scheduler: cosine
69
- learning_rate: 5e-6
70
-
71
- train_on_inputs: false
72
- group_by_length: false
73
- bf16: auto
74
- fp16: false
75
- tf32: false
76
-
77
- gradient_checkpointing: true
78
- early_stopping_patience:
79
- resume_from_checkpoint:
80
- logging_steps: 1
81
- xformers_attention:
82
- flash_attention: true
83
-
84
- warmup_steps: 100
85
- evals_per_epoch: 6
86
- eval_table_size:
87
- saves_per_epoch: 1
88
- debug:
89
- deepspeed:
90
- weight_decay: 0.0
91
- fsdp:
92
- fsdp_config:
93
- save_total_limit: 6
94
- special_tokens:
95
- pad_token: <|end_of_text|>
96
- ```
97
-
98
- </details><br>
99
-
100
- # out/llama_3170_wrong_only
101
-
102
- This model is a fine-tuned version of [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) on the ugaoo/llama_3170_wrong_only dataset.
103
-
104
- ## Model description
105
-
106
- More information needed
107
-
108
- ## Intended uses & limitations
109
-
110
- More information needed
111
-
112
- ## Training and evaluation data
113
-
114
- More information needed
115
-
116
- ## Training procedure
117
-
118
- ### Training hyperparameters
119
-
120
- The following hyperparameters were used during training:
121
- - learning_rate: 5e-06
122
- - train_batch_size: 4
123
- - eval_batch_size: 4
124
- - seed: 42
125
- - distributed_type: multi-GPU
126
- - num_devices: 3
127
- - gradient_accumulation_steps: 3
128
- - total_train_batch_size: 36
129
- - total_eval_batch_size: 12
130
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
- - lr_scheduler_type: cosine
132
- - lr_scheduler_warmup_steps: 100
133
- - num_epochs: 6.0
134
-
135
- ### Training results
136
-
137
-
138
-
139
- ### Framework versions
140
-
141
- - PEFT 0.15.0
142
- - Transformers 4.49.0
143
- - Pytorch 2.5.1+cu124
144
- - Datasets 3.4.1
145
- - Tokenizers 0.21.1