ugaoo commited on
Commit
0158adc
·
verified ·
1 Parent(s): b21256a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -146
README.md CHANGED
@@ -1,146 +0,0 @@
1
- ---
2
- library_name: peft
3
- license: llama3.1
4
- base_model: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
5
- tags:
6
- - generated_from_trainer
7
- datasets:
8
- - ugaoo/subset_each5k_multimedqa
9
- model-index:
10
- - name: out/subset_each5k_multimedqa
11
- results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
-
20
- axolotl version: `0.8.0.dev0`
21
- ```yaml
22
- base_model: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
23
- model_type: AutoModelForCausalLM
24
- tokenizer_type: AutoTokenizer
25
- trust_remote_code: true
26
-
27
- load_in_8bit: false
28
- load_in_4bit: true
29
- strict: false
30
-
31
- datasets:
32
- - path: ugaoo/subset_each5k_multimedqa
33
- type: alpaca
34
- val_set_size: 0
35
- output_dir: ./out/subset_each5k_multimedqa
36
-
37
- sequence_len: 4000
38
- sample_packing: true
39
- pad_to_sequence_len: true
40
-
41
- adapter: qlora
42
- lora_r: 256
43
- lora_alpha: 512
44
- lora_dropout: 0.05
45
- lora_target_linear: true
46
- lora_target_modules:
47
- - q_proj
48
- - k_proj
49
- - v_proj
50
- - o_proj
51
- - up_proj
52
- - down_proj
53
- - gate_proj
54
- lora_modules_to_save:
55
- - embed_tokens
56
- - lm_head
57
-
58
- wandb_project: cosmosearch
59
- wandb_entity:
60
- wandb_watch:
61
- wandb_name: subset_each5k_multimedqa_Nemotron-70B
62
- wandb_log_model:
63
-
64
- gradient_accumulation_steps: 3
65
- micro_batch_size: 4
66
- num_epochs: 6
67
- optimizer: adamw_torch
68
- lr_scheduler: cosine
69
- learning_rate: 5e-6
70
-
71
- train_on_inputs: false
72
- group_by_length: false
73
- bf16: auto
74
- fp16: false
75
- tf32: false
76
-
77
- gradient_checkpointing: true
78
- early_stopping_patience:
79
- resume_from_checkpoint:
80
- logging_steps: 1
81
- xformers_attention:
82
- flash_attention: true
83
-
84
- warmup_steps: 100
85
- evals_per_epoch: 6
86
- eval_table_size:
87
- saves_per_epoch: 1
88
- debug:
89
- deepspeed:
90
- weight_decay: 0.0
91
- fsdp:
92
- fsdp_config:
93
- save_total_limit: 6
94
- special_tokens:
95
- pad_token: <|end_of_text|>
96
-
97
- ```
98
-
99
- </details><br>
100
-
101
- # out/subset_each5k_multimedqa
102
-
103
- This model is a fine-tuned version of [nvidia/Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF) on the ugaoo/subset_each5k_multimedqa dataset.
104
-
105
- ## Model description
106
-
107
- More information needed
108
-
109
- ## Intended uses & limitations
110
-
111
- More information needed
112
-
113
- ## Training and evaluation data
114
-
115
- More information needed
116
-
117
- ## Training procedure
118
-
119
- ### Training hyperparameters
120
-
121
- The following hyperparameters were used during training:
122
- - learning_rate: 5e-06
123
- - train_batch_size: 4
124
- - eval_batch_size: 4
125
- - seed: 42
126
- - distributed_type: multi-GPU
127
- - num_devices: 3
128
- - gradient_accumulation_steps: 3
129
- - total_train_batch_size: 36
130
- - total_eval_batch_size: 12
131
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
132
- - lr_scheduler_type: cosine
133
- - lr_scheduler_warmup_steps: 100
134
- - num_epochs: 6.0
135
-
136
- ### Training results
137
-
138
-
139
-
140
- ### Framework versions
141
-
142
- - PEFT 0.15.0
143
- - Transformers 4.49.0
144
- - Pytorch 2.5.1+cu124
145
- - Datasets 3.4.1
146
- - Tokenizers 0.21.1