Meta-Llama-3.1-8B-Instruct-abliterated finetuned using the ICONN-1-BasicChat-Data-SuperLite dataset as requested by @Enderchef under https://huggingface.co/mradermacher/model_requests/discussions/920
axolotl version: 0.9.0
base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
datasets:
- path: Sabresooth/Sabresooth_Train
chat_template: llama3
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out
adapter: lora
lora_model_dir:
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00004
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_limit_all_gathers: true
fsdp_sync_module_states: true
fsdp_offload_params: true
fsdp_use_orig_params: false
fsdp_cpu_ram_efficient_loading: true
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sharding_strategy: FULL_SHARD
special_tokens:
pad_token: <|end_of_text|>
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 8.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.4056 | 0.0336 | 1 | 4.5655 |
3.9338 | 0.2689 | 8 | 4.2118 |
1.4716 | 0.5378 | 16 | 2.0672 |
0.4684 | 0.8067 | 24 | 1.0214 |
0.0732 | 1.0672 | 32 | 0.4799 |
0.081 | 1.3361 | 40 | 0.0248 |
0.0064 | 1.6050 | 48 | 0.0024 |
0.0013 | 1.8739 | 56 | 0.0014 |
0.0004 | 2.1345 | 64 | 0.0003 |
0.0003 | 2.4034 | 72 | 0.0003 |
0.0002 | 2.6723 | 80 | 0.0005 |
0.0001 | 2.9412 | 88 | 0.0001 |
0.0001 | 3.2017 | 96 | 0.0001 |
0.0001 | 3.4706 | 104 | 0.0001 |
0.0002 | 3.7395 | 112 | 0.0001 |
0.0001 | 4.0 | 120 | 0.0001 |
0.0001 | 4.2689 | 128 | 0.0001 |
0.0001 | 4.5378 | 136 | 0.0001 |
0.0001 | 4.8067 | 144 | 0.0001 |
0.0001 | 5.0672 | 152 | 0.0001 |
0.0001 | 5.3361 | 160 | 0.0001 |
0.0001 | 5.6050 | 168 | 0.0001 |
0.0001 | 5.8739 | 176 | 0.0001 |
0.0001 | 6.1345 | 184 | 0.0001 |
0.0001 | 6.4034 | 192 | 0.0001 |
0.0 | 6.6723 | 200 | 0.0001 |
0.0 | 6.9412 | 208 | 0.0001 |
0.0001 | 7.2017 | 216 | 0.0001 |
0.0001 | 7.4706 | 224 | 0.0001 |
0.0001 | 7.7395 | 232 | 0.0001 |
Framework versions
- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.0+cu128
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support