Meta-Llama-3.1-8B-Instruct-abliterated finetuned using the ICONN-1-BasicChat-Data-SuperLite dataset as requested by @Enderchef under https://huggingface.co/mradermacher/model_requests/discussions/918
axolotl version: 0.9.0
base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
datasets:
- path: /apool/axolotl/0001.parquet
chat_template: llama3
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out
adapter: lora
lora_model_dir:
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00001
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_limit_all_gathers: true
fsdp_sync_module_states: true
fsdp_offload_params: true
fsdp_use_orig_params: false
fsdp_cpu_ram_efficient_loading: true
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sharding_strategy: FULL_SHARD
special_tokens:
pad_token: <|end_of_text|>
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 8.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.5587 | 0.0336 | 1 | 3.4337 |
3.6702 | 0.2689 | 8 | 3.4260 |
3.5802 | 0.5378 | 16 | 3.3161 |
3.2421 | 0.8067 | 24 | 3.0272 |
2.322 | 1.0672 | 32 | 2.4812 |
1.9774 | 1.3361 | 40 | 1.8708 |
1.5103 | 1.6050 | 48 | 1.3871 |
1.1904 | 1.8739 | 56 | 1.0542 |
1.0394 | 2.1345 | 64 | 0.8591 |
0.5501 | 2.4034 | 72 | 0.6723 |
0.2454 | 2.6723 | 80 | 0.5369 |
0.4499 | 2.9412 | 88 | 0.4286 |
0.2194 | 3.2017 | 96 | 0.3691 |
0.1172 | 3.4706 | 104 | 0.2802 |
0.0739 | 3.7395 | 112 | 0.1948 |
0.1524 | 4.0 | 120 | 0.1457 |
0.0444 | 4.2689 | 128 | 0.1125 |
0.1385 | 4.5378 | 136 | 0.0759 |
0.0591 | 4.8067 | 144 | 0.0560 |
0.0252 | 5.0672 | 152 | 0.0460 |
0.0066 | 5.3361 | 160 | 0.0370 |
0.023 | 5.6050 | 168 | 0.0252 |
0.0033 | 5.8739 | 176 | 0.0202 |
0.0029 | 6.1345 | 184 | 0.0168 |
0.0024 | 6.4034 | 192 | 0.0154 |
0.0103 | 6.6723 | 200 | 0.0146 |
0.0108 | 6.9412 | 208 | 0.0139 |
0.0049 | 7.2017 | 216 | 0.0138 |
0.0025 | 7.4706 | 224 | 0.0139 |
0.0036 | 7.7395 | 232 | 0.0136 |
Framework versions
- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.0+cu128
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support