Meta-Llama-3.1-8B-Instruct-abliterated finetuned using the ICONN-1-BasicChat-Data-SuperLite dataset as requested by @Enderchef under https://huggingface.co/mradermacher/model_requests/discussions/920

Built with Axolotl

axolotl version: 0.9.0

base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: Sabresooth/Sabresooth_Train
    chat_template: llama3
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

adapter: lora
lora_model_dir:

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00004

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
special_tokens:
  pad_token: <|end_of_text|>

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 8.0

Training results

Training Loss Epoch Step Validation Loss
3.4056 0.0336 1 4.5655
3.9338 0.2689 8 4.2118
1.4716 0.5378 16 2.0672
0.4684 0.8067 24 1.0214
0.0732 1.0672 32 0.4799
0.081 1.3361 40 0.0248
0.0064 1.6050 48 0.0024
0.0013 1.8739 56 0.0014
0.0004 2.1345 64 0.0003
0.0003 2.4034 72 0.0003
0.0002 2.6723 80 0.0005
0.0001 2.9412 88 0.0001
0.0001 3.2017 96 0.0001
0.0001 3.4706 104 0.0001
0.0002 3.7395 112 0.0001
0.0001 4.0 120 0.0001
0.0001 4.2689 128 0.0001
0.0001 4.5378 136 0.0001
0.0001 4.8067 144 0.0001
0.0001 5.0672 152 0.0001
0.0001 5.3361 160 0.0001
0.0001 5.6050 168 0.0001
0.0001 5.8739 176 0.0001
0.0001 6.1345 184 0.0001
0.0001 6.4034 192 0.0001
0.0 6.6723 200 0.0001
0.0 6.9412 208 0.0001
0.0001 7.2017 216 0.0001
0.0001 7.4706 224 0.0001
0.0001 7.7395 232 0.0001

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.0+cu128
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
0
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth

Dataset used to train nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth