nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth

Meta-Llama-3.1-8B-Instruct-abliterated finetuned using the ICONN-1-BasicChat-Data-SuperLite dataset as requested by @Enderchef under https://huggingface.co/mradermacher/model_requests/discussions/920

axolotl version: 0.9.0

base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: Sabresooth/Sabresooth_Train
    chat_template: llama3
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

adapter: lora
lora_model_dir:

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00004

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
special_tokens:
  pad_token: <|end_of_text|>

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 16
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 8.0

Training results

Training Loss	Epoch	Step	Validation Loss
3.4056	0.0336	1	4.5655
3.9338	0.2689	8	4.2118
1.4716	0.5378	16	2.0672
0.4684	0.8067	24	1.0214
0.0732	1.0672	32	0.4799
0.081	1.3361	40	0.0248
0.0064	1.6050	48	0.0024
0.0013	1.8739	56	0.0014
0.0004	2.1345	64	0.0003
0.0003	2.4034	72	0.0003
0.0002	2.6723	80	0.0005
0.0001	2.9412	88	0.0001
0.0001	3.2017	96	0.0001
0.0001	3.4706	104	0.0001
0.0002	3.7395	112	0.0001
0.0001	4.0	120	0.0001
0.0001	4.2689	128	0.0001
0.0001	4.5378	136	0.0001
0.0001	4.8067	144	0.0001
0.0001	5.0672	152	0.0001
0.0001	5.3361	160	0.0001
0.0001	5.6050	168	0.0001
0.0001	5.8739	176	0.0001
0.0001	6.1345	184	0.0001
0.0001	6.4034	192	0.0001
0.0	6.6723	200	0.0001
0.0	6.9412	208	0.0001
0.0001	7.2017	216	0.0001
0.0001	7.4706	224	0.0001
0.0001	7.7395	232	0.0001

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.7.0+cu128
Datasets 3.5.0
Tokenizers 0.21.1

nicoboss
/

Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth

Dataset used to train nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth

Evaluation results