Built with Axolotl

See axolotl config

axolotl version: 0.9.2

adapter: lora
base_model: Qwen/Qwen3-4B
bf16: true

# Dataset & Data Loading
dataset_processes: 32
chat_template: chatml
datasets:
- message_property_mappings:
    content: content
    role: role
  path: dougiefresh/jade_identity
  train_split: train
  valid_split: valid
  trust_remote_code: false
  type: chat_template

# Training Efficiency
micro_batch_size: 32
gradient_accumulation_steps: 2
gradient_checkpointing: true

# LoRA Settings
lora_alpha: 64
lora_dropout: 0.05
lora_r: 64
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj

# Optimization
learning_rate: 0.000008     # ↓ lower LR for stability
lr_scheduler: cosine
warmup_ratio: 0.2          # ↑ slightly longer warmup for smoother start
optimizer: adamw_torch_fused

# Sequence Length & Packing
sequence_len: 2048         # ↓ 32K is overkill for identity Q&A
max_prompt_len: 2048
sample_packing_bin_size: 256
sample_packing_group_size: 200000

# Saving & Evaluation
num_epochs: 30.0                     # ↑ train longer on smaller dataset
output_dir: ./outputs/identity_adapter
save_only_model: false
save_safetensors: true
val_set_size: 0.2                    # ↑ larger validation split
eval_steps: 50                       # ↑ more frequent eval
save_steps: 50                       # ↑ save often to prevent data loss
load_best_model_at_end: true

# Training Behavior
train_on_inputs: false
shuffle_merged_datasets: true
skip_prepare_dataset: false
auto_resume_from_checkpoints: true
weight_decay: 0.01

# Advanced
pretrain_multipack_attn: true
pretrain_multipack_buffer_size: 10000
qlora_sharded_model_loading: false
mean_resizing_embeddings: false
strict: false

# TRL
trl:
  log_completions: false
  ref_model_mixup_alpha: 0.9
  ref_model_sync_steps: 64
  sync_ref_model: false
  use_vllm: false

# Hardware
load_in_4bit: false
load_in_8bit: false
use_ray: false
ray_num_workers: 1
resources_per_worker:
  GPU: 1

callbacks:
  - type: ReduceLROnPlateau
    monitor: eval_loss
    factor: 0.5
    patience: 3
    mode: min
    min_lr: 1e-7

  - type: EarlyStoppingCallback
    monitor: eval_loss
    patience: 6
    mode: min

# Logging
use_tensorboard: true
logging_dir: ./outputs/tensorboard
logging_first_step: true
logging_steps: 10

outputs/identity_adapter

This model is a fine-tuned version of Qwen/Qwen3-4B on the dougiefresh/jade_identity dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3335

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 184
  • num_epochs: 30.0

Training results

Training Loss Epoch Step Validation Loss
No log 0.0323 1 7.7014
7.2709 1.6129 50 7.0879
4.9858 3.2258 100 4.8536
3.5705 4.8387 150 3.4831
2.839 6.4516 200 2.9379
2.5697 8.0645 250 2.6852
2.3997 9.6774 300 2.5461
2.2486 11.2903 350 2.4681
2.1874 12.9032 400 2.4054
2.0334 14.5161 450 2.3724
1.9825 16.1290 500 2.3459
1.9212 17.7419 550 2.3317
1.8507 19.3548 600 2.3255
1.8262 20.9677 650 2.3246
1.8001 22.5806 700 2.3292
1.7335 24.1935 750 2.3303
1.751 25.8065 800 2.3328
1.7384 27.4194 850 2.3327
1.7723 29.0323 900 2.3335

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
63
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dougiefresh/jade_qwen_4b_identity_adapter

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Adapter
(19)
this model

Dataset used to train dougiefresh/jade_qwen_4b_identity_adapter