--- library_name: peft license: apache-2.0 base_model: Qwen/Qwen3-4B tags: - generated_from_trainer datasets: - dougiefresh/jade_identity model-index: - name: outputs/identity_adapter results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.9.2` ```yaml adapter: lora base_model: Qwen/Qwen3-4B bf16: true # Dataset & Data Loading dataset_processes: 32 chat_template: chatml datasets: - message_property_mappings: content: content role: role path: dougiefresh/jade_identity train_split: train valid_split: valid trust_remote_code: false type: chat_template # Training Efficiency micro_batch_size: 32 gradient_accumulation_steps: 2 gradient_checkpointing: true # LoRA Settings lora_alpha: 64 lora_dropout: 0.05 lora_r: 64 lora_target_modules: - q_proj - v_proj - k_proj - o_proj - gate_proj - down_proj - up_proj # Optimization learning_rate: 0.000008 # ↓ lower LR for stability lr_scheduler: cosine warmup_ratio: 0.2 # ↑ slightly longer warmup for smoother start optimizer: adamw_torch_fused # Sequence Length & Packing sequence_len: 2048 # ↓ 32K is overkill for identity Q&A max_prompt_len: 2048 sample_packing_bin_size: 256 sample_packing_group_size: 200000 # Saving & Evaluation num_epochs: 30.0 # ↑ train longer on smaller dataset output_dir: ./outputs/identity_adapter save_only_model: false save_safetensors: true val_set_size: 0.2 # ↑ larger validation split eval_steps: 50 # ↑ more frequent eval save_steps: 50 # ↑ save often to prevent data loss load_best_model_at_end: true # Training Behavior train_on_inputs: false shuffle_merged_datasets: true skip_prepare_dataset: false auto_resume_from_checkpoints: true weight_decay: 0.01 # Advanced pretrain_multipack_attn: true pretrain_multipack_buffer_size: 10000 qlora_sharded_model_loading: false mean_resizing_embeddings: false strict: false # TRL trl: log_completions: false ref_model_mixup_alpha: 0.9 ref_model_sync_steps: 64 sync_ref_model: false use_vllm: false # Hardware load_in_4bit: false load_in_8bit: false use_ray: false ray_num_workers: 1 resources_per_worker: GPU: 1 callbacks: - type: ReduceLROnPlateau monitor: eval_loss factor: 0.5 patience: 3 mode: min min_lr: 1e-7 - type: EarlyStoppingCallback monitor: eval_loss patience: 6 mode: min # Logging use_tensorboard: true logging_dir: ./outputs/tensorboard logging_first_step: true logging_steps: 10 ```

# outputs/identity_adapter This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) on the dougiefresh/jade_identity dataset. It achieves the following results on the evaluation set: - Loss: 2.3335 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 184 - num_epochs: 30.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-------:|:----:|:---------------:| | No log | 0.0323 | 1 | 7.7014 | | 7.2709 | 1.6129 | 50 | 7.0879 | | 4.9858 | 3.2258 | 100 | 4.8536 | | 3.5705 | 4.8387 | 150 | 3.4831 | | 2.839 | 6.4516 | 200 | 2.9379 | | 2.5697 | 8.0645 | 250 | 2.6852 | | 2.3997 | 9.6774 | 300 | 2.5461 | | 2.2486 | 11.2903 | 350 | 2.4681 | | 2.1874 | 12.9032 | 400 | 2.4054 | | 2.0334 | 14.5161 | 450 | 2.3724 | | 1.9825 | 16.1290 | 500 | 2.3459 | | 1.9212 | 17.7419 | 550 | 2.3317 | | 1.8507 | 19.3548 | 600 | 2.3255 | | 1.8262 | 20.9677 | 650 | 2.3246 | | 1.8001 | 22.5806 | 700 | 2.3292 | | 1.7335 | 24.1935 | 750 | 2.3303 | | 1.751 | 25.8065 | 800 | 2.3328 | | 1.7384 | 27.4194 | 850 | 2.3327 | | 1.7723 | 29.0323 | 900 | 2.3335 | ### Framework versions - PEFT 0.15.2 - Transformers 4.51.3 - Pytorch 2.6.0+cu124 - Datasets 3.5.1 - Tokenizers 0.21.1