---
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen3-4B
tags:
- generated_from_trainer
datasets:
- dougiefresh/jade_identity
model-index:
- name: outputs/identity_adapter
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.9.2`
```yaml
adapter: lora
base_model: Qwen/Qwen3-4B
bf16: true

# Dataset & Data Loading
dataset_processes: 32
chat_template: chatml
datasets:
- message_property_mappings:
    content: content
    role: role
  path: dougiefresh/jade_identity
  train_split: train
  valid_split: valid
  trust_remote_code: false
  type: chat_template

# Training Efficiency
micro_batch_size: 32
gradient_accumulation_steps: 2
gradient_checkpointing: true

# LoRA Settings
lora_alpha: 64
lora_dropout: 0.05
lora_r: 64
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj

# Optimization
learning_rate: 0.000008     # ↓ lower LR for stability
lr_scheduler: cosine
warmup_ratio: 0.2          # ↑ slightly longer warmup for smoother start
optimizer: adamw_torch_fused

# Sequence Length & Packing
sequence_len: 2048         # ↓ 32K is overkill for identity Q&A
max_prompt_len: 2048
sample_packing_bin_size: 256
sample_packing_group_size: 200000

# Saving & Evaluation
num_epochs: 30.0                     # ↑ train longer on smaller dataset
output_dir: ./outputs/identity_adapter
save_only_model: false
save_safetensors: true
val_set_size: 0.2                    # ↑ larger validation split
eval_steps: 50                       # ↑ more frequent eval
save_steps: 50                       # ↑ save often to prevent data loss
load_best_model_at_end: true

# Training Behavior
train_on_inputs: false
shuffle_merged_datasets: true
skip_prepare_dataset: false
auto_resume_from_checkpoints: true
weight_decay: 0.01

# Advanced
pretrain_multipack_attn: true
pretrain_multipack_buffer_size: 10000
qlora_sharded_model_loading: false
mean_resizing_embeddings: false
strict: false

# TRL
trl:
  log_completions: false
  ref_model_mixup_alpha: 0.9
  ref_model_sync_steps: 64
  sync_ref_model: false
  use_vllm: false

# Hardware
load_in_4bit: false
load_in_8bit: false
use_ray: false
ray_num_workers: 1
resources_per_worker:
  GPU: 1

callbacks:
  - type: ReduceLROnPlateau
    monitor: eval_loss
    factor: 0.5
    patience: 3
    mode: min
    min_lr: 1e-7

  - type: EarlyStoppingCallback
    monitor: eval_loss
    patience: 6
    mode: min

# Logging
use_tensorboard: true
logging_dir: ./outputs/tensorboard
logging_first_step: true
logging_steps: 10
```

</details><br>

# outputs/identity_adapter

This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) on the dougiefresh/jade_identity dataset.
It achieves the following results on the evaluation set:
- Loss: 2.3335

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 184
- num_epochs: 30.0

### Training results

| Training Loss | Epoch   | Step | Validation Loss |
|:-------------:|:-------:|:----:|:---------------:|
| No log        | 0.0323  | 1    | 7.7014          |
| 7.2709        | 1.6129  | 50   | 7.0879          |
| 4.9858        | 3.2258  | 100  | 4.8536          |
| 3.5705        | 4.8387  | 150  | 3.4831          |
| 2.839         | 6.4516  | 200  | 2.9379          |
| 2.5697        | 8.0645  | 250  | 2.6852          |
| 2.3997        | 9.6774  | 300  | 2.5461          |
| 2.2486        | 11.2903 | 350  | 2.4681          |
| 2.1874        | 12.9032 | 400  | 2.4054          |
| 2.0334        | 14.5161 | 450  | 2.3724          |
| 1.9825        | 16.1290 | 500  | 2.3459          |
| 1.9212        | 17.7419 | 550  | 2.3317          |
| 1.8507        | 19.3548 | 600  | 2.3255          |
| 1.8262        | 20.9677 | 650  | 2.3246          |
| 1.8001        | 22.5806 | 700  | 2.3292          |
| 1.7335        | 24.1935 | 750  | 2.3303          |
| 1.751         | 25.8065 | 800  | 2.3328          |
| 1.7384        | 27.4194 | 850  | 2.3327          |
| 1.7723        | 29.0323 | 900  | 2.3335          |


### Framework versions

- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.1
- Tokenizers 0.21.1