See axolotl config
axolotl version: 0.12.2
# In case of weird errors, try reinstalling
# pip install --no-build-isolation axolotl[deepspeed]
# (unsloth libraries are incompatible)
base_model: Qwen/Qwen3-14B
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: Sunbird/ug40-instructions
name: pretraining_text_qwen
split: train
text_column: text
type: completion
test_datasets:
- path: Sunbird/ug40-instructions
name: pretraining_text_qwen
split: dev
text_column: text
type: completion
sequence_len: 512
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
gradient_accumulation_steps: 8 # Remember to check number of GPUs on the instance
micro_batch_size: 4 # 4 on 4xH100, 16 on 8xH100
num_epochs: 2
optimizer: adamw_torch_fused
learning_rate: 2e-5
lr_scheduler: cosine
weight_decay: 0.01
max_grad_norm: 1.0
train_on_inputs:
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
xformers_attention:
flash_attention: true
eager_attention:
# plugins:
# - axolotl.integrations.liger.LigerPlugin
# liger_rope: true
# liger_rms_norm: true
# liger_glu_activation: true
# liger_layer_norm: true
# liger_fused_linear_cross_entropy: true
loss_watchdog_threshold: 10.0
loss_watchdog_patience: 3
warmup_steps: 20
eval_steps: 200
#save_steps: 5000
logging_steps: 5
save_strategy: epoch
save_only_model: true
hub_model_id: sunflower-qwen14b-pretrained
hub_strategy: end
#save_total_limit: 2
# auto_resume_from_checkpoints: true
debug:
deepspeed: zero3_bf16.json
# fsdp:
# - full_shard
# - auto_wrap
# fsdp_config:
# fsdp_version: 2
# fsdp_offload_params: false
# fsdp_cpu_ram_efficient_loading: true
# fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
# fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer
# fsdp_state_dict_type: FULL_STATE_DICT
# fsdp_sharding_strategy: FULL_SHARD
# fsdp_reshard_after_forward: true
# fsdp_activation_checkpointing: true
dataset_prepared_path: last_run_prepared
output_dir: ./outputs-14b/
use_wandb: true
use_mlflow: true
wandb_project: ug40-pretraining
# wandb_name also sets mlflow run name
wandb_name: qwen3-14b-updated-dataset
mlflow_tracking_uri: https://mlflow.sunbird.ai
mlflow_experiment_name: ug40-pretraining
# mlflow_run_name: qwen3-14b-convergence-test-lr5e-5
sunflower-qwen14b-pretrained
This model is a fine-tuned version of Qwen/Qwen3-14B on the Sunbird/ug40-instructions dataset. It achieves the following results on the evaluation set:
- Loss: 3.4671
- Memory/max Mem Active(gib): 86.43
- Memory/max Mem Allocated(gib): 83.31
- Memory/device Mem Reserved(gib): 89.31
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 20
- training_steps: 6566
Training results
| Training Loss | Epoch | Step | Validation Loss | Mem Active(gib) | Mem Allocated(gib) | Mem Reserved(gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 5.0475 | 32.9 | 31.23 | 33.76 |
| 1.9221 | 0.0609 | 200 | 3.9620 | 86.26 | 83.31 | 89.25 |
| 1.7596 | 0.1218 | 400 | 3.7963 | 86.26 | 83.31 | 89.25 |
| 1.6725 | 0.1827 | 600 | 3.7146 | 86.26 | 83.31 | 89.25 |
| 1.5979 | 0.2436 | 800 | 3.6525 | 86.26 | 83.31 | 89.31 |
| 1.5777 | 0.3045 | 1000 | 3.6217 | 86.43 | 83.31 | 89.31 |
| 1.5402 | 0.3654 | 1200 | 3.5778 | 86.43 | 83.31 | 89.31 |
| 1.4566 | 0.4263 | 1400 | 3.5412 | 86.43 | 83.31 | 89.31 |
| 1.4802 | 0.4872 | 1600 | 3.5108 | 86.43 | 83.31 | 89.31 |
| 1.4387 | 0.5482 | 1800 | 3.4920 | 86.43 | 83.31 | 89.31 |
| 1.4597 | 0.6091 | 2000 | 3.4641 | 86.43 | 83.31 | 89.31 |
| 1.4184 | 0.6700 | 2200 | 3.4305 | 86.43 | 83.31 | 89.31 |
| 1.3884 | 0.7309 | 2400 | 3.4378 | 86.43 | 83.31 | 89.31 |
| 1.3969 | 0.7918 | 2600 | 3.4255 | 86.43 | 83.31 | 89.31 |
| 1.386 | 0.8527 | 2800 | 3.4179 | 86.43 | 83.31 | 89.31 |
| 1.3878 | 0.9136 | 3000 | 3.4013 | 86.43 | 83.31 | 89.31 |
| 1.3527 | 0.9745 | 3200 | 3.3740 | 86.43 | 83.31 | 89.31 |
| 1.235 | 1.0353 | 3400 | 3.3815 | 86.43 | 83.31 | 89.31 |
| 1.2022 | 1.0962 | 3600 | 3.3864 | 86.43 | 83.31 | 89.31 |
| 1.2686 | 1.1571 | 3800 | 3.3910 | 86.43 | 83.31 | 89.31 |
| 1.1872 | 1.2180 | 4000 | 3.4042 | 86.43 | 83.31 | 89.31 |
| 1.1492 | 1.2789 | 4200 | 3.4116 | 86.43 | 83.31 | 89.31 |
| 1.1509 | 1.3399 | 4400 | 3.4143 | 86.43 | 83.31 | 89.31 |
| 1.1203 | 1.4008 | 4600 | 3.4283 | 86.43 | 83.31 | 89.31 |
| 1.1141 | 1.4617 | 4800 | 3.4334 | 86.43 | 83.31 | 89.31 |
| 1.0503 | 1.5226 | 5000 | 3.4457 | 86.43 | 83.31 | 89.31 |
| 1.0882 | 1.5835 | 5200 | 3.4416 | 86.43 | 83.31 | 89.31 |
| 1.0906 | 1.6444 | 5400 | 3.4468 | 86.43 | 83.31 | 89.31 |
| 1.1084 | 1.7053 | 5600 | 3.4555 | 86.43 | 83.31 | 89.31 |
| 1.0827 | 1.7662 | 5800 | 3.4560 | 86.43 | 83.31 | 89.31 |
| 1.0913 | 1.8271 | 6000 | 3.4650 | 86.43 | 83.31 | 89.31 |
| 1.0717 | 1.8880 | 6200 | 3.4688 | 86.43 | 83.31 | 89.31 |
| 1.0629 | 1.9489 | 6400 | 3.4671 | 86.43 | 83.31 | 89.31 |
Framework versions
- Transformers 4.55.2
- Pytorch 2.7.1+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 48