You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

See axolotl config

axolotl version: 0.12.2

# In case of weird errors, try reinstalling
# pip install --no-build-isolation axolotl[deepspeed]
# (unsloth libraries are incompatible)
base_model: Qwen/Qwen3-14B

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: Sunbird/ug40-instructions
    name: pretraining_text_qwen
    split: train
    text_column: text
    type: completion

test_datasets:
  - path: Sunbird/ug40-instructions
    name:  pretraining_text_qwen
    split: dev
    text_column: text
    type: completion
      
sequence_len: 512
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

gradient_accumulation_steps: 8 # Remember to check number of GPUs on the instance
micro_batch_size: 4 # 4 on 4xH100, 16 on 8xH100
num_epochs: 2
optimizer: adamw_torch_fused
learning_rate: 2e-5
lr_scheduler: cosine
weight_decay: 0.01
max_grad_norm: 1.0

train_on_inputs: 
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
xformers_attention:
flash_attention: true
eager_attention: 

# plugins:
#   - axolotl.integrations.liger.LigerPlugin

# liger_rope: true
# liger_rms_norm: true
# liger_glu_activation: true
# liger_layer_norm: true
# liger_fused_linear_cross_entropy: true

loss_watchdog_threshold: 10.0
loss_watchdog_patience: 3

warmup_steps: 20
eval_steps: 200
#save_steps: 5000 
logging_steps: 5
save_strategy: epoch
save_only_model: true
hub_model_id: sunflower-qwen14b-pretrained
hub_strategy: end

#save_total_limit: 2
# auto_resume_from_checkpoints: true
debug:

deepspeed: zero3_bf16.json

# fsdp:
#   - full_shard
#   - auto_wrap

# fsdp_config:
#   fsdp_version: 2
#   fsdp_offload_params: false
#   fsdp_cpu_ram_efficient_loading: true
#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
#   fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer
#   fsdp_state_dict_type: FULL_STATE_DICT
#   fsdp_sharding_strategy: FULL_SHARD
#   fsdp_reshard_after_forward: true
#   fsdp_activation_checkpointing: true
  
dataset_prepared_path: last_run_prepared
output_dir: ./outputs-14b/

use_wandb: true
use_mlflow: true
wandb_project: ug40-pretraining
# wandb_name also sets mlflow run name
wandb_name: qwen3-14b-updated-dataset
mlflow_tracking_uri: https://mlflow.sunbird.ai
mlflow_experiment_name: ug40-pretraining
# mlflow_run_name: qwen3-14b-convergence-test-lr5e-5

sunflower-qwen14b-pretrained

This model is a fine-tuned version of Qwen/Qwen3-14B on the Sunbird/ug40-instructions dataset. It achieves the following results on the evaluation set:

Loss: 3.4671
Memory/max Mem Active(gib): 86.43
Memory/max Mem Allocated(gib): 83.31
Memory/device Mem Reserved(gib): 89.31

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 20
training_steps: 6566

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	5.0475	32.9	31.23	33.76
1.9221	0.0609	200	3.9620	86.26	83.31	89.25
1.7596	0.1218	400	3.7963	86.26	83.31	89.25
1.6725	0.1827	600	3.7146	86.26	83.31	89.25
1.5979	0.2436	800	3.6525	86.26	83.31	89.31
1.5777	0.3045	1000	3.6217	86.43	83.31	89.31
1.5402	0.3654	1200	3.5778	86.43	83.31	89.31
1.4566	0.4263	1400	3.5412	86.43	83.31	89.31
1.4802	0.4872	1600	3.5108	86.43	83.31	89.31
1.4387	0.5482	1800	3.4920	86.43	83.31	89.31
1.4597	0.6091	2000	3.4641	86.43	83.31	89.31
1.4184	0.6700	2200	3.4305	86.43	83.31	89.31
1.3884	0.7309	2400	3.4378	86.43	83.31	89.31
1.3969	0.7918	2600	3.4255	86.43	83.31	89.31
1.386	0.8527	2800	3.4179	86.43	83.31	89.31
1.3878	0.9136	3000	3.4013	86.43	83.31	89.31
1.3527	0.9745	3200	3.3740	86.43	83.31	89.31
1.235	1.0353	3400	3.3815	86.43	83.31	89.31
1.2022	1.0962	3600	3.3864	86.43	83.31	89.31
1.2686	1.1571	3800	3.3910	86.43	83.31	89.31
1.1872	1.2180	4000	3.4042	86.43	83.31	89.31
1.1492	1.2789	4200	3.4116	86.43	83.31	89.31
1.1509	1.3399	4400	3.4143	86.43	83.31	89.31
1.1203	1.4008	4600	3.4283	86.43	83.31	89.31
1.1141	1.4617	4800	3.4334	86.43	83.31	89.31
1.0503	1.5226	5000	3.4457	86.43	83.31	89.31
1.0882	1.5835	5200	3.4416	86.43	83.31	89.31
1.0906	1.6444	5400	3.4468	86.43	83.31	89.31
1.1084	1.7053	5600	3.4555	86.43	83.31	89.31
1.0827	1.7662	5800	3.4560	86.43	83.31	89.31
1.0913	1.8271	6000	3.4650	86.43	83.31	89.31
1.0717	1.8880	6200	3.4688	86.43	83.31	89.31
1.0629	1.9489	6400	3.4671	86.43	83.31	89.31