See axolotl config

axolotl version: 0.10.0


base_model: Qwen/Qwen3-32B
# Automatically upload checkpoint and final model to HF
hub_model_id: ctitools/neurocti-qwen3-32b-orkl10k-base-v1

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false

chat_template: qwen3
#pretraining_dataset: 
#  - ctitools/orkl_cleaned_10k
#max_steps: 24576

datasets:
  - path: ctitools/orkl_cleaned_10k
    type: completion

val_set_size: 0.01
output_dir: ./outputs/out
dataset_prepared_path: last_run_prepared

sequence_len: 4096
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

#load_in_4bit: false
#load_in_8bit: true
adapter: lora
lora_r: 16
lora_alpha: 32
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - down_proj
  - up_proj
lora_mlp_kernel: true
lora_qkv_kernel: true
lora_o_kernel: true

bf16: auto
tf32: true

wandb_project: neurocti-qwen3-32b
wandb_entity: aaronkaplan
wandb_watch: 
wandb_name: neurocti-hunting_lora_neurocti-qwen3-32b-orkl10k-base-fb16-r16-lr0.0004-sl4096-e3-v1
wandb_log_model: 

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 3
#optimizer: adamw_torch_4bit
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0004

gradient_checkpointing: offload
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
special_tokens:

# multi-gpu setups:
deepspeed: deepspeed_configs/zero2.json

neurocti-qwen3-32b-orkl10k-base-v1

This model is a fine-tuned version of Qwen/Qwen3-32B on the ctitools/orkl_cleaned_10k dataset. It achieves the following results on the evaluation set:

Loss: 1.9131

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 4
total_eval_batch_size: 2
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 5085

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0	0	2.3042
2.0217	0.2501	424	1.8355
1.7319	0.5003	848	1.8335
1.9541	0.7504	1272	1.8253
1.9703	1.0006	1696	1.8291
1.8948	1.2507	2120	1.8597
1.7536	1.5009	2544	1.9037
1.7786	1.7510	2968	1.8944
1.7746	2.0012	3392	1.8625
1.7543	2.2513	3816	1.8899
1.5163	2.5015	4240	1.9114
1.6959	2.7516	4664	1.9131

Framework versions

PEFT 0.15.2
Transformers 4.52.3
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.1

ctitools
/

neurocti-qwen3-32b-orkl10k-base-v2

You need to agree to share your contact information to access this model

neurocti-qwen3-32b-orkl10k-base-v1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ctitools/neurocti-qwen3-32b-orkl10k-base-v2

Dataset used to train ctitools/neurocti-qwen3-32b-orkl10k-base-v2

Evaluation results