Experiment Log on a lightweight python pro coder
#1
by
burtenshaw
- opened
First MoE training run on wandb
https://wandb.ai/smartwithfood/huggingface/runs/m22i87x1/workspace?nw=nwuserbenjaminburtenshaw
MoE Training
Completed the first few runs of the MoE weights.
Training
Run this command
python trl/trl/scripts/sft.py --config recipes/config_000.yaml
This is the config
model_name_or_path: Qwen/Qwen3-30B-A3B
# dataset
dataset_name: burtenshaw/tulu-3-sft-personas-code-no-prompt
dataset_num_proc: 6
text_column: messages
eos_token: '<|im_end|>'
# training
learning_rate: 2.0e-5
num_train_epochs: 1
packing: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
gradient_checkpointing: true
logging_steps: 1
max_length: 2048
warmup_ratio: 0.03
lr_scheduler_type: 'cosine'
bf16: true
bf16_full_eval: true
fp16: false
Training
Evaluation
The results stay mostly within significance on all benchmarks except live code bench. This make sense because we're training the model to stop thinking with this dataset. This decreased evaluation time by 35%.
Task | Metric | Qwen/Qwen3-30B-A3B | config_000 |
---|---|---|---|
ARC Challenge | acc_norm | 0.3874 | 0.3848 |
Hellaswag | acc_norm | 0.6483 | 0.6747 |
MMLU (Average) | acc | 0.3271 | 0.3581 |
Winogrande | acc | 0.5943 | 0.5975 |
LCB Code Gen v4 | codegen_pass@1:16 | 0.3224 | 0.2269 |