Experiment Log on a lightweight python pro coder

#1
by burtenshaw - opened

Trained the first yolo run for this model on colab and here's the wandb run.

So far, I'm working with this dataset.

MoE Training

Completed the first few runs of the MoE weights.

Training

Run this command

python trl/trl/scripts/sft.py --config recipes/config_000.yaml 

This is the config

model_name_or_path: Qwen/Qwen3-30B-A3B

# dataset
dataset_name: burtenshaw/tulu-3-sft-personas-code-no-prompt
dataset_num_proc: 6
text_column: messages
eos_token: '<|im_end|>'

# training
learning_rate: 2.0e-5
num_train_epochs: 1
packing: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
gradient_checkpointing: true
logging_steps: 1
max_length: 2048
warmup_ratio: 0.03
lr_scheduler_type: 'cosine'
bf16: true
bf16_full_eval: true
fp16: false

Training

image.png

Evaluation

The results stay mostly within significance on all benchmarks except live code bench. This make sense because we're training the model to stop thinking with this dataset. This decreased evaluation time by 35%.

Task Metric Qwen/Qwen3-30B-A3B config_000
ARC Challenge acc_norm 0.3874 0.3848
Hellaswag acc_norm 0.6483 0.6747
MMLU (Average) acc 0.3271 0.3581
Winogrande acc 0.5943 0.5975
LCB Code Gen v4 codegen_pass@1:16 0.3224 0.2269
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment