🧠 SmallCoder (303M)

SmallCoder is a 303M parameter LLaMA-style language model trained from scratch for code generation and algorithmic reasoning.

This checkpoint represents a 6B-token Supervised Fine-Tuning (SFT) run that fixed a critical End-of-Sequence (EOS) token bug from earlier versions.

Despite its compact size, SmallCoder achieves state-of-the-art (SOTA) coding performance for <500M models, rivaling 1B–7B parameter LLMs.

Trained with support from Google’s TPU Research Cloud (TRC) program.


πŸš€ Key Results

Model Size HumanEval (pass@1) MBPP (pass@1)
SmallCoder (Stage 4.1) 303M 27.4 % 31.0 %
TinyLlama-1.1B 1.1B ~26.4 % ~27.6 %
MPT-1B-Instruct 1.0B ~22.0 % ~25.0 %
Zephyr-1.3B-SFT 1.3B 31.0 % 34.0 %
Mistral-7B-Base 7B 30.5 % 47.5 %

βš–οΈ SmallCoder nearly matches Mistral 7B on HumanEval while being 23Γ— smaller.


🧬 Model Architecture

A LLaMA-type causal decoder with standard Multi-Head Attention (MHA).

LlamaConfig(
  vocab_size=49152,               # StarCoder tokenizer
  hidden_size=768,
  num_hidden_layers=24,
  num_attention_heads=8,
  num_key_value_heads=8,
  intermediate_size=3072,
  max_position_embeddings=1024,
)
Parameter Value
Total parameters β‰ˆ 303 M
Context length 1 024 tokens
Tokenizer bigcode/starcoder
Architecture type LLaMA (MHA, non-GQA)
Precision bfloat16
Optimizer AdamW XLA
Hardware TPU v4-32 (TRC)

πŸ“š Training Curriculum (4 Stages, 29.8B tokens)

Stage Tokens (B) Dataset Objective Loss ↓
1. Linguistic Base 6.3 FineWeb-Edu General English grounding 10.87 β†’ 2.58
2. Code Specialization 7.5 60 % Nemotron Synthetic Code / 40 % StarCoderData Code syntax & reasoning 5.00 β†’ 1.25
3. Math & Knowledge 10.0 Nemotron CC-Math / FineWiki / OpenWebMath Mathematical reasoning 2.77 β†’ 1.55
4.1 SFT (EOS Fixed) 6.0 Nemotron SFT / OpenCodeInstruct / OpenMathInstruct-2 Instruction-tuned code alignment 1.73 β†’ ~0.70

🧩 Total β‰ˆ 29.8 B tokens of curated curriculum learning.


πŸ“Š Detailed Benchmarks (Stage 4.1 SFT)

Domain Benchmark Metric Score
Code HumanEval (0-shot) pass@1 27.4 %
Code MBPP (3-shot) pass@1 31.0 %
Math GSM8k (0-shot) exact match 4.55 %
Knowledge Wikitext-2 perplexity ↓ 167.6
Reasoning ARC (Easy/Challenge) acc norm 34.6 / 22.8 %
Commonsense HellaSwag acc norm 28.3 %

humaneval/mbpp were computed with manual evaluation (max_new_tokens=512, temp=0.2) due to SFT format truncation issues in lm-eval.


⚠️ Known Limitations

  1. Code-Specialized Model Tuned for Python and algorithmic reasoning. Poor performance on general text, math, and commonsense tasks.

  2. Short Context Trained on 1 024-token sequences only. Performance degrades on longer inputs.

  3. Tokenizer Bias Uses bigcode/starcoder BPE vocabulary β€” optimized for code, not prose.


πŸ’» Usage Example

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Beebey/smallcoder-303m"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).to(device)

prompt = """User: Write a Python function to compute Fibonacci numbers.
Assistant:"""
inputs = tokenizer(prompt, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ’‘ Trained using the β€œUser:” / β€œAssistant:” dialogue format.


🧾 Citation

If you use SmallCoder (303M) in your research, please cite:

@misc{smallcoder303m,
  title  = {SmallCoder: A 303M-parameter Code LLM trained from scratch},
  author = {Da Silva, Ilan},
  year   = {2025},
  url    = {https://huggingface.co/Beebey/smallcoder-303m},
  note   = {Trained with Google TPU Research Cloud (TRC) support}
}

πŸ™ Acknowledgements

This model was trained with support from the Google TPU Research Cloud (TRC) program. Special thanks to the open datasets that enabled this work: FineWeb, StarCoderData, Nemotron, and OpenWebMath.


🧩 Summary

Category Description
Type Code LLM (LLaMA-style)
Parameters 303 M
Training tokens ~29.8 B
Specialty Code generation & reasoning
Context window 1 024 tokens
Tokenizer bigcode/starcoder
License Apache 2.0
Hardware TPU v4 (TRC Program)

πŸ”¬ SmallCoder (303M) demonstrates that a carefully designed <500M model can achieve near-SOTA coding performance, matching 1B-class models on HumanEval β€” proving that efficient, compact, open models still matter.


Downloads last month
43
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Beebey/smallcoder-303m

Quantizations
1 model

Datasets used to train Beebey/smallcoder-303m