🧠 CooperLM-354M

CooperLM-354M is a 354 million parameter GPT-2 based language model trained from scratch on a filtered subset of Wikipedia, BookCorpus, and OpenWebText. It was created as a toy project to explore end-to-end LLM training using Hugging Face’s Transformers and Datasets libraries.

Github Repo: https://github.com/daniel-mehta/CooperLM-354M

🧱 Architecture

  • GPT-2 with 24 layers, 16 heads, 1024 hidden size
  • 256-token context window
  • Trained for 1 epoch on 100k samples (~1.2M sequences)

📊 Training Details

Setting Value
Model Type GPT2LMHeadModel
Epochs 1
Precision fp16
Batch Size (effective) 16
GPU RTX 4080
Final Eval Loss 5.63
Perplexity ~263

📥 Usage

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

model = GPT2LMHeadModel.from_pretrained("mehta/CooperLM-354M")
tokenizer = GPT2TokenizerFast.from_pretrained("mehta/CooperLM-354M")

prompt = "In a distant future,"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.9, top_p=0.95)

print(tokenizer.decode(output[0], skip_special_tokens=True))

📝License

MIT

Downloads last month
18
Safetensors
Model size
354M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mehta/CooperLM-354M

Finetuned
(1841)
this model
Quantizations
1 model