🧠 CooperLM-354M

CooperLM-354M is a 354 million parameter GPT-2 based language model trained from scratch on a filtered subset of Wikipedia, BookCorpus, and OpenWebText. It was created as a toy project to explore end-to-end LLM training using Hugging Face’s Transformers and Datasets libraries.

Github Repo: https://github.com/daniel-mehta/CooperLM-354M

🧱 Architecture

GPT-2 with 24 layers, 16 heads, 1024 hidden size
256-token context window
Trained for 1 epoch on 100k samples (~1.2M sequences)

📊 Training Details

Setting	Value
Model Type	GPT2LMHeadModel
Epochs	1
Precision	fp16
Batch Size (effective)	16
GPU	RTX 4080
Final Eval Loss	5.63
Perplexity	~263

📥 Usage

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

model = GPT2LMHeadModel.from_pretrained("mehta/CooperLM-354M")
tokenizer = GPT2TokenizerFast.from_pretrained("mehta/CooperLM-354M")

prompt = "In a distant future,"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.9, top_p=0.95)

print(tokenizer.decode(output[0], skip_special_tokens=True))

📝License

MIT

mehta
/

CooperLM-354M

🧠 CooperLM-354M

Github Repo: https://github.com/daniel-mehta/CooperLM-354M

🧱 Architecture

📊 Training Details

📥 Usage

📝License

Model tree for mehta/CooperLM-354M