🧠 CooperLM-354M
CooperLM-354M is a 354 million parameter GPT-2 based language model trained from scratch on a filtered subset of Wikipedia, BookCorpus, and OpenWebText. It was created as a toy project to explore end-to-end LLM training using Hugging Face’s Transformers and Datasets libraries.
Github Repo: https://github.com/daniel-mehta/CooperLM-354M
🧱 Architecture
- GPT-2 with 24 layers, 16 heads, 1024 hidden size
- 256-token context window
- Trained for 1 epoch on 100k samples (~1.2M sequences)
📊 Training Details
Setting | Value |
---|---|
Model Type | GPT2LMHeadModel |
Epochs | 1 |
Precision | fp16 |
Batch Size (effective) | 16 |
GPU | RTX 4080 |
Final Eval Loss | 5.63 |
Perplexity | ~263 |
📥 Usage
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
model = GPT2LMHeadModel.from_pretrained("mehta/CooperLM-354M")
tokenizer = GPT2TokenizerFast.from_pretrained("mehta/CooperLM-354M")
prompt = "In a distant future,"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.9, top_p=0.95)
print(tokenizer.decode(output[0], skip_special_tokens=True))
📝License
MIT
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support