Llama-10M-1M
A 10M parameter LLaMA model trained on 1M synthetic tokens using the BabyLlama framework.
Model Details
- Model Type: Causal Language Model (LLaMA architecture)
- Parameters: 3,652,032 parameters
- Training Data: Synthetic text data (3,519 samples)
- Architecture:
- Hidden Size: 192
- Layers: 6
- Attention Heads: 6
- Sequence Length: 128
Training Details
- Training Loss: 2.499714469909668
- Evaluation Loss: N/A
- Perplexity: N/A
- Learning Rate: 3e-4
- Batch Size: 32
- Epochs: 2
- Training Time: 29.3597 seconds
- Training Samples: 3,519
Evaluation Metrics
Metric | Value |
---|---|
Perplexity | N/A |
Training Loss | 2.499714469909668 |
Evaluation Loss | N/A |
Training Time | 29.3597s |
Parameters | 3,652,032 |
Training Samples | 3,519 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("pgryko/babyllama-10m")
tokenizer = AutoTokenizer.from_pretrained("pgryko/babyllama-10m")
# Generate text
inputs = tokenizer("The quick brown fox", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, temperature=0.8, do_sample=True)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Training Framework
This model was trained using the BabyLlama framework, which provides:
- Modern training pipeline with HuggingFace Transformers
- Efficient data processing and tokenization
- Comprehensive evaluation metrics
- Support for multiple architectures (LLaMA, GPT-2, GPT-J)
Citation
If you use this model in your research, please cite:
@misc{babyllama2024,
title={BabyLlama: Training Small Language Models from Scratch},
author={BabyLlama Team},
year={2024},
url={https://github.com/pgryko/BabyLlama}
}
License
This model is released under the MIT License.
Detailed Evaluation Results
Generation Quality Metrics
- Diversity Score: 0.932
- Repetition Score: 0.528 (lower is better)
- Average Top Token Probability: 0.356
- Average Entropy: 2.015
- Low Confidence Ratio: 0.791
Sample Generations
- "A child teaches slowly at the office, therefore the teacher writes happily. The bird reads thoughtfully in the garden. An artist writes carefully outside, afterwards the engineer explores eagerly. A child walks quickly in the park, meanwhile a writer creates sadly. A student"
- "The cat designs carefully at the library. A child jumps eagerly in the school, furthermore an artist learns thoughtfully. The engineer explores carefully in the school. The cat discovers eagerly on the street, and the scientist teaches quickly. The bird explores slowly in the"
- "The scientist teaches quickly in the park, however the engineer imagines creatively. A child thinks sadly in the lab, however a writer walks carefully. A dog writes sadly at the office. A dog explores patiently in the classroom. The engineer creates sadly in the"
- "A writer thinks sadly at the library. A writer reads carefully on the street, but the cat builds quickly. A student jumps patiently in the school. A student runs happily in the school, moreover a writer reads quickly. The cat creates brilliantly in the"
- "The engineer learns creatively at the office, afterwards a student runs quickly. The teacher thinks creatively in the school, and the scientist creates patiently. The scientist writes brilliantly in the lab, therefore the scientist designs brilliantly. A writer imagines creatively in the school."
Evaluation Plots
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support