Llama-10M-1M

A 10M parameter LLaMA model trained on 1M synthetic tokens using the BabyLlama framework.

Model Details

Model Type: Causal Language Model (LLaMA architecture)
Parameters: 3,652,032 parameters
Training Data: Synthetic text data (3,519 samples)
Architecture:
- Hidden Size: 192
- Layers: 6
- Attention Heads: 6
- Sequence Length: 128

Training Details

Training Loss: 2.499714469909668
Evaluation Loss: N/A
Perplexity: N/A
Learning Rate: 3e-4
Batch Size: 32
Epochs: 2
Training Time: 29.3597 seconds
Training Samples: 3,519

Evaluation Metrics

Metric	Value
Perplexity	N/A
Training Loss	2.499714469909668
Evaluation Loss	N/A
Training Time	29.3597s
Parameters	3,652,032
Training Samples	3,519

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("pgryko/babyllama-10m")
tokenizer = AutoTokenizer.from_pretrained("pgryko/babyllama-10m")

# Generate text
inputs = tokenizer("The quick brown fox", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, temperature=0.8, do_sample=True)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Training Framework

This model was trained using the BabyLlama framework, which provides:

Modern training pipeline with HuggingFace Transformers
Efficient data processing and tokenization
Comprehensive evaluation metrics
Support for multiple architectures (LLaMA, GPT-2, GPT-J)

Citation

If you use this model in your research, please cite:

@misc{babyllama2024,
  title={BabyLlama: Training Small Language Models from Scratch},
  author={BabyLlama Team},
  year={2024},
  url={https://github.com/pgryko/BabyLlama}
}

License

This model is released under the MIT License.

Detailed Evaluation Results

Generation Quality Metrics

Diversity Score: 0.932
Repetition Score: 0.528 (lower is better)
Average Top Token Probability: 0.356
Average Entropy: 2.015
Low Confidence Ratio: 0.791

Sample Generations

"A child teaches slowly at the office, therefore the teacher writes happily. The bird reads thoughtfully in the garden. An artist writes carefully outside, afterwards the engineer explores eagerly. A child walks quickly in the park, meanwhile a writer creates sadly. A student"
"The cat designs carefully at the library. A child jumps eagerly in the school, furthermore an artist learns thoughtfully. The engineer explores carefully in the school. The cat discovers eagerly on the street, and the scientist teaches quickly. The bird explores slowly in the"
"The scientist teaches quickly in the park, however the engineer imagines creatively. A child thinks sadly in the lab, however a writer walks carefully. A dog writes sadly at the office. A dog explores patiently in the classroom. The engineer creates sadly in the"
"A writer thinks sadly at the library. A writer reads carefully on the street, but the cat builds quickly. A student jumps patiently in the school. A student runs happily in the school, moreover a writer reads quickly. The cat creates brilliantly in the"
"The engineer learns creatively at the office, afterwards a student runs quickly. The teacher thinks creatively in the school, and the scientist creates patiently. The scientist writes brilliantly in the lab, therefore the scientist designs brilliantly. A writer imagines creatively in the school."

pgryko
/

babyllama-10m