Llama-10M-1M

A 10M parameter LLaMA model trained on 1M synthetic tokens using the BabyLlama framework.

Model Details

  • Model Type: Causal Language Model (LLaMA architecture)
  • Parameters: 3,652,032 parameters
  • Training Data: Synthetic text data (3,519 samples)
  • Architecture:
    • Hidden Size: 192
    • Layers: 6
    • Attention Heads: 6
    • Sequence Length: 128

Training Details

  • Training Loss: 2.499714469909668
  • Evaluation Loss: N/A
  • Perplexity: N/A
  • Learning Rate: 3e-4
  • Batch Size: 32
  • Epochs: 2
  • Training Time: 29.3597 seconds
  • Training Samples: 3,519

Evaluation Metrics

Metric Value
Perplexity N/A
Training Loss 2.499714469909668
Evaluation Loss N/A
Training Time 29.3597s
Parameters 3,652,032
Training Samples 3,519

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("pgryko/babyllama-10m")
tokenizer = AutoTokenizer.from_pretrained("pgryko/babyllama-10m")

# Generate text
inputs = tokenizer("The quick brown fox", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, temperature=0.8, do_sample=True)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Training Framework

This model was trained using the BabyLlama framework, which provides:

  • Modern training pipeline with HuggingFace Transformers
  • Efficient data processing and tokenization
  • Comprehensive evaluation metrics
  • Support for multiple architectures (LLaMA, GPT-2, GPT-J)

Citation

If you use this model in your research, please cite:

@misc{babyllama2024,
  title={BabyLlama: Training Small Language Models from Scratch},
  author={BabyLlama Team},
  year={2024},
  url={https://github.com/pgryko/BabyLlama}
}

License

This model is released under the MIT License.

Detailed Evaluation Results

Generation Quality Metrics

  • Diversity Score: 0.932
  • Repetition Score: 0.528 (lower is better)
  • Average Top Token Probability: 0.356
  • Average Entropy: 2.015
  • Low Confidence Ratio: 0.791

Sample Generations

  1. "A child teaches slowly at the office, therefore the teacher writes happily. The bird reads thoughtfully in the garden. An artist writes carefully outside, afterwards the engineer explores eagerly. A child walks quickly in the park, meanwhile a writer creates sadly. A student"
  2. "The cat designs carefully at the library. A child jumps eagerly in the school, furthermore an artist learns thoughtfully. The engineer explores carefully in the school. The cat discovers eagerly on the street, and the scientist teaches quickly. The bird explores slowly in the"
  3. "The scientist teaches quickly in the park, however the engineer imagines creatively. A child thinks sadly in the lab, however a writer walks carefully. A dog writes sadly at the office. A dog explores patiently in the classroom. The engineer creates sadly in the"
  4. "A writer thinks sadly at the library. A writer reads carefully on the street, but the cat builds quickly. A student jumps patiently in the school. A student runs happily in the school, moreover a writer reads quickly. The cat creates brilliantly in the"
  5. "The engineer learns creatively at the office, afterwards a student runs quickly. The teacher thinks creatively in the school, and the scientist creates patiently. The scientist writes brilliantly in the lab, therefore the scientist designs brilliantly. A writer imagines creatively in the school."

Evaluation Plots

Evaluation Plots

Downloads last month
4
Safetensors
Model size
3.65M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support