Model Card for meta-llama/Llama-3.1-8B (Instruction-Tuned)
This model is a powerful, multilingual instruction-tuned autoregressive LLM developed by Meta that excels at chat, reasoning, coding, and long-context tasks.
Model Details
Model Description
Llama 3.1 8B is part of Meta's Llama 3.1 collection—released July 23, 2024—including 8B, 70B, and 405B parameter models. It was pre-trained on ~15 trillion tokens of multilingual text and code, with a context window of 128K tokens. Instruction-tuning used supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to optimize for assistive tasks :contentReference[oaicite:1]{index=1}.
- Developed by: Meta AI
- Model type: Decoder‑only transformer (auto-regressive)
- Input/Output modality: Multilingual text and code
- Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (+ broad multilingual support) :contentReference[oaicite:2]{index=2}
- Context window: 128,000 tokens :contentReference[oaicite:3]{index=3}
- Knowledge cutoff: December 2023 :contentReference[oaicite:4]{index=4}
- License: Llama 3.1 Community License (custom commercial) :contentReference[oaicite:5]{index=5}
- Finetuned from: Base pretrained Llama 3.1 8B
Model Sources
- Repository:
https://huggingface.co/meta-llama/Llama-3.1-8B
:contentReference[oaicite:6]{index=6} - Paper: “Introducing Llama 3” blog post by Meta AI, April 18, 2024; updated to version 3.1 July 23, 2024 :contentReference[oaicite:7]{index=7}
- Demo: Available via transformers pipeline, or hosted on Meta.ai and WhatsApp :contentReference[oaicite:8]{index=8}
Uses
Direct Use
Ideal for multilingual chatbots, reasoning assistants, code generation, summarization, data synthesis, and long-context tasks (document analysis, RAG).
Downstream Use
Can be fine-tuned for domain-specific applications such as RAG, summarization, topic-controlled dialogue, coding agents, multimodal reasoning pipelines.
Out-of-Scope Use
Not designed for vision (image, audio, video generation). Avoid using for disallowed content per license (e.g., illicit or unsafe instructions).
Bias, Risks, and Limitations
- May produce biased or unsafe content, hallucinatory outputs, and reflection of training data biases.
- Context window misuse could cause unexpected behavior.
- Not fully safe for sensitive/legal/medical advice without guardrails.
Recommendations
Use with moderation filters, human oversight, prompt safety checks, and evaluation for target domain bias and safety.
How to Get Started
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "meta-llama/Llama-3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
inputs = tokenizer("Tell me a story about a dragon:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
Pre-trained on a cleaned corpus of ~15 trillion public tokens (multilingual text/code). Instruction tuning used public datasets and ~25M synthetic examples from SFT/RLHF (Collabnix, Lifewire, Hugging Face).
Training Procedure
- Preprocessing: Public web, code, and instruction data filtered via Meta classifiers.
- Hyperparameters: Referenced in local repo; mix of SFT & RLHF; context length up to 128K.
Speeds, Sizes, Times
- Pretraining: 15 trillion tokens; ~1.46 M GPU hours for 8B model (Collabnix).
- Checkpoint size: ~8 B parameters; ~30–40 GB depending on format (fp16, bfloat16).
Evaluation
Testing Data & Metrics
Benchmarked on multilingual tasks (MMLU, coding, reasoning), outperforming many open and closed models (Hugging Face).
- Instruction-tuned 8B: ~69.4% MMLU; latency ~280 ms TTFT; ~193 tokens/sec (Hugging Face).
Results Summary
Metric | Value |
---|---|
MMLU (instruction) | ~69.4% |
Perplexity (The Pile) | ~8.28 (fp16) |
Throughput | ~192.9 tokens/sec |
Time-to-first-token | ~0.28 sec |
Environmental Impact
- Pretraining compute: ~1.46M GPU hours (H100s) for 8B; ~15T tokens.
- Estimated CO₂e emissions: Use ML CO₂ Impact calculator for specifics.
Technical Specifications
Architecture
- Decoder-only Transformer with SwiGLU, rotary embeddings, RMSNorm, Grouped-Query Attention (GQA); 32 layers, 8B parameters (arXiv, Prompthub, Collabnix, Wikipedia).
Compute Infrastructure
- Pretrained on large Meta GPU clusters, likely H100-based.
Software
- Implemented in PyTorch and Hugging Face Transformers (v4.43+) (Hugging Face).
Citation
@misc{together2024llama3,
title={Introducing Llama 3},
author={Meta AI},
howpublished={\url{https://ai.meta.com/blog/meta-llama-3/}},
year={2024},
note={Version 3.1 released July 23, 2024}
}
Model tree for alphaoumardev/Llama3-8B-noryu
Base model
meta-llama/Llama-3.1-8B