Model Card for meta-llama/Llama-3.1-8B (Instruction-Tuned)

This model is a powerful, multilingual instruction-tuned autoregressive LLM developed by Meta that excels at chat, reasoning, coding, and long-context tasks.

Model Details

Model Description

Llama 3.1 8B is part of Meta's Llama 3.1 collection—released July 23, 2024—including 8B, 70B, and 405B parameter models. It was pre-trained on ~15 trillion tokens of multilingual text and code, with a context window of 128K tokens. Instruction-tuning used supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to optimize for assistive tasks :contentReference[oaicite:1]{index=1}.

  • Developed by: Meta AI
  • Model type: Decoder‑only transformer (auto-regressive)
  • Input/Output modality: Multilingual text and code
  • Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (+ broad multilingual support) :contentReference[oaicite:2]{index=2}
  • Context window: 128,000 tokens :contentReference[oaicite:3]{index=3}
  • Knowledge cutoff: December 2023 :contentReference[oaicite:4]{index=4}
  • License: Llama 3.1 Community License (custom commercial) :contentReference[oaicite:5]{index=5}
  • Finetuned from: Base pretrained Llama 3.1 8B

Model Sources

  • Repository: https://huggingface.co/meta-llama/Llama-3.1-8B :contentReference[oaicite:6]{index=6}
  • Paper: “Introducing Llama 3” blog post by Meta AI, April 18, 2024; updated to version 3.1 July 23, 2024 :contentReference[oaicite:7]{index=7}
  • Demo: Available via transformers pipeline, or hosted on Meta.ai and WhatsApp :contentReference[oaicite:8]{index=8}

Uses

Direct Use

Ideal for multilingual chatbots, reasoning assistants, code generation, summarization, data synthesis, and long-context tasks (document analysis, RAG).

Downstream Use

Can be fine-tuned for domain-specific applications such as RAG, summarization, topic-controlled dialogue, coding agents, multimodal reasoning pipelines.

Out-of-Scope Use

Not designed for vision (image, audio, video generation). Avoid using for disallowed content per license (e.g., illicit or unsafe instructions).

Bias, Risks, and Limitations

  • May produce biased or unsafe content, hallucinatory outputs, and reflection of training data biases.
  • Context window misuse could cause unexpected behavior.
  • Not fully safe for sensitive/legal/medical advice without guardrails.

Recommendations

Use with moderation filters, human oversight, prompt safety checks, and evaluation for target domain bias and safety.

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "meta-llama/Llama-3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("Tell me a story about a dragon:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

Pre-trained on a cleaned corpus of ~15 trillion public tokens (multilingual text/code). Instruction tuning used public datasets and ~25M synthetic examples from SFT/RLHF (Collabnix, Lifewire, Hugging Face).

Training Procedure

  • Preprocessing: Public web, code, and instruction data filtered via Meta classifiers.
  • Hyperparameters: Referenced in local repo; mix of SFT & RLHF; context length up to 128K.

Speeds, Sizes, Times

  • Pretraining: 15 trillion tokens; ~1.46 M GPU hours for 8B model (Collabnix).
  • Checkpoint size: ~8 B parameters; ~30–40 GB depending on format (fp16, bfloat16).

Evaluation

Testing Data & Metrics

Benchmarked on multilingual tasks (MMLU, coding, reasoning), outperforming many open and closed models (Hugging Face).

  • Instruction-tuned 8B: ~69.4% MMLU; latency ~280 ms TTFT; ~193 tokens/sec (Hugging Face).

Results Summary

Metric Value
MMLU (instruction) ~69.4%
Perplexity (The Pile) ~8.28 (fp16)
Throughput ~192.9 tokens/sec
Time-to-first-token ~0.28 sec

Environmental Impact

  • Pretraining compute: ~1.46M GPU hours (H100s) for 8B; ~15T tokens.
  • Estimated CO₂e emissions: Use ML CO₂ Impact calculator for specifics.

Technical Specifications

Architecture

  • Decoder-only Transformer with SwiGLU, rotary embeddings, RMSNorm, Grouped-Query Attention (GQA); 32 layers, 8B parameters (arXiv, Prompthub, Collabnix, Wikipedia).

Compute Infrastructure

  • Pretrained on large Meta GPU clusters, likely H100-based.

Software

  • Implemented in PyTorch and Hugging Face Transformers (v4.43+) (Hugging Face).

Citation

@misc{together2024llama3,
  title={Introducing Llama 3},
  author={Meta AI},
  howpublished={\url{https://ai.meta.com/blog/meta-llama-3/}},
  year={2024},
  note={Version 3.1 released July 23, 2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for alphaoumardev/Llama3-8B-noryu

Finetuned
(1587)
this model

Datasets used to train alphaoumardev/Llama3-8B-noryu