Llama-3.1-8B-Instruct-Elite

Abstract
A bilingual (Chinese/English) instruction-tuned model based on Llama-3.1-8B-Instruct. It follows the training recipe of Llama-3.2-3B-Elite (Qwen-3-235b-a22b-Instruct-2507 as teacher for distillation + SFT), but intentionally reduces emojis (From Qwen3 teacher) while retaining and reinforcing professional formatting (e.g., bolded subheadings, bullet lists, clear paragraphs) to produce answers that are cleaner, more stable, and easier to read.

Base Training Languages GPU Quantization License

Table of Contents


Highlights

  • Professional and clean: fewer emojis by default; outputs emphasize bolded subheadings + bullet lists, making content easy to copy and further edit.
  • Stable structure: Consistent formatting for sectioned reports, step checklists, comparison tables, and key-point summaries.
  • Bilingual / mixed text friendly: Strong terminology coherence and clear hierarchy for Chinese, English, and mixed Chineseโ€“English scenarios.
  • Stronger instruction-following: Higher adherence to constraints such as โ€œno emojis,โ€ โ€œonly output key-point tables,โ€ and โ€œpreserve Markdown heading levels.โ€
  • Controllable verbosity: Defaults to less verbosity, focusing on key information while keeping necessary context.

Base: meta-llama/Llama-3.1-8B-Instruct; Training paradigm: Teacher distillation + SFT.


Model Overview

  • Parameters: 8B
  • Tasks: Instruction following / Dialogue generation / Q&A / Summarization / Structured output
  • Languages: Chinese & English (robust for mixed Chineseโ€“English)
  • Goal: Deliver concise, professional, and format-friendly content on modest compute (reduced emojis; keep bolded subheadings, bullet lists, and other formatting enhancements).

Training & Data

  • Data size: About 80,000 high-quality instructionโ€“response pairs (Chinese/English mix covering Q&A, summarization, expository writing, structured output, procedural steps, etc.).
  • Method: Distillation from a teacher model + SFT; explicit format/style control (fewer emojis; emphasize headings/lists/bold).
  • Compute: Single A100; LoRA/QLoRA can complete several epochs within a short time.
  • Style & constraints: Fewer emojis; strengthened bold subheadings, bullet lists, bold key terms, and clear paragraph hierarchy.

If a distilled-data subset is released, add links and stats here (sample counts / language ratios / filtering rules).


Quickstart

Transformers (recommended)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Jackrong/Llama-3.1-8B-Instruct-Elite"
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "Write clean, professional answers with bolded subheadings and structured lists; avoid emojis."},
    {"role": "user", "content": "่ฏท็”จ่ฆ็‚น่ฏดๆ˜Žๅฆ‚ไฝ•ไผ˜ๅŒ–ๅ‘จ่ฎกๅˆ’๏ผŒไฝฟๅ…ถๆ›ดๅฏๆ‰ง่กŒใ€‚"}
]

prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
)
print(tok.decode(outputs[0], skip_special_tokens=True))
vLLM
from vllm import LLM, SamplingParams

llm = LLM(model="Jackrong/Llama-3.1-8B-Instruct-Elite", dtype="bfloat16")
params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)

prompt = "ๅˆ—ๅ‡บ 5 ๆกๅฏๆ‰ง่กŒ็š„ๅ‘จ่ฎกๅˆ’ไผ˜ๅŒ–ๅปบ่ฎฎ๏ผˆ็”จๅŠ ็ฒ—ๅฐๆ ‡้ข˜+่ฆ็‚นๅˆ—่กจ๏ผ‰ใ€‚"
outputs = llm.generate([prompt], params)
print(outputs[0].outputs[0].text)
llama.cpp (GGUF: Q4_K_M)
./main -m Llama-3.1-8B-Instruct-Elite.Q4_K_M.gguf   -p "ไปฅ่ฆ็‚น่ฏดๆ˜Ž๏ผšๅฆ‚ไฝ•ๅฐ†ๆŠ€ๆœฏๆ–‡็ซ ๆ”นๅ†™ๅพ—ๆ›ดไธ“ไธšไธ”ๅนฒๅ‡€๏ผŸ"

Prompting & Output Conventions

  • Organize with concise headings and bolded subheadings; bold key terms and conclusions where helpful.
  • Use bullet lists for steps and key points; avoid emojis by default.
  • Sampling tips: temperature=0.6โ€“0.8, top_p=0.9โ€“0.95.

Use Cases & Limitations

Use cases: Chinese/English or mixed bilingual Q&A, summarization, instructional/technical/business writing; structured outputs (plans, steps, tables, FAQs, meeting minutes).
Limitations: For high-factuality tasks that require up-to-date information, pair with retrieval; for medical/legal/financial or other high-risk scenarios, use human review; do not use for illegal or harmful purposes.


License

  • Model weights: Llama 3.1 Community License (same as base).
  • Code/scripts: May use Apache-2.0 or similar; the weight license remains unchanged.

Acknowledgments

  • Meta for Llama-3.1 and the broader ecosystem
  • Open-source community contributions to distillation, SFT, evaluation, and deployment
  • Training recipe and practices adapted from Llama-3.2-3B-Elite

Citation

@misc{JackrongL31_8B_Elite,
  title  = {Jackrong/Llama-3.1-8B-Instruct-Elite},
  author = {Jackrong},
  year   = {2025},
  url    = {https://huggingface.co/Jackrong/Llama-3.1-8B-Instruct-Elite}
}

Changelog

  • v1.0: Initial release. ~80k samples; trained on a single A100; provides GGUF Q4_K_M; fewer emojis; strengthened bold subheadings and bullet lists; training recipe aligned with 3.2-3B-Elite.
Downloads last month
5
GGUF
Model size
8.03B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Jackrong/Llama-3.1-8B-Instruct-Elite

Quantized
(491)
this model