Llama-3.1-8B-Instruct-Elite

Abstract
A bilingual (Chinese/English) instruction-tuned model based on Llama-3.1-8B-Instruct. It follows the training recipe of Llama-3.2-3B-Elite (Qwen-3-235b-a22b-Instruct-2507 as teacher for distillation + SFT), but intentionally reduces emojis (From Qwen3 teacher) while retaining and reinforcing professional formatting (e.g., bolded subheadings, bullet lists, clear paragraphs) to produce answers that are cleaner, more stable, and easier to read.

Highlights
Model Overview
Training & Data
Quickstart
Prompting & Output Conventions
Use Cases & Limitations
Deployment & Quantization
License
Acknowledgments
Citation
Changelog

Highlights

Professional and clean: fewer emojis by default; outputs emphasize bolded subheadings + bullet lists, making content easy to copy and further edit.
Stable structure: Consistent formatting for sectioned reports, step checklists, comparison tables, and key-point summaries.
Bilingual / mixed text friendly: Strong terminology coherence and clear hierarchy for Chinese, English, and mixed Chinese–English scenarios.
Stronger instruction-following: Higher adherence to constraints such as “no emojis,” “only output key-point tables,” and “preserve Markdown heading levels.”
Controllable verbosity: Defaults to less verbosity, focusing on key information while keeping necessary context.

Base: meta-llama/Llama-3.1-8B-Instruct; Training paradigm: Teacher distillation + SFT.

Model Overview

Parameters: 8B
Tasks: Instruction following / Dialogue generation / Q&A / Summarization / Structured output
Languages: Chinese & English (robust for mixed Chinese–English)
Goal: Deliver concise, professional, and format-friendly content on modest compute (reduced emojis; keep bolded subheadings, bullet lists, and other formatting enhancements).

Training & Data

Data size: About 80,000 high-quality instruction–response pairs (Chinese/English mix covering Q&A, summarization, expository writing, structured output, procedural steps, etc.).
Method: Distillation from a teacher model + SFT; explicit format/style control (fewer emojis; emphasize headings/lists/bold).
Compute: Single A100; LoRA/QLoRA can complete several epochs within a short time.
Style & constraints: Fewer emojis; strengthened bold subheadings, bullet lists, bold key terms, and clear paragraph hierarchy.

If a distilled-data subset is released, add links and stats here (sample counts / language ratios / filtering rules).

Quickstart

Transformers (recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Jackrong/Llama-3.1-8B-Instruct-Elite"
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "Write clean, professional answers with bolded subheadings and structured lists; avoid emojis."},
    {"role": "user", "content": "请用要点说明如何优化周计划，使其更可执行。"}
]

prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
)
print(tok.decode(outputs[0], skip_special_tokens=True))

vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="Jackrong/Llama-3.1-8B-Instruct-Elite", dtype="bfloat16")
params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)

prompt = "列出 5 条可执行的周计划优化建议（用加粗小标题+要点列表）。"
outputs = llm.generate([prompt], params)
print(outputs[0].outputs[0].text)

llama.cpp (GGUF: Q4_K_M)

./main -m Llama-3.1-8B-Instruct-Elite.Q4_K_M.gguf   -p "以要点说明：如何将技术文章改写得更专业且干净？"

Prompting & Output Conventions

Organize with concise headings and bolded subheadings; bold key terms and conclusions where helpful.
Use bullet lists for steps and key points; avoid emojis by default.
Sampling tips: temperature=0.6–0.8, top_p=0.9–0.95.

Use Cases & Limitations

Use cases: Chinese/English or mixed bilingual Q&A, summarization, instructional/technical/business writing; structured outputs (plans, steps, tables, FAQs, meeting minutes).
Limitations: For high-factuality tasks that require up-to-date information, pair with retrieval; for medical/legal/financial or other high-risk scenarios, use human review; do not use for illegal or harmful purposes.

License

Model weights: Llama 3.1 Community License (same as base).
Code/scripts: May use Apache-2.0 or similar; the weight license remains unchanged.

Acknowledgments

Meta for Llama-3.1 and the broader ecosystem
Open-source community contributions to distillation, SFT, evaluation, and deployment
Training recipe and practices adapted from Llama-3.2-3B-Elite

Citation

@misc{JackrongL31_8B_Elite,
  title  = {Jackrong/Llama-3.1-8B-Instruct-Elite},
  author = {Jackrong},
  year   = {2025},
  url    = {https://huggingface.co/Jackrong/Llama-3.1-8B-Instruct-Elite}
}

Changelog

v1.0: Initial release. ~80k samples; trained on a single A100; provides GGUF Q4_K_M; fewer emojis; strengthened bold subheadings and bullet lists; training recipe aligned with 3.2-3B-Elite.

Downloads last month: 40

GGUF

Model size

8.03B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for Jackrong/Llama-3.1-8B-Instruct-Elite

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(507)

this model

Jackrong
/

Llama-3.1-8B-Instruct-Elite