Llama-3.1-8B-Instruct-Elite
Abstract
A bilingual (Chinese/English) instruction-tuned model based on Llama-3.1-8B-Instruct. It follows the training recipe of Llama-3.2-3B-Elite (Qwen-3-235b-a22b-Instruct-2507 as teacher for distillation + SFT), but intentionally reduces emojis (From Qwen3 teacher) while retaining and reinforcing professional formatting (e.g., bolded subheadings, bullet lists, clear paragraphs) to produce answers that are cleaner, more stable, and easier to read.
Table of Contents
- Highlights
- Model Overview
- Training & Data
- Quickstart
- Prompting & Output Conventions
- Use Cases & Limitations
- Deployment & Quantization
- License
- Acknowledgments
- Citation
- Changelog
Highlights
- Professional and clean: fewer emojis by default; outputs emphasize bolded subheadings + bullet lists, making content easy to copy and further edit.
- Stable structure: Consistent formatting for sectioned reports, step checklists, comparison tables, and key-point summaries.
- Bilingual / mixed text friendly: Strong terminology coherence and clear hierarchy for Chinese, English, and mixed ChineseโEnglish scenarios.
- Stronger instruction-following: Higher adherence to constraints such as โno emojis,โ โonly output key-point tables,โ and โpreserve Markdown heading levels.โ
- Controllable verbosity: Defaults to less verbosity, focusing on key information while keeping necessary context.
Base:
meta-llama/Llama-3.1-8B-Instruct
; Training paradigm: Teacher distillation + SFT.
Model Overview
- Parameters: 8B
- Tasks: Instruction following / Dialogue generation / Q&A / Summarization / Structured output
- Languages: Chinese & English (robust for mixed ChineseโEnglish)
- Goal: Deliver concise, professional, and format-friendly content on modest compute (reduced emojis; keep bolded subheadings, bullet lists, and other formatting enhancements).
Training & Data
- Data size: About 80,000 high-quality instructionโresponse pairs (Chinese/English mix covering Q&A, summarization, expository writing, structured output, procedural steps, etc.).
- Method: Distillation from a teacher model + SFT; explicit format/style control (fewer emojis; emphasize headings/lists/bold).
- Compute: Single A100; LoRA/QLoRA can complete several epochs within a short time.
- Style & constraints: Fewer emojis; strengthened bold subheadings, bullet lists, bold key terms, and clear paragraph hierarchy.
If a distilled-data subset is released, add links and stats here (sample counts / language ratios / filtering rules).
Quickstart
Transformers (recommended)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Jackrong/Llama-3.1-8B-Instruct-Elite"
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "Write clean, professional answers with bolded subheadings and structured lists; avoid emojis."},
{"role": "user", "content": "่ฏท็จ่ฆ็น่ฏดๆๅฆไฝไผๅๅจ่ฎกๅ๏ผไฝฟๅ
ถๆดๅฏๆง่กใ"}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
)
print(tok.decode(outputs[0], skip_special_tokens=True))
vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="Jackrong/Llama-3.1-8B-Instruct-Elite", dtype="bfloat16")
params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
prompt = "ๅๅบ 5 ๆกๅฏๆง่ก็ๅจ่ฎกๅไผๅๅปบ่ฎฎ๏ผ็จๅ ็ฒๅฐๆ ้ข+่ฆ็นๅ่กจ๏ผใ"
outputs = llm.generate([prompt], params)
print(outputs[0].outputs[0].text)
llama.cpp (GGUF: Q4_K_M)
./main -m Llama-3.1-8B-Instruct-Elite.Q4_K_M.gguf -p "ไปฅ่ฆ็น่ฏดๆ๏ผๅฆไฝๅฐๆๆฏๆ็ซ ๆนๅๅพๆดไธไธไธๅนฒๅ๏ผ"
Prompting & Output Conventions
- Organize with concise headings and bolded subheadings; bold key terms and conclusions where helpful.
- Use bullet lists for steps and key points; avoid emojis by default.
- Sampling tips:
temperature=0.6โ0.8
,top_p=0.9โ0.95
.
Use Cases & Limitations
Use cases: Chinese/English or mixed bilingual Q&A, summarization, instructional/technical/business writing; structured outputs (plans, steps, tables, FAQs, meeting minutes).
Limitations: For high-factuality tasks that require up-to-date information, pair with retrieval; for medical/legal/financial or other high-risk scenarios, use human review; do not use for illegal or harmful purposes.
License
- Model weights: Llama 3.1 Community License (same as base).
- Code/scripts: May use Apache-2.0 or similar; the weight license remains unchanged.
Acknowledgments
- Meta for Llama-3.1 and the broader ecosystem
- Open-source community contributions to distillation, SFT, evaluation, and deployment
- Training recipe and practices adapted from Llama-3.2-3B-Elite
Citation
@misc{JackrongL31_8B_Elite,
title = {Jackrong/Llama-3.1-8B-Instruct-Elite},
author = {Jackrong},
year = {2025},
url = {https://huggingface.co/Jackrong/Llama-3.1-8B-Instruct-Elite}
}
Changelog
- v1.0: Initial release. ~80k samples; trained on a single A100; provides GGUF Q4_K_M; fewer emojis; strengthened bold subheadings and bullet lists; training recipe aligned with 3.2-3B-Elite.
- Downloads last month
- 5
4-bit
Model tree for Jackrong/Llama-3.1-8B-Instruct-Elite
Base model
meta-llama/Llama-3.1-8B