ERNIE-4.5-300B-A47B
Note: "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights.
ERNIE 4.5 Highlights
The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations:
Multimodal Heterogeneous MoE Pre-Training: Our models are jointly trained on both textual and visual modalities to better capture the nuances of multimodal information and improve performance on tasks involving text understanding and generation, image understanding, and cross-modal reasoning. To achieve this without one modality hindering the learning of another, we designed a heterogeneous MoE structure, incorporated modality-isolated routing, and employed router orthogonal loss and multimodal token-balanced loss. These architectural choices ensure that both modalities are effectively represented, allowing for mutual reinforcement during training.
Scaling-Efficient Infrastructure: We propose a novel heterogeneous hybrid parallelism and hierarchical load balancing strategy for efficient training of ERNIE 4.5 models. By using intra-node expert parallelism, memory-efficient pipeline scheduling, FP8 mixed-precision training and finegrained recomputation methods, we achieve remarkable pre-training throughput. For inference, we propose multi-expert parallel collaboration method and convolutional code quantization algorithm to achieve 4-bit/2-bit lossless quantization. Furthermore, we introduce PD disaggregation with dynamic role switching for effective resource utilization to enhance inference performance for ERNIE 4.5 MoE models. Built on PaddlePaddle, ERNIE 4.5 delivers high-performance inference across a wide range of hardware platforms.
Modality-Specific Post-Training: To meet the diverse requirements of real-world applications, we fine-tuned variants of the pre-trained model for specific modalities. Our LLMs are optimized for general-purpose language understanding and generation. The VLMs focuses on visuallanguage understanding and supports both thinking and non-thinking modes. Each model employed a combination of Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO) or a modified reinforcement learning method named Unified Preference Optimization (UPO) for post-training.
Model Overview
ERNIE-4.5-300B-A47B is a text MoE Post-trained model, with 300B total parameters and 47B activated parameters for each token. The following are the model configuration details:
Key | Value |
---|---|
Modality | Text |
Training Stage | Pretraining |
Params(Total / Activated) | 300B / 47B |
Layers | 54 |
Heads(Q/KV) | 64 / 8 |
Text Experts(Total / Activated) | 64 / 8 |
Vision Experts(Total / Activated) | 64 / 8 |
Context Length | 131072 |
Quickstart
Using transformers
library
Note: Before using the model, please ensure you have the transformers
library installed (version 4.50.0 or higher)
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "baidu/ERNIE-4.5-300B-A47B-PT"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# decode the generated ids
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print("generate_text:", generate_text)
Using vLLM
vllm github library. Python-only build.
# 80G * 16 GPU
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --trust-remote-code
# FP8 online quantification 80G * 16 GPU
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --trust-remote-code --quantization fp8
Best Practices
Sampling Parameters
To achieve optimal performance, we suggest using Temperature=0.8
, TopP=0.8
.
Prompts for Web Search
For Web Search, {references}, {date}, and {question} are arguments.
For Chinese question, we use the prompt:
ernie_search_zh_prompt = \
'''ไธ้ขไฝ ไผๆถๅฐๅฝๅๆถ้ดใๅคไธชไธๅๆฅๆบ็ๅ่ๆ็ซ ๅไธๆฎตๅฏน่ฏใไฝ ็ไปปๅกๆฏ้
่ฏปๅคไธชๅ่ๆ็ซ ๏ผๅนถๆ นๆฎๅ่ๆ็ซ ไธญ็ไฟกๆฏๅ็ญๅฏน่ฏไธญ็้ฎ้ขใ
ไปฅไธๆฏๅฝๅๆถ้ดๅๅ่ๆ็ซ ๏ผ
---------
#ๅฝๅๆถ้ด
{date}
#ๅ่ๆ็ซ
{references}
---------
่ฏทๆณจๆ๏ผ
1. ๅ็ญๅฟ
้กป็ปๅ้ฎ้ข้ๆฑๅๅฝๅๆถ้ด๏ผๅฏนๅ่ๆ็ซ ็ๅฏ็จๆง่ฟ่กๅคๆญ๏ผ้ฟๅ
ๅจๅ็ญไธญไฝฟ็จ้่ฏฏๆ่ฟๆถ็ไฟกๆฏใ
2. ๅฝๅ่ๆ็ซ ไธญ็ไฟกๆฏๆ ๆณๅ็กฎๅฐๅ็ญ้ฎ้ขๆถ๏ผไฝ ้่ฆๅจๅ็ญไธญๆไพ่ทๅ็ธๅบไฟกๆฏ็ๅปบ่ฎฎ๏ผๆๆฟ่ฎคๆ ๆณๆไพ็ธๅบไฟกๆฏใ
3. ไฝ ้่ฆไผๅ
ๆ นๆฎ็พ็งใๅฎ็ฝใๆๅจๆบๆใไธไธ็ฝ็ซ็ญ้ซๆๅจๆงๆฅๆบ็ไฟกๆฏๆฅๅ็ญ้ฎ้ขใ
4. ๅๅค้่ฆ็ปผๅๅ่ๆ็ซ ไธญ็็ธๅ
ณๆฐๅญใๆกไพใๆณๅพๆกๆใๅ
ฌๅผ็ญไฟกๆฏ๏ผไฝฟไฝ ็็ญๆกๆดไธไธใ
5. ๅฝ้ฎ้ขๅฑไบๅไฝ็ฑปไปปๅกๆถ๏ผ้ๆณจๆไปฅไธ็ปดๅบฆ๏ผ
- ๆๅบฆ้ฒๆ๏ผ่ง็นใ็ซๅบๆธ
ๆฐๆ็กฎ๏ผ้ฟๅ
ๆจกๆฃฑไธคๅฏ๏ผ่ฏญ่จๆๆญ็ดๆฅ
- ๆ้้ฃๆฌ๏ผ็จ่ฏ็ฒพๅ็ๅจ๏ผๅ็จไฟฎ่พๆๆณ๏ผๅขๅผบๆๆๅ
- ๆ็ๆๆฎ๏ผ้ป่พไธฅๅฏ้่ฟ๏ผ็ปๅๆๅจๆฐๆฎ/ไบๅฎๆฏๆ่ฎบ็น
---------
ไธ้ข่ฏท็ปๅไปฅไธไฟกๆฏ๏ผๅ็ญ้ฎ้ข๏ผ่กฅๅ
จๅฏน่ฏ
{question}'''
For English question, we use the prompt:
ernie_search_en_prompt = \
'''
Below you will be given the current time, multiple references from different sources, and a conversation. Your task is to read the references and use the information in them to answer the question in the conversation.
Here are the current time and the references:
---------
#Current Time
{date}
#References
{references}
---------
Please note:
1. Based on the questionโs requirements and the current time, assess the usefulness of the references to avoid using inaccurate or outdated information in the answer.
2. If the references do not provide enough information to accurately answer the question, you should suggest how to obtain the relevant information or acknowledge that you are unable to provide it.
3. Prioritize using information from highly authoritative sources such as encyclopedias, official websites, authoritative institutions, and professional websites when answering questions.
4. Incorporate relevant numbers, cases, legal provisions, formulas, and other details from the references to make your answer more professional.
5. For creative tasks, keep these dimensions in mind:
- Clear attitude: Clear views and positions, avoid ambiguity, and use decisive and direct language
- Brilliant writing: Precise and vivid words, good use of rhetoric, and enhance the appeal
- Well-reasoned: Rigorous logic and progressive, combined with authoritative data/facts to support the argument
---------
Now, using the information above, answer the question and complete the conversation:
{question}'''
Parameter notes:
- {question} is the userโs question
- {date} is the current time, and the recommended format is โYYYY-MM-DD HH:MM:SS, Day of the Week, Beijing/China.โ
- {references} is the references, and the recommended format is:
##ๅ่ๆ็ซ 1
ๆ ้ข๏ผๅจๆฐไผฆ
ๆ็ซ ๅๅธๆถ้ด๏ผ2025-04-20
ๅ
ๅฎน๏ผๅจๆฐไผฆ(Jay Chou),1979ๅนด1ๆ18ๆฅๅบ็ไบๅฐๆนพ็ๆฐๅๅธ,็ฅ็ฑ็ฆๅปบ็ๆฐธๆฅๅฟ,ๅ่ฏญๆต่กไน็ทๆญๆใ้ณไนไบบใๆผๅใๅฏผๆผใ็ผๅง,ๆฏไธไบๆทกๆฑไธญๅญฆใ2000ๅนด,ๅ่กไธชไบบ้ฆๅผ ้ณไนไธ่พใJayใใ...
ๆฅๆบ็ฝ็ซ็ฝๅ๏ผbaike.baidu.com
ๆฅๆบ็ฝ็ซ็็ฝ็ซๅ๏ผ็พๅบฆ็พ็ง
##ๅ่ๆ็ซ 2
...
License
The ERNIE 4.5 models are provided under the Apache License 2.0. This license permits commercial use, subject to its terms and conditions. Copyright (c) 2025 Baidu, Inc. All Rights Reserved.
Citation
If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:
@misc{ernie2025technicalreport,
title={ERNIE 4.5 Technical Report},
author={Baidu ERNIE Team},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={}
}
- Downloads last month
- 2,620
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for unsloth/ERNIE-4.5-300B-A47B-PT-GGUF
Base model
baidu/ERNIE-4.5-300B-A47B-PT