ERNIE-4.5-300B-A47B

Note: "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights.

ERNIE 4.5 Highlights

The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations:

  1. Multimodal Heterogeneous MoE Pre-Training: Our models are jointly trained on both textual and visual modalities to better capture the nuances of multimodal information and improve performance on tasks involving text understanding and generation, image understanding, and cross-modal reasoning. To achieve this without one modality hindering the learning of another, we designed a heterogeneous MoE structure, incorporated modality-isolated routing, and employed router orthogonal loss and multimodal token-balanced loss. These architectural choices ensure that both modalities are effectively represented, allowing for mutual reinforcement during training.

  2. Scaling-Efficient Infrastructure: We propose a novel heterogeneous hybrid parallelism and hierarchical load balancing strategy for efficient training of ERNIE 4.5 models. By using intra-node expert parallelism, memory-efficient pipeline scheduling, FP8 mixed-precision training and finegrained recomputation methods, we achieve remarkable pre-training throughput. For inference, we propose multi-expert parallel collaboration method and convolutional code quantization algorithm to achieve 4-bit/2-bit lossless quantization. Furthermore, we introduce PD disaggregation with dynamic role switching for effective resource utilization to enhance inference performance for ERNIE 4.5 MoE models. Built on PaddlePaddle, ERNIE 4.5 delivers high-performance inference across a wide range of hardware platforms.

  3. Modality-Specific Post-Training: To meet the diverse requirements of real-world applications, we fine-tuned variants of the pre-trained model for specific modalities. Our LLMs are optimized for general-purpose language understanding and generation. The VLMs focuses on visuallanguage understanding and supports both thinking and non-thinking modes. Each model employed a combination of Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO) or a modified reinforcement learning method named Unified Preference Optimization (UPO) for post-training.

Model Overview

ERNIE-4.5-300B-A47B is a text MoE Post-trained model, with 300B total parameters and 47B activated parameters for each token. The following are the model configuration details:

Key Value
Modality Text
Training Stage Pretraining
Params(Total / Activated) 300B / 47B
Layers 54
Heads(Q/KV) 64 / 8
Text Experts(Total / Activated) 64 / 8
Vision Experts(Total / Activated) 64 / 8
Context Length 131072

Quickstart

Using transformers library

Note: Before using the model, please ensure you have the transformers library installed (version 4.50.0 or higher)

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "baidu/ERNIE-4.5-300B-A47B-PT"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=1024
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# decode the generated ids
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print("generate_text:", generate_text)

Using vLLM

vllm github library. Python-only build.

# 80G * 16 GPU
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --trust-remote-code
# FP8 online quantification 80G * 16 GPU
vllm serve baidu/ERNIE-4.5-300B-A47B-PT --trust-remote-code --quantization fp8

Best Practices

Sampling Parameters

To achieve optimal performance, we suggest using Temperature=0.8, TopP=0.8.

Prompts for Web Search

For Web Search, {references}, {date}, and {question} are arguments.

For Chinese question, we use the prompt:

ernie_search_zh_prompt = \
'''ไธ‹้ขไฝ ไผšๆ”ถๅˆฐๅฝ“ๅ‰ๆ—ถ้—ดใ€ๅคšไธชไธๅŒๆฅๆบ็š„ๅ‚่€ƒๆ–‡็ซ ๅ’Œไธ€ๆฎตๅฏน่ฏใ€‚ไฝ ็š„ไปปๅŠกๆ˜ฏ้˜…่ฏปๅคšไธชๅ‚่€ƒๆ–‡็ซ ๏ผŒๅนถๆ นๆฎๅ‚่€ƒๆ–‡็ซ ไธญ็š„ไฟกๆฏๅ›ž็ญ”ๅฏน่ฏไธญ็š„้—ฎ้ข˜ใ€‚
ไปฅไธ‹ๆ˜ฏๅฝ“ๅ‰ๆ—ถ้—ดๅ’Œๅ‚่€ƒๆ–‡็ซ ๏ผš
---------
#ๅฝ“ๅ‰ๆ—ถ้—ด
{date}

#ๅ‚่€ƒๆ–‡็ซ 
{references}

---------
่ฏทๆณจๆ„๏ผš
1. ๅ›ž็ญ”ๅฟ…้กป็ป“ๅˆ้—ฎ้ข˜้œ€ๆฑ‚ๅ’Œๅฝ“ๅ‰ๆ—ถ้—ด๏ผŒๅฏนๅ‚่€ƒๆ–‡็ซ ็š„ๅฏ็”จๆ€ง่ฟ›่กŒๅˆคๆ–ญ๏ผŒ้ฟๅ…ๅœจๅ›ž็ญ”ไธญไฝฟ็”จ้”™่ฏฏๆˆ–่ฟ‡ๆ—ถ็š„ไฟกๆฏใ€‚
2. ๅฝ“ๅ‚่€ƒๆ–‡็ซ ไธญ็š„ไฟกๆฏๆ— ๆณ•ๅ‡†็กฎๅœฐๅ›ž็ญ”้—ฎ้ข˜ๆ—ถ๏ผŒไฝ ้œ€่ฆๅœจๅ›ž็ญ”ไธญๆไพ›่Žทๅ–็›ธๅบ”ไฟกๆฏ็š„ๅปบ่ฎฎ๏ผŒๆˆ–ๆ‰ฟ่ฎคๆ— ๆณ•ๆไพ›็›ธๅบ”ไฟกๆฏใ€‚
3. ไฝ ้œ€่ฆไผ˜ๅ…ˆๆ นๆฎ็™พ็ง‘ใ€ๅฎ˜็ฝ‘ใ€ๆƒๅจๆœบๆž„ใ€ไธ“ไธš็ฝ‘็ซ™็ญ‰้ซ˜ๆƒๅจๆ€งๆฅๆบ็š„ไฟกๆฏๆฅๅ›ž็ญ”้—ฎ้ข˜ใ€‚
4. ๅ›žๅค้œ€่ฆ็ปผๅˆๅ‚่€ƒๆ–‡็ซ ไธญ็š„็›ธๅ…ณๆ•ฐๅญ—ใ€ๆกˆไพ‹ใ€ๆณ•ๅพ‹ๆกๆ–‡ใ€ๅ…ฌๅผ็ญ‰ไฟกๆฏ๏ผŒไฝฟไฝ ็š„็ญ”ๆกˆๆ›ดไธ“ไธšใ€‚
5. ๅฝ“้—ฎ้ข˜ๅฑžไบŽๅˆ›ไฝœ็ฑปไปปๅŠกๆ—ถ๏ผŒ้œ€ๆณจๆ„ไปฅไธ‹็ปดๅบฆ๏ผš
   - ๆ€ๅบฆ้ฒœๆ˜Ž๏ผš่ง‚็‚นใ€็ซ‹ๅœบๆธ…ๆ™ฐๆ˜Ž็กฎ๏ผŒ้ฟๅ…ๆจกๆฃฑไธคๅฏ๏ผŒ่ฏญ่จ€ๆžœๆ–ญ็›ดๆŽฅ
   - ๆ–‡้‡‡้ฃžๆ‰ฌ๏ผš็”จ่ฏ็ฒพๅ‡†็”ŸๅŠจ๏ผŒๅ–„็”จไฟฎ่พžๆ‰‹ๆณ•๏ผŒๅขžๅผบๆ„ŸๆŸ“ๅŠ›
   - ๆœ‰็†ๆœ‰ๆฎ๏ผš้€ป่พ‘ไธฅๅฏ†้€’่ฟ›๏ผŒ็ป“ๅˆๆƒๅจๆ•ฐๆฎ/ไบ‹ๅฎžๆ”ฏๆ’‘่ฎบ็‚น
---------
ไธ‹้ข่ฏท็ป“ๅˆไปฅไธŠไฟกๆฏ๏ผŒๅ›ž็ญ”้—ฎ้ข˜๏ผŒ่กฅๅ…จๅฏน่ฏ
{question}'''

For English question, we use the prompt:

ernie_search_en_prompt = \
'''
Below you will be given the current time, multiple references from different sources, and a conversation. Your task is to read the references and use the information in them to answer the question in the conversation.
Here are the current time and the references:
---------
#Current Time
{date}

#References
{references}

---------
Please note:
1. Based on the questionโ€™s requirements and the current time, assess the usefulness of the references to avoid using inaccurate or outdated information in the answer.  
2. If the references do not provide enough information to accurately answer the question, you should suggest how to obtain the relevant information or acknowledge that you are unable to provide it.  
3. Prioritize using information from highly authoritative sources such as encyclopedias, official websites, authoritative institutions, and professional websites when answering questions.
4. Incorporate relevant numbers, cases, legal provisions, formulas, and other details from the references to make your answer more professional.
5. For creative tasks, keep these dimensions in mind:
   - Clear attitude: Clear views and positions, avoid ambiguity, and use decisive and direct language
   - Brilliant writing: Precise and vivid words, good use of rhetoric, and enhance the appeal
   - Well-reasoned: Rigorous logic and progressive, combined with authoritative data/facts to support the argument

---------
Now, using the information above, answer the question and complete the conversation:  
{question}'''

Parameter notes:

  • {question} is the userโ€™s question
  • {date} is the current time, and the recommended format is โ€œYYYY-MM-DD HH:MM:SS, Day of the Week, Beijing/China.โ€
  • {references} is the references, and the recommended format is:
##ๅ‚่€ƒๆ–‡็ซ 1
ๆ ‡้ข˜๏ผšๅ‘จๆฐไผฆ
ๆ–‡็ซ ๅ‘ๅธƒๆ—ถ้—ด๏ผš2025-04-20
ๅ†…ๅฎน๏ผšๅ‘จๆฐไผฆ(Jay Chou),1979ๅนด1ๆœˆ18ๆ—ฅๅ‡บ็”ŸไบŽๅฐๆนพ็œๆ–ฐๅŒ—ๅธ‚,็ฅ–็ฑ็ฆๅปบ็œๆฐธๆ˜ฅๅŽฟ,ๅŽ่ฏญๆต่กŒไน็”ทๆญŒๆ‰‹ใ€้Ÿณไนไบบใ€ๆผ”ๅ‘˜ใ€ๅฏผๆผ”ใ€็ผ–ๅ‰ง,ๆฏ•ไธšไบŽๆทกๆฑŸไธญๅญฆใ€‚2000ๅนด,ๅ‘่กŒไธชไบบ้ฆ–ๅผ ้Ÿณไนไธ“่พ‘ใ€ŠJayใ€‹ใ€‚...
ๆฅๆบ็ฝ‘็ซ™็ฝ‘ๅ€๏ผšbaike.baidu.com
ๆฅๆบ็ฝ‘็ซ™็š„็ฝ‘็ซ™ๅ๏ผš็™พๅบฆ็™พ็ง‘

##ๅ‚่€ƒๆ–‡็ซ 2
...

License

The ERNIE 4.5 models are provided under the Apache License 2.0. This license permits commercial use, subject to its terms and conditions. Copyright (c) 2025 Baidu, Inc. All Rights Reserved.

Citation

If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:

@misc{ernie2025technicalreport,
      title={ERNIE 4.5 Technical Report},
      author={Baidu ERNIE Team},
      year={2025},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={}
}
Downloads last month
2,620
GGUF
Model size
299B params
Architecture
ernie4_5-moe
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for unsloth/ERNIE-4.5-300B-A47B-PT-GGUF

Quantized
(3)
this model