LG-AI-EXAONE commited on
Commit
3cbee3d
·
1 Parent(s): 26854ea

Update Technical Report

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -36,7 +36,7 @@ In the EXAONE 4.0 architecture, we apply new architectural changes compared to p
36
  1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
37
  2. **QK-Reorder-Norm**: We adopt the Post-LN (LayerNorm) scheme for transformer blocks instead of Pre-LN, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
38
 
39
- For more details, please refer to our [technical report](https://www.lgresearch.ai/data/cdn/upload/EXAONE_4_0.pdf), [blog](https://www.lgresearch.ai/blog/view?seq=576), and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-4.0).
40
 
41
 
42
  ### Model Configuration
@@ -140,7 +140,7 @@ git clone --single-branch -b add-exaone4 https://github.com/lgai-exaone/llama.cp
140
 
141
  ## Performance
142
 
143
- The following tables show the evaluation results of each model, with reasoning and non-reasoning mode. The evaluation details can be found in the [technical report](https://www.lgresearch.ai/data/cdn/upload/EXAONE_4_0.pdf).
144
 
145
  - ✅ denotes the model has a hybrid reasoning capability, evaluated by selecting reasoning / non-reasoning on the purpose.
146
  - To assess Korean **practical** and **professional** knowledge, we adopt both the [KMMLU-Redux](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Redux) and [KMMLU-Pro](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro) benchmarks. Both datasets are publicly released!
@@ -1089,7 +1089,14 @@ The model is licensed under [EXAONE AI Model License Agreement 1.2 - NC](./LICEN
1089
 
1090
  ## Citation
1091
 
1092
- TBD
 
 
 
 
 
 
 
1093
 
1094
 
1095
  ## Contact
 
36
  1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
37
  2. **QK-Reorder-Norm**: We adopt the Post-LN (LayerNorm) scheme for transformer blocks instead of Pre-LN, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
38
 
39
+ For more details, please refer to our [technical report](https://arxiv.org/abs/2507.11407), [blog](https://www.lgresearch.ai/blog/view?seq=576), and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-4.0).
40
 
41
 
42
  ### Model Configuration
 
140
 
141
  ## Performance
142
 
143
+ The following tables show the evaluation results of each model, with reasoning and non-reasoning mode. The evaluation details can be found in the [technical report](https://arxiv.org/abs/2507.11407).
144
 
145
  - ✅ denotes the model has a hybrid reasoning capability, evaluated by selecting reasoning / non-reasoning on the purpose.
146
  - To assess Korean **practical** and **professional** knowledge, we adopt both the [KMMLU-Redux](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Redux) and [KMMLU-Pro](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro) benchmarks. Both datasets are publicly released!
 
1089
 
1090
  ## Citation
1091
 
1092
+ ```
1093
+ @article{exaone-4.0,
1094
+ title={EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes},
1095
+ author={{LG AI Research}},
1096
+ journal={arXiv preprint arXiv:2507.11407},
1097
+ year={2025}
1098
+ }
1099
+ ```
1100
 
1101
 
1102
  ## Contact