Commit
·
227e523
1
Parent(s):
b8c3971
Update Technical Report
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ In the EXAONE 4.0 architecture, we apply new architectural changes compared to p
|
|
36 |
1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
|
37 |
2. **QK-Reorder-Norm**: We adopt the Post-LN (LayerNorm) scheme for transformer blocks instead of Pre-LN, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
|
38 |
|
39 |
-
For more details, please refer to our [technical report](https://
|
40 |
|
41 |
|
42 |
### Model Configuration
|
@@ -181,7 +181,7 @@ print(tokenizer.decode(output[0]))
|
|
181 |
|
182 |
## Performance
|
183 |
|
184 |
-
The following tables show the evaluation results of each model, with reasoning and non-reasoning mode. The evaluation details can be found in the [technical report](https://
|
185 |
|
186 |
- ✅ denotes the model has a hybrid reasoning capability, evaluated by selecting reasoning / non-reasoning on the purpose.
|
187 |
- To assess Korean **practical** and **professional** knowledge, we adopt both the [KMMLU-Redux](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Redux) and [KMMLU-Pro](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro) benchmarks. Both datasets are publicly released!
|
@@ -1130,7 +1130,14 @@ The model is licensed under [EXAONE AI Model License Agreement 1.2 - NC](./LICEN
|
|
1130 |
|
1131 |
## Citation
|
1132 |
|
1133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1134 |
|
1135 |
|
1136 |
## Contact
|
|
|
36 |
1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
|
37 |
2. **QK-Reorder-Norm**: We adopt the Post-LN (LayerNorm) scheme for transformer blocks instead of Pre-LN, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
|
38 |
|
39 |
+
For more details, please refer to our [technical report](https://arxiv.org/abs/2507.11407), [blog](https://www.lgresearch.ai/blog/view?seq=576), and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-4.0).
|
40 |
|
41 |
|
42 |
### Model Configuration
|
|
|
181 |
|
182 |
## Performance
|
183 |
|
184 |
+
The following tables show the evaluation results of each model, with reasoning and non-reasoning mode. The evaluation details can be found in the [technical report](https://arxiv.org/abs/2507.11407).
|
185 |
|
186 |
- ✅ denotes the model has a hybrid reasoning capability, evaluated by selecting reasoning / non-reasoning on the purpose.
|
187 |
- To assess Korean **practical** and **professional** knowledge, we adopt both the [KMMLU-Redux](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Redux) and [KMMLU-Pro](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro) benchmarks. Both datasets are publicly released!
|
|
|
1130 |
|
1131 |
## Citation
|
1132 |
|
1133 |
+
```
|
1134 |
+
@article{exaone-4.0,
|
1135 |
+
title={EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes},
|
1136 |
+
author={{LG AI Research}},
|
1137 |
+
journal={arXiv preprint arXiv:2507.11407},
|
1138 |
+
year={2025}
|
1139 |
+
}
|
1140 |
+
```
|
1141 |
|
1142 |
|
1143 |
## Contact
|