LG-AI-EXAONE commited on
Commit
0806c70
·
1 Parent(s): 3cbee3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -34,19 +34,19 @@ The EXAONE 4.0 model series consists of two sizes: a mid-size **32B** model opti
34
  In the EXAONE 4.0 architecture, we apply new architectural changes compared to previous EXAONE models as below:
35
 
36
  1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
37
- 2. **QK-Reorder-Norm**: We adopt the Post-LN (LayerNorm) scheme for transformer blocks instead of Pre-LN, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
38
 
39
  For more details, please refer to our [technical report](https://arxiv.org/abs/2507.11407), [blog](https://www.lgresearch.ai/blog/view?seq=576), and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-4.0).
40
 
41
 
42
  ### Model Configuration
43
 
44
- - Number of Parameters (without embeddings): [[num_params_wo_embeddings]]
45
- - Number of Layers: [[num_layers]]
46
- - Number of Attention Heads: [[num_heads]]
47
  - Vocab Size: 102,400
48
- - Context Length: [[context_length]] tokens
49
- [[quantization]]
50
 
51
  ## Quickstart
52
 
 
34
  In the EXAONE 4.0 architecture, we apply new architectural changes compared to previous EXAONE models as below:
35
 
36
  1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
37
+ 2. **QK-Reorder-Norm**: We reorder the LayerNorm position from the traditional Pre-LN scheme by applying LayerNorm directly to the attention and MLP outputs, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
38
 
39
  For more details, please refer to our [technical report](https://arxiv.org/abs/2507.11407), [blog](https://www.lgresearch.ai/blog/view?seq=576), and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-4.0).
40
 
41
 
42
  ### Model Configuration
43
 
44
+ - Number of Parameters (without embeddings): 1.07B
45
+ - Number of Layers: 30
46
+ - Number of Attention Heads: GQA with 32-heads and 8-KV heads
47
  - Vocab Size: 102,400
48
+ - Context Length: 65,536 tokens
49
+ - Quantization: `Q8_0`, `Q6_K`, `Q5_K_M`, `Q4_K_M`, `IQ4_XS` in GGUF format (also includes `BF16` weights)
50
 
51
  ## Quickstart
52