File size: 3,864 Bytes
6571769 ce7a00a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
---
tags:
- unsloth
base_model:
- Qwen/Qwen3-30B-A3B-Base
language:
- eng
- fra
- por
- deu
- ron
- swe
- dan
- bul
- rus
- ces
- ell
- ukr
- spa
- nld
- slk
- hrv
- pol
- lit
- nob
- nno
- fas
- slv
- guj
- lav
- ita
- oci
- nep
- mar
- bel
- srp
- ltz
- vec
- asm
- cym
- szl
- ast
- hne
- awa
- mai
- bho
- snd
- gle
- fao
- hin
- pan
- ben
- ori
- tgk
- ydd
- lmo
- lij
- scn
- fur
- srd
- glg
- cat
- isl
- als
- lim
- prs
- afr
- mkd
- sin
- urd
- mag
- bos
- hye
- zho
- yue
- mya
- ara
- ars
- apc
- arz
- ary
- acm
- acq
- aeb
- heb
- mlt
- ind
- zsm
- tgl
- ceb
- jav
- sun
- min
- ban
- bjn
- pag
- ilo
- war
- tam
- tel
- kan
- mal
- tur
- azj
- uzn
- kaz
- bak
- tat
- tha
- lao
- fin
- est
- hun
- vie
- khm
- jpn
- kor
- kat
- eus
- hat
- pap
- kea
- tpi
- swa
---
# Qwen3-30B-A3B
## Qwen3 Highlights
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.
Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:
- **Expanded Higher-Quality Pre-training Corpus:** Qwen3 is pre-trained on 36 trillion tokens across 119 languages β tripling the language coverage of Qwen2.5 β with a much richer mix of high-quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
- **Training Techniques and Model Architecture:** Qwen3 incorporates a series of training techiques and architectural refinements, including global-batch load balancing loss for MoE models and qk layernorm for all models, leading to improved stability and overall performance.
- **Three-stage Pre-training:** Stage 1 focuses on broad language modeling and general knowledge acquisition, Stage 2 improves reasoning skills like STEM, coding, and logical reasoning, and Stage 3 enhances long-context comprehension by extending training sequence lengths up to 32k tokens.
- **Scaling Law Guided Hyperparameter Tuning:** Through comprehensive scaling law studies across the three-stage pre-training pipeline, Qwen3 systematically tunes critical hyperparameters β such as learning rate scheduler and batch size β separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.
## Model Overview
**Qwen3-30B-A3B** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 30.5B in total and 3.3B activated
- Number of Paramaters (Non-Embedding): 29.9B
- Number of Layers: 48
- Number of Attention Heads (GQA): 32 for Q and 4 for KV
- Number of Experts: 128
- Number of Activated Experts: 8
- Context Length: 32,768
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
## Requirements
The code of Qwen3-MoE has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
With `transformers<4.51.0`, you will encounter the following error:
```
KeyError: 'qwen3_moe'
```
## Evaluation & Performance
Detailed evaluation results are reported in this [π blog](https://qwenlm.github.io/blog/qwen3/).
### Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{qwen3,
title = {Qwen3},
url = {https://qwenlm.github.io/blog/qwen3/},
author = {Qwen Team},
month = {April},
year = {2025}
}
``` |