Update README_WEIGHTS.md
Browse files- README_WEIGHTS.md +3 -3
README_WEIGHTS.md
CHANGED
|
@@ -18,7 +18,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
|
|
| 18 |
- Input/output embedding layers and a complete set of 61 Transformer hidden layers.
|
| 19 |
- **Parameter Count**:
|
| 20 |
- Total parameters: **671B**
|
| 21 |
-
- Activation parameters: **36.6B
|
| 22 |
|
| 23 |
#### Structural Details
|
| 24 |
|
|
@@ -35,8 +35,8 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
|
|
| 35 |
- **Composition**:
|
| 36 |
- Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
|
| 37 |
- **Parameter Count**:
|
| 38 |
-
- Parameters: **11.5B unique parameters
|
| 39 |
-
- Activation parameters: **
|
| 40 |
|
| 41 |
#### Structural Details
|
| 42 |
|
|
|
|
| 18 |
- Input/output embedding layers and a complete set of 61 Transformer hidden layers.
|
| 19 |
- **Parameter Count**:
|
| 20 |
- Total parameters: **671B**
|
| 21 |
+
- Activation parameters: **36.6B** (including 0.9B for the output Head).
|
| 22 |
|
| 23 |
#### Structural Details
|
| 24 |
|
|
|
|
| 35 |
- **Composition**:
|
| 36 |
- Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
|
| 37 |
- **Parameter Count**:
|
| 38 |
+
- Parameters: **11.5B unique parameters** (excluding the shared 0.9B Embedding and 0.9B output Head).
|
| 39 |
+
- Activation parameters: **1.5B** (including 0.9B for the output Head).
|
| 40 |
|
| 41 |
#### Structural Details
|
| 42 |
|