deepseek-ai
/

DeepSeek-V3-Base

Model card Files Files and versions

DeepSeekDDM commited on Jan 24

Commit

32950e4

·

verified ·

1 Parent(s): 69cf1d9

Update README_WEIGHTS.md

Files changed (1) hide show

README_WEIGHTS.md +3 -3

README_WEIGHTS.md CHANGED Viewed

@@ -18,7 +18,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
   - Input/output embedding layers and a complete set of 61 Transformer hidden layers.
 - **Parameter Count**:
   - Total parameters: **671B**
-  - Activation parameters: **36.6B**.
 #### Structural Details
@@ -35,8 +35,8 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
 - **Composition**:
   - Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
 - **Parameter Count**:
-  - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head).
-  - Activation parameters: **2.4B** (including the shared 0.9B Embedding and 0.9B output Head).
 #### Structural Details

   - Input/output embedding layers and a complete set of 61 Transformer hidden layers.
 - **Parameter Count**:
   - Total parameters: **671B**
+  - Activation parameters: **36.6B** (including 0.9B for the output Head).
 #### Structural Details
 - **Composition**:
   - Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
 - **Parameter Count**:
+  - Parameters: **11.5B unique parameters** (excluding the shared 0.9B Embedding and 0.9B output Head).
+  - Activation parameters: **1.5B** (including 0.9B for the output Head).
 #### Structural Details