MollyHexapotato commited on
Commit
7ef0bab
·
verified ·
1 Parent(s): 05d5176

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - test
6
+ - custom-architecture
7
+ - deepseek
8
+ ---
9
+
10
+ # Custom DeepSeek-R1 (4 Layers)
11
+
12
+ ⚠️ **For Testing Purposes Only**
13
+ This is a modified version of DeepSeek-R1 with **random weights**, used for architecture experiments.
14
+
15
+ ## Key Modifications
16
+ - Reduced to **4 layers** (original: 32+ layers)
17
+ - Contains:
18
+ - First 3 layers: **MLA** (Multi-head Latent Attention)
19
+ - Layer 4: **MoE** (Mixture of Experts)
20
+ - All weights randomly initialized (not performance-optimized)
21
+
22
+ ## Usage
23
+ ```python
24
+ from transformers import AutoModelForCausalLM
25
+ model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom-deepseek-r1-4L")