MollyHexapotato
/

custom-deepseek-r1-4L

custom-architecture

Model card Files Files and versions

MollyHexapotato commited on Jul 24

Commit

7ef0bab

·

verified ·

1 Parent(s): 05d5176

Create README.md

Files changed (1) hide show

README.md +25 -0

README.md ADDED Viewed

	@@ -0,0 +1,25 @@

+---
+language: en
+license: apache-2.0
+tags:
+- test
+- custom-architecture
+- deepseek
+---
+# Custom DeepSeek-R1 (4 Layers)
+⚠️ **For Testing Purposes Only**
+This is a modified version of DeepSeek-R1 with **random weights**, used for architecture experiments.
+## Key Modifications
+- Reduced to **4 layers** (original: 32+ layers)
+- Contains:
+  - First 3 layers: **MLA** (Multi-head Latent Attention)
+  - Layer 4: **MoE** (Mixture of Experts)
+- All weights randomly initialized (not performance-optimized)
+## Usage
+```python
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom-deepseek-r1-4L")