Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,8 @@ This model was made possible by excellent AMD mi300x compute generously provided
|
|
19 |
|
20 |
**⚠️ Experimental Model**: This model is created through weight interpolation and duplication, and has not been further trained. Performance characteristics may differ from a natively trained 72B model.
|
21 |
|
|
|
|
|
22 |
## Key Features
|
23 |
|
24 |
- ✅ Full Qwen3-72B architecture (8192 hidden, 80 layers)
|
|
|
19 |
|
20 |
**⚠️ Experimental Model**: This model is created through weight interpolation and duplication, and has not been further trained. Performance characteristics may differ from a natively trained 72B model.
|
21 |
|
22 |
+
As is, this model underperforms Qwen3-32B. The intent is to create a target suitable for distillation from Qwen3-235B.
|
23 |
+
|
24 |
## Key Features
|
25 |
|
26 |
- ✅ Full Qwen3-72B architecture (8192 hidden, 80 layers)
|