--- language: en license: apache-2.0 tags: - test - custom-architecture - deepseek --- # DeepSeek-R1-Channel-INT8_4L (4 Layers) ⚠️ **For Testing Purposes Only** This is a modified version of meituan/DeepSeek-R1-Channel-INT8 with **random weights**, used for architecture experiments. ## Key Modifications - Reduced to **4 layers** - Contains: - First 3 layers: **MLA** (Multi-head Latent Attention) - Layer 4: **MoE** (Mixture of Experts) - All weights randomly initialized (not performance-optimized) ## Usage ```python from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom_DeepSeek-R1-Channel-INT8_4L")