DeepSeek-R1-Channel-INT8_4L (4 Layers)

⚠️ For Testing Purposes Only
This is a modified version of meituan/DeepSeek-R1-Channel-INT8 with random weights, used for architecture experiments.

Key Modifications

Reduced to 4 layers
Contains:
- First 3 layers: MLA (Multi-head Latent Attention)
- Layer 4: MoE (Mixture of Experts)
All weights randomly initialized (not performance-optimized)

Usage

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom_DeepSeek-R1-Channel-INT8_4L")

Downloads last month: 4

Safetensors

Model size

15.1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support