This is DarwinLM pruned from Mistral-Minitron-8B. The model is masked that the pruned weights are set as 0 while the remaining weights are the same as the original model.

The shape of all weights are the same as the original model.

# To use the model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-4B-Mistral-Minitron-8B-Pruned-Masked")

4.8B

Methods Avg SciQ PIQA WG ARC-E ARC-C HS LogiQA BoolQ MMLU
Mistral Minitron 8B 72.4 96.5 80.4 79.5 83.1 63 62.4 33.6 83.9 69.4
DarwinLM 4.8B 53.6 83.1 69.8 58.9 64.5 35.7 48.9 24.1 64.7 33.2
Downloads last month
72
Safetensors
Model size
8.33B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Shengkun/DarwinLM-4B-Mistral-Minitron-8B-Pruned-Masked

Quantizations
1 model