Shengkun/DarwinLM-4B-Mistral-Minitron-8B-Pruned-Masked

This is DarwinLM pruned from Mistral-Minitron-8B. The model is masked that the pruned weights are set as 0 while the remaining weights are the same as the original model.

The shape of all weights are the same as the original model.

# To use the model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-4B-Mistral-Minitron-8B-Pruned-Masked")

4.8B

Methods	Avg	SciQ	PIQA	WG	ARC-E	ARC-C	HS	LogiQA	BoolQ	MMLU
Mistral Minitron 8B	72.4	96.5	80.4	79.5	83.1	63	62.4	33.6	83.9	69.4
DarwinLM 4.8B	53.6	83.1	69.8	58.9	64.5	35.7	48.9	24.1	64.7	33.2

Shengkun
/

DarwinLM-4B-Mistral-Minitron-8B-Pruned-Masked

Model tree for Shengkun/DarwinLM-4B-Mistral-Minitron-8B-Pruned-Masked