Shengkun/DarwinLM-4.6B-Llama3.1-8B-Pruned-Masked

This is DarwinLM pruned from Llama3.1-8B. The model is masked that the pruned weights are set as 0 while the remaining weights are the same as the original model.

The shape of all weights are the same as the original model.

# To use the model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-4.6B-Llama3.1-8B-Pruned-Masked")

4.6B

Model	Method	Param.	SciQ	PIQA	WG	ArcE	ArcC	HS	LogiQA	BoolQ	MMLU	Avg
Llama-3.1-8B	Dense	8B	96.3	81.2	74.3	81.4	58.2	81.7	31.1	84.0	65.2	72.8
	Uniform	4.5B	29.1	53.6	51.7	26.0	23.6	27.1	25.5	62.1	25.7	36.1
	ZipLM	6B	65.5	60.6	56.0	40.2	34.4	34.4	28.1	63.0	27.9	45.7
	DarwinLM (one-shot)	4.6B	84.9	69.4	57.3	59.6	34.2	44.6	24.1	62.2	28.5	51.6
	OLMO (2.5T)	7B	92.8	79.4	70.4	73.3	44.9	77.1	27.9	72.5	28.3	62.9
	DarwinLM (10.0B)	4.6B	93.2	74.8	67.4	73.2	51.6	71.3	30.7	71.1	40.6	63.7

Shengkun
/

DarwinLM-4.6B-Llama3.1-8B-Pruned-Masked

Model tree for Shengkun/DarwinLM-4.6B-Llama3.1-8B-Pruned-Masked