cckm
/

tinymistral_950m

Text Generation

Model card Files Files and versions Community

tinymistral_950m / README.md

cckm's picture

Update README.md

b8ac79e verified 18 days ago

|

history blame contribute delete

845 Bytes

	---
	license: mit
	datasets:
	- tiiuae/falcon-refinedweb
	- HuggingFaceFW/fineweb
	base_model:
	- cckm/tinymistral_950m
	language:
	- en
	pipeline_tag: text-generation
	library_name: PyTorch
	---

	## A deep and narrow Mistral model (950M params)
	This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.

	It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.

	LM Harness numbers:
	\| Benchmark \| Result \|
	\| ----- \| ----- \|
	\| arc_c \| 0.2884 \|
	\| arc_e \| 0.5139 \|
	\| boolq \| 0.6089 \|
	\| hellaswag \| 0.5888 \|
	\| obqa \| 0.3280 \|
	\| piqa \| 0.7388 \|
	\| siqa \| 0.4038 \|
	\| wino \| 0.5627 \|