Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
llm-jp
's Collections
Optimal Sparsity Code
Optimal Sparsity Math
LLM-jp-3.1 Fine-tuned Models
LLM-jp-3 Fine-tuned Models
Open Japanese LLM leaderboard
Multi Modal Models
Drop-Upcycling
Sparse Autoencoders
LLM-jp-3.1 Pre-trained Models
LLM-jp-3 Pre-trained Models
LLM-jp ver2.0 Models
LLM-jp ver1.1 Models
LLM-jp ver1.0 Models
Optimal Sparsity Math
updated
3 days ago
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Upvote
1
llm-jp/optimal-sparsity-math-d512-E8-k2-320M-A170M
0.3B
•
Updated
3 days ago
•
12
llm-jp/optimal-sparsity-math-d512-E16-k2-520M-A170M
0.5B
•
Updated
3 days ago
•
12
llm-jp/optimal-sparsity-math-d512-E32-k2-920M-A170M
0.9B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E64-k2-1.7B-A170M
2B
•
Updated
3 days ago
•
10
llm-jp/optimal-sparsity-math-d512-E128-k2-3.3B-A170M
3B
•
Updated
3 days ago
•
10
llm-jp/optimal-sparsity-math-d512-E256-k2-6.6B-A170M
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E8-k2-1.1B-A470M
1B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E16-k2-1.9B-A470M
2B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E32-k2-3.5B-A470M
3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E64-k2-6.7B-A470M
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E128-k2-13.2B-A470M
13B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E256-k2-26.0B-A470M
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E8-k2-3.9B-A1.5B
4B
•
Updated
3 days ago
•
10
llm-jp/optimal-sparsity-math-d2048-E16-k2-7.1B-A1.5B
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E32-k2-13.6B-A1.5B
14B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E64-k2-26.4B-A1.5B
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E128-k2-52.2B-A1.5B
52B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E8-k4-320M-A220M
0.3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E16-k4-520M-A220M
0.5B
•
Updated
3 days ago
•
12
llm-jp/optimal-sparsity-math-d512-E32-k4-920M-A220M
0.9B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E64-k4-1.7B-A220M
2B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E128-k4-3.3B-A220M
3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E256-k4-6.6B-A220M
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E8-k4-1.1B-A670M
1B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E16-k4-1.9B-A670M
2B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E32-k4-3.5B-A670M
3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E64-k4-6.7B-A670M
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E128-k4-13.2B-A670M
13B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E256-k4-26.0B-A670M
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E8-k4-3.9B-A2.3B
4B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E16-k4-7.1B-A2.3B
7B
•
Updated
3 days ago
•
10
llm-jp/optimal-sparsity-math-d2048-E32-k4-13.6B-A2.3B
14B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E64-k4-26.4B-A2.3B
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E128-k4-52.2B-A2.3B
52B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E8-k8-320M-A320M
0.3B
•
Updated
3 days ago
•
12
llm-jp/optimal-sparsity-math-d512-E16-k8-520M-A320M
0.5B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E32-k8-920M-A320M
0.9B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E64-k8-1.7B-A320M
2B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E128-k8-3.3B-A320M
3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E256-k8-6.6B-A320M
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E8-k8-1.1B-A1.1B
1B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E16-k8-1.9B-A1.1B
2B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E32-k8-3.5B-A1.1B
3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E64-k8-6.7B-A1.1B
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E128-k8-13.2B-A1.1B
13B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E256-k8-26.0B-A1.1B
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E8-k8-3.9B-A3.9B
4B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E16-k8-7.1B-A3.9B
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E32-k8-13.6B-A3.9B
14B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E64-k8-26.4B-A3.9B
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E128-k8-52.2B-A3.9B
52B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E16-k16-520M-A520M
0.5B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E32-k16-920M-A520M
0.9B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E64-k16-1.7B-A520M
2B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E128-k16-3.3B-A520M
3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d512-E256-k16-6.6B-A520M
7B
•
Updated
3 days ago
•
12
llm-jp/optimal-sparsity-math-d1024-E16-k16-1.9B-A1.9B
2B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E32-k16-3.5B-A1.9B
3B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E64-k16-6.7B-A1.9B
7B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d1024-E128-k16-13.2B-A1.9B
13B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d1024-E256-k16-26.0B-A1.9B
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E16-k16-7.1B-A7.1B
7B
•
Updated
3 days ago
•
13
llm-jp/optimal-sparsity-math-d2048-E32-k16-13.6B-A7.1B
14B
•
Updated
3 days ago
•
8
llm-jp/optimal-sparsity-math-d2048-E64-k16-26.4B-A7.1B
26B
•
Updated
3 days ago
•
11
llm-jp/optimal-sparsity-math-d2048-E128-k16-52.2B-A7.1B
52B
•
Updated
3 days ago
•
13
Upvote
1
Share collection
View history
Collection guide
Browse collections