PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption Paper • 2411.03357 • Published Nov 4, 2024
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment Paper • 2507.20984 • Published 25 days ago • 54
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment Paper • 2507.20984 • Published 25 days ago • 54
PowerInfer/SmallThinker-4BA0.6B-Instruct-GGUF Text Generation • 4B • Updated 18 days ago • 13.7k • 28
PowerInfer/SmallThinker-21BA3B-Instruct-GGUF Text Generation • 22B • Updated 18 days ago • 41.4k • 31
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs Paper • 2402.03804 • Published Feb 6, 2024 • 4
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10, 2024 • 28
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10, 2024 • 39
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10, 2024 • 28
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10, 2024 • 39