view article Article Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs By davidberenstein1957 and 1 other โข 29 days ago โข 35
Qwen3 Collection Qwen's new Qwen3 models. In Unsloth Dynamic 2.0, GGUF, 4-bit and 16-bit Safetensor formats. Includes 128K Context Length variants. โข 65 items โข Updated 6 days ago โข 148
view article Article Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time By rbrt and 4 others โข Feb 18 โข 33
Granite Experiments Collection Experimental projects under consideration for the Granite family. โข 17 items โข Updated 1 day ago โข 12
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper โข 2504.07096 โข Published Apr 9 โข 74
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others โข Jan 28 โข 862
view article Article Decoding Strategies in Large Language Models By mlabonne โข Oct 29, 2024 โข 66
Solar Pro Collection The most intelligent LLM on a single GPU โข 4 items โข Updated Nov 15, 2024 โข 14
Cohere Labs Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. โข 3 items โข Updated Apr 15 โข 55
Yi 1.5 GGUFs Collection Collection of Yi 1.5 GGUFs made with gguf-my-repo โข 15 items โข Updated May 20, 2024 โข 5
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. โข 26 items โข Updated May 1 โข 569
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper โข 2402.15627 โข Published Feb 23, 2024 โข 39
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper โข 2402.17764 โข Published Feb 27, 2024 โข 618
Frankenmodels Collection They're not supposed to be that size! Neat, right? โข 8 items โข Updated Dec 12, 2023 โข 3