Pensez: Less Data, Better Reasoning -- Rethinking French LLM Paper • 2503.13661 • Published 11 days ago • 5
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published Feb 13 • 34
Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs Paper • 2411.08719 • Published Nov 10, 2024
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs Paper • 2412.14471 • Published Dec 19, 2024
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published Feb 14
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published 22 days ago • 1
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper • 2503.01295 • Published 25 days ago • 8
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 55
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 30 days ago • 19
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 30 days ago • 19
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published 30 days ago • 7
Kanana: Compute-efficient Bilingual Language Models Paper • 2502.18934 • Published 30 days ago • 64
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Paper • 2501.06589 • Published Jan 11
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published Feb 19 • 33
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
One Thousand and One Pairs: A "novel" challenge for long-context language models Paper • 2406.16264 • Published Jun 24, 2024
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 55
view post Post 1579 Mini-QwQ an edge device friendly reasoning model distilled from QwQ-32B 🤗: kz919/QwQ-0.5B-Distilled-SFT🇬 🇬 🇺 🇫: kz919/QwQ-0.5B-Distilled-SFT-gguf🤖: kz919/Mini-QwQ See translation 👍 7 7 + Reply