view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation By yuxiang630 and 8 others โข Apr 29, 2024 โข 78
LocAgent: Graph-Guided LLM Agents for Code Localization Paper โข 2503.09089 โข Published Mar 12 โข 13
bigcode/self-oss-instruct-sc2-exec-filter-50k Viewer โข Updated Nov 4, 2024 โข 50.7k โข 361 โข 99
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function Paper โข 2410.21438 โข Published Oct 28, 2024 โข 2
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper โข 2503.10460 โข Published Mar 13 โข 29
Running 2.67k 2.67k The Ultra-Scale Playbook ๐ The ultimate guide to training LLM on large GPU Clusters
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper โข 2502.02737 โข Published Feb 4 โข 232