Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published Feb 21 • 13
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 32
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs Paper • 2406.08657 • Published Jun 12, 2024 • 10
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation Paper • 2310.13165 • Published Oct 19, 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond Paper • 2309.16583 • Published Sep 28, 2023 • 12