Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning Paper • 2402.13669 • Published Feb 21 • 1
LoRA Dropout as a Sparsity Regularizer for Overfitting Control Paper • 2404.09610 • Published Apr 15 • 1
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24 • 41
Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugs Paper • 2409.10033 • Published Sep 16 • 1
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs Paper • 2409.02257 • Published Sep 3 • 1
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once? Paper • 2402.11597 • Published Feb 18 • 1
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models Paper • 2310.20410 • Published Oct 31, 2023 • 1
InFoBench: Evaluating Instruction Following Ability in Large Language Models Paper • 2401.03601 • Published Jan 7 • 7
Can Large Language Models Understand Real-World Complex Instructions? Paper • 2309.09150 • Published Sep 17, 2023 • 2
Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation Paper • 2308.01240 • Published Aug 2, 2023 • 2
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models Paper • 2306.04757 • Published Jun 7, 2023 • 6