LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 15
Aligning Multimodal LLM with Human Preference: A Survey Paper • 2503.14504 • Published 7 days ago • 20
NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering Paper • 2502.10868 • Published Feb 15 • 2
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published 28 days ago • 19
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 1
ColQwen2 Models Collection Pre-trained checkpoints for the ColQwen2 model. • 4 items • Updated Jan 23 • 4
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 10 items • Updated 1 day ago • 412
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3 • 17
Question Answering on Patient Medical Records with Private Fine-Tuned LLMs Paper • 2501.13687 • Published Jan 23 • 9
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 99
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper • 2501.05122 • Published Jan 9 • 20