BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation Paper • 2506.00482 • Published 9 days ago • 8
BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation Paper • 2506.00482 • Published 9 days ago • 8
Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Paper • 2412.17759 • Published Dec 23, 2024
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 98
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use Paper • 2505.17332 • Published 17 days ago • 31
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding Paper • 2505.17330 • Published 17 days ago • 22
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding Paper • 2505.17330 • Published 17 days ago • 22 • 2
Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems Paper • 2505.18366 • Published 16 days ago • 25
Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems Paper • 2505.18366 • Published 16 days ago • 25 • 2
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use Paper • 2505.17332 • Published 17 days ago • 31
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 98
MVTamperBench: Evaluating Robustness of Vision-Language Models Paper • 2412.19794 • Published Dec 27, 2024 • 2