PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language Paper • 2505.10055 • Published May 15 • 1
Transformer-based Spatial Grounding: A Comprehensive Survey Paper • 2507.12739 • Published Jul 17
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language Paper • 2505.10055 • Published May 15 • 1
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 535