Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published 23 days ago • 18
Page to MD Collection A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR • 7 items • Updated Dec 13, 2024
Page to MD Collection A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR • 7 items • Updated Dec 13, 2024