v1v1d/Arxiv_MD_v2_2k
Viewer
•
Updated
•
3.04k
•
64
A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR