prithivMLmods
/

docscopeOCR-7B-050425-exp

Image-Text-to-Text

text-generation-inference

KIE-Key Information Extraction

Model card Files Files and versions

prithivMLmods commited on May 4

Commit

29bb90f

·

verified ·

1 Parent(s): c42642f

Update README.md

Files changed (1) hide show

README.md +18 -1

README.md CHANGED Viewed

@@ -23,4 +23,21 @@ pipeline_tag: image-text-to-text
 | **Precision**           | bfloat16                                            |
 > [!note]
-> The open dataset image-text response will be updated soon.

 | **Precision**           | bfloat16                                            |
 > [!note]
+> The open dataset image-text response will be updated soon.
+## References
+- **DocVLM: Make Your VLM an Efficient Reader**
+  [https://arxiv.org/pdf/2412.08746v1](https://arxiv.org/pdf/2412.08746v1)
+- **YaRN: Efficient Context Window Extension of Large Language Models**
+  [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
+- **Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution**
+  [https://arxiv.org/pdf/2409.12191](https://arxiv.org/pdf/2409.12191)
+- **Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond**
+  [https://arxiv.org/pdf/2308.12966](https://arxiv.org/pdf/2308.12966)
+- **A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy**
+  [https://arxiv.org/pdf/2412.02210](https://arxiv.org/pdf/2412.02210)