prithivMLmods commited on
Commit
29bb90f
·
verified ·
1 Parent(s): c42642f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -23,4 +23,21 @@ pipeline_tag: image-text-to-text
23
  | **Precision** | bfloat16 |
24
 
25
  > [!note]
26
- > The open dataset image-text response will be updated soon.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  | **Precision** | bfloat16 |
24
 
25
  > [!note]
26
+ > The open dataset image-text response will be updated soon.
27
+
28
+ ## References
29
+
30
+ - **DocVLM: Make Your VLM an Efficient Reader**
31
+ [https://arxiv.org/pdf/2412.08746v1](https://arxiv.org/pdf/2412.08746v1)
32
+
33
+ - **YaRN: Efficient Context Window Extension of Large Language Models**
34
+ [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
35
+
36
+ - **Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution**
37
+ [https://arxiv.org/pdf/2409.12191](https://arxiv.org/pdf/2409.12191)
38
+
39
+ - **Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond**
40
+ [https://arxiv.org/pdf/2308.12966](https://arxiv.org/pdf/2308.12966)
41
+
42
+ - **A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy**
43
+ [https://arxiv.org/pdf/2412.02210](https://arxiv.org/pdf/2412.02210)