Update README.md
Browse files
README.md
CHANGED
@@ -23,4 +23,21 @@ pipeline_tag: image-text-to-text
|
|
23 |
| **Precision** | bfloat16 |
|
24 |
|
25 |
> [!note]
|
26 |
-
> The open dataset image-text response will be updated soon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
| **Precision** | bfloat16 |
|
24 |
|
25 |
> [!note]
|
26 |
+
> The open dataset image-text response will be updated soon.
|
27 |
+
|
28 |
+
## References
|
29 |
+
|
30 |
+
- **DocVLM: Make Your VLM an Efficient Reader**
|
31 |
+
[https://arxiv.org/pdf/2412.08746v1](https://arxiv.org/pdf/2412.08746v1)
|
32 |
+
|
33 |
+
- **YaRN: Efficient Context Window Extension of Large Language Models**
|
34 |
+
[https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
|
35 |
+
|
36 |
+
- **Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution**
|
37 |
+
[https://arxiv.org/pdf/2409.12191](https://arxiv.org/pdf/2409.12191)
|
38 |
+
|
39 |
+
- **Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond**
|
40 |
+
[https://arxiv.org/pdf/2308.12966](https://arxiv.org/pdf/2308.12966)
|
41 |
+
|
42 |
+
- **A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy**
|
43 |
+
[https://arxiv.org/pdf/2412.02210](https://arxiv.org/pdf/2412.02210)
|