Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweig
|
|
| 10 |
Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
|
| 11 |
It takes an image document and a question as input and returns an answer. <br>
|
| 12 |
- **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
|
| 13 |
-
- **Paper
|
| 14 |
|
| 15 |
|
| 16 |
## 2 Model Summary
|
|
@@ -59,32 +59,19 @@ This model can be finetuned on other DocVQA datasets such as [InfoGraphVQA](http
|
|
| 59 |
|
| 60 |
|
| 61 |
|
| 62 |
-
## Citation
|
| 63 |
-
|
| 64 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 65 |
|
| 66 |
**BibTeX:**
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
## More Information [optional]
|
| 81 |
-
|
| 82 |
-
[More Information Needed]
|
| 83 |
-
|
| 84 |
-
## Model Card Authors [optional]
|
| 85 |
-
|
| 86 |
-
[More Information Needed]
|
| 87 |
-
|
| 88 |
-
## Model Card Contact
|
| 89 |
-
|
| 90 |
-
[More Information Needed]
|
|
|
|
| 10 |
Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
|
| 11 |
It takes an image document and a question as input and returns an answer. <br>
|
| 12 |
- **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
|
| 13 |
+
- **Paper:** [DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA](https://openaccess.thecvf.com/content/ICCV2025W/VisionDocs/html/Bencharef_DIVE-Doc_Downscaling_foundational_Image_Visual_Encoder_into_hierarchical_architecture_for_ICCVW_2025_paper.html)
|
| 14 |
|
| 15 |
|
| 16 |
## 2 Model Summary
|
|
|
|
| 59 |
|
| 60 |
|
| 61 |
|
| 62 |
+
## Citation
|
|
|
|
|
|
|
| 63 |
|
| 64 |
**BibTeX:**
|
| 65 |
|
| 66 |
+
```bibtex
|
| 67 |
+
@inproceedings{Bencharef_2025_ICCV,
|
| 68 |
+
author = {Bencharef, Rayane and Rahiche, Abderrahmane and Cheriet, Mohamed},
|
| 69 |
+
title = {DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA},
|
| 70 |
+
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
|
| 71 |
+
month = {October},
|
| 72 |
+
year = {2025},
|
| 73 |
+
pages = {7547-7556}
|
| 74 |
+
}
|
| 75 |
+
```
|
| 76 |
+
## Contact
|
| 77 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|