JayRay5
/

DIVE-Doc-ARD-HRes

Model card Files Files and versions

JayRay5 commited on 17 days ago

Commit

3da99b6

·

verified ·

1 Parent(s): 77a37f0

Update README.md

Files changed (1) hide show

README.md +14 -27

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ DIVE-Doc is a VLM architecture built as a trade-off between end-to-end lightweig
 Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
 It takes an image document and a question as input and returns an answer. <br>
 - **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
-- **Paper [optional]:** [More Information Needed]
 ## 2 Model Summary
@@ -59,32 +59,19 @@ This model can be finetuned on other DocVQA datasets such as [InfoGraphVQA](http
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 Without relying on external tools such as OCR, it processes the inputs in an end-to-end way.
 It takes an image document and a question as input and returns an answer. <br>
 - **Repository:** [GitHub](https://github.com/JayRay5/DIVE-Doc)
+- **Paper:** [DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA](https://openaccess.thecvf.com/content/ICCV2025W/VisionDocs/html/Bencharef_DIVE-Doc_Downscaling_foundational_Image_Visual_Encoder_into_hierarchical_architecture_for_ICCVW_2025_paper.html)
 ## 2 Model Summary
+## Citation
 **BibTeX:**
+```bibtex
+@inproceedings{Bencharef_2025_ICCV,
+    author    = {Bencharef, Rayane and Rahiche, Abderrahmane and Cheriet, Mohamed},
+    title     = {DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA},
+    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
+    month     = {October},
+    year      = {2025},
+    pages     = {7547-7556}
+}
+```
+## Contact
+[email protected]