nielsr HF Staff commited on
Commit
7650c9e
·
verified ·
1 Parent(s): 0238781

Improve model card with pipeline tag and library name

Browse files

This PR improves the model card by adding the `pipeline_tag` and `library_name` metadata. The `pipeline_tag` is set to `image-text-to-text` as the model processes both image and text data to generate text. The `library_name` is set to `transformers` based on the model's reliance on the Transformers library. This ensures the model is properly categorized on the Hugging Face Hub and allows users to easily discover it using relevant search filters.

Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ ---
6
+
7
+ # Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
8
+
9
+ <p align="left">
10
+ <a href='https://jiwanchung.github.io/' target='_blank'>Jiwan Chung<sup>*</sup></a>&emsp;
11
+ <a href='https://junhyeok.kim/' target='_blank'>Junhyeok Kim<sup>*</sup></a>&emsp;
12
+ <a href='https://scholar.google.com/citations?user=w3hOuRoAAAAJ' target='_blank'>Siyeol Kim</a>&emsp;
13
+ <a href='https://jaeyoung-l.github.io/' target='_blank'>Jaeyoung Lee</a>&emsp;
14
+ <a href="https://scholar.google.com/citations?user=Og3gN_AAAAAJ" target='_blank'>Minsoo Kim</a>&emsp;
15
+ <a href='https://mirlab.yonsei.ac.kr/' target='_blank'>Youngjae Yu</a>
16
+ </p>
17
+
18
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.18842-b31b1b.svg)](https://arxiv.org/abs/2505.18842) [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-kjunh/v1--7B-FFD21E)](https://huggingface.co/kjunh/v1-7B)
19
+
20
+ <p align="center">
21
+ <img src="assets/figure.png">
22
+ </p>
23
+
24
+ ## Installation
25
+ ```bash
26
+ conda create -n v1 python=3.10 -y
27
+ conda activate v1
28
+ pip install -r requirements.txt
29
+ pip install flash-attn --no-build-isolation
30
+ ```
31
+
32
+ ## Demo
33
+
34
+ ### Gradio Web UI
35
+ Highly Recommended as the copy tokens are displayed on image.
36
+
37
+ <p align="center">
38
+ <img src="assets/demo.png">
39
+ </p>
40
+
41
+ ```bash
42
+ python run_gradio.py
43
+ ```
44
+
45
+ ### Inference
46
+ ```bash
47
+ python inference.py
48
+ ```
49
+ The script uses a default image URL and text prompt. To use your own inputs, you can modify the `image` variable within the `messages` list and the `text` field for the user prompt.
50
+
51
+ ## Coming Soon
52
+ - [x] Inference code
53
+ - [ ] Training data
54
+ - [ ] Evaluation code
55
+ - [ ] Training code
56
+
57
+
58
+ ## Citation
59
+ If you find our work valuable, please cite:
60
+ ```bibtex
61
+ @misc{chung2025dontlookoncemultimodal,
62
+ title={Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation},
63
+ author={Jiwan Chung and Junhyeok Kim and Siyeol Kim and Jaeyoung Lee and Min Soo Kim and Youngjae Yu},
64
+ year={2025},
65
+ eprint={2505.18842},
66
+ archivePrefix={arXiv},
67
+ primaryClass={cs.CL},
68
+ url={https://arxiv.org/abs/2505.18842},
69
+ }
70
+ ```