Add metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +140 -135
README.md CHANGED
@@ -1,135 +1,140 @@
1
- <div align="center">
2
- <img src="assets/logo.png" alt="JarvisArt Icon" width="100"/>
3
-
4
- # JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
5
- <!-- **JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent** -->
6
- <a href="https://arxiv.org/pdf/2506.17612"><img src="https://img.shields.io/badge/arXiv-2506.17612-b31b1b.svg" alt="Paper"></a>
7
- <a href="https://huggingface.co/papers/2506.17612"><img src="https://img.shields.io/badge/๐Ÿค—-Daily%20Papers-ffbd00.svg" alt="Huggingface Daily Papers"></a>
8
- <a href="https://jarvisart.vercel.app/"><img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page"></a>
9
- <a href="https://www.youtube.com/watch?v=Ol28DQj8wV8"><img src="https://img.shields.io/badge/YouTube-Watch-red" alt="YouTube"></a>
10
- <a href="https://www.bilibili.com/video/BV1Sd3nzREvP/?spm_id_from=333.1007.top_right_bar_window_history.content.click&vd_source=3939804dc1d27869e194605ae46329ec"><img src="https://img.shields.io/badge/BiliBili-ๅ“”ๅ“ฉๅ“”ๅ“ฉ-FF69B4" alt="BiliBili"></a>
11
- <a href="https://x.com/ling_yunlong/status/1940010865627103419"><img src="https://img.shields.io/twitter/follow/LYL1015?style=social" alt="Twitter Follow"></a>
12
- <a href="https://github.com/LYL1015/JarvisArt"><img src="https://img.shields.io/github/stars/LYL1015/JarvisArt?style=social" alt="GitHub Stars"></a>
13
- </div>
14
-
15
- <div align="center">
16
- <p>
17
- <a href="https://lyl1015.github.io/">Yunlong Lin</a><sup>1*</sup>,
18
- <a href="https://github.com/iendi">Zixu Lin</a><sup>1*</sup>,
19
- <a href="https://github.com/kunjie-lin">Kunjie Lin</a><sup>1*</sup>,
20
- <a href="https://noyii.github.io/">Jinbin Bai</a><sup>5</sup>,
21
- <a href="https://paulpanwang.github.io/">Panwang Pan</a><sup>4</sup>,
22
- <a href="https://chenxinli001.github.io/">Chenxin Li</a><sup>3</sup>,
23
- <a href="https://haoyuchen.com/">Haoyu Chen</a><sup>2</sup>,
24
- <a href="https://zhongdao.github.io/">Zhongdao Wang</a><sup>6</sup>,
25
- <a href="https://scholar.google.com/citations?user=k5hVBfMAAAAJ&hl=zh-CN">Xinghao Ding</a><sup>1โ€ </sup>,
26
- <a href="https://fenglinglwb.github.io/">Wenbo Li</a><sup>3โ™ฃ</sup>,
27
- <a href="https://yanshuicheng.info/">Shuicheng Yan</a><sup>5โ€ </sup>
28
- </p>
29
- </div>
30
-
31
- <div align="center">
32
- <p>
33
- <sup>1</sup>Xiamen University, <sup>2</sup>The Hong Kong University of Science and Technology (Guangzhou), <sup>3</sup> The Chinese University of Hong Kong, <sup>4</sup>Bytedance, <sup>5</sup>National University of Singapore, <sup>6</sup>Tsinghua University
34
- </p>
35
- <!-- <sup>*</sup>Equal Contributions <sup>โ™ฃ</sup>Project Leader <sup>โ€ </sup>Corresponding Author -->
36
- <!-- <p>Accepted by CVPR 2025</p> -->
37
- </div>
38
-
39
- ---
40
-
41
- ## โš ๏ธ Security Warning
42
-
43
- **IMPORTANT: This is the ONLY official JarvisArt repository!**
44
-
45
- We have identified **fake repositories** claiming to be JarvisArt that may contain **malware, viruses, or malicious code**. Please be extremely cautious and only use this official repository.
46
-
47
- **Known fake/malicious repositories:**
48
- - โŒ `https://github.com/joelp0/JarvisArt` - **FAKE & POTENTIALLY DANGEROUS**
49
- - โŒ Any other repositories not from our official organization
50
-
51
- <!-- --- -->
52
-
53
- <!-- ## ๐Ÿ“ฎ Updates
54
-
55
- - **[Coming Soon]** ๐Ÿš€ Gradio demo and Hugging Face demo will be released first.
56
- - **[Coming Soon]** ๐ŸŽฏ Training and inference code will be released.
57
- - **[2025.06]** ๐Ÿ“„ Paper is now available on arXiv.
58
- - **[2025.06]** ๐ŸŒ Project page is live. -->
59
-
60
- ---
61
-
62
- ## ๐Ÿ“ Overview
63
-
64
- <div align="center">
65
- <img src="assets/teaser.jpg" alt="JarvisArt Teaser" width="800"/>
66
- <br>
67
- <em>JarvisArt workflow and results showcase</em>
68
- </div>
69
-
70
- JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It is designed to liberate human creativity by understanding user intent, mimicking the reasoning of professional artists, and coordinating over 200 tools in Adobe Lightroom. JarvisArt utilizes a novel two-stage training framework, starting with Chain-of-Thought supervised fine-tuning for foundational reasoning, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to enhance its decision-making and tool proficiency. Supported by the newly created MMArt dataset (55K samples) and MMArt-Bench, JarvisArt demonstrates superior performance, outperforming GPT-4o with a 60% improvement in pixel-level metrics for content fidelity while maintaining comparable instruction-following capabilities.
71
-
72
- ---
73
-
74
- ## ๐ŸŽฌ Demo Videos
75
-
76
- <!-- <div align="center">
77
- <video width="800" controls>
78
- <source src="assets/demo.mp4" type="video/mp4">
79
- Your browser does not support the video tag.
80
- </video>
81
- <p>JarvisArt Demo Video: Showcasing intelligent photo retouching capabilities</p>
82
- </div> -->
83
-
84
- <!-- <div align="center">
85
- <img src="assets/demo1.gif" alt="JarvisArt Demo" width="800px">
86
- <p>JarvisArt Interactive Retouching Demonstration</p>
87
- </div>
88
-
89
- <div align="center">
90
- <img src="assets/demo2.gif" alt="JarvisArt Demo" width="800px">
91
- <p>JarvisArt Multimodal Instruction Understanding and Execution</p>
92
- </div> -->
93
- Global Retouching Case
94
- <div align="center">
95
- <img src="assets/global_demo1.gif" alt="JarvisArt Demo" width="800px">
96
- <p></p>
97
- </div>
98
-
99
- Local Retouching Case
100
- <div align="center">
101
- <img src="assets/local_demo1.gif" alt="JarvisArt Demo" width="800px">
102
- <p>JarvisArt supports multi-granularity retouching goals, ranging from scene-level adjustments to region-specific refinements. Users can perform intuitive, free-form edits through natural inputs such as text prompts and bounding boxes</p>
103
- </div>
104
-
105
- ## ๐Ÿ“š Citation
106
-
107
- If you find JarvisArt useful in your research, please consider citing:
108
-
109
- ```bibtex
110
- @article{jarvisart2025,
111
- title={JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent},
112
- author={Yunlong Lin and Zixu Lin and Kunjie Lin and Jinbin Bai and Panwang Pan and Chenxin Li and Haoyu Chen and Zhongdao Wang and Xinghao Ding and Wenbo Li and Shuicheng Yan},
113
- year={2025},
114
- journal={arXiv preprint arXiv:2506.17612}
115
- }
116
- ```
117
-
118
- ---
119
-
120
-
121
- ## ๐Ÿ“ง Contact
122
-
123
- For any questions or inquiries, please reach out to us:
124
-
125
- - **Yunlong Lin**: [email protected]
126
- - **Zixu Lin**: [email protected]
127
- - **Kunjie Lin**: [email protected]
128
-
129
- ---
130
-
131
- ## ๐Ÿ™ Acknowledgements
132
-
133
- We would like to express our gratitude to [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory.git) and [gradio_image_annotator](https://github.com/edgarGracia/gradio_image_annotator.git) for their valuable open-source contributions which have provided important technical references for our work.
134
-
135
-
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: image-text-to-text
4
+ ---
5
+
6
+ <div align="center">
7
+ <img src="assets/logo.png" alt="JarvisArt Icon" width="100"/>
8
+
9
+ # JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
10
+ <!-- **JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent** -->
11
+ <a href="https://arxiv.org/pdf/2506.17612"><img src="https://img.shields.io/badge/arXiv-2506.17612-b31b1b.svg" alt="Paper"></a>
12
+ <a href="https://huggingface.co/papers/2506.17612"><img src="https://img.shields.io/badge/๐Ÿค—-Daily%20Papers-ffbd00.svg" alt="Huggingface Daily Papers"></a>
13
+ <a href="https://jarvisart.vercel.app/"><img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page"></a>
14
+ <a href="https://www.youtube.com/watch?v=Ol28DQj8wV8"><img src="https://img.shields.io/badge/YouTube-Watch-red" alt="YouTube"></a>
15
+ <a href="https://www.bilibili.com/video/BV1Sd3nzREvP/?spm_id_from=333.1007.top_right_bar_window_history.content.click&vd_source=3939804dc1d27869e194605ae46329ec"><img src="https://img.shields.io/badge/BiliBili-ๅ“”ๅ“ฉๅ“”ๅ“ฉ-FF69B4" alt="BiliBili"></a>
16
+ <a href="https://x.com/ling_yunlong/status/1940010865627103419"><img src="https://img.shields.io/twitter/follow/LYL1015?style=social" alt="Twitter Follow"></a>
17
+ <a href="https://github.com/LYL1015/JarvisArt"><img src="https://img.shields.io/github/stars/LYL1015/JarvisArt?style=social" alt="GitHub Stars"></a>
18
+ </div>
19
+
20
+ <div align="center">
21
+ <p>
22
+ <a href="https://lyl1015.github.io/">Yunlong Lin</a><sup>1*</sup>,
23
+ <a href="https://github.com/iendi">Zixu Lin</a><sup>1*</sup>,
24
+ <a href="https://github.com/kunjie-lin">Kunjie Lin</a><sup>1*</sup>,
25
+ <a href="https://noyii.github.io/">Jinbin Bai</a><sup>5</sup>,
26
+ <a href="https://paulpanwang.github.io/">Panwang Pan</a><sup>4</sup>,
27
+ <a href="https://chenxinli001.github.io/">Chenxin Li</a><sup>3</sup>,
28
+ <a href="https://haoyuchen.com/">Haoyu Chen</a><sup>2</sup>,
29
+ <a href="https://zhongdao.github.io/">Zhongdao Wang</a><sup>6</sup>,
30
+ <a href="https://scholar.google.com/citations?user=k5hVBfMAAAAJ&hl=zh-CN">Xinghao Ding</a><sup>1โ€ </sup>,
31
+ <a href="https://fenglinglwb.github.io/">Wenbo Li</a><sup>3โ™ฃ</sup>,
32
+ <a href="https://yanshuicheng.info/">Shuicheng Yan</a><sup>5โ€ </sup>
33
+ </p>
34
+ </div>
35
+
36
+ <div align="center">
37
+ <p>
38
+ <sup>1</sup>Xiamen University, <sup>2</sup>The Hong Kong University of Science and Technology (Guangzhou), <sup>3</sup> The Chinese University of Hong Kong, <sup>4</sup>Bytedance, <sup>5</sup>National University of Singapore, <sup>6</sup>Tsinghua University
39
+ </p>
40
+ <!-- <sup>*</sup>Equal Contributions <sup>โ™ฃ</sup>Project Leader <sup>โ€ </sup>Corresponding Author -->
41
+ <!-- <p>Accepted by CVPR 2025</p> -->
42
+ </div>
43
+
44
+ ---
45
+
46
+ ## โš ๏ธ Security Warning
47
+
48
+ **IMPORTANT: This is the ONLY official JarvisArt repository!**
49
+
50
+ We have identified **fake repositories** claiming to be JarvisArt that may contain **malware, viruses, or malicious code**. Please be extremely cautious and only use this official repository.
51
+
52
+ **Known fake/malicious repositories:**
53
+ - โŒ `https://github.com/joelp0/JarvisArt` - **FAKE & POTENTIALLY DANGEROUS**
54
+ - โŒ Any other repositories not from our official organization
55
+
56
+ <!-- --- -->
57
+
58
+ <!-- ## ๐Ÿ“ฎ Updates
59
+
60
+ - **[Coming Soon]** ๐Ÿš€ Gradio demo and Hugging Face demo will be released first.
61
+ - **[Coming Soon]** ๐ŸŽฏ Training and inference code will be released.
62
+ - **[2025.06]** ๐Ÿ“„ Paper is now available on arXiv.
63
+ - **[2025.06]** ๐ŸŒ Project page is live. -->
64
+
65
+ ---
66
+
67
+ ## ๐Ÿ“ Overview
68
+
69
+ <div align="center">
70
+ <img src="assets/teaser.jpg" alt="JarvisArt Teaser" width="800"/>
71
+ <br>
72
+ <em>JarvisArt workflow and results showcase</em>
73
+ </div>
74
+
75
+ JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It is designed to liberate human creativity by understanding user intent, mimicking the reasoning of professional artists, and coordinating over 200 tools in Adobe Lightroom. JarvisArt utilizes a novel two-stage training framework, starting with Chain-of-Thought supervised fine-tuning for foundational reasoning, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to enhance its decision-making and tool proficiency. Supported by the newly created MMArt dataset (55K samples) and MMArt-Bench, JarvisArt demonstrates superior performance, outperforming GPT-4o with a 60% improvement in pixel-level metrics for content fidelity while maintaining comparable instruction-following capabilities.
76
+
77
+ ---
78
+
79
+ ## ๐ŸŽฌ Demo Videos
80
+
81
+ <!-- <div align="center">
82
+ <video width="800" controls>
83
+ <source src="assets/demo.mp4" type="video/mp4">
84
+ Your browser does not support the video tag.
85
+ </video>
86
+ <p>JarvisArt Demo Video: Showcasing intelligent photo retouching capabilities</p>
87
+ </div> -->
88
+
89
+ <!-- <div align="center">
90
+ <img src="assets/demo1.gif" alt="JarvisArt Demo" width="800px">
91
+ <p>JarvisArt Interactive Retouching Demonstration</p>
92
+ </div>
93
+
94
+ <div align="center">
95
+ <img src="assets/demo2.gif" alt="JarvisArt Demo" width="800px">
96
+ <p>JarvisArt Multimodal Instruction Understanding and Execution</p>
97
+ </div> -->
98
+ Global Retouching Case
99
+ <div align="center">
100
+ <img src="assets/global_demo1.gif" alt="JarvisArt Demo" width="800px">
101
+ <p></p>
102
+ </div>
103
+
104
+ Local Retouching Case
105
+ <div align="center">
106
+ <img src="assets/local_demo1.gif" alt="JarvisArt Demo" width="800px">
107
+ <p>JarvisArt supports multi-granularity retouching goals, ranging from scene-level adjustments to region-specific refinements. Users can perform intuitive, free-form edits through natural inputs such as text prompts and bounding boxes</p>
108
+ </div>
109
+
110
+ ## ๐Ÿ“š Citation
111
+
112
+ If you find JarvisArt useful in your research, please consider citing:
113
+
114
+ ```bibtex
115
+ @article{jarvisart2025,
116
+ title={JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent},
117
+ author={Yunlong Lin and Zixu Lin and Kunjie Lin and Jinbin Bai and Panwang Pan and Chenxin Li and Haoyu Chen and Zhongdao Wang and Xinghao Ding and Wenbo Li and Shuicheng Yan},
118
+ year={2025},
119
+ journal={arXiv preprint arXiv:2506.17612}
120
+ }
121
+ ```
122
+
123
+ ---
124
+
125
+
126
+ ## ๐Ÿ“ง Contact
127
+
128
+ For any questions or inquiries, please reach out to us:
129
+
130
+ - **Yunlong Lin**: [email protected]
131
+ - **Zixu Lin**: [email protected]
132
+ - **Kunjie Lin**: [email protected]
133
+
134
+ ---
135
+
136
+ ## ๐Ÿ™ Acknowledgements
137
+
138
+ We would like to express our gratitude to [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory.git) and [gradio_image_annotator](https://github.com/edgarGracia/gradio_image_annotator.git) for their valuable open-source contributions which have provided important technical references for our work.
139
+
140
+