Update README.md
Browse files
README.md
CHANGED
@@ -16,12 +16,13 @@ language:
|
|
16 |
- en
|
17 |
|
18 |
---
|
19 |
-
This is the repo for the paper [PromptCap: Prompt-Guided Image Captioning
|
20 |
|
21 |
We introduce PromptCap, a captioning model that can be controlled by natural language instruction. The instruction may contain a question that the user is interested in.
|
22 |
For example, "what is the boy putting on?". PromptCap also supports generic caption, using the question "what does the image describe?"
|
23 |
|
24 |
-
PromptCap can
|
|
|
25 |
When paired with GPT-3, and conditioned on user question, PromptCap get SOTA performance on knowledge-based VQA tasks (60.4% on OK-VQA and 59.6% on A-OKVQA)
|
26 |
|
27 |
# QuickStart
|
|
|
16 |
- en
|
17 |
|
18 |
---
|
19 |
+
This is the repo for the paper [PromptCap: Prompt-Guided Task-Aware Image Captioning](https://arxiv.org/abs/2211.09699)
|
20 |
|
21 |
We introduce PromptCap, a captioning model that can be controlled by natural language instruction. The instruction may contain a question that the user is interested in.
|
22 |
For example, "what is the boy putting on?". PromptCap also supports generic caption, using the question "what does the image describe?"
|
23 |
|
24 |
+
PromptCap can serve as a light-weight visual plug-in (much faster than BLIP-2) for LLM like GPT-3, ChatGPT, and other foundation models like Segment Anything and DINO.
|
25 |
+
It achieves SOTA performance on COCO captioning (150 CIDEr).
|
26 |
When paired with GPT-3, and conditioned on user question, PromptCap get SOTA performance on knowledge-based VQA tasks (60.4% on OK-VQA and 59.6% on A-OKVQA)
|
27 |
|
28 |
# QuickStart
|