HuggingFaceM4
/

Idefics3-8B-Llama3

Image-Text-to-Text

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

HugoLaurencon commited on 28 days ago

Commit

605edaf

•

1 Parent(s): a1b83a3

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ We release the checkpoints under the Apache 2.0.
 ](https://huggingface.co/papers/2306.16527)
     - Idefics2 paper: [What matters when building vision-language models?
 ](https://huggingface.co/papers/2405.02246)
-    - Idefics3 paper: Coming soon (TODO)
 # Uses
@@ -65,7 +65,7 @@ Idefics3 demonstrates a great improvement over Idefics2, especially in document
 - We use 169 visual tokens to encode a image of size 364x364. Each image is divided into several sub images of sizes at most 364x364, which are then encoded separately.
 - For the fine-tuning datasets, we have extended [The Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and added several datasets, including [Docmatix](HuggingFaceM4/Docmatix). We will push soon these datasets to the same repo of The Cauldron (TODO).
-More details about the training of the model will be available in our upcoming technical report (TODO).
 # How to Get Started

 ](https://huggingface.co/papers/2306.16527)
     - Idefics2 paper: [What matters when building vision-language models?
 ](https://huggingface.co/papers/2405.02246)
+    - Idefics3 paper: [Building and better understanding vision-language models: insights and future directions](https://huggingface.co/papers/2408.12637)
 # Uses
 - We use 169 visual tokens to encode a image of size 364x364. Each image is divided into several sub images of sizes at most 364x364, which are then encoded separately.
 - For the fine-tuning datasets, we have extended [The Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and added several datasets, including [Docmatix](HuggingFaceM4/Docmatix). We will push soon these datasets to the same repo of The Cauldron (TODO).
+More details about the training of the model is available in our [technical report](https://huggingface.co/papers/2408.12637).
 # How to Get Started