Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ base_model:
|
|
14 |
|
15 |
### Model Description
|
16 |
|
17 |
-
MoshiVis is a perceptually augmented version of Moshi, giving it the ability to freely discuss images whilst maintaining its natural conversation style and low latency.
|
18 |
To achieve this, Moshi has been extended with a visual backbone and a cross-attention mechanism to infuse the visual information into the language model.
|
19 |
To train MoshiVis, we add a few parameters (~200M) on top of a frozen Moshi backbone (for the text/speech modeling aspect, ~7B params)
|
20 |
and a PaliGemma2 vision encoder (for the image encoding part, ~400M parameters).
|
@@ -31,6 +31,8 @@ We provide the same model weights for other backends and quantization formats in
|
|
31 |
|
32 |
### Model Sources
|
33 |
|
|
|
|
|
34 |
- **Repository:** [Github kyutai-labs/moshivis](https://github.com/kyutai-labs/moshivis)
|
35 |
- **Demo:** [Talk to Moshi](http://vis.moshi.chat)
|
36 |
|
|
|
14 |
|
15 |
### Model Description
|
16 |
|
17 |
+
**MoshiVis** ([Project Page](https://kyutai.org/moshivis) | [arXiv](https://arxiv.org/abs/2503.15633)) is a perceptually augmented version of Moshi, giving it the ability to freely discuss images whilst maintaining its natural conversation style and low latency.
|
18 |
To achieve this, Moshi has been extended with a visual backbone and a cross-attention mechanism to infuse the visual information into the language model.
|
19 |
To train MoshiVis, we add a few parameters (~200M) on top of a frozen Moshi backbone (for the text/speech modeling aspect, ~7B params)
|
20 |
and a PaliGemma2 vision encoder (for the image encoding part, ~400M parameters).
|
|
|
31 |
|
32 |
### Model Sources
|
33 |
|
34 |
+
- **Project Page** [kyutai.org/moshivis](https://kyutai.org/moshivi)
|
35 |
+
- **Preprint** ([arXiv/abs/2503.15633](https://arxiv.org/abs/2503.15633))
|
36 |
- **Repository:** [Github kyutai-labs/moshivis](https://github.com/kyutai-labs/moshivis)
|
37 |
- **Demo:** [Talk to Moshi](http://vis.moshi.chat)
|
38 |
|