MoshiVis v0.1
Collection
MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs
โข
8 items
โข
Updated
โข
22
Please refer to the main model card
This model page contains the Moshika (female voice) model weights for the rust
backend of the MoshiVis repo,
in Q8
format. We provide the same model weights for other backends and quantization formats in the associated model collection.
8-bit
Base model
google/paligemma2-3b-pt-448