Please refer to the main model card

This model page contains the Moshika (female voice) model weights for the MLX backend of the MoshiVis repo, in bfloat16,Q8 and Q4 formats. We provide the same model weights for other backends and quantization formats in the associated model collection.

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

8.31B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kyutai/moshika-vis-mlx

Base model

google/paligemma2-3b-pt-448

Finetuned

(16)

this model

Collection including kyutai/moshika-vis-mlx

MoshiVis v0.1

Collection

MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated 25 days ago • 22