Kyutai

non-profit

Verified

https://kyutai.org/

kyutai_labs

kyutai-labs

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

ameroyer updated a collection 12 days ago

MoshiRAG Release

ameroyer updated a collection 12 days ago

Hibiki-Zero

gabrielatkyutail updated a model 17 days ago

kyutai/pocket-tts-without-voice-cloning

View all activity

Papers

One View Is Enough! Monocular Training for In-the-Wild Novel View Generation

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

View all Papers

kyutai 's collections 10

MoshiRAG Release

Candle & PyTorch model checkpoints released as part of the MoshiRAG release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi-rag

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

Paper • 2604.12928 • Published Apr 14
kyutai/moshika-rag-pytorch-bf16

Audio-to-Audio • Updated Apr 17 • 629 • 5
kyutai/moshika-rag-candle-bf16

Audio-to-Audio • Updated Apr 17 • 417 • 7

CASA

CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion on long-context streaming inputs

Running

Agents

3

CASA Gallery

🏠

3

Video Gallery for CASA: Cross-Attention over Self-Attention
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12
kyutai/CASA-Helium1-VL-2B

Image-Text-to-Text • 3B • Updated Mar 9 • 26 • 8
kyutai/CASA-Qwen2_5-VL-3B

Image-Text-to-Text • 4B • Updated Dec 23, 2025 • 176 • 2

Text-To-Speech

https://kyutai.org/next/tts

kyutai/pocket-tts

Updated 17 days ago • 6.4k • 630
kyutai/pocket-tts-without-voice-cloning

Updated 17 days ago • 7.68k • 24
kyutai/tts-1.6b-en_fr

Text-to-Speech • Updated Sep 11, 2025 • 150k • 376
kyutai/tts-voices

Updated Mar 9 • 154

Helium 1

Helium 1: a modular and multilingual LLM

kyutai/helium-1-2b

Text Generation • 2B • Updated Apr 30, 2025 • 11.9k • 54
kyutai/helium-1-2b-books

Text Generation • 2B • Updated Apr 30, 2025 • 9 • 1
kyutai/helium-1-2b-hum

Text Generation • 2B • Updated Apr 30, 2025 • 16
kyutai/helium-1-2b-life

Text Generation • 2B • Updated Apr 30, 2025 • 11 • 1

Hibiki fr-en

Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki.

Running

53

Hibiki Samples

🤗

53

Translate speech in real-time with high fidelity
High-Fidelity Simultaneous Speech-To-Speech Translation

Paper • 2502.03382 • Published Feb 5, 2025 • 8
kyutai/hibiki-1b-mlx-bf16

Translation • Updated Feb 6, 2025 • 122 • 30
kyutai/hibiki-2b-mlx-bf16

Translation • Updated Feb 6, 2025 • 14 • 22

Hibiki-Zero

Streaming speech translation without the need for word-level alignments

Running

12

Hibiki Zero Samples

🏆

12

Demo samples of the speech translation model Hibiki-Zero.
Simultaneous Speech-to-Speech Translation Without Aligned Data

Paper • 2602.11072 • Published Feb 11 • 1
kyutai/Audio-NTREX-4L

Viewer • Updated Feb 12 • 3.6k • 764 • 3
kyutai/hibiki-zero-3b-pytorch-bf16

Audio-to-Audio • Updated Feb 12 • 3.23k • 53

ARC-Encoders

Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs.

ARC-Encoder: learning compressed text representations for large language models

Paper • 2510.20535 • Published Oct 23, 2025 • 8
kyutai/ARC8_Encoder_Llama

Feature Extraction • Updated Nov 5, 2025 • 11 • 2
kyutai/ARC_finetuning

Preview • Updated Oct 24, 2025 • 31
kyutai/ARC8_Encoder_multi

Feature Extraction • Updated Nov 5, 2025 • 14 • 6

Speech-To-Text

https://kyutai.org/next/stt

kyutai/stt-2.6b-en

Automatic Speech Recognition • Updated Jun 26, 2025 • 122
kyutai/stt-1b-en_fr

Automatic Speech Recognition • Updated Nov 18, 2025 • 126
kyutai/stt-1b-en_fr-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 5
kyutai/stt-2.6b-en-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 8

MoshiVis v0.1

MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs

Vision-Speech Models: Teaching Speech Models to Converse about Images

Paper • 2503.15633 • Published Mar 19, 2025 • 2
kyutai/Babillage

Viewer • Updated Mar 21, 2025 • 465k • 522 • 13
kyutai/moshika-vis-pytorch-bf16

Updated Jun 18, 2025 • 58
kyutai/moshika-vis-candle-bf16

Updated Mar 18, 2025 • 1

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi

Moshi: a speech-text foundation model for real-time dialogue

Paper • 2410.00037 • Published Sep 17, 2024 • 16
kyutai/moshiko-pytorch-bf16

Updated Sep 18, 2024 • 147k • 240
kyutai/moshika-pytorch-bf16

Updated Sep 18, 2024 • 30.5k • 60
kyutai/mimi

Feature Extraction • 96.2M • Updated Jul 2, 2025 • 2M • • 301

MoshiRAG Release

Candle & PyTorch model checkpoints released as part of the MoshiRAG release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi-rag

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

Paper • 2604.12928 • Published Apr 14
kyutai/moshika-rag-pytorch-bf16

Audio-to-Audio • Updated Apr 17 • 629 • 5
kyutai/moshika-rag-candle-bf16

Audio-to-Audio • Updated Apr 17 • 417 • 7

Hibiki-Zero

Streaming speech translation without the need for word-level alignments

Running

12

Hibiki Zero Samples

🏆

12

Demo samples of the speech translation model Hibiki-Zero.
Simultaneous Speech-to-Speech Translation Without Aligned Data

Paper • 2602.11072 • Published Feb 11 • 1
kyutai/Audio-NTREX-4L

Viewer • Updated Feb 12 • 3.6k • 764 • 3
kyutai/hibiki-zero-3b-pytorch-bf16

Audio-to-Audio • Updated Feb 12 • 3.23k • 53

CASA

CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion on long-context streaming inputs

Running

Agents

3

CASA Gallery

🏠

3

Video Gallery for CASA: Cross-Attention over Self-Attention
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12
kyutai/CASA-Helium1-VL-2B

Image-Text-to-Text • 3B • Updated Mar 9 • 26 • 8
kyutai/CASA-Qwen2_5-VL-3B

Image-Text-to-Text • 4B • Updated Dec 23, 2025 • 176 • 2

ARC-Encoders

Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs.

ARC-Encoder: learning compressed text representations for large language models

Paper • 2510.20535 • Published Oct 23, 2025 • 8
kyutai/ARC8_Encoder_Llama

Feature Extraction • Updated Nov 5, 2025 • 11 • 2
kyutai/ARC_finetuning

Preview • Updated Oct 24, 2025 • 31
kyutai/ARC8_Encoder_multi

Feature Extraction • Updated Nov 5, 2025 • 14 • 6

Text-To-Speech

https://kyutai.org/next/tts

kyutai/pocket-tts

Updated 17 days ago • 6.4k • 630
kyutai/pocket-tts-without-voice-cloning

Updated 17 days ago • 7.68k • 24
kyutai/tts-1.6b-en_fr

Text-to-Speech • Updated Sep 11, 2025 • 150k • 376
kyutai/tts-voices

Updated Mar 9 • 154

Speech-To-Text

https://kyutai.org/next/stt

kyutai/stt-2.6b-en

Automatic Speech Recognition • Updated Jun 26, 2025 • 122
kyutai/stt-1b-en_fr

Automatic Speech Recognition • Updated Nov 18, 2025 • 126
kyutai/stt-1b-en_fr-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 5
kyutai/stt-2.6b-en-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 8

Helium 1

Helium 1: a modular and multilingual LLM

kyutai/helium-1-2b

Text Generation • 2B • Updated Apr 30, 2025 • 11.9k • 54
kyutai/helium-1-2b-books

Text Generation • 2B • Updated Apr 30, 2025 • 9 • 1
kyutai/helium-1-2b-hum

Text Generation • 2B • Updated Apr 30, 2025 • 16
kyutai/helium-1-2b-life

Text Generation • 2B • Updated Apr 30, 2025 • 11 • 1

MoshiVis v0.1

MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs

Vision-Speech Models: Teaching Speech Models to Converse about Images

Paper • 2503.15633 • Published Mar 19, 2025 • 2
kyutai/Babillage

Viewer • Updated Mar 21, 2025 • 465k • 522 • 13
kyutai/moshika-vis-pytorch-bf16

Updated Jun 18, 2025 • 58
kyutai/moshika-vis-candle-bf16

Updated Mar 18, 2025 • 1

Hibiki fr-en

Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki.

Running

53

Hibiki Samples

🤗

53

Translate speech in real-time with high fidelity
High-Fidelity Simultaneous Speech-To-Speech Translation

Paper • 2502.03382 • Published Feb 5, 2025 • 8
kyutai/hibiki-1b-mlx-bf16

Translation • Updated Feb 6, 2025 • 122 • 30
kyutai/hibiki-2b-mlx-bf16

Translation • Updated Feb 6, 2025 • 14 • 22

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi

Moshi: a speech-text foundation model for real-time dialogue

Paper • 2410.00037 • Published Sep 17, 2024 • 16
kyutai/moshiko-pytorch-bf16

Updated Sep 18, 2024 • 147k • 240
kyutai/moshika-pytorch-bf16

Updated Sep 18, 2024 • 30.5k • 60
kyutai/mimi

Feature Extraction • 96.2M • Updated Jul 2, 2025 • 2M • • 301

AI & ML interests

Recent Activity

Papers

Team members 17

kyutai 's collections 10

CASA Gallery

Hibiki Samples

Hibiki Zero Samples

Hibiki Zero Samples

CASA Gallery

Hibiki Samples