arxiv:1906.11645

RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

Published on Jun 26, 2019

Authors:

Abstract

An end-to-end neural network trained on a large Russian spoken language corpus achieves high scores for naturalness and intelligibility in text-to-speech synthesis.

AI-generated summary

We present RUSLAN -- a new open Russian spoken language corpus for the text-to-speech task. RUSLAN contains 22200 audio samples with text annotations -- more than 31 hours of high-quality speech of one person -- being the largest annotated Russian corpus in terms of speech duration for a single speaker. We trained an end-to-end neural network for the text-to-speech task on our corpus and evaluated the quality of the synthesized speech using Mean Opinion Score test. Synthesized speech achieves 4.05 score for naturalness and 3.78 score for intelligibility on a 5-point MOS scale.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1906.11645 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1906.11645 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1906.11645 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.