Norwegian Chatterbox TTS Model Card

Model Overview

Name: Norwegian Chatterbox TTS (finetuned)
Architecture: Based on the Chatterbox text-to-speech architecture, adapted and finetuned for Norwegian language support.
Framework: 🤗 Transformers & 🦾 TorchAudio
License: Proprietary Commercial

This model generates high-quality, natural-sounding Norwegian speech from input text. The model is a fine-tuned version of ResembleAI/chatterbox. It’s ideal for voice assistants, audiobooks, notifications, and accessibility applications requireing Norwegian language support.

Intended Use

Primary: Text-to-speech synthesis for Norwegian (Bokmål / Nynorsk).
- Emotional expressiveness
- Norwegian dialects
Examples:
- Virtual assistants and chatbots
- E‑learning platforms
- Audiobook narration
- In‑car infotainment systems

Training Data

Base Model: Pretrained Chatterbox TTS, trained on a multi‑lingual corpus.
Fine‑Tuning Data:
- ~6000 hours of audio‑recordings and transcriptions, of varying quality, spanning many dialects in addition to Bokmål and Nynorsk.

Samples / Eksempler
Tilgi dem ikke; de vet hva de gjør! De puster på hatets og ondskapens glør! De liker å drepe, de frydes ved jammer, de ønsker å se vår verden i flammer! De ønsker å drukne oss alle i blod! Tror du det ikke? Du vet det jo!
Settings / Innstillinger	Generert lydklipp / Generated audio clip
Male voice (English speaker) Ex: 0.5, CFG: 0.5, Temp: 0.5
Male voice (English speaker) Ex: 0.8, CFG: 0.5, Temp: 0.5
Male voice (English speaker) Ex: 1.2, CFG: 0.5, Temp: 0.7
Female voice (English speaker) Ex: 0.5, CFG: 0.5, Temp: 0.7
Female voice (English speaker) Ex: 0.5, CFG: 0.5, Temp: 0.4
Female voice (English speaker) Ex: 0.5, CFG: 0.5, Temp: 0.7

Known limitations

The model does not handle longer text inputs
The model only supports Norwegian

Roadmap

Make the model support longer text inputs

Installation

pip install chatterbox-tts

Usage

from pathlib import Path

import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
from huggingface_hub import hf_hub_download

REPO_ID = "akhbar/chatterbox-tts-norwegian"

for fpath in ["ve.safetensors", "t3_cfg.safetensors", "s3gen.safetensors", "tokenizer.json", "conds.pt"]:
    local_path = hf_hub_download(repo_id=REPO_ID, filename=fpath)

model = ChatterboxTTS.from_local(Path(local_path).parent, device="cuda")

text = (
    "Det beste er godt nok, bare man kan få det."
    "Dette ordtaket understreker at selv om man søker det beste, så kan det være vanskelig å oppnå, og at det man får kan være godt nok."
)
wav = model.generate(text, exaggeration=1.0, cfg_weight=0.5, temperature=0.4)
ta.save("test-1.wav", wav, model.sr)

# If you want to synthesize with a different voice, specify the audio prompt
AUDIO_PROMPT_PATH = "<voice_to_clone.wav>"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("ordtak.wav", wav, model.sr)

See example_tts.py for more examples.

Acknowledgements

ResembleAI

License

This project is offered under a dual‑license model:

Non‑Commercial Personal & Educational Use
- License: NonCommercial‑Personal‑Educational 1.0.0
- Applies to: Individuals and educational institutions only
- Permissions:
  - Run and modify the model for your own private or classroom use
- Restrictions:
  - No redistribution of the model or derivatives
  - No commercial use (including internal business use or paid services)
- See LICENSE.txt for full terms.
Commercial Use
- A separate, paid commercial license is required for any use of the model (or derivatives) in products, services, internal business processes, or any scenario involving monetary compensation or business advantage.
- To license this model for commercial purposes, please contact the author.

Citation

@misc{akhbar2025norwegianchatterbox,
  title={Chatterbox TTS Norwegian},
  author={Alexander Vaagan},
  year={2025},
  howpublished={\url{https://huggingface.co/akhbar/norwegian-chatterbox-tts}}
}

akhbar
/

chatterbox-tts-norwegian