Table of Contents

  1. GLaDOS voice model, trained on German Portal 1 and Portal 2 game files
    1. Model description
    2. Dataset & Training
      1. Build the training dataset
      2. Training
      3. Infere the final model

GLaDOS voice model, trained on German Portal 1 and Portal 2 game files

Model description

This model uses a checkpoint from the Torsten High model as a base and fine tuned it via the voice lines, directly coming from the game files of Portal 1 and Portal 2 to replicate the German GLaDOS voice for piper.

Training has been performed on an RTX 4000 for over more than 3000 epochs.

You will find two voice models in here

  • de_DE-glados-high.onnx and de_DE-glados-high.onnx.json: GLaDOS herself
  • de_DE-glados-turret-high.onnx and de_DE-glados-turret-high.onnx.json: Fine tuned on the above model to sound like the turret voice.

Dataset & Training

I also added hints on how to build the training dataset and the used toolchain for preparing and training the model in this repo. Reasons being:

  • The the training data is intellectual property and copyright by Valve (I cannot include it here for obvious reasons)
  • Training a model for piper (as of early 2025) relies on old/outdated tools from 2021 and getting everything up and running can be super frustrating

Requirements

  • A PC with an nVidia GPU and the proprietary nVidia drivers, CUDA, Docker + Docker as well as the nvidia-container-toolkit installed
  • Ideally use a linux system (WSL untested but potentially might work)
  • Basic linux and python knowledge

Build the training dataset

  1. Extract the files from the game

    The training dataset has been extracted from the Portal 1 and Portal 2 game files. For legal reason, they are not included in this repo. But you can easily extract them from the gamefiles via VPKEdit Portal 1:

    • Switch the game to the desired language (Here: German) via Steam
    • Navigate to <steam>/steamapps/common/Portal/portal and open portal_pak_dir.vpk with VPKEdit
    • Inside portal_pak_dir.vpk, navigate to sound/vo/aperture_ai and extract all *.wav files into the folder raw inside this git repo

    Portal 2:

    • Switch the game to the desired language (Here: German) via Steam
    • Navigate to <steam>/steamapps/common/portal 2. Select the subfolder matching the language (here portal2_german) and open pak01_dir.vpk via VPKEdit
    • Inside pak01_dir.vpk, navigate to sound/vo/glados and extract all *.wav files (but no subfolders) to the folder raw inside this git repo

    Portal 2 DLC 1:

    • Repeat the steps 1 for Portal 2 above but now select the portal2_dlc1_<your language> folder (if it exists). Here, portal2_dlc1_german does exist. Open pak01_dir.vpk with VPKEdit
    • Repeat step 3 of Portal 2 above but copy the files to raw in this git repo
  2. Transcode the files

    We need to transcode the files. The portal 1 files have a samplerate of 44.1 kHz WAV while the portal2 files are MP3. For training, we need WAV, 16bit (LE), mono PCM with the sampleratres shown below, depending on the model quality we want to train.

    • x-low, low: 16000 kHz
    • medium, high: 22050 Hz

    NOTE: In principle, we can also train on 44100 Hz, however the piper-train then needs to be modified for training and inference as it only supports Run the following command (needs ffmpeg to be installed)

    # Before running the script, first edit the bitrate, that you want
    ./0_transcode.sh
    
  3. Sort by good/bad samples

    Now the annoying part: Listen to all voice samples, one by one and sort them by good (same voice style, no degradation in quality, no additional none-voice parts or mumble etc) and bad (the opposite) I have written a helper script for this purpose: 1sortgoodbad.py (Read the comments in it). But hold your horses: Before you perform this annoying job, that can take several hours: I expect the quality of the voice lines to be similar across languges. So you can use my script 1_from_good.py which uses the good.txt file to tag voice samples as good or bad, based on my decisions made during listening to GLaDOS myself. Run the following command

    ./1_from_good.py
    
  4. Transcribe

    Now we need to transcribe the files. For this, we need faster-whisper. The easiest way to install and use it, is to do this via Docker. But before you do that, you should edit the file 2_transcribe.py and select the language and model you want to use.

    Run this to build the docker container(s)

    docker compose up --build -d
    docker exec -it transcribe bash
    

    You should now be in the transcribe docker container. Run

    ./2_transcribe.py
    

    This will yield a new file metadata.csv. Copy this file to raw_good, once transcription has finished

Training

For this, you should use the Docker container, which is provided by this repo. But before you do that, you need to configure the new files:

  • 3gentraindata.sh: Edit the samplerate (16000 for x-low and low, 22050 for medium, high models) and the language code (en, de, ru, fr, โ€ฆ)
  • 4train.sh: Edit the QUALITY, BATCHSIZE, PHONEMEMAX parameters, that suit your training hardware. Also select the CHKPOINT to start from: You ideally do not want to train from scratch but rather from an already exisiting checkpoint. Grab one from the piper people, that fits the model (x-low, low, medium, high) and language, that you want to train. Copy it to checkpoints within this repo.

Now run the following within this repo (if you haven't it already done for transcription)

docker compose up --build -d
docker exec -it training bash

This will build and enter the training container and also export training metrics via tensorboard at http://127.0.0.1:6006

From inside the container, you now need to generate your traindata for the training process

./3_gen_traindata.sh

And now, you are ready for training. Simply run

./4_train.sh

inside the container.

In the case, you need to stop training, you just have to change the path to the checkpoint by setting the CHCKPOINT variable in ./4_train.sh.

Infere the final model

After training has finished (either it flattened of or you hit the max epoch limit), you need to export the model to the onnx format. First, edit 5_export.sh and set the name and also the checkpoint (generally the last trained checkpoint by 4_train.sh, you want to export the model from From still inside the training docker container, run this command

./5_export.sh

This will generate a <model_name>.onnx and <model_name>.onnx.json file. The later one needs to be adjusted: Open it in a file editor and and navigate to the line where it reads

"dataset": "",

and place replace "" with this models name (here: "<modelname>")

"dataset": "de_DE-glados-high"

These two files can now be used by piper

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for systemofapwne/piper-de-glados

Quantized
(6)
this model