Model Card for unphonemizer-gpt2

This model was pretrained on a phonemized dataset from distil-whisper/whisper_transcriptions using the GPT-2 architecture to un-phonemize the phonemes.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: [More Information Needed]
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]

Uses

Direct Use

Translates a sequence of phonemes into english text.

How to Get Started with the Model

To use the model, input the <s> token, than the phonemized text, than the seperator |. The model will then predict the english text following the '|' seperator. For example, to un-phonemize ˈeɪthˈʌndɹɪd nˈaɪnti sˈɪks (896), input to the model <s>ˈeɪthˈʌndɹɪd nˈaɪnti sˈɪks| and it will predict 896 following the '|'

Use the code below to get started with the model.


Training Details

Training Data

[More Information Needed]

Training Procedure

GPT-2 architecture was trained directly on the data from the dataset. A custom loss function was used to only train the model on the english side; no loss is added for how it predicts the phonemes.

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Downloads last month
4
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Ramora0/unphonemizer-gpt2