Automatic Speech Recognition
Transformers

WhiStress Model

This is the official model checkpoint for WhiStress β€” introduced in our paper:
WhiStress: Enriching Transcriptions with Sentence Stress Detection (Interspeech 2025).


Overview

WhiStress extends OpenAI's Whisper ASR model with a decoder-based classifier that predicts token-level sentence stress. This allows models not only to transcribe speech but also to detect which words are emphasized.

This checkpoint is based on the whisper-small.en variant and adds two stress-specific modules:

  • additional_decoder_block.pt
  • classifier.pt

πŸ”§ How to Use

You can use the weights in your own pipeline by cloning our codebase and loading the components:

git clone https://github.com/slp-rl/WhiStress.git
cd WhiStress
pip install -r requirements.txt

Then, either download the weights manually from this Hugging Face repo or use our script:

python download_weights.py

The weights should be placed in the following directory structure:

whistress/
β”œβ”€β”€ weights/
β”‚   β”œβ”€β”€ additional_decoder_block.pt
β”‚   β”œβ”€β”€ classifier.pt
β”‚   └── metadata.json

πŸ—£οΈ Inference Example

from whistress import WhiStressInferenceClient


whistress_client = WhiStressInferenceClient(device="cuda") # or "cpu"

pred_transcription, pred_stresses = whistress_client.predict(
    audio=sample['audio'], # (sr, np.ndarray)
    transcription=None, # predict directly from audio both transcription and stress, pass transcription to predict stress only.
    return_pairs=False # set to True if you a list want a list of (word, binary_label) pairs.
)
print(pred_transcription) # e.g., "I didn’t say she stole my money."
print(pred_stresses) # e.g., ['my']

Each prediction includes:

  • transcription: full text output
  • emphasis_indices: list of stressed token indices
  • emphasized_tokens: list of corresponding words

Notes

The model is intended for research purposes only.

πŸ“œ Citation

If you use our model, please cite our work:

@misc{yosha2025whistress,
    title={WHISTRESS: Enriching Transcriptions with Sentence Stress Detection}, 
    author={Iddo Yosha and Dorin Shteyman and Yossi Adi},
    year={2025},
    eprint={2505.19103},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2505.19103}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train slprl/WhiStress

Collection including slprl/WhiStress