WhiStress Model
This is the official model checkpoint for WhiStress β introduced in our paper:
WhiStress: Enriching Transcriptions with Sentence Stress Detection (Interspeech 2025).
- π Project Page: pages.cs.huji.ac.il/adiyoss-lab/whistress
- π Code: github.com/slp-rl/WhiStress
- π¦ Dataset: slprl/TinyStress-15K
Overview
WhiStress extends OpenAI's Whisper ASR model with a decoder-based classifier that predicts token-level sentence stress. This allows models not only to transcribe speech but also to detect which words are emphasized.
This checkpoint is based on the whisper-small.en
variant and adds two stress-specific modules:
additional_decoder_block.pt
classifier.pt
π§ How to Use
You can use the weights in your own pipeline by cloning our codebase and loading the components:
git clone https://github.com/slp-rl/WhiStress.git
cd WhiStress
pip install -r requirements.txt
Then, either download the weights manually from this Hugging Face repo or use our script:
python download_weights.py
The weights should be placed in the following directory structure:
whistress/
βββ weights/
β βββ additional_decoder_block.pt
β βββ classifier.pt
β βββ metadata.json
π£οΈ Inference Example
from whistress import WhiStressInferenceClient
whistress_client = WhiStressInferenceClient(device="cuda") # or "cpu"
pred_transcription, pred_stresses = whistress_client.predict(
audio=sample['audio'], # (sr, np.ndarray)
transcription=None, # predict directly from audio both transcription and stress, pass transcription to predict stress only.
return_pairs=False # set to True if you a list want a list of (word, binary_label) pairs.
)
print(pred_transcription) # e.g., "I didnβt say she stole my money."
print(pred_stresses) # e.g., ['my']
Each prediction includes:
transcription
: full text outputemphasis_indices
: list of stressed token indicesemphasized_tokens
: list of corresponding words
Notes
The model is intended for research purposes only.
π Citation
If you use our model, please cite our work:
@misc{yosha2025whistress,
title={WHISTRESS: Enriching Transcriptions with Sentence Stress Detection},
author={Iddo Yosha and Dorin Shteyman and Yossi Adi},
year={2025},
eprint={2505.19103},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.19103},
}