tiantiaf
/

wavlm-large-broader-accent

Audio Classification

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions Community

wavlm-large-broader-accent / README.md

tiantiaf's picture

Update README.md

49394e5 verified about 1 month ago

|

history blame contribute delete

2.68 kB

	---
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	license: openrail
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- microsoft/wavlm-large
	pipeline_tag: audio-classification
	datasets:
	- edinburghcstr/edacc
	- mozilla-foundation/common_voice_11_0
	---
	# WavLM-Large for Broader Accent Classification

	# Model Description
	This model includes the implementation of broader accent classification described in Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits (https://arxiv.org/pdf/2505.14648)

	The included English accents are: ['British Isles', 'North America', 'Other']

	- Library: https://github.com/tiantiaf0627/vox-profile-release


	# How to use this model

	## Download repo
	```
	git clone [email protected]:tiantiaf0627/vox-profile-release.git
	```
	## Install the package
	```
	conda create -n vox_profile python=3.8
	cd vox-profile-release
	pip install -e .
	```

	## Load the model
	```python
	# Load libraries
	import torch
	import torch.nn.functional as F
	from src.model.accent.wavlm_accent import WavLMWrapper

	# Find device
	device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

	# Load model from Huggingface
	model = WavLMWrapper.from_pretrained("tiantiaf/wavlm-large-broader-accent").to(device)
	model.eval()
	```

	## Prediction
	```python
	# Label List
	english_accent_list = [
	'British Isles', 'North America', 'Other'
	]

	# Load data, here just zeros as the example, audio data should be 16kHz mono channel
	data = torch.zeros([1, 16000]).float().to(device)
	logits, embeddings = model(data, return_feature=True)

	# Probability and output
	accent_prob = F.softmax(logits, dim=1)
	print(english_accent_list[torch.argmax(accent_prob).detach().cpu().item()])
	```

	## If you have any questions, please contact: Tiantian Feng ([email protected])

	## Kindly cite our paper if you are using our model or find it useful in your work
	```
	@article{feng2025vox,
	title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
	author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
	journal={arXiv preprint arXiv:2505.14648},
	year={2025}
	}
	```
	Responsible use of the Model: the Model is released under Open RAIL license, and users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions in using our model.

	❌ Out-of-Scope Use
	- Clinical or diagnostic applications
	- Surveillance
	- Privacy-invasive applications
	- No commercial use