Hiring 💼

100 18 315

9-Volt Fan

9voltfan2009

AI & ML interests

None yet

Recent Activity

upvoted a collection 3 days ago

SODA Discrete Audio Models

new activity 3 days ago

potsawee/soda-demo:Thanks a Million for This HF Space!

liked a Space 3 days ago

potsawee/soda-demo

View all activity

Organizations

upvoted a collection 3 days ago

SODA Discrete Audio Models

Collection

4 items • Updated 11 days ago • 4

New activity in potsawee/soda-demo 3 days ago

Thanks a Million for This HF Space!

#1 opened 3 days ago by

9voltfan2009

liked a Space 3 days ago

SODA Demo

🥤

SODA Demo

reacted to fabiosuizu's post with 🔥👀 3 days ago

Post

1435

Hi everyone!

I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback.

**What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme.

**Key specs**:
- 17MB total model size (NeMo Citrinet-256, INT4 quantized)
- 257ms median inference on CPU
- Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%)
- Benchmarked on speechocean762 (2,500 test utterances)
- Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian)

**Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB.

**Try it**: fabiosuizu/pronunciation-assessment

The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes.

**API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description.

Would love feedback on:
1. Use cases you'd find this useful for
2. Languages you'd want supported next
3. Whether the scoring feels calibrated for your experience level

Thanks!

1 reply

liked 2 Spaces 4 days ago

Ming UniAudio Demo

🎵

Multi-modal audio generation and processing demo.

KaniTTS2-en

😻

Generate speech from text with custom voice options

New activity in neuphonic/neutts-nano-multilingual-collection 5 days ago

Thanks a Million for This HF Space! + Voice Cloning Option?

#2 opened 6 days ago by

9voltfan2009

liked a Space 6 days ago

NeuTTS-Nano Multilingual Collection

🌍

Generate speech with voice cloning, now in four languages!

New activity in multimodalart/kugelaudio 7 days ago

Thanks a Million for This HF Space!

👍 🤗 2

#1 opened 12 days ago by

9voltfan2009

liked a model 7 days ago

unsloth/MiniMax-M2.5-GGUF

Text Generation • 229B • Updated 9 days ago • 73.4k • 144

reacted to danielhanchen's post with 🔥 7 days ago

Post

8368

You can now run MiniMax-2.5 locally! 🚀
At 230B parameters, MiniMax-2.5 is the strongest LLM under 700B params, delivering SOTA agentic coding & chat.

Run Dynamic 3/4-bit on a 128GB Mac for 20 tokens/s.
Guide: https://unsloth.ai/docs/models/minimax-2.5
GGUF: unsloth/MiniMax-M2.5-GGUF