Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
2
2
Mayowa Daniel
mayowadan
Follow
0 followers
·
1 following
maydan2a
danieloi
AI & ML interests
None yet
Recent Activity
reacted
to
hexgrad
's
post
with ➕
13 days ago
IMHO, being able & willing to defeat CAPTCHA, hCaptcha, or any other reasoning puzzle is a must-have for any Web-Browsing / Computer-Using Agent (WB/CUA). I realize it subverts the purpose of CAPTCHA, but I do not think you can claim to be building AGI/agents without smoothly passing humanity checks. It would be like getting in a self-driving car that requires human intervention over speed bumps. Claiming AGI or even "somewhat powerful AI" seems hollow if you are halted by a mere CAPTCHA. I imagine OpenAI's Operator is *able* but *not willing* to defeat CAPTCHA. Like their non-profit status, I expect that policy to evolve over time—and if not, rival agent-builders will attack that opening to offer a better product.
reacted
to
hexgrad
's
post
with 👍
13 days ago
I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit. Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio. Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.
reacted
to
hexgrad
's
post
with ❤️
13 days ago
To Meta AI Research: I would like to fold https://huggingface.co/datasets/ylacombe/expresso into the training mix of an Apache TTS model series. Can you relax the Expresso dataset license to CC-BY or more permissive? Barring that, can I have an individual exception to train on the materials and distribute trained Apache models, without direct redistribution of the original files? Thanks! CC (Expresso paper authors whose handles I could find on HF) @wnhsu @adavirro @bowenshi @itaigat @TalRemez @JadeCopet @hassid @felixkreuk @adiyoss @edupoux
View all activity
Organizations
None yet
mayowadan
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
2 articles
18 days ago
view article
Article
Upgrading Kokoro: natural TTS for short bursts
By
hexgrad
•
Nov 22, 2024
•
29
view article
Article
G2P Shrinks Speech Models
By
hexgrad
•
Feb 5
•
61