benjamin-paine (Benjamin Paine)

reacted to clem's post with 👀 4 months ago

Post

3830

Should we assemble affordable open-source robots at Hugging Face for the community. Would you buy them? At what price?

8 replies

·

reacted to Kseniase's post with 🔥 4 months ago

Post

7964

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

·

reacted to frimelle's post with ❤️ 5 months ago

Post

2454

What’s in a name? More than you might think, especially for AI.
Whenever I introduce myself, people often start speaking French to me, even though my French is très basic. It turns out that AI systems do something similar:
Large language models infer cultural identity from names, shaping their responses based on presumed backgrounds. But is this helpful personalization or a reinforcement of stereotypes?
In our latest paper, we explored this question by testing DeepSeek, Llama, Aya, Mistral-Nemo, and GPT-4o-mini on how they associate names with cultural identities. We analysed 900 names from 30 cultures and found strong assumptions baked into AI responses: some cultures were overrepresented, while others barely registered.
For example, a name like "Jun" often triggered Japan-related responses, while "Carlos" was linked primarily to Mexico, even though these names exist in multiple countries. Meanwhile, names from places like Ireland led to more generic answers, suggesting weaker associations in the training data.
This has real implications for AI fairness: How should AI systems personalize without stereotyping? Should they adapt at all based on a name?
Work with some of my favourite researchers: @sidicity Arnav Arora and @IAugenstein
Read the full paper here: Presumed Cultural Identity: How Names Shape LLM Responses (2502.11995)

reacted to clem's post with 🤗 5 months ago

Post

3522

We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.

Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.

Have you been using any integration and how can we make it better?

https://huggingface.co/blog/inference-providers

posted an update 5 months ago

Post

2442

Zonos is flying up the trending tab, and for good reason - it's the most expressive and emotive open-source TTS I've used to date. I'm happy to say it's now supported in Taproot, with added long-form synthesis support and other goodies.

Try it here: https://huggingface.co/spaces/benjamin-paine/zonos-longform

Getting started with Zonos in Taproot is easy; with a working CUDA toolkit and Python/Pip installation, all you have to do is:

apt install espeak-ng
pip install taproot
taproot install speech-synthesis:zonos-transformer
taproot invoke speech-synthesis:zonos-transformer --text "Hello, world!"

See more on GitHub at https://github.com/painebenjamin/taproot/

2 replies

·

replied to Xenova's post 5 months ago

Yup! That stays one chunk.

chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()

Emitting "Last week she said, “Hi there. How are you?”"

The only exception is with newlines - I wanted it to emit when a newline was encountered.

chunker.push("Last week she said,\n“Hi there. How are you?”");
chunker.flush()

Emitting "Last week she said,"
Emitting "“Hi there. How are you?”"

If you want to disable this behavior, pass in {emitParagraphs: false} to the constructor, i.e.:

const chunker = new SentenceChunker({emitParagraphs: false});

There's also chunkLength to determine the character length maximum (128 by default), and emitTrimmed on whether or not each emit should trim leading/trailing whitespace (default true.) One last thing, if your input is always growing - like if you're streaming one response and just concatenating it as one big string - you can use GrowingSentenceChunker instead (in the same file.) Example:

const chunker = new GrowingSentenceChunker();
chunker.onChunk((chunk) => { console.log(`Emitting "${chunk}"`); });
chunker.push("Last week");
chunker.push("Last week she said");
chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()

Emitting "Last week she said, “Hi there. How are you?”"

And just in case it's not obvious, the .flush() call will just emit anything left in the buffer, even if it's shorter than the maximum length. If you don't call .flush(), it will wait for another input that pushes it over the chunk limit before emitting again.

replied to Xenova's post 5 months ago

I spent a bit of time working on a JavaScript sentence splitter - it might work right out of the box for this purpose! It tries to split on punctuation when possible for smooth flow, but has a max length option to ensure run-on sentences still get split, too. It also maintains a buffer so you can just keep pushing streaming text into it and it will emit when it has a full chunk.

https://raw.githubusercontent.com/painebenjamin/anachrovox/refs/heads/main/www/sentence.js

Example:

const chunker = new SentenceChunker();
chunker.onChunk((sentenceChunk) => { console.log(`Emitting "${sentenceChunk}"`); });
chunker.push("The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.");
chunker.flush()

Output:

Emitting "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
Emitting "The best performing models also connect the encoder and decoder through an attention mechanism."
Emitting "We propose a new simple network architecture, the Transformer, based solely on attention mechanisms,"
Emitting "dispensing with recurrence and convolutions entirely."
Emitting "Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train."

reacted to Xenova's post with 🔥 5 months ago

Post

13619

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

11 replies

·

reacted to odellus's post with 🧠 6 months ago

Post

1545

Tired: shitposting on bsky
Wired: shitposting on hf

1 reply

·

reacted to hexgrad's post with 🚀 6 months ago

Post

8667

hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages, pip install kokoro, and still 82M parameters.

GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS

11 replies

·

reacted to clem's post with 🤗 6 months ago

Post

7356

AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!

reacted to sayakpaul's post with 🤗 6 months ago

Post

2057

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

replied to mitkox's post 6 months ago

Thanks for doing this! I've been all-in on llama.cpp for awhile now but I would be lying if I said I didn't wonder if I was missing out on anything with other engines.

reacted to sequelbox's post with 👍 6 months ago

Post

2364

A general FYI that Valiant Labs no longer has an X account. This is a business decision. Many other businesses seem to be making the same decision right now.

You can follow my account on Bluesky for updates on Shining Valiant 3, other Valiant Labs models, my open-source datasets, etc: https://bsky.app/profile/sequelbox.bsky.social

back to building :)

reacted to merve's post with ❤️ 6 months ago

Post

2648

Everything that happened this week in open AI, a recap 🤠 merve/jan-17-releases-678a673a9de4a4675f215bf5

👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻‍♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm

replied to their post 6 months ago

Hello again @JLouisBiz !

I've updated the spaces, they now use Kokoro instead of XTTS. It's drastically faster. Additionally, since the TTS is so much faster, I felt comfortable extended the output to 1024 tokens.

replied to their post 7 months ago

Hello! It's currently clipped at 512 tokens for output, so yes it won't be suitable for very long generation. It's also a very tiny model - Llama 3.2 3B - so definitely more for conversation and less for completing tasks.

I'm going to try and swap in Kokoro TTS which should be faster on these small machines. Thanks for taking the time to test.

replied to their post 7 months ago

I'm sorry that it's not working for you - can you make sure you've given it permission to use your microphone and that you're using the correct one (if you have multiple)? There should be an icon in the corner like this (in chrome) you can click on which should let you select microphones and check levels. Whenever I've had trouble activating it, I've always found I was using the wrong microphone or my voice volume was way far down.

If you're using a browser other than Chrome please let me know, I've tested it in others but there could always be something I'm missing.

replied to their post 7 months ago

Regarding the indicators in the bottom right,

If the "recording" light doesn't turn on (the top one,) then it did not hear you utter a wake phrase.
If the "listening" light does turn on, it detects voice activity, but unless you utter a wake phrase it will not send the recording for transcription and completion.

So in short, if you say "Hex Vox, what's the news?" and you don't see the recording light turn on, then it didn't catch the wake phrase and you have to try again.

If instead you just want to speak your command without relying on wake phrase recognition, you can just click the "Call" button - that will start recording immediately and always send the audio for transcription.

This project was the one that set me off on making the wake phrase model in the first place. At first I didn't have it and relied instead on voice activity detection and transcription, however this performs extremely poorly in noisy environments or any kind of muted speech, with near-constant accidental activation. The only efficient way to be always-on AND hands-free was to use a front-end wake-word model to gate the rest of the audio workflow.

replied to their post 7 months ago

You're very welcome! Just so it's clear, the code is licensed under Apache, and the wake-word models are licensed under CC-BY-4.0 (to coincide with the licenses of the audio they were trained on.) More info on the models here: https://huggingface.co/benjamin-paine/anachrovox

Benjamin Paine PRO

AI & ML interests

Recent Activity

Organizations

Benjamin Paine PRO

AI & ML interests

Recent Activity

Organizations

benjamin-paine's activity