You can find two examples here:
Joshua
AI & ML interests
Recent Activity
Organizations
Xenova's activity


The model itself has a maximum context length, so you can't feed everything through the model at once, unfortunately. To solve this, I implemented streaming in v1.2.0, which you can use as follows:
import { KokoroTTS } from "kokoro-js";
const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
dtype: "fp32", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
// device: "webgpu", // Options: "wasm", "webgpu" (web) or "cpu" (node).
});
const text = "Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects. It can even run 100% locally in your browser, powered by Transformers.js!";
const stream = tts.stream(text);
let i = 0;
for await (const { text, phonemes, audio } of stream) {
console.log({ text, phonemes });
audio.save(`audio-${i++}.wav`);
}

This is great! Does it work for nested cases too? For example,
Last week she said, โHi there. How are you?โ
should remain a single chunk.

Generate 10 seconds of speech in ~1 second for $0.
What will you build? ๐ฅ
webml-community/kokoro-webgpu
The most difficult part was getting the model running in the first place, but the next steps are simple:
โ๏ธ Implement sentence splitting, allowing for streamed responses
๐ Multilingual support (only phonemization left)
Who wants to help?

Hi there - we recently fixed this issue and will release a new version for it soon!

Hey! Oh that's awesome - great work! Feel free to adapt any code/logic of mine as you'd like!

๐ npm i kokoro-js ๐
Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX
You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";
const tts = await KokoroTTS.from_pretrained(
"onnx-community/Kokoro-82M-ONNX",
{ dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);
const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
{ voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");
Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! ๐ค
The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! ๐คฏ

If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.
What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. โค๏ธ
More details at hexgrad/Kokoro-82M#21

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! ๐คฏ
Try it out yourself! ๐
webml-community/attention-visualization
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization

For this demo, ~150MB if using WebGPU and ~120MB if using WASM.

๐ Faster and more accurate than Whisper
๐ Privacy-focused (no data leaves your device)
โก๏ธ WebGPU accelerated (w/ WASM fallback)
๐ฅ Powered by ONNX Runtime Web and Transformers.js
Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web

Sharing these new additions with the links in case itโs helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets
There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!
You can also add your favorite ones if you're part of the community!
Check it out:

#AIforJournalism #HuggingFace #OpenSourceAI

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)

๐ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
๐๏ธ Qwen2-VL from Qwen for dynamic-resolution image understanding
๐ข JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
๐ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
๐คธโโ๏ธ ViTPose for pose estimation
๐ MGP-STR for optical character recognition (OCR)
๐ PatchTST & PatchTSMixer for time series forecasting
That's right, everything running 100% locally in your browser (no data sent to a server)! ๐ฅ Huge for privacy!
Check out the release notes for more information. ๐
https://github.com/huggingface/transformers.js/releases/tag/3.1.0
Demo link (+ source code): webml-community/Janus-1.3B-WebGPU

โก WebGPU support (up to 100x faster than WASM)
๐ข New quantization formats (dtypes)
๐ 120 supported architectures in total
๐ 25 new example projects and templates
๐ค Over 1200 pre-converted models
๐ Node.js (ESM + CJS), Deno, and Bun compatibility
๐ก A new home on GitHub and NPM
Get started with
npm i @huggingface/transformers
.Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3

Expect massive performance gains. Inferenced a whole book with 46k chunks in <5min. If your device doesn't support #WebGPU use the classic Wasm-based version:
- WebGPU: https://do-me.github.io/SemanticFinder/webgpu/
- Wasm: https://do-me.github.io/SemanticFinder/
WebGPU harnesses the full power of your hardware, no longer being restricted to just the CPU. The speedup is significant (4-60x) for all kinds of devices: consumer-grade laptops, heavy Nvidia GPU setups or Apple Silicon. Measure the difference for your device here: Xenova/webgpu-embedding-benchmark
Chrome currently works out of the box, Firefox requires some tweaking.
WebGPU + transformers.js allows to build amazing applications and make them accessible to everyone. E.g. SemanticFinder could become a simple GUI for populating your (vector) DB of choice. See the pre-indexed community texts here: do-me/SemanticFinder
Happy to hear your ideas!
We have Transformers.js, the JavaScript/WASM/WebGPU port of the python library, which supports ~100 different architectures.
Docs: https://huggingface.co/docs/transformers.js
Repo: http://github.com/xenova/transformers.js
Is that the kind of thing you're looking for? :)

- ๐ค Demo: webml-community/phi-3.5-webgpu
- ๐งโ๐ป Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu

Install it from NPM with:
๐๐๐ ๐ @๐๐๐๐๐๐๐๐๐๐๐/๐๐๐๐๐๐๐๐๐๐๐๐
or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q
Segment Anything demo: webml-community/segment-anything-webgpu