hf-internal-testing (Hugging Face Internal Testing Organization)

NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯

🔐 Privacy by design (no data leaves your device)
💰 Completely free... forever
📦 Zero installation required, just visit a website
⚡️ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!

2 replies

·

lhoestq

updated a dataset 2 days ago

hf-internal-testing/ner-jsonl

Viewer • Updated 2 days ago • 11k • 50

lhoestq

published a dataset 2 days ago

hf-internal-testing/ner-jsonl

Viewer • Updated 2 days ago • 11k • 50

lysandre

updated a model 3 days ago

hf-internal-testing/MiniMax-tiny

Text Generation • Updated 3 days ago • 9

lysandre

published a model 3 days ago

hf-internal-testing/MiniMax-tiny

Text Generation • Updated 3 days ago • 9

danaaubakirova

posted an update 3 days ago

Post

1740

We just dropped SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics!

check out the blog: https://huggingface.co/blog/smolvla
read the technical report: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)
access the model weights: lerobot/smolvla_base

danaaubakirova

authored 2 papers 4 days ago

PatFig: Generating Short and Long Captions for Patent Figures

Paper • 2309.08379 • Published Sep 15, 2023

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published 4 days ago • 74

albertvillanova

posted an update 11 days ago

Post

426

New in smolagents v1.17.0:
- Structured generation in CodeAgent 🧱
- Streamable HTTP MCP support 🌐
- Agent.run() returns rich RunResult 📦

Smarter agents, smoother workflows.
Try it now: https://github.com/huggingface/smolagents/releases/tag/v1.17.0

sayakpaul

posted an update 16 days ago

Post

2372

Diffusers supports a good variety of quantization backends. It can be challenging to navigate through them, given the complex nature of diffusion pipelines in general.

So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.

Give it a go here:
https://lnkd.in/gf8Pi4-2

joaogante

posted an update 17 days ago

Post

453

Let's go! Custom generation code has landed in transformers 🚀

Have you designed a new cool KV cache? Maybe you're comparing new test-time compute ideas you've been researching? Have you found a way to do diffusion with existing models? You can now easily share your findings with the community with custom generation code, sharing the well-known generate interface 🤓

In a nutshell, we have expanded the support of custom modeling code on the Hub with *model-agnostic* custom generation code. Write for one model, reuse with any model -- hopefully, this will democratize access to new generation ideas 🫡

As a creator, you gain the ability to get your ideas in transformers with minimal effort. You'll also have access to all Hub features: a landing page for your creation, discussions, usage metrics, ... 🤓

💎 Resources 💎
- docs: https://huggingface.co/docs/transformers/generation_strategies#custom-decoding-methods
- minimal example: transformers-community/custom_generate_example
- discussion: transformers-community/support#10

sayakpaul

posted an update 17 days ago

Post

1679

Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.

This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code ♥️

We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.

Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.

Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.

We explore several key questions in the work, such as:

Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?

Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.

* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly

We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.

To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.

sayakpaul

authored a paper 21 days ago

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Paper • 2505.10046 • Published 23 days ago • 9

albertvillanova

posted an update 22 days ago

Post

2417

New in smolagents v1.16.0:
🔍 Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
🔧 Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
📚 Better docs

👉 https://github.com/huggingface/smolagents/releases/tag/v1.16.0

lewtun

authored a paper 26 days ago

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Paper • 2504.11354 • Published Apr 15 • 5

linoyts

posted an update about 1 month ago

Post

3239

FramePack is hands down one of the best OS releases in video generation 🙇🏻‍♀️🤯
✅ fully open sourced + amazing quality + reduced memory + improved speed
but more even - its gonna facilitate *soooo* many downstream applications
like this version adapted for landscape rotation 👇https://huggingface.co/spaces/tori29umai/FramePack_rotate_landscape