I just launched TTS Arena V2 - a platform for benchmarking TTS models by blind A/B testing. The goal is to make it easy to compare quality between open-source and commercial models, including conversational ones.
What's new in V2:
- **Conversational Arena**: Evaluate models like CSM-1B, Dia 1.6B, and PlayDialog in multi-turn settings - **Personal Leaderboard**: Optional login to see which models you tend to prefer - **Multi-speaker TTS**: Random voices per generation to reduce speaker bias - **Performance Upgrade**: Rebuilt from Gradio → Flask. Much faster with fewer failed generations. - **Keyboard Shortcuts**: Vote entirely via keyboard
Also added models like MegaTTS 3, Cartesia Sonic, and ElevenLabs' full lineup.
I'd love any feedback, feature suggestions, or ideas for models to include.
The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:
- Extracts both the logical structure AND researcher intuition from academic papers - Adopts the persona of researchers "before experiments" to capture exploratory thinking - Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model
It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.
I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.
- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity - I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions
My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.
This significantly reduces computation costs while expanding reasoning dataset domain coverage.
1. OCR a grocery list or train a titan while sipping coffee? ☕ 2. Camera Snap 📷: Capture life’s chaos—your cat’s face or that weird receipt. Proof you’re a spy! 3. OCR 🔍: PDFs beg for mercy as GPT-4o extracts text. 4. Image Gen 🎨: Prompt “neon superhero me” 5. PDF 📄: Double-page OCR Single-page sniping
having trouble with auto train hello there this is the first time i am testing auto train with a 1.8k SFT dataset. Howevery i am not quite sure the training is going smooth. Logs seem quite confusing, token did not match can not auth, generates confusing train splits, do you know how i can check my running job properly? what is being used for training as data? any ideas?
For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference – with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.