AI & ML interests

None defined yet.

Recent Activity

blog-explorers's activity

mrfakename 
posted an update 3 days ago
view post
Post
1675
Hi everyone,

I just launched TTS Arena V2 - a platform for benchmarking TTS models by blind A/B testing. The goal is to make it easy to compare quality between open-source and commercial models, including conversational ones.

What's new in V2:

- **Conversational Arena**: Evaluate models like CSM-1B, Dia 1.6B, and PlayDialog in multi-turn settings
- **Personal Leaderboard**: Optional login to see which models you tend to prefer
- **Multi-speaker TTS**: Random voices per generation to reduce speaker bias
- **Performance Upgrade**: Rebuilt from Gradio → Flask. Much faster with fewer failed generations.
- **Keyboard Shortcuts**: Vote entirely via keyboard

Also added models like MegaTTS 3, Cartesia Sonic, and ElevenLabs' full lineup.

I'd love any feedback, feature suggestions, or ideas for models to include.

TTS-AGI/TTS-Arena-V2
  • 1 reply
·
merve 
posted an update 3 days ago
Reality123b 
posted an update 5 days ago
view post
Post
211
https://huggingface.co/posts/Reality123b/379097737205276
remember this dataset?
im bumping the example count to approx 23 million prompt-response pairs
and ofc. it is going to be a hybrid reasoning, well, it isnt programmatically hybrid reasoning but it is that it is going to use CoT whenever necessary and it doesnt when it doesnt seem like it doesnt need
merve 
posted an update 5 days ago
view post
Post
2491
Meta released Llama Guard 4 and new Prompt Guard 2 models 🔥

Llama Guard 4 is a new model to filter model inputs/outputs both text-only and image 🛡️ use it before and after LLMs/VLMs! meta-llama/Llama-Guard-4-12B

Prompt Guard 2 22M & 86M are smol models to prevent model jailbreaks and prompt injections ⚔ meta-llama/Llama-Prompt-Guard-2-22M meta-llama/Llama-Guard-4-12B
Both come with new release of transformers 🤗

Try the model right away 👉🏻https://github.com/huggingface/huggingface-llama-recipes/blob/main/llama_guard_4.ipynb

Read our blog to learn more and easily get started 👉🏻 https://huggingface.co/blog/llama-guard-4 🦙
  • 1 reply
·
merve 
posted an update 10 days ago
view post
Post
3915
Don't sleep on new AI at Meta Vision-Language release! 🔥

facebook/perception-encoder-67f977c9a65ca5895a7f6ba1
facebook/perception-lm-67f9783f171948c383ee7498

Meta dropped swiss army knives for vision with A2.0 license 👏
> image/video encoders for vision language modelling and spatial understanding (object detection etc) 👏
> The vision LM outperforms InternVL3 and Qwen2.5VL 👏
> They also release gigantic video and image datasets

The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks.

They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 👏



> Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮



> Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!)

The authors release the following checkpoints in sizes base, large and giant:

> 3 PE-Core checkpoints (224, 336, 448)
> 2 PE-Lang checkpoints (L, G)
> One PE-Spatial (G, 448)
> 3 PLM (1B, 3B, 8B)
> Datasets



Authors release following datasets 📑
> PE Video: Gigantic video datasete of 1M videos with 120k expert annotations ⏯️
> PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks
> PLM-VideoBench: New video benchmark on MCQA
  • 2 replies
·
merve 
posted an update 12 days ago
view post
Post
3347
New foundation model on image and video captioning just dropped by NVIDIA AI 🔥

Describe Anything Model (DAM) is a 3B vision language model to generate detailed captions with localized references 😮

The team released the models, the dataset, a new benchmark and a demo 🤩 nvidia/describe-anything-680825bb8f5e41ff0785834c

Most of the vision LMs focus on image as a whole, lacking localized references in captions, and not taking in visual prompts (points, boxes, drawings around objects)

DAM addresses this on two levels: new vision backbone that takes in focal crops and the image itself, and a large scale dataset 👀

They generate a dataset by extending existing segmentation and referring expression generation datasets like REFCOCO, by passing in the images and classes to VLMs and generating captions.

Lastly, they also release a new benchmark again with self-supervision, they use an LLM to evaluate the detailed captions focusing on localization 👏
Reality123b 
posted an update 17 days ago
ChuckMcSneed 
posted an update 18 days ago
view post
Post
696
Okay, folks, I need some help with this darn internet thing! My son, Timmy, showed me this interesting… forum thingy. He called it "/lmg/" and said it was the place to talk about… well, let's just say important matters 😉.

Timmy says something happened, though! He keeps mumbling about "Soy Jacks," "4chan is dead" and "hacked servers."

So, is this "/lmg/" thing GONE forever? Or did it move somewhere else? Timmy isn't being very helpful, and I'm sure some of you bright young minds on here probably know! I want to learn more and I really liked it there!

Thanks in advance for any help!

---

God bless America 🇺🇸
#WWG1WGA
·
merve 
posted an update 21 days ago
view post
Post
4425
sooo many open AI releases past week, let's summarize! 🤗
merve/april-11-releases-67fcd78be33d241c0977b9d2

multimodal
> Moonshot AI released Kimi VL Thinking, first working open-source multimodal reasoning model and Kimi VL Instruct, both 16B MoEs with 3B active params (OS)
> InternVL3 released based on Qwen2.5VL, 7 ckpts with various sizes (1B to 78B)

LLMs
> NVIDIA released Llama-3_1-Nemotron-Ultra-253B-v1 an LLM built on Llama 405B for reasoning, chat and tool use
> Agentica released DeepCoder-14B-Preview, fine-tuned version of DeepSeek-R1-Distilled-Qwen-14B on problem-test pairs, along with the compiled dataset
> Zyphra/ZR1-1.5B is a new small reasoning LLM built on R1-Distill-1.5B (OS)
> Skywork-OR1-32B-Preview is a new reasoning model by Skywork

Image Generation
> HiDream releases three new models, HiDream I1 Dev, I1 Full, and I1 fast for image generation (OS)

*OS ones have Apache 2.0 or MIT licenses
·
jjokah 
posted an update 28 days ago
view post
Post
2339
# Video Tokenization — for efficient AI video processing

Meet 𝐕𝐢𝐝𝐓𝐨𝐤, a new open-source video tokenization technique developed by Microsoft Research to address the computational challenges of processing large volumes of video data. The core problem VidTok tackles is the inefficiency caused by redundant information in raw video pixels.

VidTok converts complex video footage into compact, structured units called tokens, making it easier and more efficient for AI systems to analyze, understand, and generate video content.

Research Paper: https://arxiv.org/abs/2412.13061
VidTok Code: https://github.com/microsoft/VidTok
Reality123b 
posted an update about 1 month ago
view post
Post
570
Does anyone know how to convert a replit app into a huggingface spaces app?
mrfakename 
posted an update about 1 month ago
view post
Post
2758
Papla P1 from Papla Media is now available on the TTS Arena!

Try out Papla's new ultra-realistic TTS model + compare it with other leading models on the TTS Arena: TTS-AGI/TTS-Arena
Reality123b 
posted an update about 1 month ago
view post
Post
2151
ok, there must be a problem. HF charged me 0.12$ for 3 inference requests to text models
·
samchain 
posted an update about 1 month ago
view post
Post
834
NLP for Economics 1.2 is out !

This collection features two models:
- EconoSentiment : a first version based on econo-sentence-v2 and trained on the Financial PhraseBank, showcasing great performances.
- EconoDetect-US : a classifier to detect texts related to the US economy.

And two datasets:
- economics-relevance : the HF version of the Kaggle dataset US Economics News
- imf-weo-reports : A first version and gated dataset aggregating several World Economic Outlooks from the IMF
  • 1 reply
·