AI & ML interests

None defined yet.

Recent Activity

huggingface-projects's activity

AdinaY 
posted an update about 7 hours ago
merve 
posted an update about 10 hours ago
AdinaY 
posted an update about 16 hours ago
view post
Post
261
OpenAudio S1-mini 🔊 a new OPEN multilingual TTS model trained on 2M+ hours of data, by FishAudio

fishaudio/openaudio-s1-mini

✨ Supports 14 languages
✨ 50+ emotions & tones
✨ RLHF-optimized
✨ Special effects: laughing, crying, shouting, etc.
  • 1 reply
·
Xenova 
posted an update 1 day ago
view post
Post
818
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯

🔐 Privacy by design (no data leaves your device)
💰 Completely free... forever
📦 Zero installation required, just visit a website
⚡️ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!
  • 1 reply
·
merve 
posted an update 1 day ago
view post
Post
910
Past week was insanely packed for open AI! 😱
Luckily we picked some highlights for you ❤️ lfg!

💬 LLMs/VLMs
> Deepseek 🐳 released deepseek-ai/DeepSeek-R1-0528, 38B model, only 0.2 and 1.4 points behind o3 in AIME 24/25 🤯 they also released an 8B distilled version based on Qwen3 (OS) deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
> Xiaomi released MiMo-7B-RL (LLM for code and math) and MiMo-VL-7B-RL (VLM for visual reasoning, GUI agentic task and general use) (OS) 😍 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
> NVIDIA released , new reasoning model nvidia/Nemotron-Research-Reasoning-Qwen-1.5B
> DS: MiniMax released https://huggingface.co/MiniMaxAI/SynLogic, new 49k logical reasoning examples across 35 tasks including solving cipher, sudoku and more!

🖼️ Image/Video Generation
> tencent released tencent/HunyuanPortrait, a new model for consistent portrait generation with SVD Research license. They also released tencent/HunyuanVideo-Avatar, audio driven avatar generation (OS)
> showlab released showlab/OmniConsistency, consistent stylization model (OS)
> Rapidata/text-2-video-human-preferences-veo3 is a new T2V preference dataset based on videos from Veo3 with 46k examples (OS)

Audio🗣️
> https://huggingface.co/ResembleAI/Chatterbox is a new 500M text-to-speech model preferred more than ElevenLabs (OS) 😍
> PlayHT/PlayDiffusion is a new speech editing model (OS)

Other
> https://huggingface.co/NX-AI/TiReX is a new time series foundation model
> Yandex released a huge (4.79B examples!) video recommendation dataset https://huggingface.co/yandex/yambda

OS ones have Apache2.0 or MIT licenses, find more models and datasets here merve/releases-30-may-6840097345e0b1e915bff843
AdinaY 
posted an update 1 day ago
merve 
posted an update 1 day ago
view post
Post
867
Yesterday was the day of vision language action models (VLAs)!

> SmolVLA: open-source small VLA for robotics by Hugging Face LeRobot team 🤖
Blog: https://huggingface.co/blog/smolvla
Model: lerobot/smolvla_base

> Holo-1: 3B & 7B web/computer use agentic VLAs by H Company 💻
Model family: Hcompany/holo1-683dd1eece7eb077b96d0cbd
Demo: https://huggingface.co/spaces/multimodalart/Holo1
Blog: https://huggingface.co/blog/Hcompany/holo1
super exciting times!!
merve 
posted an update 2 days ago
AdinaY 
posted an update 2 days ago
view post
Post
809
SynLogic 🧠 logical reasoning model & dataset by MiniMax.

MiniMaxAI/synlogic-6836c3246fca0277657ff032

✨ 3 models: 7B/32B/ Mix-3-32B (MIT license)
✨ Dataset: 35 verifiable logic tasks (Sudoku, Cipher, Arrow Maze etc.)
✨ RL training with auto-verifiable rewards
✨ Generalizes to math without explicit math training
✨ +6 pts on BBEH, +9.5 on KOR-Bench vs baselines
AdinaY 
posted an update 3 days ago
view post
Post
1599
Video-XL-2 🔥 long video understanding model by BAAI & Shanghai Jiaotong University

BAAI/Video-XL-2

✨ Apache 2.0
✨ Handles up to 10,000+ frames on a single GPU
✨ 2048-frame encoding in just 12s
✨ Efficient Chunk-based Prefilling & Bi-granularity KV decoding
merve 
posted an update 3 days ago
AdinaY 
posted an update 4 days ago
view post
Post
2102
May highlights from China’s open source ecosystem 🔥

zh-ai-community/may-2025-open-works-from-the-chinese-community-681a3494145f2914dc679b7c

✨ DeepSeek dropped R1 updates
- Both R1 & 8B distralled smol model

✨ Bytedance goes big on open source:
- BAGEL, Dolphin, Seedcoder, Dream0...

✨ Multimodal is on fire!
- HuyuanCustom / HunyuanVideo-Avatar / HunyuanPortrait
- MiniMax: SynLogic / Orsta-7B
- Xiaomi: MiMo VL
- Alibaba Wan: Wan2.1-VACE
- OpenGVlab: ZeroGUI
- StepFun: ACE-Step-v1/Step1X-3D

✨ Specialized models/datasets excels
- Alibaba Qwen: World PM 72B
- BAAI:RobotBrain (MLLM for robotic)
- HiThink Research: BizFinBench (dataset)
- OpenBMB: Ultra FineWeb (dataset)
- Bilibili: Index-anisora (Anime/ACG)
- Skywork:Matrix-Game (game)

More awesome releases: Alibaba QwenLong-L1-32B, SkyWork OR1, OpenS2V-5M etc...
merve 
posted an update 4 days ago
merve 
posted an update 6 days ago
view post
Post
1939
HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❤️‍🔥 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers 🔥
AdinaY 
posted an update 7 days ago
view post
Post
500
MiMo-VL 🔥 smol & mighty vision language model by Xiaomi

XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

✨ 7B with RL & SFT
✨ Native resolution ViT for fine grained perception
✨ MORL = smarter alignment across perception, grounding & reasoning
merve 
posted an update 7 days ago
view post
Post
2684
introducing: VLM vibe eval 🪭 visionLMsftw/VLMVibeEval

vision LMs are saturated over benchmarks, so we built vibe eval 💬

> compare different models with refreshed in-the-wild examples in different categories 🤠
> submit your favorite model for eval
no numbers -- just vibes!
AdinaY 
posted an update 9 days ago
view post
Post
2625
🔥 New benchmark & dataset for Subject-to-Video generation

OPENS2V-NEXUS by Pekin University

✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match
  • 2 replies
·
AdinaY 
posted an update 9 days ago
view post
Post
2236
HunyuanVideo-Avatar 🔥 another image to video model byTencent Hunyuan

tencent/HunyuanVideo-Avatar

✨Emotion-controlled, high-dynamic avatar videos
✨Multi-character support with separate audio control
✨Works with any style: cartoon, 3D, real face, while keeping identity consistent