LeRobot Worldwide Hackathon

Enterprise
community

AI & ML interests

None defined yet.

Recent Activity

LeRobot-worldwide-hackathon's activity

AdinaY 
posted an update about 7 hours ago
AdinaY 
posted an update about 16 hours ago
view post
Post
261
OpenAudio S1-mini 🔊 a new OPEN multilingual TTS model trained on 2M+ hours of data, by FishAudio

fishaudio/openaudio-s1-mini

✨ Supports 14 languages
✨ 50+ emotions & tones
✨ RLHF-optimized
✨ Special effects: laughing, crying, shouting, etc.
  • 1 reply
·
AdinaY 
posted an update 1 day ago
maringetxway 
updated a Space 2 days ago
AdinaY 
posted an update 3 days ago
view post
Post
809
SynLogic 🧠 logical reasoning model & dataset by MiniMax.

MiniMaxAI/synlogic-6836c3246fca0277657ff032

✨ 3 models: 7B/32B/ Mix-3-32B (MIT license)
✨ Dataset: 35 verifiable logic tasks (Sudoku, Cipher, Arrow Maze etc.)
✨ RL training with auto-verifiable rewards
✨ Generalizes to math without explicit math training
✨ +6 pts on BBEH, +9.5 on KOR-Bench vs baselines
AdinaY 
posted an update 3 days ago
view post
Post
1599
Video-XL-2 🔥 long video understanding model by BAAI & Shanghai Jiaotong University

BAAI/Video-XL-2

✨ Apache 2.0
✨ Handles up to 10,000+ frames on a single GPU
✨ 2048-frame encoding in just 12s
✨ Efficient Chunk-based Prefilling & Bi-granularity KV decoding
AdinaY 
posted an update 4 days ago
view post
Post
2102
May highlights from China’s open source ecosystem 🔥

zh-ai-community/may-2025-open-works-from-the-chinese-community-681a3494145f2914dc679b7c

✨ DeepSeek dropped R1 updates
- Both R1 & 8B distralled smol model

✨ Bytedance goes big on open source:
- BAGEL, Dolphin, Seedcoder, Dream0...

✨ Multimodal is on fire!
- HuyuanCustom / HunyuanVideo-Avatar / HunyuanPortrait
- MiniMax: SynLogic / Orsta-7B
- Xiaomi: MiMo VL
- Alibaba Wan: Wan2.1-VACE
- OpenGVlab: ZeroGUI
- StepFun: ACE-Step-v1/Step1X-3D

✨ Specialized models/datasets excels
- Alibaba Qwen: World PM 72B
- BAAI:RobotBrain (MLLM for robotic)
- HiThink Research: BizFinBench (dataset)
- OpenBMB: Ultra FineWeb (dataset)
- Bilibili: Index-anisora (Anime/ACG)
- Skywork:Matrix-Game (game)

More awesome releases: Alibaba QwenLong-L1-32B, SkyWork OR1, OpenS2V-5M etc...
AdinaY 
posted an update 7 days ago
view post
Post
500
MiMo-VL 🔥 smol & mighty vision language model by Xiaomi

XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

✨ 7B with RL & SFT
✨ Native resolution ViT for fine grained perception
✨ MORL = smarter alignment across perception, grounding & reasoning
AdinaY 
posted an update 9 days ago
view post
Post
2625
🔥 New benchmark & dataset for Subject-to-Video generation

OPENS2V-NEXUS by Pekin University

✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match
  • 2 replies
·
AdinaY 
posted an update 9 days ago
view post
Post
2236
HunyuanVideo-Avatar 🔥 another image to video model byTencent Hunyuan

tencent/HunyuanVideo-Avatar

✨Emotion-controlled, high-dynamic avatar videos
✨Multi-character support with separate audio control
✨Works with any style: cartoon, 3D, real face, while keeping identity consistent
AdinaY 
posted an update 9 days ago
AdinaY 
posted an update 10 days ago
view post
Post
2818
Orsta 🔥 vision language models trained with V-Triune, a unified reinforcement learning system by MiniMax AI

One-RL-to-See-Them-All/one-rl-to-see-them-all-6833d27abce23898b2f9815a

✨ 7B & 32B with MIT license
✨ Masters 8 visual tasks: math, science QA, charts, puzzles, object detection, grounding, OCR, and counting
✨ Uses Dynamic IoU rewards for better visual understanding
✨Strong performance in visual reasoning and perception
AdinaY 
posted an update 10 days ago
AdinaY 
posted an update 16 days ago
view post
Post
2776
ByteDance is absolutely cooking lately🔥

BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens
AdinaY 
posted an update 17 days ago
view post
Post
2402
Dolphin 🔥 A multimodal document image parsing model from ByteDance
, built on an analyze-then-parse paradigm.

ByteDance/Dolphin

✨ MIT licensed
✨ Handles text, tables, figures & formulas via:
- Reading-order layout analysis
- Parallel parsing with smart prompts