Andres Marafioti
andito
ยท
AI & ML interests
Multimodal models, VLM and TTS
Recent Activity
reacted
to
merve's
post
with ๐ค
1 day ago
So many open releases at Hugging Face past week ๐คฏ recapping all here โคต๏ธ https://huggingface.co/collections/merve/march-21-releases-67dbe10e185f199e656140ae
๐ Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota ๐ฅ (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)
๐ฌ LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)
๐ผ๏ธ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) ๐ฅ
> YOLOE is a new real-time zero-shot object detector with text and visual prompts ๐ฅน
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)
๐ค Audio
> Sesame released CSM-1B new speech generation model (OS)
๐ค Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset
*OS ones have Apache 2.0 or MIT license
View all activity
Organizations
andito's activity
-
-
-
-
-
-
-
-
-
-
-
published
an
article
about 1 month ago
view article
SmolVLM2: Bringing Video Understanding to Every Device
view article
SmolVLM Grows Smaller โ Introducing the 250M & 500M Models!
view article
SmolVLM - small yet mighty Vision Language Model
view article
LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?
view article
Docmatix - a huge dataset for Document Visual Question Answering
view article
Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models