51 84 142

Dmitry Ryumin

DmitryRyumin

https://dmitryryumin.github.io

DmitryRyumin

AI & ML interests

Machine Learning and Applications, Multi-Modal Understanding

Recent Activity

liked a model 1 day ago

google/gemma-3n-E4B-it

liked a model 5 days ago

MeiGen-AI/MeiGen-MultiTalk

liked a Space 5 days ago

Ekaterina2002/diabetes-catboost

View all activity

Organizations

liked a model 1 day ago

google/gemma-3n-E4B-it

Image-Text-to-Text • 8B • Updated about 14 hours ago • 5.55k • 182

liked a model 5 days ago

MeiGen-AI/MeiGen-MultiTalk

Image-to-Video • Updated 18 days ago • 6.54k • 72

liked a Space 5 days ago

Diabetes Catboost

⚡

CatBoost regression model for diabetes dataset

reacted to merve's post with 🔥 9 days ago

Post

2423

#CVPR2025 Paper Picks #1
VisionZip is a compression technique that reduces number of visual tokens to improve performance AND prefill time for vision language models
demo: Senqiao/VisionZip
paper: VisionZip: Longer is Better but Not Necessary in Vision Language Models (2412.04467)
most of the image tokens are redundant for the LLM, so the authors ask "are all visual tokens necessary?"

the method is simple:
find which tokens have the highest attention score, merge rest of the tokens based on similarity, then merge both

their method is both training-free and for fine-tuning
the authors report 5 point improvement on average of vision language tasks + 8x improvement in prefilling time for Llava-Next 7B and 13B 🤯

removing redundant tokens improve image token quality too 🥹

liked a Space 9 days ago

V-JEPA 2 - Streaming Video Classification

🌍

Run V-JEPA 2 on a video stream for Video Classification

reacted to merve's post with ❤️ 9 days ago

Post

3574

Releases of the past week are here merve/releases-june-13-6852c3c1eaf1e0c24c958860

Here's our picks 🤓
So many interesting models released past week in open AI! 🤖

🖼️ Computer Vision/VLMs
> nanonets/Nanonets-OCR-s is the new state-of-the-art OCR model that can handle checkboxes, watermarks, tables (OS)
> Meta released facebook/v-jepa-2-6841bad8413014e185b497a6, new sota video embeddings with two new classification models (OS)
> ByteDance-Seed/SeedVR2-3B is a new 3B video restoration model (OS)

Audio
> Stepfun released stepfun-ai/Step-Audio-AQAA, new large (137B 🤯) audio language model that takes in audio and generates audio (OS)

🤖 Robotics
> nvidia released nvidia/GR00T-N1.5-3B, new open foundation vision language action model

3D
> tencent/Hunyuan3D-2.1 is the new version of Hunyuan by Tencent that can generate 3D assets from text and image prompts

liked a Space 9 days ago

Bilingual Text-based Emotion Recognition

🚀

Analyze emotions from text or audio

liked a Space 13 days ago

267

vggt

🏆

VGGT (CVPR 2025)

liked a Space 20 days ago

ShapeLLM-Omni

🏢

A Native Multimodal LLM for 3D Generation and Understanding

upvoted a collection 22 days ago

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 496

liked 2 Spaces 25 days ago

719

Realistic Text To Speech Unlimited

🔥

Free Text-To-Speech generator with Emotion control (OpenAI)

106

SoloSpeech

🎯

State-of-the-art target speech extractor

updated a Space 26 days ago

BiBiER

🏃

Bilingual Bimodal Emotion Recognition

published a Space 27 days ago

BiBiER

🏃

Bilingual Bimodal Emotion Recognition

liked a Space 28 days ago

1.18k

Chatterbox TTS

🍿

Expressive Zeroshot TTS

liked 2 models about 1 month ago

tencent/HunyuanPortrait

Image-to-Video • Updated May 27 • 65

boltuix/bert-emotion

Text Classification • 0.0B • Updated 2 days ago • 17.5k • 36

reacted to AdinaY's post with 🚀 about 1 month ago

Post

2805

ByteDance is absolutely cooking lately🔥

BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens

liked a model about 1 month ago

tiiuae/Falcon-H1-3B-Instruct

Text Generation • 3B • Updated 17 days ago • 2.91k • 9

liked a Space about 1 month ago

Falcon H1 Playground

🦅

Chat with Falcon-H1 models to get answers

Dmitry Ryumin

AI & ML interests

Recent Activity

Organizations

DmitryRyumin's activity

Diabetes Catboost

V-JEPA 2 - Streaming Video Classification

Bilingual Text-based Emotion Recognition

vggt

ShapeLLM-Omni

Realistic Text To Speech Unlimited

SoloSpeech

BiBiER

BiBiER

Chatterbox TTS

Falcon H1 Playground