Mohammed Mohammed Ali

MohammedEltoum

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

DINOv3

upvoted an article 9 days ago

Vision Language Model Alignment in TRL ⚡️

upvoted a paper 10 days ago

MolmoAct: Action Reasoning Models that can Reason in Space

View all activity

Organizations

upvoted a paper 4 days ago

DINOv3

Paper • 2508.10104 • Published 9 days ago • 162

upvoted an article 9 days ago

Article

Vision Language Model Alignment in TRL ⚡️

and 4 others •

16 days ago

• 69

upvoted a paper 10 days ago

MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published 11 days ago • 38

upvoted a paper 14 days ago

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

Paper • 2507.23404 • Published 22 days ago • 2

upvoted a paper about 1 month ago

Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers

Paper • 2507.10787 • Published Jul 14 • 11

upvoted a paper about 2 months ago

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

Paper • 2506.19851 • Published Jun 24 • 59

upvoted an article 3 months ago

Article

How to Build an MCP Server with Gradio

and 1 other •

Apr 30

• 189

upvoted 2 papers 3 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 280

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

upvoted an article 3 months ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 510

upvoted 3 papers 3 months ago

Vision-Language-Action Models: Concepts, Progress, Applications and Challenges

Paper • 2505.04769 • Published May 7 • 8

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185

FG-CLIP: Fine-Grained Visual and Textual Alignment

Paper • 2505.05071 • Published May 8 • 18

upvoted 4 papers 4 months ago

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

Paper • 2505.01043 • Published May 2 • 10

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Paper • 2505.01658 • Published May 3 • 39

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 86

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 301

upvoted 3 papers 5 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 197

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Paper • 2503.13358 • Published Mar 17 • 96

Mohammed Mohammed Ali

AI & ML interests

Recent Activity

Organizations

MohammedEltoum's activity

Vision Language Model Alignment in TRL ⚡️

How to Build an MCP Server with Gradio

Vision Language Models (Better, Faster, Stronger)