steve z's picture

steve z

stzhao

·

https://zhaoshitian.github.io/

zhaoshitian

AI & ML interests

None yet

Recent Activity

new activity 19 days ago

Agents-X/PyVision-Video-RL-Data:Update dataset card with metadata, links, and description

new activity 19 days ago

Agents-X/PyVision-Image-RL-Data:Improve dataset card

new activity 19 days ago

Agents-X/PyVision-Image-SFT-Data:Improve dataset card: add metadata and project links

View all activity

Organizations

upvoted 2 papers 20 days ago

PyVision-RL: Forging Open Agentic Vision Models via RL

Paper • 2602.20739 • Published 21 days ago • 31

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Paper • 2602.14337 • Published 29 days ago • 13

upvoted a paper about 1 month ago

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 257

upvoted a paper 3 months ago

Yume-1.5: A Text-Controlled Interactive World Generation Model

Paper • 2512.22096 • Published Dec 26, 2025 • 60

upvoted a paper 4 months ago

TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

Paper • 2511.01833 • Published Nov 3, 2025 • 16

upvoted 2 papers 6 months ago

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Paper • 2509.07969 • Published Sep 9, 2025 • 59

Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5, 2025 • 47

upvoted 4 papers 7 months ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 84

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 270

MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20, 2025 • 43

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50

upvoted 4 papers 8 months ago

Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published Jul 23, 2025 • 91

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22, 2025 • 34

Neural-Driven Image Editing

Paper • 2507.05397 • Published Jul 7, 2025 • 27

PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10, 2025 • 33

upvoted 3 papers 9 months ago

Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18, 2025 • 66

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12, 2025 • 73

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 273

upvoted 2 articles 9 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

+4

Jun 3, 2025

•

100

Article

Cheap Framepack camera control loras with one training video.

Jun 1, 2025

•

2