-
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Paper • 2502.19655 • Published -
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Paper • 2502.19634 • Published • 63 -
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Paper • 2502.19735 • Published • 9 -
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14
Deping Zhang
Deping
AI & ML interests
Deep Reinforcement Learning, Computer Vision, Large Language Models ( especially their "emergence" capabilities), Theoretical Condensed Matter Physics ( superconductivity, ferromagnetism)
Organizations
None yet
Video_MLLMS
LLMs
-
mistralai/Mixtral-8x7B-Instruct-v0.1
Text Generation • 47B • Updated • 372k • • 4.47k -
mistralai/Mistral-7B-v0.1
Text Generation • 7B • Updated • 326k • 3.86k -
microsoft/phi-1_5
Text Generation • 1B • Updated • 122k • • 1.34k -
microsoft/phi-2
Text Generation • 3B • Updated • 565k • • 3.35k
VideoEncoder
Video Understanding, Video Embedding, Video Tasks
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 22 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 28 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
microsoft/xclip-base-patch16-zero-shot
Video Classification • 0.2B • Updated • 6.08k • 24
GeneralDetector
LLM_Infra
VisionExpertModels
-
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 152k • 46 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 4.7M • 250 -
microsoft/layoutlmv3-large
Updated • 154k • 112 -
laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup
Zero-Shot Image Classification • Updated • 89.4k • 21
VLMS
-
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text • 13B • Updated • 1.01k • 36 -
TRI-ML/prismatic-vlms
Image-to-Text • Updated • 20 -
bczhou/tiny-llava-v1-hf
Image-Text-to-Text • 1B • Updated • 4.72k • 57 -
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Paper • 2402.06118 • Published • 15
VLM_Datasets
MM_Datasets
LLM_VLM_R1
-
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Paper • 2502.19655 • Published -
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Paper • 2502.19634 • Published • 63 -
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Paper • 2502.19735 • Published • 9 -
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14
LLM_Infra
Video_MLLMS
VisionExpertModels
-
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 152k • 46 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 4.7M • 250 -
microsoft/layoutlmv3-large
Updated • 154k • 112 -
laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup
Zero-Shot Image Classification • Updated • 89.4k • 21
LLMs
-
mistralai/Mixtral-8x7B-Instruct-v0.1
Text Generation • 47B • Updated • 372k • • 4.47k -
mistralai/Mistral-7B-v0.1
Text Generation • 7B • Updated • 326k • 3.86k -
microsoft/phi-1_5
Text Generation • 1B • Updated • 122k • • 1.34k -
microsoft/phi-2
Text Generation • 3B • Updated • 565k • • 3.35k
VLMS
-
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text • 13B • Updated • 1.01k • 36 -
TRI-ML/prismatic-vlms
Image-to-Text • Updated • 20 -
bczhou/tiny-llava-v1-hf
Image-Text-to-Text • 1B • Updated • 4.72k • 57 -
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Paper • 2402.06118 • Published • 15
VideoEncoder
Video Understanding, Video Embedding, Video Tasks
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 22 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 28 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
microsoft/xclip-base-patch16-zero-shot
Video Classification • 0.2B • Updated • 6.08k • 24
VLM_Datasets
GeneralDetector
MM_Datasets