AI & ML interests
Natural Language Processing, Image Generation
Recent Activity
View all activity
Reasoning in the pixel space
Advancing LLMs' general reasoning capabilities
Video Mamba
A collection of models and datasets from ABC: Achieving Better Control of Multimodal Embeddings using VLMs.
PixelWorld
The dataset and models for CritiqueFineTuning
-
TIGER-Lab/WebInstruct-CFT
Viewer • Updated • 654k • 221 • 52 -
TIGER-Lab/Qwen2.5-Math-7B-CFT
Text Generation • 8B • Updated • 109 • 8 -
TIGER-Lab/Qwen2.5-32B-Instruct-CFT
Text Generation • 33B • Updated • 16 • 6 -
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Paper • 2501.17703 • Published • 59
The generalist image editing model
The VLM2Vec embedding models.
The datasets and models for the MAmmoTH project
Imagenhub
The structure knowledge grounded language model
Mantis model family optimized for multi-image reasoning with interleaved text/image format
Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
The pioneering work in Dialogue-driven Movie Shot Generation
SoTA VLM for Reasoning
-
TIGER-Lab/VL-Rethinker-72B
Visual Question Answering • 73B • Updated • 1.75k • 3 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
TIGER-Lab/VL-Rethinker-7B
Image-Text-to-Text • 8B • Updated • 60.1k • 12 -
TIGER-Lab/VL-Reasoner-72B
Visual Question Answering • 73B • Updated • 152 • 3
Scaling up MM data
-
TIGER-Lab/VisualWebInstruct-Recall
Viewer • Updated • 361k • 765 • 3 -
TIGER-Lab/VisualWebInstruct-Seed
Viewer • Updated • 60.3k • 68 • 16 -
TIGER-Lab/VisualWebInstruct
Viewer • Updated • 1.91M • 1.3k • 34 -
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Paper • 2503.10582 • Published • 23
The dataset and model for MAmmoTH-VL
Video Augmentation for Synthetic Video Instruction-following Data Generation
-
TIGER-Lab/VISTA-LongVA
Video-Text-to-Text • 8B • Updated • 20 • 2 -
TIGER-Lab/VISTA-Mantis
Video-Text-to-Text • 8B • Updated • 12 -
TIGER-Lab/VISTA-VideoLLaVA
Video-Text-to-Text • 7B • Updated • 18 -
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Paper • 2412.00927 • Published • 28
List of model variates of TIGEREScore checkpoints and the associated dataset
The dataset and model for UniIR project
ConsistI2V Image-to-Video generation models
Scaling up instruction data from the web for to build better LLMs
Long-context research projects
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
Reasoning in the pixel space
The pioneering work in Dialogue-driven Movie Shot Generation
Advancing LLMs' general reasoning capabilities
SoTA VLM for Reasoning
-
TIGER-Lab/VL-Rethinker-72B
Visual Question Answering • 73B • Updated • 1.75k • 3 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
TIGER-Lab/VL-Rethinker-7B
Image-Text-to-Text • 8B • Updated • 60.1k • 12 -
TIGER-Lab/VL-Reasoner-72B
Visual Question Answering • 73B • Updated • 152 • 3
Video Mamba
A collection of models and datasets from ABC: Achieving Better Control of Multimodal Embeddings using VLMs.
Scaling up MM data
-
TIGER-Lab/VisualWebInstruct-Recall
Viewer • Updated • 361k • 765 • 3 -
TIGER-Lab/VisualWebInstruct-Seed
Viewer • Updated • 60.3k • 68 • 16 -
TIGER-Lab/VisualWebInstruct
Viewer • Updated • 1.91M • 1.3k • 34 -
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Paper • 2503.10582 • Published • 23
PixelWorld
The dataset and models for CritiqueFineTuning
-
TIGER-Lab/WebInstruct-CFT
Viewer • Updated • 654k • 221 • 52 -
TIGER-Lab/Qwen2.5-Math-7B-CFT
Text Generation • 8B • Updated • 109 • 8 -
TIGER-Lab/Qwen2.5-32B-Instruct-CFT
Text Generation • 33B • Updated • 16 • 6 -
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Paper • 2501.17703 • Published • 59
The dataset and model for MAmmoTH-VL
Video Augmentation for Synthetic Video Instruction-following Data Generation
-
TIGER-Lab/VISTA-LongVA
Video-Text-to-Text • 8B • Updated • 20 • 2 -
TIGER-Lab/VISTA-Mantis
Video-Text-to-Text • 8B • Updated • 12 -
TIGER-Lab/VISTA-VideoLLaVA
Video-Text-to-Text • 7B • Updated • 18 -
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Paper • 2412.00927 • Published • 28
The generalist image editing model
The VLM2Vec embedding models.
List of model variates of TIGEREScore checkpoints and the associated dataset
The datasets and models for the MAmmoTH project
The dataset and model for UniIR project
Imagenhub
The structure knowledge grounded language model
ConsistI2V Image-to-Video generation models
Mantis model family optimized for multi-image reasoning with interleaved text/image format
Scaling up instruction data from the web for to build better LLMs
Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Long-context research projects