Muhammad Ramzan
iamramzan
AI & ML interests
GenAI, Vision & Co
Organizations
Vision Foundation Models 🧩
Foundation models for computer vision.
Comprehensive Computer Vision Backbones 🧩
This collection offers a variety of pre-trained computer vision backbones ideal for fine-tuning.
-
microsoft/resnet-50
Image Classification • 0.0B • Updated • 281k • • 428 -
google/vit-base-patch16-224-in21k
Image Feature Extraction • 0.1B • Updated • 3.7M • 350 -
google/vit-base-patch32-224-in21k
Image Feature Extraction • 0.1B • Updated • 8.65k • 19 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 678k • 86
Top Vision-Language Papers 🖼️💬📝
A curated list of papers on vision-language models, with the most influential ones at the top.
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 47 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 28
Cutting-Edge Object Detection Models 🥥
-
facebook/detr-resnet-50
Object Detection • 0.0B • Updated • 491k • • 874 -
facebook/detr-resnet-101-dc5
Object Detection • 0.1B • Updated • 3.68k • 19 -
facebook/detr-resnet-50-dc5
Object Detection • 0.0B • Updated • 1.96k • 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 147k • 135
Shaheen Collection 🦅
Top Vision-Language Papers 🖼️💬📝
A curated list of papers on vision-language models, with the most influential ones at the top.
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 47 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 28
Vision Foundation Models 🧩
Foundation models for computer vision.
Cutting-Edge Object Detection Models 🥥
-
facebook/detr-resnet-50
Object Detection • 0.0B • Updated • 491k • • 874 -
facebook/detr-resnet-101-dc5
Object Detection • 0.1B • Updated • 3.68k • 19 -
facebook/detr-resnet-50-dc5
Object Detection • 0.0B • Updated • 1.96k • 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 147k • 135
Comprehensive Computer Vision Backbones 🧩
This collection offers a variety of pre-trained computer vision backbones ideal for fine-tuning.
-
microsoft/resnet-50
Image Classification • 0.0B • Updated • 281k • • 428 -
google/vit-base-patch16-224-in21k
Image Feature Extraction • 0.1B • Updated • 3.7M • 350 -
google/vit-base-patch32-224-in21k
Image Feature Extraction • 0.1B • Updated • 8.65k • 19 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 678k • 86