Orr Zohar's picture

Orr Zohar PRO

orrzohar

·

https://orrzohar.github.io

AI & ML interests

Large Multi-Modal Models, Foundation Models, Video Understanding

Recent Activity

upvoted a paper 4 days ago

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

upvoted a paper 10 days ago

RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation

upvoted a paper 12 days ago

Describe Anything: Detailed Localized Image and Video Captioning

View all activity

Organizations

orrzohar's activity

commented a paper 26 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 28 days ago • 179 •

commented a paper 27 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 28 days ago • 179 •

New activity in google/gemma-3-27b-it 29 days ago

SigLIP or SigLIP2 encoder?

#48 opened about 1 month ago by

New activity in google/gemma-3-4b-it about 1 month ago

SigLIP or SigLIP2 encoder?

#37 opened about 1 month ago by

New activity in HuggingFaceTB/SmolVLM2-2.2B-Instruct 2 months ago

Input Video length constraints

#6 opened 2 months ago by

Several questions on the same video

#8 opened 2 months ago by

checkpoint you are trying to load has model type `smolvlm` but Transformers does not recognize this

#7 opened 2 months ago by

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (CUDABFloat16Type) should be the same

#4 opened 2 months ago by

Using pre-computed embeddings for images/frames and using as input

#2 opened 2 months ago by

commented 2 papers 5 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 146 •

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 146 •

New activity in lmms-lab/LLaVA-OneVision-Data 7 months ago

Missing/corrupted images in dataset

#9 opened 8 months ago by

commented 3 papers 9 months ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 53 •

$VILA^2$: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 42 •

$VILA^2$: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 42 •

commented a paper 10 months ago

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Paper • 2407.06189 • Published Jul 8, 2024 • 27 •

New activity in HuggingFaceM4/idefics2-8b 11 months ago

Idefics2-pretraining

#54 opened 12 months ago by

New activity in meta-llama/Meta-Llama-3-8B-Instruct about 1 year ago

The request to access the repo has been sent for several days, why hasn't it passed yet?

#70 opened about 1 year ago by