Orkut Murat Yılmaz's picture
4 67

Orkut Murat Yılmaz

orkut

AI & ML interests

Geo Sciences, Free Software

Recent Activity

liked a Space 12 days ago
enzostvs/deepsite
liked a model 26 days ago
manycore-research/SpatialLM-Llama-1B
liked a dataset about 1 month ago
alibayram/yapay_zeka_turkce_mmlu_model_cevaplari
View all activity

Organizations

Mathematical Intelligence's profile picture GeoPerformans Ar-Ge Bilişim Haritacılık Sanayi ve Ticaret Limited Şirketi's profile picture Karakulaklar's profile picture

orkut's activity

upvoted an article 2 months ago
view article
Article

Open-source DeepResearch – Freeing our search agents

1.22k
reacted to merve's post with 🔥 9 months ago
view post
Post
2294
We have recently merged Video-LLaVA to transformers! 🤗🎞️
What makes this model different?

Demo: llava-hf/video-llava
Model: LanguageBind/Video-LLaVA-7B-hf

Compared to other models that take image and video input and either project them separately or downsampling video and projecting selected frames, Video-LLaVA is converting images and videos to unified representation and project them using a shared projection layer.

It uses Vicuna 1.5 as the language model and LanguageBind's own encoders that's based on OpenCLIP, these encoders project the modalities to an unified representation before passing to projection layer.


I feel like one of the coolest features of this model is the joint understanding which is also introduced recently with many models

It's a relatively older model but ahead of it's time and works very well! Which means, e.g. you can pass model an image of a cat and a video of a cat and ask questions like whether the cat in the image exists in video or not 🤩