BBBBCHAN
/

LLaVA-Scissor-baseline-0.5B

Video-Text-to-Text

text-generation

video-understanding

text-generation-inference

Model card Files Files and versions Community

BBBBCHAN commited on May 26

Commit

ad64fea

·

verified ·

1 Parent(s): c521a26

Update README.md

Files changed (1) hide show

README.md +19 -3

README.md CHANGED Viewed

@@ -1,3 +1,19 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-4.0
+datasets:
+- THUdyh/Oryx-SFT-Data
+language:
+- en
+- zh
+metrics:
+- accuracy
+base_model:
+- google/siglip-so400m-patch14-384
+- Qwen/Qwen2.5-0.5B-Instruct
+library_name: transformers
+---
+# LLaVA-Scissor-baseline-0.5B
+## Model Summary
+This repository contains the baseline model used in LLaVA-Scissor.
+This model is an enhanced version of [LLaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-0.5b-ov) model with [SIGLIP](https://huggingface.co/google/siglip-so400m-patch14-384) vision encoder and [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) large language model and is finetuned with [Oryx](https://huggingface.co/datasets/THUdyh/Oryx-SFT-Data) data.