Improve model card: Add pipeline tag, paper link, and GitHub repository link (#1)
Browse files- Improve model card: Add pipeline tag, paper link, and GitHub repository link (b8e0608de4d13fc308a7f7eb0dd913957341729c)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,16 +1,23 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
3 |
datasets:
|
4 |
- THUdyh/Oryx-SFT-Data
|
5 |
language:
|
6 |
- en
|
7 |
- zh
|
|
|
|
|
8 |
metrics:
|
9 |
- accuracy
|
10 |
-
|
11 |
-
|
12 |
-
-
|
13 |
-
|
|
|
|
|
|
|
14 |
model-index:
|
15 |
- name: llava-onevision-qwen-7b-ov
|
16 |
results:
|
@@ -74,16 +81,14 @@ model-index:
|
|
74 |
value: 40.55
|
75 |
name: accuracy
|
76 |
verified: true
|
77 |
-
tags:
|
78 |
-
- llava
|
79 |
-
- llava-scissor
|
80 |
-
- llava-onevision
|
81 |
-
- llava-ov
|
82 |
-
- token-compression
|
83 |
---
|
84 |
|
85 |
# LLaVA-Scissor-baseline-7B
|
86 |
|
|
|
|
|
|
|
|
|
87 |
## Model Summary
|
88 |
This repository contains the baseline model used in LLaVA-Scissor.
|
89 |
This model is an enhanced version of [LLaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov) model with [SIGLIP](https://huggingface.co/google/siglip-so400m-patch14-384) vision encoder and [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) large language model and is finetuned with [Oryx](https://huggingface.co/datasets/THUdyh/Oryx-SFT-Data) data.
|
@@ -140,7 +145,8 @@ image_tensors.append(frames)
|
|
140 |
|
141 |
# Prepare conversation input
|
142 |
conv_template = "qwen_2"
|
143 |
-
question = f"{DEFAULT_IMAGE_TOKEN}
|
|
|
144 |
conv = copy.deepcopy(conv_templates[conv_template])
|
145 |
conv.append_message(conv.roles[0], question)
|
146 |
conv.append_message(conv.roles[1], None)
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- google/siglip-so400m-patch14-384
|
4 |
+
- Qwen/Qwen2.5-7B-Instruct
|
5 |
datasets:
|
6 |
- THUdyh/Oryx-SFT-Data
|
7 |
language:
|
8 |
- en
|
9 |
- zh
|
10 |
+
library_name: transformers
|
11 |
+
license: cc-by-nc-4.0
|
12 |
metrics:
|
13 |
- accuracy
|
14 |
+
pipeline_tag: video-text-to-text
|
15 |
+
tags:
|
16 |
+
- llava
|
17 |
+
- llava-scissor
|
18 |
+
- llava-onevision
|
19 |
+
- llava-ov
|
20 |
+
- token-compression
|
21 |
model-index:
|
22 |
- name: llava-onevision-qwen-7b-ov
|
23 |
results:
|
|
|
81 |
value: 40.55
|
82 |
name: accuracy
|
83 |
verified: true
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
---
|
85 |
|
86 |
# LLaVA-Scissor-baseline-7B
|
87 |
|
88 |
+
This repository contains the baseline model for [LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs](https://huggingface.co/papers/2506.21862).
|
89 |
+
|
90 |
+
Code: https://github.com/HumanMLLM/LLaVA-Scissor
|
91 |
+
|
92 |
## Model Summary
|
93 |
This repository contains the baseline model used in LLaVA-Scissor.
|
94 |
This model is an enhanced version of [LLaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov) model with [SIGLIP](https://huggingface.co/google/siglip-so400m-patch14-384) vision encoder and [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) large language model and is finetuned with [Oryx](https://huggingface.co/datasets/THUdyh/Oryx-SFT-Data) data.
|
|
|
145 |
|
146 |
# Prepare conversation input
|
147 |
conv_template = "qwen_2"
|
148 |
+
question = f"{DEFAULT_IMAGE_TOKEN}
|
149 |
+
Describe this video."
|
150 |
conv = copy.deepcopy(conv_templates[conv_template])
|
151 |
conv.append_message(conv.roles[0], question)
|
152 |
conv.append_message(conv.roles[1], None)
|