shi-labs
/

vcoder_ds_llava-v1.5-7b

Text Generation

vcoder_ds_llava

Inference Endpoints

Model card Files Files and versions Community

praeclarumjj3 commited on Dec 19, 2023

Commit

4d5a268

•

1 Parent(s): 8e57e4d

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -1,3 +1,22 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+# VCoder-DS LLaVA-1.5-7b
+VCoder-DS LLaVA-1.5-7b was trained on COST training dataset in December 2023. It uses the pretrained [LLaVA-1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) model weights. It was introduced by Jain et al. in [this repository](https://github.com/SHI-Labs/VCoder).
+VCoder is an adapter for improving existing Vision LLMs at object-level perception tasks with the use of perception modalities as control inputs while retaining performance on other tasks.
+![img](https://praeclarumjj3.github.io/vcoder/vcoder.svg)
+### Citation
+```bibtex
+@article{jain2023vcoder,
+    title={{VCoder: Versatile Visual Encoder for Accurate Object-Level Perception with Large Language Models}},
+    author={Jitesh Jain and Jianwei Yang and Humphrey Shi},
+    journal={arXiv},
+    year={2023}
+}
+```