SOLO Model Card
Model details
Model type: SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.
Model date: SOLO-7B was trained in June 2024.
Paper or resources for more information: Paper & Github
Where to send questions or comments about the model: https://github.com/Yangyi-Chen/SOLO/issues
Inference with Huggingface Please check this scripts for an example of performing inference on the model.
- Downloads last month
- 179
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.