Quantization made by Richard Erkhov.

This repository contains a quantized version of the model presented in Visually Descriptive Language Model for Vector Graphics Reasoning.

Github

Discord

Request more models

Project page: https://mikewangwzhl.github.io/VDLM/ Code: https://github.com/MikeWangWZHL/VDLM

PVD-160k-Mistral-7b - GGUF

Original model description:

license: apache-2.0 datasets: - mikewang/PVD-160K

Text-Based Reasoning About Vector Graphics

🌐 Homepage📃 Paper🤗 Data (PVD-160k)🤗 Model (PVD-160k-Mistral-7b)💻 Code

We observe that current large multimodal models (LMMs) still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.

Teaser

To solve this challenge, we propose Visually Descriptive Language Model (VDLM), a visual reasoning framework that operates with intermediate text-based visual descriptions—SVG representations and learned Primal Visual Description, which can be directly integrated into existing LLMs and LMMs. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our paper for more details. Overview

Downloads last month
345
GGUF
Model size
7.24B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support