--- license: apache-2.0 base_model: - Qwen/Qwen2.5-VL-7B-Instruct tags: - vision - llm - critical - sft - d3.js - visualization --- # VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation [GitHub Repo](https://github.com/bopan3/VIS-Shepherd) ![VIS-Shepherd Overview](https://github.com/bopan3/VIS-Shepherd/raw/main/static/visShepherd_overview.png) This repository is the official implementation of **VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation**. ## Requirements ### Common Dependencies #### Pyhton Environment Setup To install requirements for python environment (we recommend python 3.10): ```bash pip install -r requirements.txt ``` You can use some virtual environment to install dependencies, e.g. conda or venv. #### LLaMA-Factory We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for training and model inference. If you reproduce our training experiments, please follow the instructions in the repository: ```bash git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory # More optional dependencies can be found at https://llamafactory.readthedocs.io/en/latest/getting_started/installation.html pip install -e ".[torch,metrics,deepspeed]" ``` ## Training The dataset for training is available at *train/data/viscrafter_20250521.json*, with the format as follows: ```json [ { "input": "the input instruction", "output": "the output response", "images": [ "the image path" ] }, ] ``` To train the model(s) in the paper, directly run this command at the root of the project: ```bash llamafactory-cli train train/configs/train-sft-full-viscrafter-20250521.yml ``` We trained the model on 8 A800 GPUs (80G memory) using DeepSpeed. You can find more configuration methods in the [LLaMA-Factory documentation](https://llamafactory.readthedocs.io/en/latest/advanced/arguments.html) to modify training parameters to adapt to your training environment. ## Setup Local Inference Server You can set up an inference server using the following command, which will start a server compatible with the OpenAI API that you can use to test your model. ```bash llamafactory-cli api train/configs/infer-sft-full-viscrafter-20250521.yml ``` ## Evaluation First move to the folder for evaluation and fill your API_BASE, API_KEY, and list of the name of models to use in evaluation/config/config.yaml. Note that we use Azure's API for GPT-4o, local inference server for locally trained models and OpenRouters for other models (e.g. llama-4-maverick). ```bash cd evaluation ``` ```yaml ## config for openai key OPENAI_API_BASE: "put your api base here" OPENAI_API_KEY: "put your api key here" OPENAI_API_MODEL_LIST: ["gpt-4o", "qwen/qwen-2.5-vl-7b-instruct", "qwen/qwen2.5-vl-72b-instruct", "meta-llama/llama-4-maverick"] OPENAI_TEMPERATURE: 0.01 OPENAI_TOP_P: 0.1 ``` To run inference on the test dataset for certain model, execute the following command (set --model_used to the name of model used as critic) and automatically save the inference result at folder *critic_outputs*: ```bash python run_parallel_autoCritic.py --input_base_path test_set --output_base_path critic_outputs --model_used "The name of the LLM used as critic" ``` To run auto evaluation for all the inference results under the folder *critic_outputs*, execute: ```bash ./run_all_autoEvaluate.sh ``` The Evaluation result will be saved as *evaluation/result.md*. ## Results | Model | Mean Score | % Scores 3-5 | |-------|------------|-------------| | GPT-4o | 3.41 | 72.0% | | VIS-Shepherd | 2.98 | 67.1% | | Llama-4-Maverick | 2.94 | 52.8% | | Qwen-2.5-VL-72B | 2.78 | 49.1% | | qwen-2.5-VL-7B_1.2k | 2.5 | 52.2% | | qwen-2.5-VL-7B_0.3k | 2.4 | 44.1% | | qwen-2.5-VL-7B | 2.2 | 44.1% |