VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Abstract
VisCode-200K, a large-scale dataset for visualization, improves plot generation performance by integrating execution-grounded supervision and iterative code correction, outperforming open-source models and rivaling proprietary ones.
Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts (2025)
- VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation (2025)
- SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs (2025)
- Training Language Models to Generate Quality Code with Program Analysis Feedback (2025)
- ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects (2025)
- FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks (2025)
- Structured Prompting and Feedback-Guided Reasoning with LLMs for Data Interpretation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper