SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
Abstract
SVGenius evaluates Large Language Models and Multimodal LLMs for SVG processing using a comprehensive benchmark across three dimensions: understanding, editing, and generation, revealing insights into model capabilities and limitations.
Large Language Models (LLMs) and Multimodal LLMs have shown promising capabilities for SVG processing, yet existing benchmarks suffer from limited real-world coverage, lack of complexity stratification, and fragmented evaluation paradigms. We introduce SVGenius, a comprehensive benchmark comprising 2,377 queries across three progressive dimensions: understanding, editing, and generation. Built on real-world data from 24 application domains with systematic complexity stratification, SVGenius evaluates models through 8 task categories and 18 metrics. We assess 22 mainstream models spanning different scales, architectures, training paradigms, and accessibility levels. Our analysis reveals that while proprietary models significantly outperform open-source counterparts, all models exhibit systematic performance degradation with increasing complexity, indicating fundamental limitations in current approaches; however, reasoning-enhanced training proves more effective than pure scaling for overcoming these limitations, though style transfer remains the most challenging capability across all model types. SVGenius establishes the first systematic evaluation framework for SVG processing, providing crucial insights for developing more capable vector graphics models and advancing automated graphic design applications. Appendix and supplementary materials (including all data and code) are available at https://zju-real.github.io/SVGenius.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow (2025)
- StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs (2025)
- K12Vista: Exploring the Boundaries of MLLMs in K-12 Education (2025)
- TIIF-Bench: How Does Your T2I Model Follow Your Instructions? (2025)
- PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving (2025)
- PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models (2025)
- SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper