Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
Abstract
EvoPresent, a self-improvement agent framework using multi-task reinforcement learning, enhances academic paper promotion by generating coherent narratives, aesthetically pleasing designs, and realistic presentations, supported by a comprehensive benchmark.
The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: there is no way to improve it when you cannot evaluate it right. To address this, we introduce EvoPresent, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is PresAesth, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce EvoPresent Benchmark, a comprehensive benchmark comprising: Presentation Generation Quality, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and Aesthetic Awareness, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.
Community
Website: https://evopresent.github.io/
Disclaimer: the demo video was completely generated by our proposed method, no human refinement.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs (2025)
- PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation (2025)
- Interleaving Reasoning for Better Text-to-Image Generation (2025)
- LLM-I: LLMs are Naturally Interleaved Multimodal Creators (2025)
- Factuality Matters: When Image Generation and Editing Meet Structured Visuals (2025)
- Code2Video: A Code-centric Paradigm for Educational Video Generation (2025)
- Auto-Slides: An Interactive Multi-Agent System for Creating and Customizing Research Presentations (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper