ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation
Abstract
The paper addresses the generation of legal claims for non-professionals using datasets and evaluation metrics, highlighting the limitations of current models in factual precision and clarity.
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.
Community
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System (2025)
- GLARE: Agentic Reasoning for Legal Judgment Prediction (2025)
- MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction (2025)
- VLQA: The First Comprehensive, Large, and High-Quality Vietnamese Dataset for Legal Question Answering (2025)
- Nyay-Darpan: Enhancing Decision Making Through Summarization and Case Retrieval for Consumer Law in India (2025)
- When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance (2025)
- Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper