ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
Abstract
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions, such as undermining student's academic dignity. LLM text detection thus needs to ensure the interpretability of the decision, which can help users judge how reliably correct its prediction is. When humans verify whether a text is human-written or LLM-generated, they intuitively investigate with which of them it shares more similar spans. However, existing interpretable detectors are not aligned with the human decision-making process and fail to offer evidence that users easily understand. To bridge this gap, we introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process for verifying the origin of a text. ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Our human evaluation demonstrates that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Moreover, extensive experiments in four domains and three generators show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings (2025)
- Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection (2025)
- Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection (2025)
- Fake News Detection After LLM Laundering: Measurement and Explanation (2025)
- FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation (2025)
- Confidence Estimation for Error Detection in Text-to-SQL Systems (2025)
- Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper