--- title: README emoji: 🚀 colorFrom: green colorTo: indigo sdk: static pinned: false license: openrail++ short_description: 'TextDetox: detoxification, toxicity detection, explanation' --- # Multilingual Text Detoxification with Parallel Data Text Detoxification, toxicity detection and explanation for **diverse languages**: English, Spanish, German, French, Italian, Chinese, Japanese, Arabic, Hebrew, Hindi, Ukrainian, Russian, Tatar, Amharic. By many researchers from all over the world 🌍 Support for better, safe, and multicultural online spaces. 📰 [Read about the project in press](https://toloka.ai/blog/can-llms-eliminate-toxicity-in-human-and-ai-generated-content-what-multilingual-research-shows/) 📹 [PyData&CPyConf Berlin 2023 talk](https://youtu.be/8I5tZvcmIis?si=y4sLgrW2xfwJC_GP) **[2025] !!!NOW OPEN!!! TextDetox CLEF2025 shared task** [website](https://pan.webis.de/clef25/pan25-web/text-detoxification.html) 🤗[Starter Kit](https://huggingface.co/collections/textdetox/textdetox-2025-starter-kit-67dc3a8fd86111cac961ecc8) **[2025] COLNG2025**: Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Alekhseevich Moskovskiy, Elisei Stakovskii, Eran Kaufman, Ashraf Elnagar, Animesh Mukherjee, and Alexander Panchenko. 2025. ***Multilingual and Explainable Text Detoxification with Parallel Corpora***. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7998–8025, Abu Dhabi, UAE. Association for Computational Linguistics. [pdf](https://aclanthology.org/2025.coling-main.535/) **[2024] TextDetox2024 Report**: Daryna Dementieva, Daniil Moskovskiy, Nikolay Babakov, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Dmitry Ustalov, Elisei Stakovskii, Alisa Smirnova, Ashraf Elnagar, Animesh Mukherjee, and Alexander Panchenko ***"Overview of the multilingual text detoxification task at pan 2024"*** Working Notes of CLEF (2024). [pdf](https://ceur-ws.org/Vol-3740/paper-223.pdf) **[2024] MultiParaDetox @ NAACL2024**: Daryna Dementieva, Nikolay Babakov, and Alexander Panchenko. ***"MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages."*** Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. [pdf](https://aclanthology.org/2024.naacl-short.12/) **[2024] TextDetox CLEF2024 shared task** [website](https://pan.webis.de/clef24/pan24-web/text-detoxification.html) **[2022] The first Parall Text Detoxification datasets**: [English ParaDetox](https://huggingface.co/datasets/s-nlp/paradetox) and [Russian ParaDetox](https://huggingface.co/datasets/s-nlp/ru_paradetox) ## Contact We are happy to extend our research to more languages, cultures, and dimensions 😉 Please, contact: [Daryna Dementieva](https://huggingface.co/dardem)