File size: 2,037 Bytes
2149c43 79bb9db 01c4c52 73986b2 01c4c52 2149c43 7944776 2149c43 7944776 2149c43 7944776 79bb9db 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 2149c43 7944776 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
library_name: transformers
license: openrail++
datasets:
- textdetox/multilingual_paradetox
- chameleon-lizard/synthetic-multilingual-paradetox
language:
- en
- ru
- uk
- am
- de
- es
- zh
- ar
- hi
pipeline_tag: text2text-generation
---
# Model Card for Model ID
Finetune of the mt0-xl model for text toxification task.
## Model Details
### Model Description
This is a finetune of mt0-xl model for text toxification task. Can be used for synthetic data generation from non-toxic examples.
- **Developed by:** Nikita Sushko
- **Model type:** mt5-xl
- **Language(s) (NLP):** English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi
- **License:** OpenRail++
- **Finetuned from model:** mt0-xl
## Uses
This model is intended to be used for synthetic data generation from non-toxic examples.
### Direct Use
The model may be directly used for text toxification tasks.
### Out-of-Scope Use
The model may be used for generating toxic versions of sentences.
## Bias, Risks, and Limitations
Since this model generates toxic versions of sentences, it may be used to increase toxicity of generated texts.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
import transformers
checkpoint = 'chameleon-lizard/tox-mt0-xl'
tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint)
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto")
pipe = transformers.pipeline(
"text2text-generation",
model=model,
tokenizer=tokenizer,
max_length=512,
truncation=True,
)
language = 'English'
text = "That's dissapointing."
print(pipe('Rewrite the following text in {language} the most toxic and obscene version possible: {text}')[0]['generated_text'])
# Resulting text: "That's dissapointing, you stupid ass bitch."
```
Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language. |