formality-classifier-mdeberta-v3-base

This model can classify texts based on their formality. It classifies inputs into one of the three classes ["formal", "informal", "neutral"], with neutral pertaining to texts which do not have a clear formality, such as passive statements etc.

In selecting and generating training data, a focus was put on languages that actually have a type of formal address etc., including French, German, Italian, Portuguese and Spanish. Some samples from osyvokon/pavlick-formality-scores were also used to try and teach the model to classify English inputs.

Results

Accuracy on the test set:

Language Accuracy
all 88.93%
English 79.20%
French 100%
German 97.73%
Italian 97.83%
Portuguese 100%
Spanish 98.53%

Confusion Matrix:

By Language:

Usage example

from transformers import pipeline

pipe = pipeline("text-classification", model="LenDigLearn/formality-classifier-mdeberta-v3-base")


print("DE:")
texts_de = [
    "Verschwinde", "Nein", "Ja", "vielleicht", "Warum bist du so?",
    "Können Sie mir spontan dabei helfen?", "Bitte senden Sie uns die nötigen Unterlagen zu.", "Dies müssen Sie selbst entscheiden, wenn Sie den entsprechenden Punkt erreicht haben.", "Sie sind also Herr Müller.", "Bitte helfen Sie mir!",
    "Man muss schon wissen, was dann passiert.", "Als nächstes kommen 4g Champignons und 500g Mehl dazu.", "Bananen sind krumm.", "Das ist eine Tatsache, die unumstößlich ist.", "Hilfestellungen sind unter \"Hilfe\" zu finden."
]
for text in texts_de:
    print(pipe(text))

print("-----------\nEN:")
texts_en = [
    "Piss off", "No", "Yes", "maybe", "Why are you like this?",
    "Could you help me spontaneously?", "Please send me the necessary documents.", "You will have to decide this individually as soon as you have reached the relevant point.", "I presume you are Mr. Müller?", "Please offer me your support!",
    "One would have to know what happens then.", "Then, we add 4g Mushrooms and 500g flour.", "Bananas are usually curved.", "That is an irrefutable fact.", "You can find helpful tutorials under \"help\"."
]
for text in texts_en:
    print(pipe(text))
Downloads last month
14
Safetensors
Model size
279M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for LenDigLearn/formality-classifier-mdeberta-v3-base

Finetuned
(153)
this model