papluca
/

xlm-roberta-base-language-detection

Text Classification

Generated from Trainer

Model card Files Files and versions

papluca commited on Dec 28, 2023

Commit

9865598

·

1 Parent(s): 8442e3d

Add model usage info

Files changed (1) hide show

README.md +45 -0

README.md CHANGED Viewed

@@ -103,6 +103,51 @@ As a baseline to compare `xlm-roberta-base-language-detection` against, we have
 |vi        |0.971      |0.990   |0.980     |500      |
 |zh        |1.000      |1.000   |1.000     |500      |
 ## Training procedure
 Fine-tuning was done via the `Trainer` API. Here is the [Colab notebook](https://colab.research.google.com/drive/15LJTckS6gU3RQOmjLqxVNBmbsBdnUEvl?usp=sharing) with the training code.

 |vi        |0.971      |0.990   |0.980     |500      |
 |zh        |1.000      |1.000   |1.000     |500      |
+## How to get started with the model
+The easiest way to use the model is via the high-level `pipeline` API:
+```python
+from transformers import pipeline
+text = [
+    "Brevity is the soul of wit.",
+    "Amor, ch'a nullo amato amar perdona."
+]
+model_ckpt = "papluca/xlm-roberta-base-language-detection"
+pipe = pipeline("text-classification", model=model_ckpt)
+pipe(text, top_k=1, truncation=True)
+```
+Or one can proceed with the tokenizer and model separately:
+```python
+import torch
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+text = [
+    "Brevity is the soul of wit.",
+    "Amor, ch'a nullo amato amar perdona."
+]
+model_ckpt = "papluca/xlm-roberta-base-language-detection"
+tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
+model = AutoModelForSequenceClassification.from_pretrained(model_ckpt)
+inputs = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+preds = torch.softmax(logits, dim=-1)
+# Map raw predictions to languages
+id2lang = model.config.id2label
+vals, idxs = torch.max(preds, dim=1)
+{id2lang[k.item()]: v.item() for k, v in zip(idxs, vals)}
+```
 ## Training procedure
 Fine-tuning was done via the `Trainer` API. Here is the [Colab notebook](https://colab.research.google.com/drive/15LJTckS6gU3RQOmjLqxVNBmbsBdnUEvl?usp=sharing) with the training code.