File size: 4,565 Bytes

2054c23
 
b4cca93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f45338a
b4cca93
 
f45338a
b4cca93
 
f45338a
b4cca93
 
f45338a
b4cca93
2054c23
 
b4cca93
 
 
 
f1f4d38
b4cca93
 
 
 
 
 
 
 
 
 
29dff30
b4cca93
 
 
 
 
e8e0023
b4cca93
 
 
 
 
 
 
 
f45338a
b4cca93
 
e8e0023
 
b4cca93
 
 
 
 
 
e8e0023
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4cca93
 
 
 
db4f38a
b4cca93
 
 
f45338a
 
 
 
 
 
 
 
b4cca93

---
library_name: transformers
tags:
- nli
- bert
- natural-language-inference
language:
- ru
metrics:
- accuracy
- f1
- precision
- recall
base_model:
- cointegrated/rubert-tiny2
pipeline_tag: text-classification
model-index:
- name: rubert-tiny-nli-terra-v0
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: TERRA
      type: NLI
      split: validation
    metrics:
    - type: accuracy
      value: 0.6742671009771987
      name: Accuracy
    - type: f1
      value: 0.6710526315789473
      name: F1
    - type: precision
      value: 0.6754966887417219
      name: Precision
    - type: recall
      value: 0.6666666666666666
      name: Recall
---

**⚠️ Disclaimer: This model is in the early stages of development and may produce low-quality predictions. For better results, consider using the recommended Russian natural language inference models available [here](https://huggingface.co/cointegrated).**

# RuBERT-tiny-nli v1

This model is the second iteration of the [RuBERT-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) models for a two-way natural language inference task, utilizing the Russian [Textual Entailment Recognition](https://russiansuperglue.com/tasks/task_info/TERRa) dataset. This model comprises two dense layers in the classifier head to improve inference capabilities. However, it is important to note that the model's performance is currently limited, indicating potential areas for further improvement and fine-tuning.


## Usage
How to run the model for NLI:

```python
# !pip install transformers sentencepiece --quiet
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = 'Marwolaeth/rubert-tiny-nli-terra-v1'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
if torch.cuda.is_available():
    model.cuda()

# An example from cointegrated NLI models
premise1 = 'Сократ - человек, а все люди смертны.'
hypothesis1 = 'Сократ никогда не умрёт.'
with torch.inference_mode():
    prediction = model(
      **tokenizer(premise1, hypothesis1, return_tensors='pt').to(model.device)
    )
    p = torch.softmax(prediction.logits, -1).cpu().numpy()[0]
print({v: p[k] for k, v in model.config.id2label.items()})
# {'not_entailment': 0.68763, 'entailment': 0.31237}

# An example concerning sentiments
premise2 = 'Мне не нравятся желтые ковры.'
hypothesis2 = 'Я люблю желтые ковры.'
with torch.inference_mode():
    prediction = model(
      **tokenizer(premise2, hypothesis2, return_tensors='pt').to(model.device)
    )
    p = torch.softmax(prediction.logits, -1).cpu().numpy()[0]
print({v: p[k] for k, v in model.config.id2label.items()})
# {'not_entailment': 0.5894801, 'entailment': 0.41051993}

# A tricky example
# Many NLI models fail to refute premise-hypothesis pairs like:
# 'It is good for our enemies that X' — 'It is good for us that X'
# This contradiction is quite clear, yet many NLI models struggle to accurately identify it, 
# highlighting their limitations in understanding conflicting sentiments in natural language inference.
premise3 = 'Для наших врагов хорошо, что это дерево красное.'
hypothesis3 = 'Для нас хорошо, что это дерево красное.'
with torch.inference_mode():
    prediction = model(
      **tokenizer(premise3, hypothesis3, return_tensors='pt').to(model.device)
    )
    p = torch.softmax(prediction.logits, -1).cpu().numpy()[0]
print({v: p[k] for k, v in model.config.id2label.items()})
# {'not_entailment': 0.54253, 'entailment': 0.45746994}
```

## Model Performance Metrics

The following metrics summarize the performance of the model on the validation dataset:

| Metric                           | Value                     |
|----------------------------------|---------------------------|
| **Validation Loss**              | 0.6492                    |
| **Validation Accuracy**          | 67.43%                    |
| **Validation F1 Score**          | 67.11%                    |
| **Validation Precision**         | 67.55%                    |
| **Validation Recall**            | 66.67%                    |
| **Validation Runtime***          | 0.2631 seconds            |
| **Samples per Second***          | 1 167.02                    |
| **Steps per Second***            | 7.60                     |

*Using T4 GPU with Google Colab