knowledgator
/

gliclass-base-v2.0-rac-init

@@ -77,8 +77,8 @@ Than you need to initialize a model and a pipeline:
 from gliclass import GLiClassModel, ZeroShotClassificationPipeline
 from transformers import AutoTokenizer
-model = GLiClassModel.from_pretrained("knowledgator/gliclass-modern-base-v2.0-init")
-tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-modern-base-v2.0-init")
 pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
 text = "One day I will see the world!"
@@ -88,6 +88,44 @@ for result in results:
  print(result["label"], "=>", result["score"])
 ```
 If you want to use it for NLI type of tasks, we recommend representing your premise as a text and hypothesis as a label, you can put several hypotheses, but the model works best with a single input hypothesis.
 ```python
 # Initialize model and multi-label pipeline
@@ -98,40 +136,66 @@ print(results)
 ```
 ### Benchmarks:
-Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting.
-| Model                       | IMDB | AG_NEWS | Emotions |
-|-----------------------------|------|---------|----------|
-| [gliclass-modern-large-v2.0-init (399 M)](knowledgator/gliclass-modern-large-v2.0-init) | 0.9137 | 0.7357  | 0.4140  |
-| [gliclass-modern-base-v2.0-init (151 M)](knowledgator/gliclass-modern-base-v2.0-init) | 0.8264 | 0.6637  | 0.2985  |
-| [gliclass-large-v1.0 (438 M)](https://huggingface.co/knowledgator/gliclass-large-v1.0) | 0.9404 | 0.7516  | 0.4874  |
-| [gliclass-base-v1.0 (186 M)](https://huggingface.co/knowledgator/gliclass-base-v1.0) | 0.8650 | 0.6837  | 0.4749  |
-| [gliclass-small-v1.0 (144 M)](https://huggingface.co/knowledgator/gliclass-small-v1.0) | 0.8650 | 0.6805  | 0.4664   |
-| [Bart-large-mnli (407 M)](https://huggingface.co/facebook/bart-large-mnli)      | 0.89 | 0.6887  | 0.3765   |
-| [Deberta-base-v3 (184 M)](https://huggingface.co/cross-encoder/nli-deberta-v3-base)      | 0.85 | 0.6455  | 0.5095   |
-| [Comprehendo (184M)](https://huggingface.co/knowledgator/comprehend_it-base)           | 0.90 | 0.7982  | 0.5660   |
-| SetFit [BAAI/bge-small-en-v1.5 (33.4M)](https://huggingface.co/BAAI/bge-small-en-v1.5) | 0.86 | 0.5636 | 0.5754 |
-Below you can find a comparison with other GLiClass models:
-| Dataset              | gliclass-base-v1.0-init | gliclass-large-v1.0-init | gliclass-modern-base-v2.0-init | gliclass-modern-large-v2.0-init |
-|----------------------|-----------------------|-----------------------|---------------------|---------------------|
-| CR                   | 0.8672                | 0.8024                | 0.9041              | 0.8980              |
-| sst2                 | 0.8342                | 0.8734                | 0.9011              | 0.9434              |
-| sst5                 | 0.2048                | 0.1638                | 0.1972              | 0.1123              |
-| 20_news_groups       | 0.2317                | 0.4151                | 0.2448              | 0.2792              |
-| spam                 | 0.5963                | 0.5407                | 0.5074              | 0.6364              |
-| financial_phrasebank | 0.3594                | 0.3705                | 0.2537              | 0.2562              |
-| imdb                 | 0.8772                | 0.8836                | 0.8255              | 0.9137              |
-| ag_news              | 0.5614                | 0.7069                | 0.6050              | 0.6933              |
-| emotion              | 0.2865                | 0.3840                | 0.2474              | 0.3746              |
-| cap_sotu             | 0.3966                | 0.4353                | 0.2929              | 0.2919              |
-| rotten_tomatoes      | 0.6626                | 0.7933                | 0.6630              | 0.5928              |
-| **AVERAGE:**         | 0.5344                | 0.5790                | 0.5129              | 0.5447              |
-Here you can see how the performance of the model grows providing more examples:
 | Model                             | Num Examples      | sst5   | ag_news | emotion | **AVERAGE:** |
 |------------------------------------|------------------|--------|---------|--------------|----------|
 | gliclass-modern-large-v2.0-init   | 0                | 0.1123 | 0.6933  | 0.3746       | 0.3934   |
 | gliclass-modern-large-v2.0-init   | 8                | 0.5098 | 0.8339  | 0.5010       | 0.6149   |
 | gliclass-modern-large-v2.0-init   | Weak Supervision | 0.0951 | 0.6478  | 0.4520       | 0.3983   |

 from gliclass import GLiClassModel, ZeroShotClassificationPipeline
 from transformers import AutoTokenizer
+model = GLiClassModel.from_pretrained("knowledgator/gliclass-base-v2.0-rac-init")
+tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-base-v2.0-rac-init")
 pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
 text = "One day I will see the world!"
  print(result["label"], "=>", result["score"])
 ```
+To use with one **RAC** example:
+```python
+example_1 = {
+    "text": "A recently developed machine learning platform offers robust automation for complex data analysis workflows. While it enhances productivity, users have reported difficulties in integrating it with their current data infrastructure and a need for better documentation.",
+    "all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
+    "true_labels": ["AI", "integration", 'automation']
+}
+text = "The new AI-powered tool streamlines data analysis by automating repetitive tasks, improving efficiency for data scientists. However, its steep learning curve and limited integration with existing platforms pose challenges for widespread adoption."
+labels = ["AI", "automation", "data_analysis", "usability", "integration"]
+results = pipeline(text, labels, threshold=0.1, rac_examples=[example_1])[0]
+for predict in results:
+    print(predict["label"], " - ", predict["score"])
+```
+To use with several **RAC** examples:
+```python
+example_1 = {
+    "text": "A recently developed machine learning platform offers robust automation for complex data analysis workflows. While it enhances productivity, users have reported difficulties in integrating it with their current data infrastructure and a need for better documentation.",
+    "all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
+    "true_labels": ["AI", "integration", 'automation']
+}
+example_2 = {
+    "text": "A cloud-based analytics tool leverages artificial intelligence to provide real-time insights. It significantly improves workflow efficiency but struggles with compatibility across different enterprise systems, requiring additional customization efforts.",
+    "all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
+    "true_labels": ["AI", "integration", "data_analysis"]
+}
+text = "The new AI-powered tool streamlines data analysis by automating repetitive tasks, improving efficiency for data scientists. However, its steep learning curve and limited integration with existing platforms pose challenges for widespread adoption."
+labels = ["AI", "automation", "data_analysis", "usability", "integration"]
+results = pipeline(text, labels, threshold=0.1, rac_examples=[example_1, example_2])[0]
+for predict in results:
+    print(predict["label"], " - ", predict["score"])
+```
 If you want to use it for NLI type of tasks, we recommend representing your premise as a text and hypothesis as a label, you can put several hypotheses, but the model works best with a single input hypothesis.
 ```python
 # Initialize model and multi-label pipeline
 ```
 ### Benchmarks:
+Below, you can find a comparison with other GLiClass models:
+| Dataset              | gliclass-base-v1.0-init | gliclass-large-v1.0-init | gliclass-modern-base-v2.0-init | gliclass-modern-large-v2.0-init | gliclass-base-v2.0-rac-init |
+|----------------------|-----------------------|-----------------------|---------------------|---------------------|---------------------|
+| CR                   | 0.8672                | 0.8024                | 0.9041              | 0.8980              | 0.7852              |
+| sst2                 | 0.8342                | 0.8734                | 0.9011              | 0.9434              | 0.8610              |
+| sst5                 | 0.2048                | 0.1638                | 0.1972              | 0.1123              | 0.0598              |
+| 20_news_groups       | 0.2317                | 0.4151                | 0.2448              | 0.2792              | 0.4007              |
+| spam                 | 0.5963                | 0.5407                | 0.5074              | 0.6364              | 0.6739              |
+| financial_phrasebank | 0.3594                | 0.3705                | 0.2537              | 0.2562              | 0.2537              |
+| imdb                 | 0.8772                | 0.8836                | 0.8255              | 0.9137              | 0.8716              |
+| ag_news              | 0.5614                | 0.7069                | 0.6050              | 0.6933              | 0.6759              |
+| emotion              | 0.2865                | 0.3840                | 0.2474              | 0.3746              | 0.4160              |
+| cap_sotu             | 0.3966                | 0.4353                | 0.2929              | 0.2919              | 0.3871              |
+| rotten_tomatoes      | 0.6626                | 0.7933                | 0.6630              | 0.5928              | 0.7739              |
+| **AVERAGE:**         | 0.5344                | 0.5790                | 0.5129              | 0.5447              | 0.5598              |
+Here you can see how the performance of the model grows, providing more **RAC** examples:
+| Dataset                             | 0 examples | 1 example | 2 examples | 3 examples |
+|-------------------------------------|------------|------------|------------|------------|
+| cap_sotu                            | 0.3857     | 0.4665     | 0.4935     | 0.4847     |
+| cap_sotu (8 examples)               | 0.4938     | 0.5097     | 0.4976     | 0.4894     |
+| cap_sotu (Weak Supervision - 8)     | 0.4319     | 0.4764     | 0.4488     | 0.4465     |
+| dair-ai_emotion                     | 0.4472     | 0.5505     | 0.5619     | 0.5705     |
+| dair-ai_emotion (8 examples)        | 0.5088     | 0.5630     | 0.5623     | 0.5740     |
+| dair-ai_emotion (Weak Supervision - 8) | 0.4187  | 0.5479     | 0.5693     | 0.5828     |
+| ag_news                              | 0.6791     | 0.8507     | 0.8717     | 0.8866     |
+| ag_news (8 examples)                 | 0.8496     | 0.9002     | 0.9072     | 0.9091     |
+| ag_news (Weak Supervision - 8)       | 0.6546     | 0.8623     | 0.8841     | 0.8978     |
+| sst5                                 | 0.0599     | 0.0675     | 0.1163     | 0.1267     |
+| sst5 (8 examples)                    | 0.2887     | 0.2690     | 0.2642     | 0.2394     |
+| sst5 (Weak Supervision - 8)          | 0.0744     | 0.2780     | 0.2897     | 0.2912     |
+| ScienceQA                            | 0.1142     | 0.4035     | 0.4534     | 0.4495     |
+| ScienceQA (8 examples)               | 0.6493     | 0.6547     | 0.6956     | 0.6770     |
+| ScienceQA (Weak Supervision - 8)     | 0.2987     | 0.5919     | 0.5998     | 0.5674     |
+| Malicious_code_classification        | 0.3717     | 0.6260     | 0.9672     | 0.9788     |
+| Malicious_code_classification (8 examples) | 0.8444 | 0.9722 | 0.9788 | 0.9772 |
+| Malicious_code_classification (Weak Supervision - 8) | 0.3745 | 0.9216 | 0.9788 | 0.9772 |
+| twitter-financial-news-topic         | 0.2594     | 0.6249     | 0.6408     | 0.6427     |
+| twitter-financial-news-topic (8 examples) | 0.6137 | 0.7072 | 0.7099 | 0.6948 |
+| twitter-financial-news-topic (Weak Supervision - 8) | 0.4032 | 0.6651 | 0.6316 | 0.6114 |
+| 20_newsgroups                        | 0.3211     | 0.1339     | 0.0906     | 0.1005     |
+| 20_newsgroups (8 examples)           | 0.0959     | 0.0657     | 0.0440     | 0.0445     |
+| 20_newsgroups (Weak Supervision - 8) | 0.4765     | 0.1035     | 0.0775     | 0.0777     |
+| ChemProt                              | 0.2024     | 0.1911     | 0.1568     | 0.1329     |
+| ChemProt (8 examples)                 | 0.2985     | 0.3479     | 0.3636     | 0.3538     |
+| ChemProt (Weak Supervision - 8)       | 0.2369     | 0.2067     | 0.1911     | 0.1780     |
+| **AVERAGE:**                         | **0 examples** | **1 example** | **2 examples** | **3 examples** |
+|-------------------------------------|---------------|---------------|---------------|---------------|
+| Standard                            | 0.3090        | 0.4275        | 0.4707        | 0.4718        |
+| 8 examples                          | 0.4838        | 0.5245        | 0.5288        | 0.5244        |
+| Weak Supervision - 8                | 0.3661        | 0.4862        | 0.4868        | 0.4821        |
+Here you can see how the performance of the model grows, providing more examples in comparison to other models:
 | Model                             | Num Examples      | sst5   | ag_news | emotion | **AVERAGE:** |
 |------------------------------------|------------------|--------|---------|--------------|----------|
+| gliclass-base-v2.0-rac-init   | 0                | 0.0599 | 0.6791  | 0.4472       | 0.3934   |
+| gliclass-base-v2.0-rac-init   | 8                | 0.2887 | 0.8496  | 0.5088       | 0.6149   |
+| gliclass-base-v2.0-rac-init   | Weak Supervision | 0.0744 | 0.6546  | 0.4187       | 0.3983   |
 | gliclass-modern-large-v2.0-init   | 0                | 0.1123 | 0.6933  | 0.3746       | 0.3934   |
 | gliclass-modern-large-v2.0-init   | 8                | 0.5098 | 0.8339  | 0.5010       | 0.6149   |
 | gliclass-modern-large-v2.0-init   | Weak Supervision | 0.0951 | 0.6478  | 0.4520       | 0.3983   |