BioMike commited on
Commit
4ecd63b
·
verified ·
1 Parent(s): 4611dd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -34
README.md CHANGED
@@ -77,8 +77,8 @@ Than you need to initialize a model and a pipeline:
77
  from gliclass import GLiClassModel, ZeroShotClassificationPipeline
78
  from transformers import AutoTokenizer
79
 
80
- model = GLiClassModel.from_pretrained("knowledgator/gliclass-modern-base-v2.0-init")
81
- tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-modern-base-v2.0-init")
82
  pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
83
 
84
  text = "One day I will see the world!"
@@ -88,6 +88,44 @@ for result in results:
88
  print(result["label"], "=>", result["score"])
89
  ```
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  If you want to use it for NLI type of tasks, we recommend representing your premise as a text and hypothesis as a label, you can put several hypotheses, but the model works best with a single input hypothesis.
92
  ```python
93
  # Initialize model and multi-label pipeline
@@ -98,40 +136,66 @@ print(results)
98
  ```
99
 
100
  ### Benchmarks:
101
- Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting.
102
- | Model | IMDB | AG_NEWS | Emotions |
103
- |-----------------------------|------|---------|----------|
104
- | [gliclass-modern-large-v2.0-init (399 M)](knowledgator/gliclass-modern-large-v2.0-init) | 0.9137 | 0.7357 | 0.4140 |
105
- | [gliclass-modern-base-v2.0-init (151 M)](knowledgator/gliclass-modern-base-v2.0-init) | 0.8264 | 0.6637 | 0.2985 |
106
- | [gliclass-large-v1.0 (438 M)](https://huggingface.co/knowledgator/gliclass-large-v1.0) | 0.9404 | 0.7516 | 0.4874 |
107
- | [gliclass-base-v1.0 (186 M)](https://huggingface.co/knowledgator/gliclass-base-v1.0) | 0.8650 | 0.6837 | 0.4749 |
108
- | [gliclass-small-v1.0 (144 M)](https://huggingface.co/knowledgator/gliclass-small-v1.0) | 0.8650 | 0.6805 | 0.4664 |
109
- | [Bart-large-mnli (407 M)](https://huggingface.co/facebook/bart-large-mnli) | 0.89 | 0.6887 | 0.3765 |
110
- | [Deberta-base-v3 (184 M)](https://huggingface.co/cross-encoder/nli-deberta-v3-base) | 0.85 | 0.6455 | 0.5095 |
111
- | [Comprehendo (184M)](https://huggingface.co/knowledgator/comprehend_it-base) | 0.90 | 0.7982 | 0.5660 |
112
- | SetFit [BAAI/bge-small-en-v1.5 (33.4M)](https://huggingface.co/BAAI/bge-small-en-v1.5) | 0.86 | 0.5636 | 0.5754 |
113
-
114
-
115
- Below you can find a comparison with other GLiClass models:
116
-
117
- | Dataset | gliclass-base-v1.0-init | gliclass-large-v1.0-init | gliclass-modern-base-v2.0-init | gliclass-modern-large-v2.0-init |
118
- |----------------------|-----------------------|-----------------------|---------------------|---------------------|
119
- | CR | 0.8672 | 0.8024 | 0.9041 | 0.8980 |
120
- | sst2 | 0.8342 | 0.8734 | 0.9011 | 0.9434 |
121
- | sst5 | 0.2048 | 0.1638 | 0.1972 | 0.1123 |
122
- | 20_news_groups | 0.2317 | 0.4151 | 0.2448 | 0.2792 |
123
- | spam | 0.5963 | 0.5407 | 0.5074 | 0.6364 |
124
- | financial_phrasebank | 0.3594 | 0.3705 | 0.2537 | 0.2562 |
125
- | imdb | 0.8772 | 0.8836 | 0.8255 | 0.9137 |
126
- | ag_news | 0.5614 | 0.7069 | 0.6050 | 0.6933 |
127
- | emotion | 0.2865 | 0.3840 | 0.2474 | 0.3746 |
128
- | cap_sotu | 0.3966 | 0.4353 | 0.2929 | 0.2919 |
129
- | rotten_tomatoes | 0.6626 | 0.7933 | 0.6630 | 0.5928 |
130
- | **AVERAGE:** | 0.5344 | 0.5790 | 0.5129 | 0.5447 |
131
-
132
- Here you can see how the performance of the model grows providing more examples:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  | Model | Num Examples | sst5 | ag_news | emotion | **AVERAGE:** |
134
  |------------------------------------|------------------|--------|---------|--------------|----------|
 
 
 
135
  | gliclass-modern-large-v2.0-init | 0 | 0.1123 | 0.6933 | 0.3746 | 0.3934 |
136
  | gliclass-modern-large-v2.0-init | 8 | 0.5098 | 0.8339 | 0.5010 | 0.6149 |
137
  | gliclass-modern-large-v2.0-init | Weak Supervision | 0.0951 | 0.6478 | 0.4520 | 0.3983 |
 
77
  from gliclass import GLiClassModel, ZeroShotClassificationPipeline
78
  from transformers import AutoTokenizer
79
 
80
+ model = GLiClassModel.from_pretrained("knowledgator/gliclass-base-v2.0-rac-init")
81
+ tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-base-v2.0-rac-init")
82
  pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
83
 
84
  text = "One day I will see the world!"
 
88
  print(result["label"], "=>", result["score"])
89
  ```
90
 
91
+ To use with one **RAC** example:
92
+ ```python
93
+ example_1 = {
94
+ "text": "A recently developed machine learning platform offers robust automation for complex data analysis workflows. While it enhances productivity, users have reported difficulties in integrating it with their current data infrastructure and a need for better documentation.",
95
+ "all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
96
+ "true_labels": ["AI", "integration", 'automation']
97
+ }
98
+
99
+ text = "The new AI-powered tool streamlines data analysis by automating repetitive tasks, improving efficiency for data scientists. However, its steep learning curve and limited integration with existing platforms pose challenges for widespread adoption."
100
+ labels = ["AI", "automation", "data_analysis", "usability", "integration"]
101
+
102
+ results = pipeline(text, labels, threshold=0.1, rac_examples=[example_1])[0]
103
+
104
+ for predict in results:
105
+ print(predict["label"], " - ", predict["score"])
106
+ ```
107
+
108
+ To use with several **RAC** examples:
109
+ ```python
110
+ example_1 = {
111
+ "text": "A recently developed machine learning platform offers robust automation for complex data analysis workflows. While it enhances productivity, users have reported difficulties in integrating it with their current data infrastructure and a need for better documentation.",
112
+ "all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
113
+ "true_labels": ["AI", "integration", 'automation']
114
+ }
115
+ example_2 = {
116
+ "text": "A cloud-based analytics tool leverages artificial intelligence to provide real-time insights. It significantly improves workflow efficiency but struggles with compatibility across different enterprise systems, requiring additional customization efforts.",
117
+ "all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
118
+ "true_labels": ["AI", "integration", "data_analysis"]
119
+ }
120
+ text = "The new AI-powered tool streamlines data analysis by automating repetitive tasks, improving efficiency for data scientists. However, its steep learning curve and limited integration with existing platforms pose challenges for widespread adoption."
121
+ labels = ["AI", "automation", "data_analysis", "usability", "integration"]
122
+
123
+ results = pipeline(text, labels, threshold=0.1, rac_examples=[example_1, example_2])[0]
124
+
125
+ for predict in results:
126
+ print(predict["label"], " - ", predict["score"])
127
+ ```
128
+
129
  If you want to use it for NLI type of tasks, we recommend representing your premise as a text and hypothesis as a label, you can put several hypotheses, but the model works best with a single input hypothesis.
130
  ```python
131
  # Initialize model and multi-label pipeline
 
136
  ```
137
 
138
  ### Benchmarks:
139
+ Below, you can find a comparison with other GLiClass models:
140
+
141
+ | Dataset | gliclass-base-v1.0-init | gliclass-large-v1.0-init | gliclass-modern-base-v2.0-init | gliclass-modern-large-v2.0-init | gliclass-base-v2.0-rac-init |
142
+ |----------------------|-----------------------|-----------------------|---------------------|---------------------|---------------------|
143
+ | CR | 0.8672 | 0.8024 | 0.9041 | 0.8980 | 0.7852 |
144
+ | sst2 | 0.8342 | 0.8734 | 0.9011 | 0.9434 | 0.8610 |
145
+ | sst5 | 0.2048 | 0.1638 | 0.1972 | 0.1123 | 0.0598 |
146
+ | 20_news_groups | 0.2317 | 0.4151 | 0.2448 | 0.2792 | 0.4007 |
147
+ | spam | 0.5963 | 0.5407 | 0.5074 | 0.6364 | 0.6739 |
148
+ | financial_phrasebank | 0.3594 | 0.3705 | 0.2537 | 0.2562 | 0.2537 |
149
+ | imdb | 0.8772 | 0.8836 | 0.8255 | 0.9137 | 0.8716 |
150
+ | ag_news | 0.5614 | 0.7069 | 0.6050 | 0.6933 | 0.6759 |
151
+ | emotion | 0.2865 | 0.3840 | 0.2474 | 0.3746 | 0.4160 |
152
+ | cap_sotu | 0.3966 | 0.4353 | 0.2929 | 0.2919 | 0.3871 |
153
+ | rotten_tomatoes | 0.6626 | 0.7933 | 0.6630 | 0.5928 | 0.7739 |
154
+ | **AVERAGE:** | 0.5344 | 0.5790 | 0.5129 | 0.5447 | 0.5598 |
155
+
156
+ Here you can see how the performance of the model grows, providing more **RAC** examples:
157
+ | Dataset | 0 examples | 1 example | 2 examples | 3 examples |
158
+ |-------------------------------------|------------|------------|------------|------------|
159
+ | cap_sotu | 0.3857 | 0.4665 | 0.4935 | 0.4847 |
160
+ | cap_sotu (8 examples) | 0.4938 | 0.5097 | 0.4976 | 0.4894 |
161
+ | cap_sotu (Weak Supervision - 8) | 0.4319 | 0.4764 | 0.4488 | 0.4465 |
162
+ | dair-ai_emotion | 0.4472 | 0.5505 | 0.5619 | 0.5705 |
163
+ | dair-ai_emotion (8 examples) | 0.5088 | 0.5630 | 0.5623 | 0.5740 |
164
+ | dair-ai_emotion (Weak Supervision - 8) | 0.4187 | 0.5479 | 0.5693 | 0.5828 |
165
+ | ag_news | 0.6791 | 0.8507 | 0.8717 | 0.8866 |
166
+ | ag_news (8 examples) | 0.8496 | 0.9002 | 0.9072 | 0.9091 |
167
+ | ag_news (Weak Supervision - 8) | 0.6546 | 0.8623 | 0.8841 | 0.8978 |
168
+ | sst5 | 0.0599 | 0.0675 | 0.1163 | 0.1267 |
169
+ | sst5 (8 examples) | 0.2887 | 0.2690 | 0.2642 | 0.2394 |
170
+ | sst5 (Weak Supervision - 8) | 0.0744 | 0.2780 | 0.2897 | 0.2912 |
171
+ | ScienceQA | 0.1142 | 0.4035 | 0.4534 | 0.4495 |
172
+ | ScienceQA (8 examples) | 0.6493 | 0.6547 | 0.6956 | 0.6770 |
173
+ | ScienceQA (Weak Supervision - 8) | 0.2987 | 0.5919 | 0.5998 | 0.5674 |
174
+ | Malicious_code_classification | 0.3717 | 0.6260 | 0.9672 | 0.9788 |
175
+ | Malicious_code_classification (8 examples) | 0.8444 | 0.9722 | 0.9788 | 0.9772 |
176
+ | Malicious_code_classification (Weak Supervision - 8) | 0.3745 | 0.9216 | 0.9788 | 0.9772 |
177
+ | twitter-financial-news-topic | 0.2594 | 0.6249 | 0.6408 | 0.6427 |
178
+ | twitter-financial-news-topic (8 examples) | 0.6137 | 0.7072 | 0.7099 | 0.6948 |
179
+ | twitter-financial-news-topic (Weak Supervision - 8) | 0.4032 | 0.6651 | 0.6316 | 0.6114 |
180
+ | 20_newsgroups | 0.3211 | 0.1339 | 0.0906 | 0.1005 |
181
+ | 20_newsgroups (8 examples) | 0.0959 | 0.0657 | 0.0440 | 0.0445 |
182
+ | 20_newsgroups (Weak Supervision - 8) | 0.4765 | 0.1035 | 0.0775 | 0.0777 |
183
+ | ChemProt | 0.2024 | 0.1911 | 0.1568 | 0.1329 |
184
+ | ChemProt (8 examples) | 0.2985 | 0.3479 | 0.3636 | 0.3538 |
185
+ | ChemProt (Weak Supervision - 8) | 0.2369 | 0.2067 | 0.1911 | 0.1780 |
186
+
187
+ | **AVERAGE:** | **0 examples** | **1 example** | **2 examples** | **3 examples** |
188
+ |-------------------------------------|---------------|---------------|---------------|---------------|
189
+ | Standard | 0.3090 | 0.4275 | 0.4707 | 0.4718 |
190
+ | 8 examples | 0.4838 | 0.5245 | 0.5288 | 0.5244 |
191
+ | Weak Supervision - 8 | 0.3661 | 0.4862 | 0.4868 | 0.4821 |
192
+
193
+ Here you can see how the performance of the model grows, providing more examples in comparison to other models:
194
  | Model | Num Examples | sst5 | ag_news | emotion | **AVERAGE:** |
195
  |------------------------------------|------------------|--------|---------|--------------|----------|
196
+ | gliclass-base-v2.0-rac-init | 0 | 0.0599 | 0.6791 | 0.4472 | 0.3934 |
197
+ | gliclass-base-v2.0-rac-init | 8 | 0.2887 | 0.8496 | 0.5088 | 0.6149 |
198
+ | gliclass-base-v2.0-rac-init | Weak Supervision | 0.0744 | 0.6546 | 0.4187 | 0.3983 |
199
  | gliclass-modern-large-v2.0-init | 0 | 0.1123 | 0.6933 | 0.3746 | 0.3934 |
200
  | gliclass-modern-large-v2.0-init | 8 | 0.5098 | 0.8339 | 0.5010 | 0.6149 |
201
  | gliclass-modern-large-v2.0-init | Weak Supervision | 0.0951 | 0.6478 | 0.4520 | 0.3983 |