diff --git "a/README.md" "b/README.md" new file mode 100644--- /dev/null +++ "b/README.md" @@ -0,0 +1,412 @@ +--- +base_model: sentence-transformers/all-MiniLM-L6-v2 +library_name: setfit +metrics: +- accuracy +pipeline_tag: text-classification +tags: +- setfit +- sentence-transformers +- text-classification +- generated_from_setfit_trainer +widget: +- text: Despite the widespread use of genome-based methods for taxonomic classification, + some researchers (e.g., Brown and Caporaso, 2012) argue that a polyphasic approach, + which combines multiple lines of evidence, remains essential for accurate Actinobacteria + taxonomy. +- text: This study adds to the existing literature on late-onset sepsis in very low + birth weight neonates by providing insights into the clinical characteristics, + microbiological etiologies, and outcomes of these infections based on a large, + multicenter database. +- text: The model-based clustering algorithm, specifically the Gaussian Mixture Model, + effectively identified distinct clusters in the data with high accuracy. +- text: The study demonstrates that waste cooking oil can be effectively converted + into biodiesel using the proposed process design, yielding a high-quality fuel + with significant reductions in greenhouse gas emissions. +- text: TopHat and Cufflinks have been shown to outperform other tools in accurately + aligning RNA-seq reads and quantifying gene and transcript expression levels, + respectively (Kim et al., 2013; Trapnell et al., 2012) +inference: true +model-index: +- name: SetFit with sentence-transformers/all-MiniLM-L6-v2 + results: + - task: + type: text-classification + name: Text Classification + dataset: + name: Unknown + type: unknown + split: test + metrics: + - type: accuracy + value: 0.47572815533980584 + name: Accuracy +--- + +# SetFit with sentence-transformers/all-MiniLM-L6-v2 + +This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification. + +The model has been trained using an efficient few-shot learning technique that involves: + +1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. +2. Training a classification head with features from the fine-tuned Sentence Transformer. + +## Model Details + +### Model Description +- **Model Type:** SetFit +- **Sentence Transformer body:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) +- **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance +- **Maximum Sequence Length:** 256 tokens +- **Number of Classes:** 103 classes + + + + +### Model Sources + +- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit) +- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055) +- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit) + +### Model Labels +| Label | Examples | +|:-------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Acknowledging limitation(s) whilst stating a finding or contribution | | +| Advising cautious interpretation of the findings | | +| Commenting on the findings | | +| Commenting on the strengths of the current study | | +| Comparing the result: contradicting previous findings | | +| Comparing the result: supporting previous findings | | +| Contrasting sources with ‘however’ for emphasis | | +| Describing previously used methods | | +| Describing questionnaire design | | +| Describing the characteristics of the participants | | +| Describing the limitations of the current study | | +| Describing the process: adverbs of manner | | +| Describing the process: expressing purpose with for | | +| Describing the process: infinitive of purpose | | +| Describing the process: sequence words | | +| Describing the process: statistical procedures | | +| Describing the process: typical verbs in the passive form | | +| Describing the process: using + instrument | | +| Describing the research design and the methods used | | +| Describing what other writers do in their published work | | +| Detailing specific limitations | | +| Establishing the importance of the topic for the discipline | | +| Establishing the importance of the topic for the discipline: time frame given | | +| Establishing the importance of the topic for the world or society | | +| Establishing the importance of the topic for the world or society: time frame given | | +| Establising the importance of the topic as a problem to be addressed | | +| Explaining keywords (also refer to Defining Terms) | | +| Explaining the provenance of articles for review | | +| Explaining the provenance of the participants | | +| Explaining the significance of the current study | | +| Explaining the significance of the findings or contribution of the study | | +| General comments on the relevant literature | | +| General reference to previous research or scholarship: highlighting negative outcomes | | +| Giving reasons for personal interest in the research (sometimes found in the humanities, and the applied human sciences) | | +| Giving reasons why a particular method was adopted | | +| Giving reasons why a particular method was rejected | | +| Highlighting inadequacies or weaknesses of previous studies (also refer to Being Critical) | | +| Highlighting interesting or surprising results | | +| Highlighting significant data in a table or chart | | +| Identifying a controversy within the field of study | | +| Identifying a knowledge gap in the field of study | | +| Implications and/or recommendations for practice or policy | | +| Indicating an expected outcome | | +| Indicating an unexpected outcome | | +| Indicating criteria for selection or inclusion in the study | | +| Indicating methodological problems or limitations | | +| Indicating missing, weak, or contradictory evidence | | +| Indicating the methodology for the current research | | +| Indicating the use of an established method | | +| Introducing the limitations of the current study | | +| Making recommendations for further research work | | +| Noting implications of the findings | | +| Noting the lack of or paucity of previous research | | +| Offering an explanation for the findings | | +| Outlining the structure of a short paper | | +| Outlining the structure of a thesis or dissertation | | +| Pointing out interesting or important findings | | +| Previewing a chapter | | +| Previous research: A historic perspective | | +| Previous research: Approaches taken | | +| Previous research: What has been established or proposed | | +| Previous research: area investigated as the sentence object | | +| Previous research: area investigated as the sentence subject | | +| Previous research: highlighting negative outcomes | | +| Providing background information: reference to the literature | | +| Providing background information: reference to the purpose of the study | | +| Reference to previous research: important studies | | +| Referring back to the purpose of the paper or study | | +| Referring back to the research aims or procedures | | +| Referring to a single investigation in the past: investigation prominent | | +| Referring to a single investigation in the past: researcher prominent | | +| Referring to another writer’s idea(s) or position | | +| Referring to data in a table or chart | | +| Referring to important texts in the area of interest | | +| Referring to previous work to establish what is already known | | +| Referring to secondary sources | | +| Referring to the literature to justify a method or approach | | +| Reporting positive and negative reactions | | +| Restating a result or one of several results | | +| Setting out the research questions or hypotheses | | +| Some ways of introducing quotations | | +| Stating a negative result | | +| Stating a positive result | | +| Stating purpose of the current research with reference to gaps or issues in the literature | | +| Stating the aims of the current research (note frequent use of past tense) | | +| Stating the focus, aim, or argument of a short paper | | +| Stating the purpose of the thesis, dissertation, or research article (note use of present tense) | | +| Stating what is currently known about the topic | | +| Suggesting general hypotheses | | +| Suggesting implications for what is already known | | +| Suggestions for future work | | +| Summarising the literature review | | +| Summarising the main research findings | | +| Summarising the results section | | +| Summarising the studies reviewed | | +| Surveys and interviews: Introducing excerpts from interview data | | +| Surveys and interviews: Reporting participants’ views | | +| Surveys and interviews: Reporting proportions | | +| Surveys and interviews: Reporting response rates | | +| Surveys and interviews: Reporting themes | | +| Synthesising sources: contrasting evidence or ideas | | +| Synthesising sources: supporting evidence or ideas | | +| Transition: moving to the next result | | + +## Evaluation + +### Metrics +| Label | Accuracy | +|:--------|:---------| +| **all** | 0.4757 | + +## Uses + +### Direct Use for Inference + +First install the SetFit library: + +```bash +pip install setfit +``` + +Then you can load this model and run inference. + +```python +from setfit import SetFitModel + +# Download from the 🤗 Hub +model = SetFitModel.from_pretrained("Corran/SciGenSetfit24") +# Run inference +preds = model("The model-based clustering algorithm, specifically the Gaussian Mixture Model, effectively identified distinct clusters in the data with high accuracy.") +``` + + + + + + + + + +## Training Details + +### Training Set Metrics +| Training set | Min | Median | Max | +|:-------------|:----|:--------|:----| +| Word count | 6 | 28.4192 | 62 | + +| Label | Training Sample Count | +|:-------------------------------------------------------------------------------------------------------------------------|:----------------------| +| Acknowledging limitation(s) whilst stating a finding or contribution | 50 | +| Advising cautious interpretation of the findings | 50 | +| Commenting on the findings | 50 | +| Commenting on the strengths of the current study | 50 | +| Comparing the result: contradicting previous findings | 50 | +| Comparing the result: supporting previous findings | 50 | +| Contrasting sources with ‘however’ for emphasis | 50 | +| Describing previously used methods | 50 | +| Describing questionnaire design | 50 | +| Describing the characteristics of the participants | 50 | +| Describing the limitations of the current study | 50 | +| Describing the process: adverbs of manner | 50 | +| Describing the process: expressing purpose with for | 50 | +| Describing the process: infinitive of purpose | 50 | +| Describing the process: sequence words | 50 | +| Describing the process: statistical procedures | 50 | +| Describing the process: typical verbs in the passive form | 50 | +| Describing the process: using + instrument | 50 | +| Describing the research design and the methods used | 50 | +| Describing what other writers do in their published work | 50 | +| Detailing specific limitations | 50 | +| Establishing the importance of the topic for the discipline | 50 | +| Establishing the importance of the topic for the discipline: time frame given | 50 | +| Establishing the importance of the topic for the world or society | 50 | +| Establishing the importance of the topic for the world or society: time frame given | 50 | +| Establising the importance of the topic as a problem to be addressed | 50 | +| Explaining keywords (also refer to Defining Terms) | 50 | +| Explaining the provenance of articles for review | 50 | +| Explaining the provenance of the participants | 50 | +| Explaining the significance of the current study | 50 | +| Explaining the significance of the findings or contribution of the study | 50 | +| General comments on the relevant literature | 50 | +| General reference to previous research or scholarship: highlighting negative outcomes | 50 | +| Giving reasons for personal interest in the research (sometimes found in the humanities, and the applied human sciences) | 50 | +| Giving reasons why a particular method was adopted | 50 | +| Giving reasons why a particular method was rejected | 50 | +| Highlighting inadequacies or weaknesses of previous studies (also refer to Being Critical) | 50 | +| Highlighting interesting or surprising results | 50 | +| Highlighting significant data in a table or chart | 50 | +| Identifying a controversy within the field of study | 50 | +| Identifying a knowledge gap in the field of study | 50 | +| Implications and/or recommendations for practice or policy | 50 | +| Indicating an expected outcome | 50 | +| Indicating an unexpected outcome | 50 | +| Indicating criteria for selection or inclusion in the study | 50 | +| Indicating methodological problems or limitations | 50 | +| Indicating missing, weak, or contradictory evidence | 50 | +| Indicating the methodology for the current research | 50 | +| Indicating the use of an established method | 50 | +| Introducing the limitations of the current study | 50 | +| Making recommendations for further research work | 50 | +| Noting implications of the findings | 50 | +| Noting the lack of or paucity of previous research | 50 | +| Offering an explanation for the findings | 50 | +| Outlining the structure of a short paper | 50 | +| Outlining the structure of a thesis or dissertation | 50 | +| Pointing out interesting or important findings | 50 | +| Previewing a chapter | 50 | +| Previous research: A historic perspective | 50 | +| Previous research: Approaches taken | 50 | +| Previous research: What has been established or proposed | 50 | +| Previous research: area investigated as the sentence object | 50 | +| Previous research: area investigated as the sentence subject | 50 | +| Previous research: highlighting negative outcomes | 50 | +| Providing background information: reference to the literature | 50 | +| Providing background information: reference to the purpose of the study | 50 | +| Reference to previous research: important studies | 50 | +| Referring back to the purpose of the paper or study | 50 | +| Referring back to the research aims or procedures | 50 | +| Referring to a single investigation in the past: investigation prominent | 50 | +| Referring to a single investigation in the past: researcher prominent | 50 | +| Referring to another writer’s idea(s) or position | 50 | +| Referring to data in a table or chart | 50 | +| Referring to important texts in the area of interest | 50 | +| Referring to previous work to establish what is already known | 50 | +| Referring to secondary sources | 50 | +| Referring to the literature to justify a method or approach | 50 | +| Reporting positive and negative reactions | 50 | +| Restating a result or one of several results | 50 | +| Setting out the research questions or hypotheses | 50 | +| Some ways of introducing quotations | 50 | +| Stating a negative result | 50 | +| Stating a positive result | 50 | +| Stating purpose of the current research with reference to gaps or issues in the literature | 50 | +| Stating the aims of the current research (note frequent use of past tense) | 50 | +| Stating the focus, aim, or argument of a short paper | 50 | +| Stating the purpose of the thesis, dissertation, or research article (note use of present tense) | 50 | +| Stating what is currently known about the topic | 50 | +| Suggesting general hypotheses | 50 | +| Suggesting implications for what is already known | 50 | +| Suggestions for future work | 50 | +| Summarising the literature review | 50 | +| Summarising the main research findings | 50 | +| Summarising the results section | 50 | +| Summarising the studies reviewed | 50 | +| Surveys and interviews: Introducing excerpts from interview data | 50 | +| Surveys and interviews: Reporting participants’ views | 50 | +| Surveys and interviews: Reporting proportions | 50 | +| Surveys and interviews: Reporting response rates | 50 | +| Surveys and interviews: Reporting themes | 50 | +| Synthesising sources: contrasting evidence or ideas | 50 | +| Synthesising sources: supporting evidence or ideas | 50 | +| Transition: moving to the next result | 50 | + +### Training Hyperparameters +- batch_size: (300, 300) +- num_epochs: (1, 1) +- max_steps: -1 +- sampling_strategy: oversampling +- num_iterations: 5 +- body_learning_rate: (2e-05, 1e-05) +- head_learning_rate: 0.01 +- loss: CosineSimilarityLoss +- distance_metric: cosine_distance +- margin: 0.25 +- end_to_end: False +- use_amp: False +- warmup_proportion: 0.1 +- seed: 42 +- eval_max_steps: -1 +- load_best_model_at_end: False + +### Training Results +| Epoch | Step | Training Loss | Validation Loss | +|:------:|:----:|:-------------:|:---------------:| +| 0.0058 | 1 | 0.4364 | - | +| 0.2907 | 50 | 0.1895 | - | +| 0.5814 | 100 | 0.1527 | - | +| 0.8721 | 150 | 0.139 | - | + +### Framework Versions +- Python: 3.10.12 +- SetFit: 1.0.3 +- Sentence Transformers: 3.1.1 +- Transformers: 4.39.0 +- PyTorch: 2.5.1+cu121 +- Datasets: 3.1.0 +- Tokenizers: 0.15.2 + +## Citation + +### BibTeX +```bibtex +@article{https://doi.org/10.48550/arxiv.2209.11055, + doi = {10.48550/ARXIV.2209.11055}, + url = {https://arxiv.org/abs/2209.11055}, + author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren}, + keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, + title = {Efficient Few-Shot Learning Without Prompts}, + publisher = {arXiv}, + year = {2022}, + copyright = {Creative Commons Attribution 4.0 International} +} +``` + + + + + + \ No newline at end of file