Biomedical datasets & models
Collection
4 items
โข
Updated
This is the model used to create the Biomed-Enriched dataset.
xlm-roberta-base
The model was trained on a set of 400,000 paragraphs from PubMed Central, which were annotated by the Llama 3.1 70B Instruct model.
This classifier was created to scale the initial high-quality annotations to the entire PubMed Open Access dataset. This distillation process enabled the creation of the large-scale Biomed-Enriched dataset while maintaining annotation consistency.
The model predicts the following outputs:
Clinical
Biomedical
Other
Clinical Case
Study
Review
Other
1
(low quality) to 5
(high quality).Base model
FacebookAI/xlm-roberta-base