rntc's picture
Update README.md
9c384e0 verified
---
library_name: transformers
license: mit
base_model:
- FacebookAI/xlm-roberta-base
---
# Biomedical-Enriched Classifier
This is the model used to create the [**Biomed-Enriched**](https://huggingface.co/datasets/almanach/Biomed-Enriched) dataset.
## Model Details
- **Base Model:** `xlm-roberta-base`
- **Model Type:** Multi-task model combining multi-label classification and regression.
- **Description:** This model was fine-tuned to classify paragraphs from biomedical texts for their domain and document type, and predict an educational quality score via regression.
## Training
The model was trained on a set of 400,000 paragraphs from PubMed Central, which were annotated by the [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) model.
## Purpose
This classifier was created to scale the initial high-quality annotations to the entire PubMed Open Access dataset. This distillation process enabled the creation of the large-scale Biomed-Enriched dataset while maintaining annotation consistency.
## Model Outputs
The model predicts the following outputs:
### Domain (Classification)
- `Clinical`
- `Biomedical`
- `Other`
### Document Type (Classification)
- `Clinical Case`
- `Study`
- `Review`
- `Other`
### Educational Quality (Regression)
- A regression score from `1` (low quality) to `5` (high quality).