--- library_name: transformers license: mit base_model: - FacebookAI/xlm-roberta-base --- # Biomedical-Enriched Classifier This is the model used to create the [**Biomed-Enriched**](https://huggingface.co/datasets/almanach/Biomed-Enriched) dataset. ## Model Details - **Base Model:** `xlm-roberta-base` - **Model Type:** Multi-task model combining multi-label classification and regression. - **Description:** This model was fine-tuned to classify paragraphs from biomedical texts for their domain and document type, and predict an educational quality score via regression. ## Training The model was trained on a set of 400,000 paragraphs from PubMed Central, which were annotated by the [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) model. ## Purpose This classifier was created to scale the initial high-quality annotations to the entire PubMed Open Access dataset. This distillation process enabled the creation of the large-scale Biomed-Enriched dataset while maintaining annotation consistency. ## Model Outputs The model predicts the following outputs: ### Domain (Classification) - `Clinical` - `Biomedical` - `Other` ### Document Type (Classification) - `Clinical Case` - `Study` - `Review` - `Other` ### Educational Quality (Regression) - A regression score from `1` (low quality) to `5` (high quality).