almanach
/

Biomed-Enriched-classifier

Model card Files Files and versions Community

Biomed-Enriched-classifier / README.md

rntc's picture

Update README.md

9c384e0 verified 3 months ago

|

history blame contribute delete

1.36 kB

	---
	library_name: transformers
	license: mit
	base_model:
	- FacebookAI/xlm-roberta-base
	---

	# Biomedical-Enriched Classifier

	This is the model used to create the [Biomed-Enriched](https://huggingface.co/datasets/almanach/Biomed-Enriched) dataset.
	## Model Details

	- Base Model: `xlm-roberta-base`
	- Model Type: Multi-task model combining multi-label classification and regression.
	- Description: This model was fine-tuned to classify paragraphs from biomedical texts for their domain and document type, and predict an educational quality score via regression.

	## Training

	The model was trained on a set of 400,000 paragraphs from PubMed Central, which were annotated by the [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) model.

	## Purpose

	This classifier was created to scale the initial high-quality annotations to the entire PubMed Open Access dataset. This distillation process enabled the creation of the large-scale Biomed-Enriched dataset while maintaining annotation consistency.

	## Model Outputs

	The model predicts the following outputs:

	### Domain (Classification)
	- `Clinical`
	- `Biomedical`
	- `Other`

	### Document Type (Classification)
	- `Clinical Case`
	- `Study`
	- `Review`
	- `Other`

	### Educational Quality (Regression)
	- A regression score from `1` (low quality) to `5` (high quality).