Biomedical-Enriched Classifier

This is the model used to create the Biomed-Enriched dataset.

Model Details

  • Base Model: xlm-roberta-base
  • Model Type: Multi-task model combining multi-label classification and regression.
  • Description: This model was fine-tuned to classify paragraphs from biomedical texts for their domain and document type, and predict an educational quality score via regression.

Training

The model was trained on a set of 400,000 paragraphs from PubMed Central, which were annotated by the Llama 3.1 70B Instruct model.

Purpose

This classifier was created to scale the initial high-quality annotations to the entire PubMed Open Access dataset. This distillation process enabled the creation of the large-scale Biomed-Enriched dataset while maintaining annotation consistency.

Model Outputs

The model predicts the following outputs:

Domain (Classification)

  • Clinical
  • Biomedical
  • Other

Document Type (Classification)

  • Clinical Case
  • Study
  • Review
  • Other

Educational Quality (Regression)

  • A regression score from 1 (low quality) to 5 (high quality).
Downloads last month
42
Safetensors
Model size
109M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for almanach/Biomed-Enriched-classifier

Finetuned
(3196)
this model

Collection including almanach/Biomed-Enriched-classifier