Biomed-Enriched-classifier / README.md

rntc

Update README.md

9c384e0 verified 3 months ago

preview code

raw

history blame

1.36 kB

metadata

library_name: transformers
license: mit
base_model:
  - FacebookAI/xlm-roberta-base

Biomedical-Enriched Classifier

This is the model used to create the Biomed-Enriched dataset.

Model Details

Base Model: xlm-roberta-base
Model Type: Multi-task model combining multi-label classification and regression.
Description: This model was fine-tuned to classify paragraphs from biomedical texts for their domain and document type, and predict an educational quality score via regression.

Training

The model was trained on a set of 400,000 paragraphs from PubMed Central, which were annotated by the Llama 3.1 70B Instruct model.

Purpose

This classifier was created to scale the initial high-quality annotations to the entire PubMed Open Access dataset. This distillation process enabled the creation of the large-scale Biomed-Enriched dataset while maintaining annotation consistency.

Model Outputs

The model predicts the following outputs:

Domain (Classification)

Clinical
Biomedical
Other

Document Type (Classification)

Clinical Case
Study
Review
Other

Educational Quality (Regression)

A regression score from 1 (low quality) to 5 (high quality).