We used GPT4.1-nano to classify generic texts from OSCAR as non-medical/medical using PubScience. We labeled 400.000 texts, with about 40.000 labeled as positive. We then trained a SequenceClassifier on 80.000 samples with a 50/50 class ratio.

This can be used e.g. to approximately identify medical texts in general corpora.

Downloads last month: 6

Safetensors

Model size

124M params

Tensor type

F32

Inference Providers NEW

Text Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UMCU/DutchMedicalTextDetector_v1_HEADONLY

Base model

DTAI-KULeuven/robbert-2023-dutch-base

Finetuned

(10)

this model

UMCU
/

DutchMedicalTextDetector_v1_HEADONLY

Model tree for UMCU/DutchMedicalTextDetector_v1_HEADONLY

Dataset used to train UMCU/DutchMedicalTextDetector_v1_HEADONLY