HERBERT: Leveraging UMLS Hierarchical Knowledge to Enhance Clinical Entity Normalization in Spanish

HERBERT-P is a contrastive-learning-based bi-encoder for medical entity normalization in Spanish, leveraging synonym and parent relationships from UMLS to enhance candidate retrieval for entity linking in clinical texts.

Key features:

  • Base model: PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
  • Trained with 30 positive pairs per anchor (synonyms + parents)
  • Task: Normalization of disease, procedure, and symptom mentions to SNOMED-CT/UMLS codes.
  • Domain: Spanish biomedical/clinical texts.
  • Corpora: DisTEMIST, MedProcNER, SympTEMIST.

Benchmark Results

Corpus Top-1 Top-5 Top-25 Top-200
DisTEMIST 0.588 0.723 0.803 0.867
SympTEMIST 0.635 0.784 0.882 0.946
MedProcNER 0.651 0.765 0.838 0.892
Downloads last month
7
Safetensors
Model size
126M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ICB-UMA/HERBERT-P-30

Finetuned
(10)
this model