ICB-UMA
/

ClinLinker

@@ -1,50 +1,70 @@
 ---
-license: mit
-language: es
 tags:
-  - biomedical
-  - spanish
-  - entity-linking
-  - sapbert
-  - bi-encoder
-  - umls
-  - clinical
 ---
-# ClinLinker
-**ClinLinker** is a Spanish biomedical bi-encoder trained following the SapBERT approach using only concepts from the Spanish UMLS. This model is designed for medical entity linking in clinical texts written in Spanish.
-## 🧠 Training Details
-- Base model: `PlanTL-GOB-ES/roberta-base-biomedical-clinical-es`
-- Data: UMLS Spanish concepts
-- Strategy: No hierarchical knowledge, only direct synonym pairs (term ↔ CUI)
-## 📚 Citation
-> Gallego, F., López-García, G., Gasco-Sánchez, L., Krallinger, M., Veredas, F.J. (2024). ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_19
-## 💡 Recommended Usage
-We recommend using this model together with:
-- [Faiss](https://github.com/facebookresearch/faiss) for similarity search
-- Or the `FaissEncoder` utility available at [ICB-UMA/KnowledgeGraph](https://github.com/ICB-UMA/KnowledgeGraph)
-## 🧪 Example: Encoding a Spanish Mention
 ```python
-from transformers import AutoTokenizer, AutoModel
 import torch
-tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker")
 model = AutoModel.from_pretrained("ICB-UMA/ClinLinker")
 mention = "insuficiencia renal aguda"
-inputs = tokenizer(mention, return_tensors="pt", padding=True, truncation=True)
 with torch.no_grad():
     outputs = model(**inputs)
-embedding = outputs.last_hidden_state[:, 0, :]  # CLS token
-print(embedding.shape)

 ---
+license: apache-2.0
+language:
+- es
+base_model:
+- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
 tags:
+- medical
+- spanish
+- bi-encoder
+- entity-linking
+- sapbert
+- umls
+- snomed-ct
 ---
+# **ClinLinker**
+## Model Description
+ClinLinker is a state-of-the-art bi-encoder model for medical entity linking (MEL) in Spanish, optimized for clinical domain tasks. It enriches concept representations by incorporating synonyms from the UMLS and SNOMED-CT ontologies. The model was trained with a contrastive-learning strategy using hard negative mining and multi-similarity loss.
+## 💡 Intended Use
+- **Domain**: Spanish Clinical NLP
+- **Tasks**: Entity linking (diseases, symptoms, procedures) to SNOMED-CT
+- **Evaluated On**: DisTEMIST, MedProcNER, SympTEMIST
+- **Users**: Researchers and practitioners working in clinical NLP
+## Performance Summary (Top-25 Accuracy)
+| Model               | DisTEMIST | MedProcNER | SympTEMIST |
+|--------------------|-----------|------------|------------|
+| **ClinLinker**     | **0.845** | **0.898**  | **0.909**  |
+| ClinLinker-KB-P    | 0.853     | 0.891      | 0.918      |
+| ClinLinker-KB-GP   | 0.864     | 0.901      | 0.922      |
+| SapBERT-XLM-R-large| 0.800     | 0.850      | 0.827      |
+| RoBERTa biomedical | 0.600     | 0.668      | 0.609      |
+*Results correspond to the cleaned gold-standard version (no "NO CODE" or "COMPOSITE").*
+## 🧪 Usage
 ```python
+from transformers import AutoModel, AutoTokenizer
 import torch
 model = AutoModel.from_pretrained("ICB-UMA/ClinLinker")
+tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker")
 mention = "insuficiencia renal aguda"
+inputs = tokenizer(mention, return_tensors="pt")
 with torch.no_grad():
     outputs = model(**inputs)
+embedding = outputs.last_hidden_state[:, 0, :]
+print(embedding.shape)
+```
+For scalable retrieval, use [Faiss](https://github.com/facebookresearch/faiss) or the [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) class.
+## Limitations
+- The model is optimized for Spanish clinical data and may underperform outside this domain.
+- Expert validation is advised in critical applications.
+## 📚 Citation
+> Gallego, F., López-García, G., Gasco-Sánchez, L., Krallinger, M., Veredas, F.J. (2024). ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_19
+## Authors
+Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J Veredas