med_ner_SDDCS
SDDCS - abbreviation for ner-entities SYMPTOMS, DISEASES, DRUGS, CITIES, SUBWAY STATIONS (additionall it is able to predict GENDER and AGE entities) This is a fine-tuned Named Entity Recognition (NER) model based on the google-bert/bert-base-multilingual-uncased model, designed to detect russian medical entities like diseases, drugs, symptoms, and more.
Model Details
- Model Name: med_ner_SDDCS
- Base Model: Babelscape/wikineural-multilingual-ner
- Fine-tuned on: Medical NER data
Entities Recognized:
- GENDER (e.g., женщина, мужчина)
- DISEASE (e.g., паническое расстройство, грипп, ...)
- SYMPTOM (e.g., тревога, одышка, ...)
- SPECIALITY (e.g., невролог, кардиолог, ...)
- CITY (e.g., Тула, Москва, Иркутск, ...)
- SUBWAY (e.g., Шоссе Энтузиастов, Проспект Мира, ...)
- DRUG (e.g., кардиомагнил, ципралекс)
- AGE (e.g., ребенок, пожилой)
Model Performance
The fine-tuned model has achieved the following performance metrics:
precision recall f1-score support
AGE 0.99 1.00 0.99 706
CITY 0.99 1.00 1.00 2370
DISEASE 0.99 1.00 0.99 4841
DRUG 0.99 1.00 0.99 4546
GENDER 0.99 1.00 1.00 476
SPECIALITY 0.98 0.96 0.97 3673
SUBWAY 1.00 1.00 1.00 658
SYMPTOM 0.99 0.99 0.99 8022
micro avg 0.99 0.99 0.99 25292
macro avg 0.99 0.99 0.99 25292
weighted avg 0.99 0.99 0.99 25292
How to Use
You can use this model with the transformers library to perform Named Entity Recognition (NER) tasks in the russian medical domain, mainly for patient queries. Here's how to load and use the model:
Load the tokenizer and model
from transformers import pipeline
pipe = pipeline(task="ner", model='Mykes/med_ner_SDDCS', tokenizer='Mykes/med_ner_SDDCS', aggregation_strategy="max")
# I made the misspelled words on purpose
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи хорошего психотервта в районе метро Октбрьской."
pipe(query.lower())
Result:
[{'entity_group': 'AGE',
'score': 0.9992663,
'word': 'ребенка',
'start': 2,
'end': 9},
{'entity_group': 'SYMPTOM',
'score': 0.9997758,
'word': 'треога',
'start': 10,
'end': 16},
{'entity_group': 'SYMPTOM',
'score': 0.9997876,
'word': 'норушения сна',
'start': 19,
'end': 32},
{'entity_group': 'SYMPTOM',
'score': 0.999773,
'word': 'потеря сознания',
'start': 34,
'end': 49},
{'entity_group': 'DISEASE',
'score': 0.9996424,
'word': 'паническое расстройство',
'start': 66,
'end': 89},
{'entity_group': 'SUBWAY',
'score': 0.99918646,
'word': 'октбрьской',
'start': 136,
'end': 146}]
Code for visualization
import spacy
from spacy import displacy
def convert_to_displacy_format(text, ner_results):
entities = []
for result in ner_results:
# Convert the Hugging Face output into the format displacy expects
entities.append({
"start": result['start'],
"end": result['end'],
"label": result['entity_group']
})
return {
"text": text,
"ents": entities,
"title": None
}
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи хорошего психиатра в районе метро Октбрьской."
ner_results = pipe(query.lower())
displacy_data = convert_to_displacy_format(query, ner_results)
colors = {
"SPECIALITY": "linear-gradient(90deg, #aa9cfc, #fc9ce7)",
"CITY": "linear-gradient(90deg, #feca57, #ff9f43)",
"DRUG": "linear-gradient(90deg, #55efc4, #81ecec)",
"DISEASE": "linear-gradient(90deg, #fab1a0, #ff7675)",
"SUBWAY": "linear-gradient(90deg, #00add0, #0039a6)",
"AGE": "linear-gradient(90deg, #f39c12, #e67e22)",
"SYMPTOM": "linear-gradient(90deg, #e74c3c, #c0392b)"
}
options = {"ents": ["SPECIALITY", "CITY", "DRUG", "DISEASE", "SYMPTOM", "AGE", "SUBWAY"], "colors": colors}
html = displacy.render(displacy_data, style="ent", manual=True, options=options, jupyter=False)
with open("ner_visualization_with_colors.html", "w", encoding="utf-8") as f:
f.write(html)
from IPython.display import display, HTML
display(HTML(html))
- Downloads last month
- 17
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for Mykes/med_ner_SDDCS
Base model
google-bert/bert-base-multilingual-uncased