image/jpeg

med_ner_SDDCS

SDDCS - abbreviation for ner-entities SYMPTOMS, DISEASES, DRUGS, CITIES, SUBWAY STATIONS (additionall it is able to predict GENDER and AGE entities) This is a fine-tuned Named Entity Recognition (NER) model based on the google-bert/bert-base-multilingual-uncased model, designed to detect russian medical entities like diseases, drugs, symptoms, and more.

Model Details

  • Model Name: med_ner_SDDCS
  • Base Model: Babelscape/wikineural-multilingual-ner
  • Fine-tuned on: Medical NER data

Entities Recognized:

  • GENDER (e.g., женщина, мужчина)
  • DISEASE (e.g., паническое расстройство, грипп, ...)
  • SYMPTOM (e.g., тревога, одышка, ...)
  • SPECIALITY (e.g., невролог, кардиолог, ...)
  • CITY (e.g., Тула, Москва, Иркутск, ...)
  • SUBWAY (e.g., Шоссе Энтузиастов, Проспект Мира, ...)
  • DRUG (e.g., кардиомагнил, ципралекс)
  • AGE (e.g., ребенок, пожилой)

Model Performance

The fine-tuned model has achieved the following performance metrics:

              precision    recall  f1-score   support

         AGE       0.99      1.00      0.99       706
        CITY       0.99      1.00      1.00      2370
     DISEASE       0.99      1.00      0.99      4841
        DRUG       0.99      1.00      0.99      4546
      GENDER       0.99      1.00      1.00       476
  SPECIALITY       0.98      0.96      0.97      3673
      SUBWAY       1.00      1.00      1.00       658
     SYMPTOM       0.99      0.99      0.99      8022

   micro avg       0.99      0.99      0.99     25292
   macro avg       0.99      0.99      0.99     25292
weighted avg       0.99      0.99      0.99     25292

How to Use

You can use this model with the transformers library to perform Named Entity Recognition (NER) tasks in the russian medical domain, mainly for patient queries. Here's how to load and use the model:

Load the tokenizer and model

from transformers import pipeline

pipe = pipeline(task="ner", model='Mykes/med_ner_SDDCS', tokenizer='Mykes/med_ner_SDDCS', aggregation_strategy="max")
# I made the misspelled words on purpose
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи хорошего психотервта в районе метро Октбрьской."
pipe(query.lower())

Result:

[{'entity_group': 'AGE',
  'score': 0.9992663,
  'word': 'ребенка',
  'start': 2,
  'end': 9},
 {'entity_group': 'SYMPTOM',
  'score': 0.9997758,
  'word': 'треога',
  'start': 10,
  'end': 16},
 {'entity_group': 'SYMPTOM',
  'score': 0.9997876,
  'word': 'норушения сна',
  'start': 19,
  'end': 32},
 {'entity_group': 'SYMPTOM',
  'score': 0.999773,
  'word': 'потеря сознания',
  'start': 34,
  'end': 49},
 {'entity_group': 'DISEASE',
  'score': 0.9996424,
  'word': 'паническое расстройство',
  'start': 66,
  'end': 89},
 {'entity_group': 'SUBWAY',
  'score': 0.99918646,
  'word': 'октбрьской',
  'start': 136,
  'end': 146}]

Code for visualization

import spacy
from spacy import displacy

def convert_to_displacy_format(text, ner_results):
    entities = []
    for result in ner_results:
        # Convert the Hugging Face output into the format displacy expects
        entities.append({
            "start": result['start'],
            "end": result['end'],
            "label": result['entity_group']
        })
    return {
        "text": text,
        "ents": entities,
        "title": None
    }
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи хорошего психиатра в районе метро Октбрьской."
ner_results = pipe(query.lower())
displacy_data = convert_to_displacy_format(query, ner_results)
colors = {
    "SPECIALITY": "linear-gradient(90deg, #aa9cfc, #fc9ce7)",
    "CITY": "linear-gradient(90deg, #feca57, #ff9f43)",
    "DRUG": "linear-gradient(90deg, #55efc4, #81ecec)",
    "DISEASE": "linear-gradient(90deg, #fab1a0, #ff7675)",
    "SUBWAY": "linear-gradient(90deg, #00add0, #0039a6)",
    "AGE": "linear-gradient(90deg, #f39c12, #e67e22)",
    "SYMPTOM": "linear-gradient(90deg, #e74c3c, #c0392b)"
}
options = {"ents": ["SPECIALITY", "CITY", "DRUG", "DISEASE", "SYMPTOM", "AGE", "SUBWAY"], "colors": colors}
html = displacy.render(displacy_data, style="ent", manual=True, options=options, jupyter=False)
with open("ner_visualization_with_colors.html", "w", encoding="utf-8") as f:
    f.write(html)
from IPython.display import display, HTML
display(HTML(html))
Downloads last month
17
Safetensors
Model size
167M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Mykes/med_ner_SDDCS

Finetuned
(1725)
this model

Dataset used to train Mykes/med_ner_SDDCS