--- base_model: - google-bert/bert-base-multilingual-uncased datasets: - Mykes/patient_queries_ner_SDDCS language: - ru library_name: transformers pipeline_tag: token-classification tags: - biology - medical --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63565a3d58acee56a457f799/MqL0twwg_1DWKN7taDvQy.jpeg) # med_ner_SDDCS SDDCS - abbreviation for ner-entities SYMPTOMS, DISEASES, DRUGS, CITIES, SUBWAY STATIONS (additionall it is able to predict GENDER and AGE entities) This is a fine-tuned Named Entity Recognition (NER) model based on the [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased) model, designed to detect russian medical entities like diseases, drugs, symptoms, and more. # Model Details - Model Name: med_ner_SDDCS - Base Model: Babelscape/wikineural-multilingual-ner - Fine-tuned on: Medical NER data ## Entities Recognized: - GENDER (e.g., женщина, мужчина) - DISEASE (e.g., паническое расстройство, грипп, ...) - SYMPTOM (e.g., тревога, одышка, ...) - SPECIALITY (e.g., невролог, кардиолог, ...) - CITY (e.g., Тула, Москва, Иркутск, ...) - SUBWAY (e.g., Шоссе Энтузиастов, Проспект Мира, ...) - DRUG (e.g., кардиомагнил, ципралекс) - AGE (e.g., ребенок, пожилой) ## Model Performance The fine-tuned model has achieved the following performance metrics: ``` precision recall f1-score support AGE 0.99 1.00 0.99 706 CITY 0.99 1.00 1.00 2370 DISEASE 0.99 1.00 0.99 4841 DRUG 0.99 1.00 0.99 4546 GENDER 0.99 1.00 1.00 476 SPECIALITY 0.98 0.96 0.97 3673 SUBWAY 1.00 1.00 1.00 658 SYMPTOM 0.99 0.99 0.99 8022 micro avg 0.99 0.99 0.99 25292 macro avg 0.99 0.99 0.99 25292 weighted avg 0.99 0.99 0.99 25292 How to Use ``` You can use this model with the transformers library to perform Named Entity Recognition (NER) tasks in the russian medical domain, mainly for patient queries. Here's how to load and use the model: # Load the tokenizer and model ``` from transformers import pipeline pipe = pipeline(task="ner", model='Mykes/med_ner_SDDCS', tokenizer='Mykes/med_ner_SDDCS', aggregation_strategy="max") # I made the misspelled words on purpose query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи хорошего психотервта в районе метро Октбрьской." pipe(query.lower()) ``` Result: ``` [{'entity_group': 'AGE', 'score': 0.9992663, 'word': 'ребенка', 'start': 2, 'end': 9}, {'entity_group': 'SYMPTOM', 'score': 0.9997758, 'word': 'треога', 'start': 10, 'end': 16}, {'entity_group': 'SYMPTOM', 'score': 0.9997876, 'word': 'норушения сна', 'start': 19, 'end': 32}, {'entity_group': 'SYMPTOM', 'score': 0.999773, 'word': 'потеря сознания', 'start': 34, 'end': 49}, {'entity_group': 'DISEASE', 'score': 0.9996424, 'word': 'паническое расстройство', 'start': 66, 'end': 89}, {'entity_group': 'SUBWAY', 'score': 0.99918646, 'word': 'октбрьской', 'start': 136, 'end': 146}] ``` ## Code for visualization ``` import spacy from spacy import displacy def convert_to_displacy_format(text, ner_results): entities = [] for result in ner_results: # Convert the Hugging Face output into the format displacy expects entities.append({ "start": result['start'], "end": result['end'], "label": result['entity_group'] }) return { "text": text, "ents": entities, "title": None } query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи хорошего психиатра в районе метро Октбрьской." ner_results = pipe(query.lower()) displacy_data = convert_to_displacy_format(query, ner_results) colors = { "SPECIALITY": "linear-gradient(90deg, #aa9cfc, #fc9ce7)", "CITY": "linear-gradient(90deg, #feca57, #ff9f43)", "DRUG": "linear-gradient(90deg, #55efc4, #81ecec)", "DISEASE": "linear-gradient(90deg, #fab1a0, #ff7675)", "SUBWAY": "linear-gradient(90deg, #00add0, #0039a6)", "AGE": "linear-gradient(90deg, #f39c12, #e67e22)", "SYMPTOM": "linear-gradient(90deg, #e74c3c, #c0392b)" } options = {"ents": ["SPECIALITY", "CITY", "DRUG", "DISEASE", "SYMPTOM", "AGE", "SUBWAY"], "colors": colors} html = displacy.render(displacy_data, style="ent", manual=True, options=options, jupyter=False) with open("ner_visualization_with_colors.html", "w", encoding="utf-8") as f: f.write(html) from IPython.display import display, HTML display(HTML(html)) ```