BioGPT for ICD-10 Medical Code Classification

This model is a fine-tuned version of microsoft/biogpt specifically designed for automated ICD-10 medical code classification from clinical discharge summaries. The model incorporates advanced attention mechanisms and architectural enhancements for medical text understanding.

Model Details

Model Description

This model extends the BioGPT architecture with several medical-specific enhancements including cross-attention between clinical text and ICD code descriptions, hierarchical attention for understanding medical taxonomy, and enhanced classification heads for multi-label prediction.

  • Developed by: Medhat Ramadan.
  • Shared by [optional]: Medhat Ramadan.
  • Model type: Multi-label Text Classification (Medical)
  • Language(s) (NLP): English (Clinical Text)
  • License: MIT
  • Finetuned from model [optional]: microsoft/biogpt

Model Sources [optional]

  • Repository: https://huggingface.co/Medhatvv/biogpt_icd10_enhanced

    Direct Use

    This model can be used directly for automated ICD-10 code prediction from clinical discharge summaries. It processes medical text and outputs probability scores for 50 most frequent ICD-10 codes. Intended for research, educational purposes, and as a supportive tool for medical coding professionals.

    Downstream Use [optional]

    The model can be fine-tuned for other medical classification tasks, integrated into clinical decision support systems, or used as a component in larger healthcare AI pipelines. It may also serve as a starting point for domain-specific medical coding applications.

    Out-of-Scope Use

    This model should NOT be used as the sole basis for medical billing, clinical decision-making, or patient care. It is not intended to replace professional medical coders or clinical judgment. The model should not be used on non-English text or non-clinical documents.

    Bias, Risks, and Limitations

    The model may exhibit biases present in the MIMIC-IV training dataset, including demographic, institutional, or temporal biases. It is limited to 50 most frequent ICD-10 codes and optimized specifically for discharge summaries. Performance may degrade on other clinical note types or different patient populations.

    Recommendations

    Users should validate model predictions with professional medical coding expertise. Regular evaluation for bias across different patient demographics is recommended. The model should be used as a supportive tool only, with human oversight for all clinical and billing decisions. Ensure proper data anonymization before processing patient information.

    How to Get Started with the Model

    Use the code below to get started with the model.

    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    import torch
    
    # Load model and tokenizer
    model_name = "Medhatvv/biogpt_icd10_enhanced"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    
    # Example discharge summary
    text = """
    CHIEF COMPLAINT: Chest pain and shortness of breath.
    HISTORY: 65-year-old male with hypertension and diabetes presents with acute chest pain...
    """
    
    # Predict ICD codes
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024)
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.sigmoid(outputs.logits)
    
    # Get codes above threshold
    threshold = 0.40
    predicted_codes = []
    for i, score in enumerate(predictions[0]):
        if score > threshold:
            predicted_codes.append((i, score.item()))
    

    Training Details

    Training Data

    The model was trained on MIMIC-IV discharge summaries with expert ICD-10 annotations. The dataset included 95,537 documents from 53,156 unique patients after filtering for the top 50 most frequent ICD codes. Average document length was 1,420 words with 5.43 codes per document on average.

    Training Procedure

    Preprocessing [optional]

    Text was chunked into 1024-token segments with 124-token overlap. Documents were split at the patient level to prevent data leakage. ICD code embeddings were initialized and made learnable during training.

    Training Hyperparameters

    • Training regime: Mixed precision (fp16)
    • Learning rate: 1e-5 with cosine annealing warm restarts
    • Batch size: 10 per GPU, effective batch size 80 with gradient accumulation
    • Optimizer: AdamW with weight decay 0.01
    • Epochs: 31
    • Dropout: 0.2
    • Gradient clipping: 1.0
    • Early stopping patience: 30 epochs

    Speeds, Sizes, Times [optional]

    • Training time: ~12 hours on 8x RTX 5070 GPUs
    • Model size: 1.6B+ parameters
    • Memory usage: ~45GB GPU memory during training
    • Checkpoint size: ~3.1GB

    Evaluation

    Testing Data, Factors & Metrics

    Testing Data

    Evaluation performed on held-out test set from MIMIC-IV with document-level splitting to ensure no patient overlap between train/test sets.

    Factors

    Evaluation considered performance across different ICD code categories, document lengths, and patient demographics where available.

    Metrics

    Standard multi-label classification metrics including F1-micro, F1-macro, precision, recall, and Hamming loss. These metrics are appropriate for medical coding where multiple codes per document are expected.

    Results

    Performance metrics on MIMIC-IV test set:

    • F1-Score (Micro): 74.27%
    • F1-Score (Macro): 67.91
    • Precision (Micro): 74.5%
    • Recall (Micro): 73.52%
    • Hamming Loss: 0.0547

    Summary

    The model achieves competitive performance on ICD-10 classification compared to other medical NLP models, with particular strength in handling long clinical documents through its enhanced attention mechanisms.

    Model Examination [optional]

    The model includes attention visualization capabilities showing which text segments contribute most to specific ICD code predictions. Cross-attention mechanisms provide interpretable mappings between clinical text and medical codes.

    Environmental Impact

    Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

    • Hardware Type: 8x RTX 5070 GPUs
    • Hours used: ~12 hours
    • Carbon Emitted: [Estimated based on regional energy mix]

    Technical Specifications [optional]

    Model Architecture and Objective

    Enhanced BioGPT with cross-attention between text and ICD embeddings, hierarchical attention for medical taxonomy understanding, attention-based pooling, and ensemble classification heads. Objective is multi-label classification with BCEWithLogitsLoss.

    Compute Infrastructure

    Hardware

    8x RTX 5070 GPUs with distributed data parallel training.

    Software

    PyTorch 2.0, HuggingFace Transformers, CUDA 12.8, mixed precision training with automatic mixed precision.

    Citation [optional]

    BibTeX:

    @misc{biogpt-icd10-enhanced-2024,
      title={BioGPT for ICD-10 Medical Code Classification: Enhanced Architecture with Cross-Attention and Hierarchical Learning},
      author={Medhat Ramadan.},
      year={2024},
      howpublished={HuggingFace Model Hub},
      url={https://huggingface.co/Medhatvv/biogpt_icd10_enhanced},
      note={Fine-tuned on MIMIC-IV discharge summaries for automated medical coding}
    }
    

    APA:

    Medhat Ramadan. (2024). BioGPT for ICD-10 Medical Code Classification: Enhanced Architecture with Cross-Attention and Hierarchical Learning. HuggingFace Model Hub. https://huggingface.co/Medhatvv/biogpt_icd10_enhanced

    Glossary [optional]

    • ICD-10: International Classification of Diseases, 10th Revision - standardized medical coding system
    • Discharge Summary: Clinical document summarizing patient's hospital stay and treatment
    • Cross-Attention: Attention mechanism between different input modalities (text and ICD codes)
    • MIMIC-IV: Medical Information Mart for Intensive Care IV - clinical database

    More Information [optional]

    For detailed usage examples, advanced configuration options, and integration guides, see the model repository documentation.

    Model Card Authors [optional]

    Medhat Ramadan.

    Model Card Contact

    For questions or issues, please contact through the HuggingFace model repository or open an issue in the associated GitHub repository.

Downloads last month
31
Safetensors
Model size
425M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Medhatvv/biogpt_icd10_enhanced

Base model

microsoft/biogpt
Finetuned
(62)
this model