BioGPT for ICD-10 Medical Code Classification

This model is a fine-tuned version of microsoft/biogpt specifically designed for automated ICD-10 medical code classification from clinical discharge summaries. The model incorporates advanced attention mechanisms and architectural enhancements for medical text understanding.

Model Details

Model Description

This model extends the BioGPT architecture with several medical-specific enhancements including cross-attention between clinical text and ICD code descriptions, hierarchical attention for understanding medical taxonomy, and enhanced classification heads for multi-label prediction.

Developed by: Medhat Ramadan.
Shared by [optional]: Medhat Ramadan.
Model type: Multi-label Text Classification (Medical)
Language(s) (NLP): English (Clinical Text)
License: MIT
Finetuned from model [optional]: microsoft/biogpt

Model Sources [optional]

Repository: https://huggingface.co/Medhatvv/biogpt_icd10_enhanced
Direct Use

This model can be used directly for automated ICD-10 code prediction from clinical discharge summaries. It processes medical text and outputs probability scores for 50 most frequent ICD-10 codes. Intended for research, educational purposes, and as a supportive tool for medical coding professionals.

Downstream Use [optional]

The model can be fine-tuned for other medical classification tasks, integrated into clinical decision support systems, or used as a component in larger healthcare AI pipelines. It may also serve as a starting point for domain-specific medical coding applications.

Out-of-Scope Use

This model should NOT be used as the sole basis for medical billing, clinical decision-making, or patient care. It is not intended to replace professional medical coders or clinical judgment. The model should not be used on non-English text or non-clinical documents.

Bias, Risks, and Limitations

The model may exhibit biases present in the MIMIC-IV training dataset, including demographic, institutional, or temporal biases. It is limited to 50 most frequent ICD-10 codes and optimized specifically for discharge summaries. Performance may degrade on other clinical note types or different patient populations.

Recommendations

Users should validate model predictions with professional medical coding expertise. Regular evaluation for bias across different patient demographics is recommended. The model should be used as a supportive tool only, with human oversight for all clinical and billing decisions. Ensure proper data anonymization before processing patient information.

How to Get Started with the Model

Use the code below to get started with the model.
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "Medhatvv/biogpt_icd10_enhanced"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example discharge summary
text = """
CHIEF COMPLAINT: Chest pain and shortness of breath.
HISTORY: 65-year-old male with hypertension and diabetes presents with acute chest pain...
"""

# Predict ICD codes
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Get codes above threshold
threshold = 0.40
predicted_codes = []
for i, score in enumerate(predictions[0]):
    if score > threshold:
        predicted_codes.append((i, score.item()))
```
Training Details

Training Data

The model was trained on MIMIC-IV discharge summaries with expert ICD-10 annotations. The dataset included 95,537 documents from 53,156 unique patients after filtering for the top 50 most frequent ICD codes. Average document length was 1,420 words with 5.43 codes per document on average.

Training Procedure

Preprocessing [optional]

Text was chunked into 1024-token segments with 124-token overlap. Documents were split at the patient level to prevent data leakage. ICD code embeddings were initialized and made learnable during training.

Training Hyperparameters
- Training regime: Mixed precision (fp16)
- Learning rate: 1e-5 with cosine annealing warm restarts
- Batch size: 10 per GPU, effective batch size 80 with gradient accumulation
- Optimizer: AdamW with weight decay 0.01
- Epochs: 31
- Dropout: 0.2
- Gradient clipping: 1.0
- Early stopping patience: 30 epochs
Speeds, Sizes, Times [optional]
- Training time: ~12 hours on 8x RTX 5070 GPUs
- Model size: 1.6B+ parameters
- Memory usage: ~45GB GPU memory during training
- Checkpoint size: ~3.1GB
Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation performed on held-out test set from MIMIC-IV with document-level splitting to ensure no patient overlap between train/test sets.

Factors

Evaluation considered performance across different ICD code categories, document lengths, and patient demographics where available.

Metrics

Standard multi-label classification metrics including F1-micro, F1-macro, precision, recall, and Hamming loss. These metrics are appropriate for medical coding where multiple codes per document are expected.

Results

Performance metrics on MIMIC-IV test set:
- F1-Score (Micro): 74.27%
- F1-Score (Macro): 67.91
- Precision (Micro): 74.5%
- Recall (Micro): 73.52%
- Hamming Loss: 0.0547
Summary

The model achieves competitive performance on ICD-10 classification compared to other medical NLP models, with particular strength in handling long clinical documents through its enhanced attention mechanisms.

Model Examination [optional]

The model includes attention visualization capabilities showing which text segments contribute most to specific ICD code predictions. Cross-attention mechanisms provide interpretable mappings between clinical text and medical codes.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: 8x RTX 5070 GPUs
- Hours used: ~12 hours
- Carbon Emitted: [Estimated based on regional energy mix]
Technical Specifications [optional]

Model Architecture and Objective

Enhanced BioGPT with cross-attention between text and ICD embeddings, hierarchical attention for medical taxonomy understanding, attention-based pooling, and ensemble classification heads. Objective is multi-label classification with BCEWithLogitsLoss.

Compute Infrastructure

Hardware

8x RTX 5070 GPUs with distributed data parallel training.

Software

PyTorch 2.0, HuggingFace Transformers, CUDA 12.8, mixed precision training with automatic mixed precision.

Citation [optional]

BibTeX:
```
@misc{biogpt-icd10-enhanced-2024,
  title={BioGPT for ICD-10 Medical Code Classification: Enhanced Architecture with Cross-Attention and Hierarchical Learning},
  author={Medhat Ramadan.},
  year={2024},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/Medhatvv/biogpt_icd10_enhanced},
  note={Fine-tuned on MIMIC-IV discharge summaries for automated medical coding}
}
```
APA:

Medhat Ramadan. (2024). BioGPT for ICD-10 Medical Code Classification: Enhanced Architecture with Cross-Attention and Hierarchical Learning. HuggingFace Model Hub. https://huggingface.co/Medhatvv/biogpt_icd10_enhanced

Glossary [optional]
- ICD-10: International Classification of Diseases, 10th Revision - standardized medical coding system
- Discharge Summary: Clinical document summarizing patient's hospital stay and treatment
- Cross-Attention: Attention mechanism between different input modalities (text and ICD codes)
- MIMIC-IV: Medical Information Mart for Intensive Care IV - clinical database
More Information [optional]

For detailed usage examples, advanced configuration options, and integration guides, see the model repository documentation.

Model Card Authors [optional]

Medhat Ramadan.

Model Card Contact

For questions or issues, please contact through the HuggingFace model repository or open an issue in the associated GitHub repository.

Medhatvv
/

biogpt_icd10_enhanced

BioGPT for ICD-10 Medical Code Classification

Model Details

Model Description

Model Sources [optional]

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination [optional]

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Model tree for Medhatvv/biogpt_icd10_enhanced