|
--- |
|
license: mit |
|
base_model: emilyalsentzer/Bio_ClinicalBERT |
|
tags: |
|
- medical |
|
- healthcare |
|
- clinical-notes |
|
- medical-coding |
|
- few-shot-learning |
|
- prototypical-networks |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# MediCoder AI v4 π₯ |
|
|
|
## Model Description |
|
|
|
MediCoder AI v4 is a state-of-the-art medical coding system that predicts ICD/medical codes from clinical notes with **46.3% Top-1 accuracy**. Built on Bio_ClinicalBERT with few-shot prototypical learning, it can handle ~57,000 medical codes. |
|
|
|
## π― Performance |
|
|
|
- **Top-1 Accuracy**: 46.3% |
|
- **Top-3 Accuracy**: ~52% |
|
- **Top-5 Accuracy**: ~54% |
|
- **Improvement**: +6.8 percentage points over baseline |
|
- **Medical Codes**: ~57,000 supported codes |
|
|
|
## ποΈ Architecture |
|
|
|
- **Base Model**: Bio_ClinicalBERT (specialized for medical text) |
|
- **Approach**: Few-shot Prototypical Networks |
|
- **Embedding Dimension**: 768 |
|
- **Optimization**: Conservative incremental improvements (Phase 2) |
|
|
|
## π Usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("your-username/medicoder-ai-v4-model") |
|
model = torch.load("pytorch_model.bin", map_location="cpu") |
|
|
|
# Example usage |
|
clinical_note = "Patient presents with chest pain and shortness of breath..." |
|
|
|
# Tokenize |
|
inputs = tokenizer(clinical_note, return_tensors="pt", |
|
truncation=True, max_length=512) |
|
|
|
# Get predictions (top-5 medical codes) |
|
with torch.no_grad(): |
|
embeddings = model.encode_text(inputs['input_ids'], inputs['attention_mask']) |
|
similarities = torch.mm(embeddings, model.prototypes.t()) |
|
top_codes = similarities.topk(5).indices |
|
|
|
print("Top 5 predicted medical codes:", top_codes) |
|
``` |
|
|
|
## π Training Details |
|
|
|
- **Training Data**: Medical clinical notes with associated codes |
|
- **Training Approach**: Few-shot learning with prototypical networks |
|
- **Optimization Strategy**: Conservative incremental improvements |
|
- **Phases**: |
|
- Phase 1: Enhanced embeddings and pooling (+5.7pp) |
|
- Phase 2: Ensemble prototypes with attention (+1.1pp) |
|
|
|
## π― Use Cases |
|
|
|
- **Medical Coding Assistance**: Help medical coders find relevant codes |
|
- **Clinical Decision Support**: Suggest appropriate diagnostic codes |
|
- **Healthcare Analytics**: Automated coding for large datasets |
|
- **Research**: Medical text analysis and categorization |
|
|
|
## β οΈ Limitations |
|
|
|
- Designed for English clinical text |
|
- Performance varies by medical specialty |
|
- Requires domain expertise for validation |
|
- Not a replacement for professional medical coding |
|
|
|
## π Model Details |
|
|
|
- **Model Size**: ~670 MB |
|
- **Inference Speed**: 3-8 seconds (CPU), <1 second (GPU) |
|
- **Memory Requirements**: ~2-3 GB during inference |
|
- **Self-contained**: No external dataset dependencies |
|
|
|
## π¬ Technical Details |
|
|
|
- **Few-shot Learning**: Learns from limited examples per medical code |
|
- **Prototypical Networks**: Creates representative embeddings for each code |
|
- **Ensemble Prototypes**: Multiple prototypes per code for better coverage |
|
- **Attention Aggregation**: Smart combination of multiple examples |
|
|
|
## π Evaluation |
|
|
|
Evaluated on held-out medical coding dataset with standard metrics: |
|
- Precision, Recall, F1-score |
|
- Top-K accuracy (K=1,3,5,10,20) |
|
- Comparison with baseline methods |
|
|
|
## π₯ Real-world Impact |
|
|
|
This model helps medical professionals by: |
|
- Reducing coding time from hours to minutes |
|
- Improving coding accuracy and consistency |
|
- Narrowing 57,000+ codes to top suggestions |
|
- Supporting healthcare workflow automation |
|
|
|
## π Citation |
|
|
|
If you use this model, please cite: |
|
|
|
``` |
|
@misc{medicoder-ai-v4, |
|
title={MediCoder AI v4: Few-shot Medical Coding with Prototypical Networks}, |
|
author={Your Name}, |
|
year={2025}, |
|
url={https://huggingface.co/your-username/medicoder-ai-v4-model} |
|
} |
|
``` |
|
|
|
## π Contact |
|
|
|
For questions or collaborations, please reach out via the model repository issues. |
|
|
|
--- |
|
|
|
*Built with β€οΈ for the medical community* |
|
|