sshan95
/

medicoder-ai-v4-model

+---
+license: mit
+base_model: emilyalsentzer/Bio_ClinicalBERT
+tags:
+- medical
+- healthcare
+- clinical-notes
+- medical-coding
+- few-shot-learning
+- prototypical-networks
+language:
+- en
+metrics:
+- accuracy
+library_name: transformers
+pipeline_tag: text-classification
+---
+# MediCoder AI v4 🏥
+## Model Description
+MediCoder AI v4 is a state-of-the-art medical coding system that predicts ICD/medical codes from clinical notes with **46.3% Top-1 accuracy**. Built on Bio_ClinicalBERT with few-shot prototypical learning, it can handle ~57,000 medical codes.
+## 🎯 Performance
+- **Top-1 Accuracy**: 46.3%
+- **Top-3 Accuracy**: ~52%
+- **Top-5 Accuracy**: ~54%
+- **Improvement**: +6.8 percentage points over baseline
+- **Medical Codes**: ~57,000 supported codes
+## 🏗️ Architecture
+- **Base Model**: Bio_ClinicalBERT (specialized for medical text)
+- **Approach**: Few-shot Prototypical Networks
+- **Embedding Dimension**: 768
+- **Optimization**: Conservative incremental improvements (Phase 2)
+## 🚀 Usage
+```python
+import torch
+from transformers import AutoTokenizer
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("your-username/medicoder-ai-v4-model")
+model = torch.load("pytorch_model.bin", map_location="cpu")
+# Example usage
+clinical_note = "Patient presents with chest pain and shortness of breath..."
+# Tokenize
+inputs = tokenizer(clinical_note, return_tensors="pt",
+                  truncation=True, max_length=512)
+# Get predictions (top-5 medical codes)
+with torch.no_grad():
+    embeddings = model.encode_text(inputs['input_ids'], inputs['attention_mask'])
+    similarities = torch.mm(embeddings, model.prototypes.t())
+    top_codes = similarities.topk(5).indices
+print("Top 5 predicted medical codes:", top_codes)
+```
+## 📊 Training Details
+- **Training Data**: Medical clinical notes with associated codes
+- **Training Approach**: Few-shot learning with prototypical networks
+- **Optimization Strategy**: Conservative incremental improvements
+- **Phases**:
+  - Phase 1: Enhanced embeddings and pooling (+5.7pp)
+  - Phase 2: Ensemble prototypes with attention (+1.1pp)
+## 🎯 Use Cases
+- **Medical Coding Assistance**: Help medical coders find relevant codes
+- **Clinical Decision Support**: Suggest appropriate diagnostic codes
+- **Healthcare Analytics**: Automated coding for large datasets
+- **Research**: Medical text analysis and categorization
+## ⚠️ Limitations
+- Designed for English clinical text
+- Performance varies by medical specialty
+- Requires domain expertise for validation
+- Not a replacement for professional medical coding
+## 📋 Model Details
+- **Model Size**: ~670 MB
+- **Inference Speed**: 3-8 seconds (CPU), <1 second (GPU)
+- **Memory Requirements**: ~2-3 GB during inference
+- **Self-contained**: No external dataset dependencies
+## 🔬 Technical Details
+- **Few-shot Learning**: Learns from limited examples per medical code
+- **Prototypical Networks**: Creates representative embeddings for each code
+- **Ensemble Prototypes**: Multiple prototypes per code for better coverage
+- **Attention Aggregation**: Smart combination of multiple examples
+## 📈 Evaluation
+Evaluated on held-out medical coding dataset with standard metrics:
+- Precision, Recall, F1-score
+- Top-K accuracy (K=1,3,5,10,20)
+- Comparison with baseline methods
+## 🏥 Real-world Impact
+This model helps medical professionals by:
+- Reducing coding time from hours to minutes
+- Improving coding accuracy and consistency
+- Narrowing 57,000+ codes to top suggestions
+- Supporting healthcare workflow automation
+## 📜 Citation
+If you use this model, please cite:
+```
+@misc{medicoder-ai-v4,
+  title={MediCoder AI v4: Few-shot Medical Coding with Prototypical Networks},
+  author={Your Name},
+  year={2025},
+  url={https://huggingface.co/your-username/medicoder-ai-v4-model}
+}
+```
+## 📞 Contact
+For questions or collaborations, please reach out via the model repository issues.
+---
+*Built with ❤️ for the medical community*