--- license: mit base_model: emilyalsentzer/Bio_ClinicalBERT tags: - medical - healthcare - clinical-notes - medical-coding - few-shot-learning - prototypical-networks language: - en metrics: - accuracy library_name: transformers pipeline_tag: text-classification --- # MediCoder AI v4 🏥 ## Model Description MediCoder AI v4 is a state-of-the-art medical coding system that predicts ICD/medical codes from clinical notes with **46.3% Top-1 accuracy**. Built on Bio_ClinicalBERT with few-shot prototypical learning, it can handle ~57,000 medical codes. ## 🎯 Performance - **Top-1 Accuracy**: 46.3% - **Top-3 Accuracy**: ~52% - **Top-5 Accuracy**: ~54% - **Improvement**: +6.8 percentage points over baseline - **Medical Codes**: ~57,000 supported codes ## 🏗️ Architecture - **Base Model**: Bio_ClinicalBERT (specialized for medical text) - **Approach**: Few-shot Prototypical Networks - **Embedding Dimension**: 768 - **Optimization**: Conservative incremental improvements (Phase 2) ## 🚀 Usage ```python import torch from transformers import AutoTokenizer # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("your-username/medicoder-ai-v4-model") model = torch.load("pytorch_model.bin", map_location="cpu") # Example usage clinical_note = "Patient presents with chest pain and shortness of breath..." # Tokenize inputs = tokenizer(clinical_note, return_tensors="pt", truncation=True, max_length=512) # Get predictions (top-5 medical codes) with torch.no_grad(): embeddings = model.encode_text(inputs['input_ids'], inputs['attention_mask']) similarities = torch.mm(embeddings, model.prototypes.t()) top_codes = similarities.topk(5).indices print("Top 5 predicted medical codes:", top_codes) ``` ## 📊 Training Details - **Training Data**: Medical clinical notes with associated codes - **Training Approach**: Few-shot learning with prototypical networks - **Optimization Strategy**: Conservative incremental improvements - **Phases**: - Phase 1: Enhanced embeddings and pooling (+5.7pp) - Phase 2: Ensemble prototypes with attention (+1.1pp) ## 🎯 Use Cases - **Medical Coding Assistance**: Help medical coders find relevant codes - **Clinical Decision Support**: Suggest appropriate diagnostic codes - **Healthcare Analytics**: Automated coding for large datasets - **Research**: Medical text analysis and categorization ## ⚠️ Limitations - Designed for English clinical text - Performance varies by medical specialty - Requires domain expertise for validation - Not a replacement for professional medical coding ## 📋 Model Details - **Model Size**: ~670 MB - **Inference Speed**: 3-8 seconds (CPU), <1 second (GPU) - **Memory Requirements**: ~2-3 GB during inference - **Self-contained**: No external dataset dependencies ## 🔬 Technical Details - **Few-shot Learning**: Learns from limited examples per medical code - **Prototypical Networks**: Creates representative embeddings for each code - **Ensemble Prototypes**: Multiple prototypes per code for better coverage - **Attention Aggregation**: Smart combination of multiple examples ## 📈 Evaluation Evaluated on held-out medical coding dataset with standard metrics: - Precision, Recall, F1-score - Top-K accuracy (K=1,3,5,10,20) - Comparison with baseline methods ## 🏥 Real-world Impact This model helps medical professionals by: - Reducing coding time from hours to minutes - Improving coding accuracy and consistency - Narrowing 57,000+ codes to top suggestions - Supporting healthcare workflow automation ## 📜 Citation If you use this model, please cite: ``` @misc{medicoder-ai-v4, title={MediCoder AI v4: Few-shot Medical Coding with Prototypical Networks}, author={Your Name}, year={2025}, url={https://huggingface.co/your-username/medicoder-ai-v4-model} } ``` ## 📞 Contact For questions or collaborations, please reach out via the model repository issues. --- *Built with ❤️ for the medical community*