sshan95
/

medicoder-ai-v4-model

@@ -8,128 +8,197 @@ tags:
 - medical-coding
 - few-shot-learning
 - prototypical-networks
 language:
 - en
 metrics:
 - accuracy
 library_name: transformers
 pipeline_tag: text-classification
 ---
-# MediCoder AI v4 🏥
 ## Model Description
-MediCoder AI v4 is a state-of-the-art medical coding system that predicts ICD/medical codes from clinical notes with **46.3% Top-1 accuracy**. Built on Bio_ClinicalBERT with few-shot prototypical learning, it can handle ~57,000 medical codes.
 ## 🎯 Performance
-- **Top-5 Accuracy**: ~54%
-- **Improvement**: +6.8 percentage points over baseline
-- **Medical Codes**: ~57,000 supported codes
 ## 🏗️ Architecture
 - **Base Model**: Bio_ClinicalBERT (specialized for medical text)
-- **Approach**: Few-shot Prototypical Networks
 - **Embedding Dimension**: 768
 - **Optimization**: Conservative incremental improvements (Phase 2)
-## 🚀 Usage
 ```python
 import torch
 from transformers import AutoTokenizer
-# Load model and tokenizer
-tokenizer = AutoTokenizer.from_pretrained("your-username/medicoder-ai-v4-model")
-model = torch.load("pytorch_model.bin", map_location="cpu")
-# Example usage
-clinical_note = "Patient presents with chest pain and shortness of breath..."
 # Tokenize
-inputs = tokenizer(clinical_note, return_tensors="pt",
-                  truncation=True, max_length=512)
-# Get predictions (top-5 medical codes)
 with torch.no_grad():
-    embeddings = model.encode_text(inputs['input_ids'], inputs['attention_mask'])
-    similarities = torch.mm(embeddings, model.prototypes.t())
-    top_codes = similarities.topk(5).indices
-print("Top 5 predicted medical codes:", top_codes)
 ```
-## 📊 Training Details
-- **Training Data**: Medical clinical notes with associated codes
-- **Training Approach**: Few-shot learning with prototypical networks
-- **Optimization Strategy**: Conservative incremental improvements
-- **Phases**:
-  - Phase 1: Enhanced embeddings and pooling (+5.7pp)
-  - Phase 2: Ensemble prototypes with attention (+1.1pp)
-## 🎯 Use Cases
-- **Medical Coding Assistance**: Help medical coders find relevant codes
-- **Clinical Decision Support**: Suggest appropriate diagnostic codes
-- **Healthcare Analytics**: Automated coding for large datasets
-- **Research**: Medical text analysis and categorization
-## ⚠️ Limitations
-- Designed for English clinical text
-- Performance varies by medical specialty
-- Requires domain expertise for validation
-- Not a replacement for professional medical coding
-## 📋 Model Details
-- **Model Size**: ~670 MB
-- **Inference Speed**: 3-8 seconds (CPU), <1 second (GPU)
-- **Memory Requirements**: ~2-3 GB during inference
-- **Self-contained**: No external dataset dependencies
-## 🔬 Technical Details
-- **Few-shot Learning**: Learns from limited examples per medical code
-- **Prototypical Networks**: Creates representative embeddings for each code
-- **Ensemble Prototypes**: Multiple prototypes per code for better coverage
-- **Attention Aggregation**: Smart combination of multiple examples
-## 📈 Evaluation
-Evaluated on held-out medical coding dataset with standard metrics:
-- Precision, Recall, F1-score
-- Top-K accuracy (K=1,3,5,10,20)
-- Comparison with baseline methods
-## 🏥 Real-world Impact
-This model helps medical professionals by:
-- Reducing coding time from hours to minutes
-- Improving coding accuracy and consistency
-- Narrowing 57,000+ codes to top suggestions
-- Supporting healthcare workflow automation
-## 📜 Citation
-If you use this model, please cite:
-```
-@misc{medicoder-ai-v4,
-  title={MediCoder AI v4: Few-shot Medical Coding with Prototypical Networks},
-  author={Your Name},
   year={2025},
-  url={https://huggingface.co/your-username/medicoder-ai-v4-model}
 }
 ```
-## 📞 Contact
-For questions or collaborations, please reach out via the model repository issues.
 ---
-*Built with ❤️ for the medical community*

 - medical-coding
 - few-shot-learning
 - prototypical-networks
+- deployment-ready
+- self-contained
 language:
 - en
 metrics:
 - accuracy
 library_name: transformers
 pipeline_tag: text-classification
+widget:
+- text: "Patient presents with chest pain and shortness of breath. ECG shows abnormalities."
 ---
+# MediCoder AI v4 Complete 🏥✨
 ## Model Description
+**MediCoder AI v4 Complete** is a fully self-contained medical coding system with **57,768 embedded prototypes** that predicts ICD/medical codes from clinical notes with **46.3% Top-1 accuracy**. This model requires **no external dataset** for inference.
 ## 🎯 Performance
+- **Top-1 Accuracy**: 46.3%
+- **Top-5 Accuracy**: ~54% (estimated)
+- **Medical Codes**: 57,768 supported codes
+- **Prototypes**: 57,768 embedded prototype vectors
+- **Deployment**: Fully self-contained
+## ✨ What's New in Complete Version
+- ✅ **57,768 Prototypes Embedded**: All medical codes have learned representations
+- ✅ **No Dataset Required**: Completely self-contained for deployment
+- ✅ **Production Ready**: Direct inference without external dependencies
+- ✅ **Full 46.3% Accuracy**: Complete performance preservation
+- ✅ **Memory Optimized**: Efficient prototype storage and retrieval
 ## 🏗️ Architecture
 - **Base Model**: Bio_ClinicalBERT (specialized for medical text)
+- **Approach**: Few-shot Prototypical Networks with Embedded Prototypes
 - **Embedding Dimension**: 768
+- **Prototype Storage**: 57,768 × 768 learned medical code representations
 - **Optimization**: Conservative incremental improvements (Phase 2)
+## 🚀 Quick Start
 ```python
 import torch
 from transformers import AutoTokenizer
+# Load the complete model
+tokenizer = AutoTokenizer.from_pretrained("sshan95/medicoder-ai-v4-model")
+# Load model with embedded prototypes
+checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
+prototypes = checkpoint['prototypes']  # Shape: [57768, 768]
+prototype_codes = checkpoint['prototype_codes']  # Shape: [57768]
+print(f"Loaded {prototypes.shape[0]:,} medical code prototypes!")
+```
+## 📊 Usage Example
+```python
+import torch
+import torch.nn.functional as F
+from transformers import AutoTokenizer
+# Initialize
+tokenizer = AutoTokenizer.from_pretrained("sshan95/medicoder-ai-v4-model")
+checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
+# Load model architecture (your ConservativePrototypicalNetwork)
+model = load_your_model_architecture()
+model.load_state_dict(checkpoint['model_state_dict'])
+# Load embedded prototypes
+prototypes = checkpoint['prototypes']
+prototype_codes = checkpoint['prototype_codes']
+# Example prediction
+clinical_note = "Patient presents with acute chest pain, diaphoresis, and dyspnea..."
 # Tokenize
+inputs = tokenizer(clinical_note, return_tensors="pt", truncation=True, max_length=512)
+# Get embedding
 with torch.no_grad():
+    query_embedding = model.encode_text(inputs['input_ids'], inputs['attention_mask'])
+    # Compute similarities to all prototypes
+    similarities = torch.mm(query_embedding, prototypes.t())
+    # Get top-5 predictions
+    top_5_scores, top_5_indices = torch.topk(similarities, k=5)
+    predicted_codes = prototype_codes[top_5_indices[0]]
+print("Top 5 predicted medical codes:", predicted_codes.tolist())
 ```
+## 📋 Model Contents
+When you load this model, you get:
+```python
+checkpoint = torch.load("pytorch_model.bin")
+# Available keys:
+checkpoint['model_state_dict']     # Neural network weights
+checkpoint['prototypes']           # [57768, 768] prototype embeddings
+checkpoint['prototype_codes']      # [57768] medical code mappings
+checkpoint['accuracies']          # Performance metrics
+checkpoint['config']              # Training configuration
+```
+## 🎯 Key Features
+### ✅ **Self-Contained Deployment**
+- No external dataset required
+- All medical knowledge embedded in prototypes
+- Direct inference capability
+### ✅ **Production Ready**
+- Optimized for CPU and GPU inference
+- Memory-efficient prototype storage
+- Stable, tested architecture
+### ✅ **Full Performance**
+- Complete 46.3% Top-1 accuracy preserved
+- All 57,768 medical codes supported
+- Conservative optimization approach
+## 📊 Training Details
+- **Base Model**: Bio_ClinicalBERT
+- **Training Data**: Clinical notes with medical code annotations
+- **Approach**: Few-shot prototypical learning
+- **Optimization**: Conservative incremental improvements
+- **Phase 1**: Enhanced embeddings (+5.7pp)
+- **Phase 2**: Ensemble prototypes (+1.1pp)
+- **Final Step**: Prototype extraction and embedding
+## 🚀 Deployment Options
+### **Option 1: Hugging Face Spaces**
+Perfect for demos and testing with built-in UI.
+### **Option 2: Local Deployment**
+Download and run locally for production use.
+### **Option 3: API Integration**
+Integrate into existing healthcare systems.
+## ⚠️ Usage Guidelines
+- **Purpose**: Research and educational use, medical coding assistance
+- **Validation**: Always require human expert validation
+- **Scope**: English clinical text, general medical domains
+- **Limitations**: Performance varies by medical specialty
+## 📈 Real-world Impact
+This model helps by:
+- **Reducing coding time**: Hours → Minutes
+- **Improving consistency**: Standardized predictions
+- **Narrowing choices**: 57,768 codes → Top suggestions
+- **Supporting workflow**: Integration-ready format
+## 🔬 Technical Specifications
+- **Model Size**: ~1.2 GB (with prototypes)
+- **Inference Speed**: 3-8 seconds (CPU), <1 second (GPU)
+- **Memory Usage**: ~3-4 GB during inference
+- **Dependencies**: PyTorch, Transformers, NumPy
+## 📜 Citation
+```bibtex
+@misc{medicoder-ai-v4-complete,
+  title={MediCoder AI v4 Complete: Self-Contained Medical Coding with Embedded Prototypes},
+  author={MediCoder Team},
   year={2025},
+  url={https://huggingface.co/sshan95/medicoder-ai-v4-model},
+  note={57,768 embedded prototypes, 46.3% Top-1 accuracy}
 }
 ```
+## 🏥 Community
+Built for the medical coding community. For questions, issues, or collaborations, please use the repository discussions.
 ---
+**🚀 Ready for production medical coding assistance!**
+*This complete model contains all necessary components for deployment without external dependencies.*