sshan95 commited on
Commit
063bf99
Β·
verified Β·
1 Parent(s): 728c3fd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: emilyalsentzer/Bio_ClinicalBERT
4
+ tags:
5
+ - medical
6
+ - healthcare
7
+ - clinical-notes
8
+ - medical-coding
9
+ - few-shot-learning
10
+ - prototypical-networks
11
+ language:
12
+ - en
13
+ metrics:
14
+ - accuracy
15
+ library_name: transformers
16
+ pipeline_tag: text-classification
17
+ ---
18
+
19
+ # MediCoder AI v4 πŸ₯
20
+
21
+ ## Model Description
22
+
23
+ MediCoder AI v4 is a state-of-the-art medical coding system that predicts ICD/medical codes from clinical notes with **46.3% Top-1 accuracy**. Built on Bio_ClinicalBERT with few-shot prototypical learning, it can handle ~57,000 medical codes.
24
+
25
+ ## 🎯 Performance
26
+
27
+ - **Top-1 Accuracy**: 46.3%
28
+ - **Top-3 Accuracy**: ~52%
29
+ - **Top-5 Accuracy**: ~54%
30
+ - **Improvement**: +6.8 percentage points over baseline
31
+ - **Medical Codes**: ~57,000 supported codes
32
+
33
+ ## πŸ—οΈ Architecture
34
+
35
+ - **Base Model**: Bio_ClinicalBERT (specialized for medical text)
36
+ - **Approach**: Few-shot Prototypical Networks
37
+ - **Embedding Dimension**: 768
38
+ - **Optimization**: Conservative incremental improvements (Phase 2)
39
+
40
+ ## πŸš€ Usage
41
+
42
+ ```python
43
+ import torch
44
+ from transformers import AutoTokenizer
45
+
46
+ # Load model and tokenizer
47
+ tokenizer = AutoTokenizer.from_pretrained("your-username/medicoder-ai-v4-model")
48
+ model = torch.load("pytorch_model.bin", map_location="cpu")
49
+
50
+ # Example usage
51
+ clinical_note = "Patient presents with chest pain and shortness of breath..."
52
+
53
+ # Tokenize
54
+ inputs = tokenizer(clinical_note, return_tensors="pt",
55
+ truncation=True, max_length=512)
56
+
57
+ # Get predictions (top-5 medical codes)
58
+ with torch.no_grad():
59
+ embeddings = model.encode_text(inputs['input_ids'], inputs['attention_mask'])
60
+ similarities = torch.mm(embeddings, model.prototypes.t())
61
+ top_codes = similarities.topk(5).indices
62
+
63
+ print("Top 5 predicted medical codes:", top_codes)
64
+ ```
65
+
66
+ ## πŸ“Š Training Details
67
+
68
+ - **Training Data**: Medical clinical notes with associated codes
69
+ - **Training Approach**: Few-shot learning with prototypical networks
70
+ - **Optimization Strategy**: Conservative incremental improvements
71
+ - **Phases**:
72
+ - Phase 1: Enhanced embeddings and pooling (+5.7pp)
73
+ - Phase 2: Ensemble prototypes with attention (+1.1pp)
74
+
75
+ ## 🎯 Use Cases
76
+
77
+ - **Medical Coding Assistance**: Help medical coders find relevant codes
78
+ - **Clinical Decision Support**: Suggest appropriate diagnostic codes
79
+ - **Healthcare Analytics**: Automated coding for large datasets
80
+ - **Research**: Medical text analysis and categorization
81
+
82
+ ## ⚠️ Limitations
83
+
84
+ - Designed for English clinical text
85
+ - Performance varies by medical specialty
86
+ - Requires domain expertise for validation
87
+ - Not a replacement for professional medical coding
88
+
89
+ ## πŸ“‹ Model Details
90
+
91
+ - **Model Size**: ~670 MB
92
+ - **Inference Speed**: 3-8 seconds (CPU), <1 second (GPU)
93
+ - **Memory Requirements**: ~2-3 GB during inference
94
+ - **Self-contained**: No external dataset dependencies
95
+
96
+ ## πŸ”¬ Technical Details
97
+
98
+ - **Few-shot Learning**: Learns from limited examples per medical code
99
+ - **Prototypical Networks**: Creates representative embeddings for each code
100
+ - **Ensemble Prototypes**: Multiple prototypes per code for better coverage
101
+ - **Attention Aggregation**: Smart combination of multiple examples
102
+
103
+ ## πŸ“ˆ Evaluation
104
+
105
+ Evaluated on held-out medical coding dataset with standard metrics:
106
+ - Precision, Recall, F1-score
107
+ - Top-K accuracy (K=1,3,5,10,20)
108
+ - Comparison with baseline methods
109
+
110
+ ## πŸ₯ Real-world Impact
111
+
112
+ This model helps medical professionals by:
113
+ - Reducing coding time from hours to minutes
114
+ - Improving coding accuracy and consistency
115
+ - Narrowing 57,000+ codes to top suggestions
116
+ - Supporting healthcare workflow automation
117
+
118
+ ## πŸ“œ Citation
119
+
120
+ If you use this model, please cite:
121
+
122
+ ```
123
+ @misc{medicoder-ai-v4,
124
+ title={MediCoder AI v4: Few-shot Medical Coding with Prototypical Networks},
125
+ author={Your Name},
126
+ year={2025},
127
+ url={https://huggingface.co/your-username/medicoder-ai-v4-model}
128
+ }
129
+ ```
130
+
131
+ ## πŸ“ž Contact
132
+
133
+ For questions or collaborations, please reach out via the model repository issues.
134
+
135
+ ---
136
+
137
+ *Built with ❀️ for the medical community*