Upload README.md with huggingface_hub

063bf99 verified 7 days ago

4.01 kB

	---
	license: mit
	base_model: emilyalsentzer/Bio_ClinicalBERT
	tags:
	- medical
	- healthcare
	- clinical-notes
	- medical-coding
	- few-shot-learning
	- prototypical-networks
	language:
	- en
	metrics:
	- accuracy
	library_name: transformers
	pipeline_tag: text-classification
	---

	# MediCoder AI v4 🏥

	## Model Description

	MediCoder AI v4 is a state-of-the-art medical coding system that predicts ICD/medical codes from clinical notes with 46.3% Top-1 accuracy. Built on Bio_ClinicalBERT with few-shot prototypical learning, it can handle ~57,000 medical codes.

	## 🎯 Performance

	- Top-1 Accuracy: 46.3%
	- Top-3 Accuracy: ~52%
	- Top-5 Accuracy: ~54%
	- Improvement: +6.8 percentage points over baseline
	- Medical Codes: ~57,000 supported codes

	## 🏗️ Architecture

	- Base Model: Bio_ClinicalBERT (specialized for medical text)
	- Approach: Few-shot Prototypical Networks
	- Embedding Dimension: 768
	- Optimization: Conservative incremental improvements (Phase 2)

	## 🚀 Usage

	```python
	import torch
	from transformers import AutoTokenizer

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("your-username/medicoder-ai-v4-model")
	model = torch.load("pytorch_model.bin", map_location="cpu")

	# Example usage
	clinical_note = "Patient presents with chest pain and shortness of breath..."

	# Tokenize
	inputs = tokenizer(clinical_note, return_tensors="pt",
	truncation=True, max_length=512)

	# Get predictions (top-5 medical codes)
	with torch.no_grad():
	embeddings = model.encode_text(inputs['input_ids'], inputs['attention_mask'])
	similarities = torch.mm(embeddings, model.prototypes.t())
	top_codes = similarities.topk(5).indices

	print("Top 5 predicted medical codes:", top_codes)
	```

	## 📊 Training Details

	- Training Data: Medical clinical notes with associated codes
	- Training Approach: Few-shot learning with prototypical networks
	- Optimization Strategy: Conservative incremental improvements
	- Phases:
	- Phase 1: Enhanced embeddings and pooling (+5.7pp)
	- Phase 2: Ensemble prototypes with attention (+1.1pp)

	## 🎯 Use Cases

	- Medical Coding Assistance: Help medical coders find relevant codes
	- Clinical Decision Support: Suggest appropriate diagnostic codes
	- Healthcare Analytics: Automated coding for large datasets
	- Research: Medical text analysis and categorization

	## ⚠️ Limitations

	- Designed for English clinical text
	- Performance varies by medical specialty
	- Requires domain expertise for validation
	- Not a replacement for professional medical coding

	## 📋 Model Details

	- Model Size: ~670 MB
	- Inference Speed: 3-8 seconds (CPU), <1 second (GPU)
	- Memory Requirements: ~2-3 GB during inference
	- Self-contained: No external dataset dependencies

	## 🔬 Technical Details

	- Few-shot Learning: Learns from limited examples per medical code
	- Prototypical Networks: Creates representative embeddings for each code
	- Ensemble Prototypes: Multiple prototypes per code for better coverage
	- Attention Aggregation: Smart combination of multiple examples

	## 📈 Evaluation

	Evaluated on held-out medical coding dataset with standard metrics:
	- Precision, Recall, F1-score
	- Top-K accuracy (K=1,3,5,10,20)
	- Comparison with baseline methods

	## 🏥 Real-world Impact

	This model helps medical professionals by:
	- Reducing coding time from hours to minutes
	- Improving coding accuracy and consistency
	- Narrowing 57,000+ codes to top suggestions
	- Supporting healthcare workflow automation

	## 📜 Citation

	If you use this model, please cite:

	```
	@misc{medicoder-ai-v4,
	title={MediCoder AI v4: Few-shot Medical Coding with Prototypical Networks},
	author={Your Name},
	year={2025},
	url={https://huggingface.co/your-username/medicoder-ai-v4-model}
	}
	```

	## 📞 Contact

	For questions or collaborations, please reach out via the model repository issues.

	---

	Built with ❤️ for the medical community