Unlocking Healthcare AI: I'm Releasing State-of-the-Art Medical Models for Free. Forever.

Say Hello to OpenMed: 380+ Free Medical NER Models
Healthcare AI has been stuck behind costly paywalls and closed systems for way too long. Researchers, doctors, and developers have had to deal with steep fees and mysterious “black-box” tools that slow things down. Now, OpenMed is shaking things up with over 380 top-notch Named Entity Recognition (NER) models for medical and clinical text all free and open under the Apache 2.0 license.
These models don’t just keep up with pricey commercial options, they beat them, offering great performance and easy access to speed up healthcare breakthroughs worldwide.
The Issue: Locked-Up Healthcare AI
Medical AI has some big roadblocks:
- Expensive Licenses: Small teams and universities can’t afford them.
- Hidden Details: Commercial tools don’t show how they work.
- Falling Behind: Many paid models aren’t keeping up with new tech.
- Limited Reach: Only big players get the good stuff.
This slows down research, clinical progress, and fair access to better healthcare.
The Fix: OpenMed NER Models
OpenMed brings 380+ free NER models to the table, ready to tackle all kinds of medical and clinical terms—like drug names, diseases, and more. Here’s what makes them stand out:
- ✅ Totally Free: Open-source with the Apache 2.0 license.
- ✅ Ready to Roll: Built for real-world use right away.
- ✅ Flexible Sizes: From 109M to 568M parameters.
- ✅ Tested Tough: Checked against 13+ standard datasets.
- ✅ Plays Nice: Works smoothly with Hugging Face and PyTorch.
These models tear down the old barriers, making healthcare AI open and useful for everyone.
What’s in the OpenMed Toolbox?
OpenMed’s 380+ models are fine-tuned and tested on 13 key medical datasets, delivering top results—like F1 scores up to 0.998. They come in different sizes, so whether you need something lightweight or super powerful, there’s a fit for you.
🔬 Covering All Bases
These models shine in tons of areas:
- Drugs & Chemicals: Spot compounds for drug research or safety tracking.
- Diseases & Clinics: Pull out conditions for better diagnosis tools.
- Genes & Molecules: Dig into genomics and precision medicine.
- Anatomy & Terms: Boost medical records and coding.
- Cancer Research: Power up oncology studies.
They’re perfect for everything from research papers to hospital workflows.
🎯 OpenMed (open-source) vs. the Big Shots (closed-source)
Dataset | OpenMed best F1 (%) | Closed-source SOTA F1 (%)† | Δ (OpenMed – SOTA) | Current closed-source leader |
---|---|---|---|---|
BC4CHEMD | 95.40 | 94.39 | +1.01 | Spark NLP BertForTokenClassification |
BC5CDR-Chem | 96.10 | 94.88 | +1.22 | Spark NLP BertForTokenClassification |
BC5CDR-Disease | 91.20 | 88.5 | +2.70 | BioMegatron |
NCBI-Disease | 91.10 | 89.71 | +1.39 | BioBERT |
JNLPBA | 81.90 | 82.00 | –0.10 | KeBioLM (knowledge-enhanced LM) |
Linnaeus | 96.50 | 92.70 | +3.80 | BERN2 toolkit |
Species-800 | 86.40 | 82.59 | +3.81 | Spark NLP BertForTokenClassification |
BC2GM | 90.10 | 88.75 | +1.35 | Spark NLP Bi-LSTM-CNN-Char |
AnatEM | 90.60 | 91.65 | –1.05 | Spark NLP BertForTokenClassification |
BioNLP 2013 CG | 89.90 | 87.83 | +2.07 | Spark NLP BertForTokenClassification |
Gellus | 99.80 | 63.40 | +36.40 | ConNER |
CLL | 95.70 | 85.98 | — | (no published SOTA) |
FSU | 96.10 | — | — | (no published SOTA) |
† Closed-source scores are the highest peer-reviewed / leaderboard results found in the literature (typically commercial models for Spark NLP, NEEDLE, BERN2, etc.).
🔬 By Domain
This table maps datasets to their respective domains, highlighting recommended models based on top performance across each domain's datasets.
Domain | Datasets Included | Models Available | Size Range (Params) | Recommended Model |
---|---|---|---|---|
Pharmacology | bc5cdr_chem , bc4chemd , fsu |
90 models | 109M - 568M | OpenMed-NER-PharmaDetect-SuperClinical-434M |
Disease/Pathology | bc5cdr_disease , ncbi_disease |
60 models | 109M - 434M | OpenMed-NER-PathologyDetect-PubMed-v2-109M |
Genomics | jnlpba , bc2gm , species800 , linnaeus , gellus |
150 models | 335M - 568M | OpenMed-NER-GenomicDetect-SnowMed-568M |
Anatomy | anatomy |
30 models | 560M | OpenMed-NER-AnatomyDetect-ElectraMed-560M |
Oncology | bionlp2013_cg |
30 models | 355M | OpenMed-NER-OncologyDetect-SuperMedical-355M |
Clinical Notes | cll |
30 models | 560M | OpenMed-NER-BloodCancerDetect-ElectraMed-560M |
⚡ Pick Your Size
Size | Parameters | Best For |
---|---|---|
Compact | 109M | Quick setups |
Large | 335M - 355M | Solid accuracy |
XLarge | 434M | Great all-around |
XXLarge | 560M - 568M | Max power |
📊 Top Models by Dataset
Below is a summary of the top-performing model for each dataset, showcasing their F1 scores and sizes.
Dataset | Top Model | F1 Score | Model Size (Params) |
---|---|---|---|
bc5cdr_chem |
OpenMed-NER-PharmaDetect-SuperClinical-434M |
0.961 | 434M |
bionlp2013_cg |
OpenMed-NER-OncologyDetect-SuperMedical-355M |
0.899 | 355M |
bc4chemd |
OpenMed-NER-ChemicalDetect-PubMed-335M |
0.954 | 335M |
linnaeus |
OpenMed-NER-SpeciesDetect-PubMed-335M |
0.965 | 335M |
jnlpba |
OpenMed-NER-DNADetect-SuperClinical-434M |
0.819 | 434M |
bc5cdr_disease |
OpenMed-NER-DiseaseDetect-SuperClinical-434M |
0.912 | 434M |
fsu |
OpenMed-NER-ProteinDetect-SnowMed-568M |
0.961 | 568M |
ncbi_disease |
OpenMed-NER-PathologyDetect-PubMed-v2-109M |
0.911 | 109M |
bc2gm |
OpenMed-NER-GenomeDetect-SuperClinical-434M |
0.901 | 434M |
cll |
OpenMed-NER-BloodCancerDetect-ElectraMed-560M |
0.957 | 560M |
gellus |
OpenMed-NER-GenomicDetect-SnowMed-568M |
0.998 | 568M |
anatomy |
OpenMed-NER-AnatomyDetect-ElectraMed-560M |
0.906 | 560M |
species800 |
OpenMed-NER-OrganismDetect-BioMed-335M |
0.864 | 335M |
This expanded overview provides a detailed snapshot of OpenMed's model collection, emphasizing the breadth of dataset coverage, diversity in model sizes, and exceptional performance tailored to biomedical and clinical NER tasks.
Try It Out: 3 Lines of Code
Integrating OpenMed NER models is effortless with Hugging Face Transformers:
from transformers import pipeline
ner_pipeline = pipeline("token-classification", model="OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M", aggregation_strategy="simple")
text = "Patient prescribed 10mg aspirin for hypertension."
entities = ner_pipeline(text)
print(entities)
That’s it! You’ll spot “aspirin” as a chemical, just like that.
Scaling for Large Datasets
For processing large-scale datasets efficiently across CPUs or GPUs:
from transformers.pipelines.pt_utils import KeyDataset
from datasets import Dataset
import pandas as pd
# Load your data
# Load a medical dataset from Hugging Face
from datasets import load_dataset
# Load a public medical dataset (using a subset for testing)
medical_dataset = load_dataset("BI55/MedText", split="train[:100]") # Load first 100 examples
data = pd.DataFrame({"text": medical_dataset["Completion"]})
dataset = Dataset.from_pandas(data)
# Process with optimal batching for your hardware
batch_size = 16 # Tune this based on your GPU memory
results = []
for out in medical_ner_pipeline(KeyDataset(dataset, "text"), batch_size=batch_size):
results.extend(out)
print(f"Processed {len(results)} texts with batching")
Real-World Use Cases: NER in Healthcare
Named Entity Recognition (NER) is a technology that extracts and categorizes key information—like names, dates, or medical terms—from unstructured text. In healthcare, where clinical notes, patient records, and research papers are often a jumble of free-form data, NER brings order to the chaos. Below, we explore how NER powers three vital tasks: De-Identification, Entity Relation Extraction, and HCC Coding, and why they’re essential in the medical world.
🔒 De-Identification: Safeguarding Patient Privacy
What it is: De-Identification strips personal health information (PHI)—think names, addresses, or Social Security numbers—from medical records. The goal? Make the data anonymous while keeping it useful. Why it matters: Patient privacy isn’t optional—it’s a legal and ethical must. Laws like HIPAA in the U.S. demand it. By using NER to automatically detect and mask PHI, healthcare providers and researchers can analyze data without risking breaches. It’s faster and more reliable than humans manually scrubbing records.
Impact: De-identified data fuels research and improves care, all while keeping patient identities safe.
🔗 Entity Relation Extraction: Mapping Medical Connections
What it is: This task identifies relationships between entities in text—like tying a drug to its side effects or a disease to its symptoms. NER first spots the entities; then, the relationships are pieced together. Why it matters: Seeing how things connect unlocks smarter healthcare. It builds knowledge graphs for clinical decision support, aids drug discovery, and tailors treatments to patients. Without this, critical links in medical data might stay buried.
Impact: Doctors make better calls, researchers find new insights, and patients get care that fits their unique needs.
💡 HCC Coding: Streamlining Costs and Care
What it is: Hierarchical Condition Category (HCC) coding assigns codes to diagnoses in patient records, helping payers like Medicare predict costs and set reimbursement rates. NER extracts conditions from notes to feed this process. Why it matters: Accurate coding ensures providers are paid fairly for treating complex cases. It also flags high-risk patients for proactive care. Manual coding is slow and error-prone—NER speeds it up and gets it right.
Impact: Healthcare systems save time, optimize budgets, and focus resources on those who need it most.
🌟 The Bigger Picture
NER isn’t just a tool, it’s a catalyst. By tackling these tasks, it:
Strengthens data security and compliance. Accelerates research with clean, usable datasets. Enhances patient outcomes through sharper insights. Cuts costs by automating tedious processes.
In healthcare, where every detail counts, NER turns raw text into real solutions.
Jump In with OpenMed
Join the OpenMed community on Hugging Face to stay in the loop and share ideas.
Fair and Open
- License: Apache 2.0—use it, tweak it, share it.
- Clear Info: Every model comes with a detailed card.
Wrapping Up
OpenMed’s 380+ NER models bring top performance and zero cost together, opening up medical AI for everyone. Whether you’re researching, treating patients, or building tools, these models are here to help.
- 🥇 Better Results: Outperforms big names by up to 36%.
- 🆓 No Charge: Fully free and open.
- 🚀 Easy Start: Works with tools you already use.
- 🌍 Team Effort: Join a growing community.
Head to huggingface.co/OpenMed and start exploring. Let’s make healthcare smarter, together!