Unlocking Healthcare AI: I'm Releasing State-of-the-Art Medical Models for Free. Forever.

Community Article Published July 16, 2025

Say Hello to OpenMed: 380+ Free Medical NER Models

Healthcare AI has been stuck behind costly paywalls and closed systems for way too long. Researchers, doctors, and developers have had to deal with steep fees and mysterious “black-box” tools that slow things down. Now, OpenMed is shaking things up with over 380 top-notch Named Entity Recognition (NER) models for medical and clinical text all free and open under the Apache 2.0 license.

These models don’t just keep up with pricey commercial options, they beat them, offering great performance and easy access to speed up healthcare breakthroughs worldwide.

The Issue: Locked-Up Healthcare AI

Medical AI has some big roadblocks:

Expensive Licenses: Small teams and universities can’t afford them.
Hidden Details: Commercial tools don’t show how they work.
Falling Behind: Many paid models aren’t keeping up with new tech.
Limited Reach: Only big players get the good stuff.

This slows down research, clinical progress, and fair access to better healthcare.

The Fix: OpenMed NER Models

OpenMed brings 380+ free NER models to the table, ready to tackle all kinds of medical and clinical terms—like drug names, diseases, and more. Here’s what makes them stand out:

✅ Totally Free: Open-source with the Apache 2.0 license.
✅ Ready to Roll: Built for real-world use right away.
✅ Flexible Sizes: From 109M to 568M parameters.
✅ Tested Tough: Checked against 13+ standard datasets.
✅ Plays Nice: Works smoothly with Hugging Face and PyTorch.

These models tear down the old barriers, making healthcare AI open and useful for everyone.

What’s in the OpenMed Toolbox?

OpenMed’s 380+ models are fine-tuned and tested on 13 key medical datasets, delivering top results—like F1 scores up to 0.998. They come in different sizes, so whether you need something lightweight or super powerful, there’s a fit for you.

🔬 Covering All Bases

These models shine in tons of areas:

Drugs & Chemicals: Spot compounds for drug research or safety tracking.
Diseases & Clinics: Pull out conditions for better diagnosis tools.
Genes & Molecules: Dig into genomics and precision medicine.
Anatomy & Terms: Boost medical records and coding.
Cancer Research: Power up oncology studies.

They’re perfect for everything from research papers to hospital workflows.

🎯 OpenMed (open-source) vs. the Big Shots (closed-source)

Dataset	OpenMed best F1 (%)	Closed-source SOTA F1 (%)†	Δ (OpenMed – SOTA)	Current closed-source leader
BC4CHEMD	95.40	94.39	+1.01	Spark NLP BertForTokenClassification
BC5CDR-Chem	96.10	94.88	+1.22	Spark NLP BertForTokenClassification
BC5CDR-Disease	91.20	88.5	+2.70	BioMegatron
NCBI-Disease	91.10	89.71	+1.39	BioBERT
JNLPBA	81.90	82.00	–0.10	KeBioLM (knowledge-enhanced LM)
Linnaeus	96.50	92.70	+3.80	BERN2 toolkit
Species-800	86.40	82.59	+3.81	Spark NLP BertForTokenClassification
BC2GM	90.10	88.75	+1.35	Spark NLP Bi-LSTM-CNN-Char
AnatEM	90.60	91.65	–1.05	Spark NLP BertForTokenClassification
BioNLP 2013 CG	89.90	87.83	+2.07	Spark NLP BertForTokenClassification
Gellus	99.80	63.40	+36.40	ConNER
CLL	95.70	85.98	—	(no published SOTA)
FSU	96.10	—	—	(no published SOTA)

† Closed-source scores are the highest peer-reviewed / leaderboard results found in the literature (typically commercial models for Spark NLP, NEEDLE, BERN2, etc.).

🔬 By Domain

This table maps datasets to their respective domains, highlighting recommended models based on top performance across each domain's datasets.

Domain	Datasets Included	Models Available	Size Range (Params)	Recommended Model
Pharmacology	`bc5cdr_chem`, `bc4chemd`, `fsu`	90 models	109M - 568M	`OpenMed-NER-PharmaDetect-SuperClinical-434M`
Disease/Pathology	`bc5cdr_disease`, `ncbi_disease`	60 models	109M - 434M	`OpenMed-NER-PathologyDetect-PubMed-v2-109M`
Genomics	`jnlpba`, `bc2gm`, `species800`, `linnaeus`, `gellus`	150 models	335M - 568M	`OpenMed-NER-GenomicDetect-SnowMed-568M`
Anatomy	`anatomy`	30 models	560M	`OpenMed-NER-AnatomyDetect-ElectraMed-560M`
Oncology	`bionlp2013_cg`	30 models	355M	`OpenMed-NER-OncologyDetect-SuperMedical-355M`
Clinical Notes	`cll`	30 models	560M	`OpenMed-NER-BloodCancerDetect-ElectraMed-560M`

⚡ Pick Your Size

Size	Parameters	Best For
Compact	109M	Quick setups
Large	335M - 355M	Solid accuracy
XLarge	434M	Great all-around
XXLarge	560M - 568M	Max power

📊 Top Models by Dataset

Below is a summary of the top-performing model for each dataset, showcasing their F1 scores and sizes.

Dataset	Top Model	F1 Score	Model Size (Params)
`bc5cdr_chem`	`OpenMed-NER-PharmaDetect-SuperClinical-434M`	0.961	434M
`bionlp2013_cg`	`OpenMed-NER-OncologyDetect-SuperMedical-355M`	0.899	355M
`bc4chemd`	`OpenMed-NER-ChemicalDetect-PubMed-335M`	0.954	335M
`linnaeus`	`OpenMed-NER-SpeciesDetect-PubMed-335M`	0.965	335M
`jnlpba`	`OpenMed-NER-DNADetect-SuperClinical-434M`	0.819	434M
`bc5cdr_disease`	`OpenMed-NER-DiseaseDetect-SuperClinical-434M`	0.912	434M
`fsu`	`OpenMed-NER-ProteinDetect-SnowMed-568M`	0.961	568M
`ncbi_disease`	`OpenMed-NER-PathologyDetect-PubMed-v2-109M`	0.911	109M
`bc2gm`	`OpenMed-NER-GenomeDetect-SuperClinical-434M`	0.901	434M
`cll`	`OpenMed-NER-BloodCancerDetect-ElectraMed-560M`	0.957	560M
`gellus`	`OpenMed-NER-GenomicDetect-SnowMed-568M`	0.998	568M
`anatomy`	`OpenMed-NER-AnatomyDetect-ElectraMed-560M`	0.906	560M
`species800`	`OpenMed-NER-OrganismDetect-BioMed-335M`	0.864	335M

This expanded overview provides a detailed snapshot of OpenMed's model collection, emphasizing the breadth of dataset coverage, diversity in model sizes, and exceptional performance tailored to biomedical and clinical NER tasks.

Try It Out: 3 Lines of Code

Integrating OpenMed NER models is effortless with Hugging Face Transformers:

from transformers import pipeline

ner_pipeline = pipeline("token-classification", model="OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M", aggregation_strategy="simple")
text = "Patient prescribed 10mg aspirin for hypertension."
entities = ner_pipeline(text)
print(entities)

That’s it! You’ll spot “aspirin” as a chemical, just like that.

Scaling for Large Datasets

For processing large-scale datasets efficiently across CPUs or GPUs:

from transformers.pipelines.pt_utils import KeyDataset
from datasets import Dataset
import pandas as pd

# Load your data
# Load a medical dataset from Hugging Face
from datasets import load_dataset

# Load a public medical dataset (using a subset for testing)
medical_dataset = load_dataset("BI55/MedText", split="train[:100]")  # Load first 100 examples
data = pd.DataFrame({"text": medical_dataset["Completion"]})
dataset = Dataset.from_pandas(data)

# Process with optimal batching for your hardware
batch_size = 16  # Tune this based on your GPU memory
results = []

for out in medical_ner_pipeline(KeyDataset(dataset, "text"), batch_size=batch_size):
    results.extend(out)

print(f"Processed {len(results)} texts with batching")

Real-World Use Cases: NER in Healthcare

Named Entity Recognition (NER) is a technology that extracts and categorizes key information—like names, dates, or medical terms—from unstructured text. In healthcare, where clinical notes, patient records, and research papers are often a jumble of free-form data, NER brings order to the chaos. Below, we explore how NER powers three vital tasks: De-Identification, Entity Relation Extraction, and HCC Coding, and why they’re essential in the medical world.

🔒 De-Identification: Safeguarding Patient Privacy

What it is: De-Identification strips personal health information (PHI)—think names, addresses, or Social Security numbers—from medical records. The goal? Make the data anonymous while keeping it useful. Why it matters: Patient privacy isn’t optional—it’s a legal and ethical must. Laws like HIPAA in the U.S. demand it. By using NER to automatically detect and mask PHI, healthcare providers and researchers can analyze data without risking breaches. It’s faster and more reliable than humans manually scrubbing records.

Impact: De-identified data fuels research and improves care, all while keeping patient identities safe.

🔗 Entity Relation Extraction: Mapping Medical Connections

What it is: This task identifies relationships between entities in text—like tying a drug to its side effects or a disease to its symptoms. NER first spots the entities; then, the relationships are pieced together. Why it matters: Seeing how things connect unlocks smarter healthcare. It builds knowledge graphs for clinical decision support, aids drug discovery, and tailors treatments to patients. Without this, critical links in medical data might stay buried.

Impact: Doctors make better calls, researchers find new insights, and patients get care that fits their unique needs.

💡 HCC Coding: Streamlining Costs and Care

What it is: Hierarchical Condition Category (HCC) coding assigns codes to diagnoses in patient records, helping payers like Medicare predict costs and set reimbursement rates. NER extracts conditions from notes to feed this process. Why it matters: Accurate coding ensures providers are paid fairly for treating complex cases. It also flags high-risk patients for proactive care. Manual coding is slow and error-prone—NER speeds it up and gets it right.

Impact: Healthcare systems save time, optimize budgets, and focus resources on those who need it most.

🌟 The Bigger Picture

NER isn’t just a tool, it’s a catalyst. By tackling these tasks, it:

Strengthens data security and compliance. Accelerates research with clean, usable datasets. Enhances patient outcomes through sharper insights. Cuts costs by automating tedious processes.

In healthcare, where every detail counts, NER turns raw text into real solutions.

Jump In with OpenMed

Join the OpenMed community on Hugging Face to stay in the loop and share ideas.

Fair and Open

License: Apache 2.0—use it, tweak it, share it.
Clear Info: Every model comes with a detailed card.

Wrapping Up

OpenMed’s 380+ NER models bring top performance and zero cost together, opening up medical AI for everyone. Whether you’re researching, treating patients, or building tools, these models are here to help.

🥇 Better Results: Outperforms big names by up to 36%.
🆓 No Charge: Fully free and open.
🚀 Easy Start: Works with tools you already use.
🌍 Team Effort: Join a growing community.

Head to huggingface.co/OpenMed and start exploring. Let’s make healthcare smarter, together!

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote