Unlocking Healthcare AI: I'm Releasing State-of-the-Art Medical Models for Free. Forever.

Community Article Published July 16, 2025

image/png

Say Hello to OpenMed: 380+ Free Medical NER Models

Healthcare AI has been stuck behind costly paywalls and closed systems for way too long. Researchers, doctors, and developers have had to deal with steep fees and mysterious “black-box” tools that slow things down. Now, OpenMed is shaking things up with over 380 top-notch Named Entity Recognition (NER) models for medical and clinical text all free and open under the Apache 2.0 license.

These models don’t just keep up with pricey commercial options, they beat them, offering great performance and easy access to speed up healthcare breakthroughs worldwide.

The Issue: Locked-Up Healthcare AI

Medical AI has some big roadblocks:

  • Expensive Licenses: Small teams and universities can’t afford them.
  • Hidden Details: Commercial tools don’t show how they work.
  • Falling Behind: Many paid models aren’t keeping up with new tech.
  • Limited Reach: Only big players get the good stuff.

This slows down research, clinical progress, and fair access to better healthcare.

The Fix: OpenMed NER Models

OpenMed brings 380+ free NER models to the table, ready to tackle all kinds of medical and clinical terms—like drug names, diseases, and more. Here’s what makes them stand out:

  • Totally Free: Open-source with the Apache 2.0 license.
  • Ready to Roll: Built for real-world use right away.
  • Flexible Sizes: From 109M to 568M parameters.
  • Tested Tough: Checked against 13+ standard datasets.
  • Plays Nice: Works smoothly with Hugging Face and PyTorch.

These models tear down the old barriers, making healthcare AI open and useful for everyone.

What’s in the OpenMed Toolbox?

OpenMed’s 380+ models are fine-tuned and tested on 13 key medical datasets, delivering top results—like F1 scores up to 0.998. They come in different sizes, so whether you need something lightweight or super powerful, there’s a fit for you.

🔬 Covering All Bases

These models shine in tons of areas:

  • Drugs & Chemicals: Spot compounds for drug research or safety tracking.
  • Diseases & Clinics: Pull out conditions for better diagnosis tools.
  • Genes & Molecules: Dig into genomics and precision medicine.
  • Anatomy & Terms: Boost medical records and coding.
  • Cancer Research: Power up oncology studies.

They’re perfect for everything from research papers to hospital workflows.

🎯 OpenMed (open-source) vs. the Big Shots (closed-source)

Dataset OpenMed best F1 (%) Closed-source SOTA F1 (%)† Δ (OpenMed – SOTA) Current closed-source leader
BC4CHEMD 95.40 94.39 +1.01 Spark NLP BertForTokenClassification
BC5CDR-Chem 96.10 94.88 +1.22 Spark NLP BertForTokenClassification
BC5CDR-Disease 91.20 88.5 +2.70 BioMegatron
NCBI-Disease 91.10 89.71 +1.39 BioBERT
JNLPBA 81.90 82.00 –0.10 KeBioLM (knowledge-enhanced LM)
Linnaeus 96.50 92.70 +3.80 BERN2 toolkit
Species-800 86.40 82.59 +3.81 Spark NLP BertForTokenClassification
BC2GM 90.10 88.75 +1.35 Spark NLP Bi-LSTM-CNN-Char
AnatEM 90.60 91.65 –1.05 Spark NLP BertForTokenClassification
BioNLP 2013 CG 89.90 87.83 +2.07 Spark NLP BertForTokenClassification
Gellus 99.80 63.40 +36.40 ConNER
CLL 95.70 85.98 (no published SOTA)
FSU 96.10 (no published SOTA)

† Closed-source scores are the highest peer-reviewed / leaderboard results found in the literature (typically commercial models for Spark NLP, NEEDLE, BERN2, etc.).

OpenMed (open-source) vs. latest closed-source SOTA

🔬 By Domain

This table maps datasets to their respective domains, highlighting recommended models based on top performance across each domain's datasets.

Domain Datasets Included Models Available Size Range (Params) Recommended Model
Pharmacology bc5cdr_chem, bc4chemd, fsu 90 models 109M - 568M OpenMed-NER-PharmaDetect-SuperClinical-434M
Disease/Pathology bc5cdr_disease, ncbi_disease 60 models 109M - 434M OpenMed-NER-PathologyDetect-PubMed-v2-109M
Genomics jnlpba, bc2gm, species800, linnaeus, gellus 150 models 335M - 568M OpenMed-NER-GenomicDetect-SnowMed-568M
Anatomy anatomy 30 models 560M OpenMed-NER-AnatomyDetect-ElectraMed-560M
Oncology bionlp2013_cg 30 models 355M OpenMed-NER-OncologyDetect-SuperMedical-355M
Clinical Notes cll 30 models 560M OpenMed-NER-BloodCancerDetect-ElectraMed-560M

⚡ Pick Your Size

Size Parameters Best For
Compact 109M Quick setups
Large 335M - 355M Solid accuracy
XLarge 434M Great all-around
XXLarge 560M - 568M Max power

Model size comparison showing trade-offs between performance and computational requirements

📊 Top Models by Dataset

Below is a summary of the top-performing model for each dataset, showcasing their F1 scores and sizes.

This expanded overview provides a detailed snapshot of OpenMed's model collection, emphasizing the breadth of dataset coverage, diversity in model sizes, and exceptional performance tailored to biomedical and clinical NER tasks.

Try It Out: 3 Lines of Code

Integrating OpenMed NER models is effortless with Hugging Face Transformers:

from transformers import pipeline

ner_pipeline = pipeline("token-classification", model="OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M", aggregation_strategy="simple")
text = "Patient prescribed 10mg aspirin for hypertension."
entities = ner_pipeline(text)
print(entities)

That’s it! You’ll spot “aspirin” as a chemical, just like that.

Scaling for Large Datasets

For processing large-scale datasets efficiently across CPUs or GPUs:

from transformers.pipelines.pt_utils import KeyDataset
from datasets import Dataset
import pandas as pd

# Load your data
# Load a medical dataset from Hugging Face
from datasets import load_dataset

# Load a public medical dataset (using a subset for testing)
medical_dataset = load_dataset("BI55/MedText", split="train[:100]")  # Load first 100 examples
data = pd.DataFrame({"text": medical_dataset["Completion"]})
dataset = Dataset.from_pandas(data)

# Process with optimal batching for your hardware
batch_size = 16  # Tune this based on your GPU memory
results = []

for out in medical_ner_pipeline(KeyDataset(dataset, "text"), batch_size=batch_size):
    results.extend(out)

print(f"Processed {len(results)} texts with batching")

Real-World Use Cases: NER in Healthcare

Named Entity Recognition (NER) is a technology that extracts and categorizes key information—like names, dates, or medical terms—from unstructured text. In healthcare, where clinical notes, patient records, and research papers are often a jumble of free-form data, NER brings order to the chaos. Below, we explore how NER powers three vital tasks: De-Identification, Entity Relation Extraction, and HCC Coding, and why they’re essential in the medical world.

🔒 De-Identification: Safeguarding Patient Privacy

What it is: De-Identification strips personal health information (PHI)—think names, addresses, or Social Security numbers—from medical records. The goal? Make the data anonymous while keeping it useful. Why it matters: Patient privacy isn’t optional—it’s a legal and ethical must. Laws like HIPAA in the U.S. demand it. By using NER to automatically detect and mask PHI, healthcare providers and researchers can analyze data without risking breaches. It’s faster and more reliable than humans manually scrubbing records.

Impact: De-identified data fuels research and improves care, all while keeping patient identities safe.

🔗 Entity Relation Extraction: Mapping Medical Connections

What it is: This task identifies relationships between entities in text—like tying a drug to its side effects or a disease to its symptoms. NER first spots the entities; then, the relationships are pieced together. Why it matters: Seeing how things connect unlocks smarter healthcare. It builds knowledge graphs for clinical decision support, aids drug discovery, and tailors treatments to patients. Without this, critical links in medical data might stay buried.

Impact: Doctors make better calls, researchers find new insights, and patients get care that fits their unique needs.

💡 HCC Coding: Streamlining Costs and Care

What it is: Hierarchical Condition Category (HCC) coding assigns codes to diagnoses in patient records, helping payers like Medicare predict costs and set reimbursement rates. NER extracts conditions from notes to feed this process. Why it matters: Accurate coding ensures providers are paid fairly for treating complex cases. It also flags high-risk patients for proactive care. Manual coding is slow and error-prone—NER speeds it up and gets it right.

Impact: Healthcare systems save time, optimize budgets, and focus resources on those who need it most.

🌟 The Bigger Picture

NER isn’t just a tool, it’s a catalyst. By tackling these tasks, it:

Strengthens data security and compliance. Accelerates research with clean, usable datasets. Enhances patient outcomes through sharper insights. Cuts costs by automating tedious processes.

In healthcare, where every detail counts, NER turns raw text into real solutions.

Jump In with OpenMed

Join the OpenMed community on Hugging Face to stay in the loop and share ideas.

Fair and Open

  • License: Apache 2.0—use it, tweak it, share it.
  • Clear Info: Every model comes with a detailed card.

Wrapping Up

OpenMed’s 380+ NER models bring top performance and zero cost together, opening up medical AI for everyone. Whether you’re researching, treating patients, or building tools, these models are here to help.

  • 🥇 Better Results: Outperforms big names by up to 36%.
  • 🆓 No Charge: Fully free and open.
  • 🚀 Easy Start: Works with tools you already use.
  • 🌍 Team Effort: Join a growing community.

Head to huggingface.co/OpenMed and start exploring. Let’s make healthcare smarter, together!

Community

Sign up or log in to comment