lokeshch19/ModernPubMedBERT

Hugging Face

Model metrics

by davidmezzetti - opened 17 days ago

Discussion

davidmezzetti

17 days ago

•

edited 17 days ago

Hello! Excellent job on building a medical literature embeddings model. It looks like this models solves a particular challenge you encountered, which is the great thing about open source and being able to fine tune to specific needs.

For your information, here is the comparison between bioclinical-modernbert-base-embeddings and this model on the same evaluation sets. It's also worth noting the max token length difference of 8192 vs 2048.

Model	PubMed QA	PubMed Subset	PubMed Summary	Average
bioclinical-modernbert-base-embeddings	92.49	97.10	97.04	95.54
ModernPubMedBERT	92.42	96.53	96.08	95.01

Once again, excellent work and good luck!

lokeshch19

Owner 16 days ago

Hi David, thank you so much for the kind words and for taking the time to run these comparisons. I really appreciate it! Your original pubmedbert-base-embeddings was a major inspiration for this project, so I'm thrilled to get your feedback.

From my observations, ModernPubMedBERT shows a particular strength in understanding correct positives and distinguishing them from false positives. It's encouraging to see these results, especially considering it was trained on a very small dataset with total steps that are less than the warmup steps of the bioclinical-modernbert-base-embeddings model.

Thanks again for your engagement and encouragement!

lokeshch19 changed discussion status to closed 16 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment