Model metrics
Hello! Excellent job on building a medical literature embeddings model. It looks like this models solves a particular challenge you encountered, which is the great thing about open source and being able to fine tune to specific needs.
For your information, here is the comparison between bioclinical-modernbert-base-embeddings
and this model on the same evaluation sets. It's also worth noting the max token length difference of 8192 vs 2048.
Model | PubMed QA | PubMed Subset | PubMed Summary | Average |
---|---|---|---|---|
bioclinical-modernbert-base-embeddings | 92.49 | 97.10 | 97.04 | 95.54 |
ModernPubMedBERT | 92.42 | 96.53 | 96.08 | 95.01 |
Once again, excellent work and good luck!
Hi David, thank you so much for the kind words and for taking the time to run these comparisons. I really appreciate it! Your original pubmedbert-base-embeddings was a major inspiration for this project, so I'm thrilled to get your feedback.
From my observations, ModernPubMedBERT shows a particular strength in understanding correct positives and distinguishing them from false positives. It's encouraging to see these results, especially considering it was trained on a very small dataset with total steps that are less than the warmup steps of the bioclinical-modernbert-base-embeddings model.
Thanks again for your engagement and encouragement!