Post
245
⚕️ The PubMed Open-Access (OA) subset shares a metadata for 35 Million articles. Suddenly, the existing article parser represents a Hugging Face dataset that was supported up until 2024.
ncbi/pubmed
Moreover, the pubmed data represent a compressed XLM which is beneficial for efficiency but limits processing technique application.
📢 To bridge this gap, excited to share pubmed_articles_iter project, which bridges this gap by providing:
☑️ 1. Downloader for the raw files
☑️ 2. No-string iterator over pubmed articles, utilized for converting them into JSON.
👨💻 Code: https://github.com/nicolay-r/pubmed_articles_iter
Moreover, the pubmed data represent a compressed XLM which is beneficial for efficiency but limits processing technique application.
📢 To bridge this gap, excited to share pubmed_articles_iter project, which bridges this gap by providing:
☑️ 1. Downloader for the raw files
☑️ 2. No-string iterator over pubmed articles, utilized for converting them into JSON.
👨💻 Code: https://github.com/nicolay-r/pubmed_articles_iter