zifeng-ai
/

leads-mistral-7b-v1

Safetensors

mistral

Model card Files Files and versions

xet

Community

zifeng-ai commited on Mar 14

Commit

45fd0fd

verified ·

1 Parent(s): 97e8374

Update README.md

Browse files

Files changed (1) hide show

README.md +91 -3

README.md CHANGED Viewed

@@ -4,8 +4,96 @@ base_model:
 - mistralai/Mistral-7B-Instruct-v0.3
 ---
-Github link: https://github.com/RyanWangZf/LEADS
-Instruction data link: https://huggingface.co/datasets/zifeng-ai/LEADSInstruct
-Paper link: https://arxiv.org/abs/2501.16255

 - mistralai/Mistral-7B-Instruct-v0.3
 ---
+# **LEADS-Mistral-7B-v1: A Fine-Tuned LLM for Systematic Review Literature Mining**
+## **Model Overview**
+**LEADS-Mistral-7B-v1** is a fine-tuned version of the **Mistral-7B-Instruct** model, specifically designed for systematic review literature mining tasks. This model has been optimized to assist researchers and clinicians in automating key stages of systematic review methodology, including **literature search, citation screening, and data extraction**.
+The model was trained on **LEADSInstruct**, the largest publicly available instruction dataset for systematic reviews, comprising **633,759 instruction data points**. This fine-tuning process enhances its ability to process biomedical literature, extract key study characteristics, and support AI-driven evidence synthesis.
+## **Key Features**
+✅ **Fine-tuned for Literature Mining**: Specialized for systematic review tasks, leveraging PubMed and ClinicalTrials.gov data.
+✅ **Multi-Step Systematic Review Support**: Handles search query generation, study eligibility assessment, and detailed data extraction.
+✅ **Instruction-Tuned for Task-Specific Adaptation**: Trained using structured input-output instructions for high precision.
+✅ **Optimized for Biomedical Text**: Evaluated against proprietary and open-source models for superior domain-specific performance.
+---
+## **Model Details**
+- **Base Model**: [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
+- **Fine-Tuning Dataset**: [LEADSInstruct](https://huggingface.co/datasets/zifeng-ai/LEADSInstruct)
+- **Model Size**: 7.25B parameters
+- **Tensor Type**: BF16
+- **License**: MIT
+### **Training & Benchmarking**
+The model was trained using systematic reviews and clinical trial data, covering six major subtasks:
+1. **Search Query Generation** – Maximizing study retrieval effectiveness.
+2. **Study Eligibility Prediction** – Automated citation screening for relevance.
+3. **Study Characteristics Extraction** – Extracting structured metadata from studies.
+4. **Trial Result Extraction** – Capturing key outcomes from clinical trials.
+5. **Participant Statistics Extraction** – Extracting sample size and demographic information.
+6. **Arm Design Extraction** – Identifying intervention and control groups.
+### **Evaluation & Performance**
+LEADS-Mistral-7B-v1 was benchmarked against a variety of proprietary and open-source LLMs, including:
+- **GPT-4o**, **GPT-3.5**, **Haiku-3** (Proprietary models)
+- **Mistral-7B**, **Llama-3** (General-purpose open-source LLMs)
+- **BioMistral**, **MedAlpaca** (Medical-specific LLMs)
+The model demonstrated **state-of-the-art performance** in literature mining tasks, significantly improving **data extraction accuracy** and **study screening recall**, while reducing manual effort and time cost.
+---
+## **Usage**
+### **Loading the Model**
+To use the model in Hugging Face's `transformers` library:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "zifeng-ai/leads-mistral-7b-v1"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+```
+### ***Examples***
+```python
+from leads.api import search_query_generation
+pico = {
+    "population": "Adults with type 2 diabetes",
+    "intervention": "SGLT2 inhibitors",
+    "comparison": "GLP-1 receptor agonists",
+    "outcome": "Cardiovascular outcomes and glycemic control"
+}
+search_query = search_query_generation(**pico)
+print(search_query)
+```
+---
+## **Resources**
+- **📄 Paper**: [LEADS: AI-Driven Literature Mining for Systematic Reviews](https://arxiv.org/abs/2501.16255)
+- **📂 Dataset**: [LEADSInstruct on Hugging Face](https://huggingface.co/datasets/zifeng-ai/LEADSInstruct)
+- **🔗 GitHub**: [LEADS Repository](https://github.com/RyanWangZf/LEADS)
+---
+## **License & Citation**
+This model is released under the **MIT License**, allowing unrestricted research and application use.
+If you use **LEADS-Mistral-7B-v1**, please cite:
+```
+@article{wang2025foundation,
+  title={A foundation model for human-AI collaboration in medical literature mining},
+  author={Wang, Zifeng and Cao, Lang and Jin, Qiao and Chan, Joey and Wan, Nicholas and Afzali, Behdad and Cho, Hyun-Jin and Choi, Chang-In and Emamverdi, Mehdi and Gill, Manjot K and others},
+  journal={arXiv preprint arXiv:2501.16255},
+  year={2025}
+}
+```