Update README.md
Browse files
README.md
CHANGED
@@ -4,8 +4,96 @@ base_model:
|
|
4 |
- mistralai/Mistral-7B-Instruct-v0.3
|
5 |
---
|
6 |
|
7 |
-
|
8 |
|
9 |
-
Instruction data link: https://huggingface.co/datasets/zifeng-ai/LEADSInstruct
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- mistralai/Mistral-7B-Instruct-v0.3
|
5 |
---
|
6 |
|
7 |
+
# **LEADS-Mistral-7B-v1: A Fine-Tuned LLM for Systematic Review Literature Mining**
|
8 |
|
|
|
9 |
|
10 |
+
## **Model Overview**
|
11 |
+
**LEADS-Mistral-7B-v1** is a fine-tuned version of the **Mistral-7B-Instruct** model, specifically designed for systematic review literature mining tasks. This model has been optimized to assist researchers and clinicians in automating key stages of systematic review methodology, including **literature search, citation screening, and data extraction**.
|
12 |
+
|
13 |
+
The model was trained on **LEADSInstruct**, the largest publicly available instruction dataset for systematic reviews, comprising **633,759 instruction data points**. This fine-tuning process enhances its ability to process biomedical literature, extract key study characteristics, and support AI-driven evidence synthesis.
|
14 |
+
|
15 |
+
## **Key Features**
|
16 |
+
β
**Fine-tuned for Literature Mining**: Specialized for systematic review tasks, leveraging PubMed and ClinicalTrials.gov data.
|
17 |
+
β
**Multi-Step Systematic Review Support**: Handles search query generation, study eligibility assessment, and detailed data extraction.
|
18 |
+
β
**Instruction-Tuned for Task-Specific Adaptation**: Trained using structured input-output instructions for high precision.
|
19 |
+
β
**Optimized for Biomedical Text**: Evaluated against proprietary and open-source models for superior domain-specific performance.
|
20 |
+
|
21 |
+
---
|
22 |
+
|
23 |
+
## **Model Details**
|
24 |
+
- **Base Model**: [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
|
25 |
+
- **Fine-Tuning Dataset**: [LEADSInstruct](https://huggingface.co/datasets/zifeng-ai/LEADSInstruct)
|
26 |
+
- **Model Size**: 7.25B parameters
|
27 |
+
- **Tensor Type**: BF16
|
28 |
+
- **License**: MIT
|
29 |
+
|
30 |
+
### **Training & Benchmarking**
|
31 |
+
The model was trained using systematic reviews and clinical trial data, covering six major subtasks:
|
32 |
+
|
33 |
+
1. **Search Query Generation** β Maximizing study retrieval effectiveness.
|
34 |
+
2. **Study Eligibility Prediction** β Automated citation screening for relevance.
|
35 |
+
3. **Study Characteristics Extraction** β Extracting structured metadata from studies.
|
36 |
+
4. **Trial Result Extraction** β Capturing key outcomes from clinical trials.
|
37 |
+
5. **Participant Statistics Extraction** β Extracting sample size and demographic information.
|
38 |
+
6. **Arm Design Extraction** β Identifying intervention and control groups.
|
39 |
+
|
40 |
+
### **Evaluation & Performance**
|
41 |
+
LEADS-Mistral-7B-v1 was benchmarked against a variety of proprietary and open-source LLMs, including:
|
42 |
+
- **GPT-4o**, **GPT-3.5**, **Haiku-3** (Proprietary models)
|
43 |
+
- **Mistral-7B**, **Llama-3** (General-purpose open-source LLMs)
|
44 |
+
- **BioMistral**, **MedAlpaca** (Medical-specific LLMs)
|
45 |
+
|
46 |
+
The model demonstrated **state-of-the-art performance** in literature mining tasks, significantly improving **data extraction accuracy** and **study screening recall**, while reducing manual effort and time cost.
|
47 |
+
|
48 |
+
---
|
49 |
+
|
50 |
+
## **Usage**
|
51 |
+
### **Loading the Model**
|
52 |
+
To use the model in Hugging Face's `transformers` library:
|
53 |
+
|
54 |
+
```python
|
55 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
56 |
+
|
57 |
+
model_name = "zifeng-ai/leads-mistral-7b-v1"
|
58 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
59 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
60 |
+
```
|
61 |
+
|
62 |
+
### ***Examples***
|
63 |
+
|
64 |
+
```python
|
65 |
+
from leads.api import search_query_generation
|
66 |
+
|
67 |
+
pico = {
|
68 |
+
"population": "Adults with type 2 diabetes",
|
69 |
+
"intervention": "SGLT2 inhibitors",
|
70 |
+
"comparison": "GLP-1 receptor agonists",
|
71 |
+
"outcome": "Cardiovascular outcomes and glycemic control"
|
72 |
+
}
|
73 |
+
|
74 |
+
search_query = search_query_generation(**pico)
|
75 |
+
print(search_query)
|
76 |
+
```
|
77 |
+
|
78 |
+
---
|
79 |
+
|
80 |
+
## **Resources**
|
81 |
+
- **π Paper**: [LEADS: AI-Driven Literature Mining for Systematic Reviews](https://arxiv.org/abs/2501.16255)
|
82 |
+
- **π Dataset**: [LEADSInstruct on Hugging Face](https://huggingface.co/datasets/zifeng-ai/LEADSInstruct)
|
83 |
+
- **π GitHub**: [LEADS Repository](https://github.com/RyanWangZf/LEADS)
|
84 |
+
|
85 |
+
---
|
86 |
+
|
87 |
+
## **License & Citation**
|
88 |
+
This model is released under the **MIT License**, allowing unrestricted research and application use.
|
89 |
+
|
90 |
+
If you use **LEADS-Mistral-7B-v1**, please cite:
|
91 |
+
|
92 |
+
```
|
93 |
+
@article{wang2025foundation,
|
94 |
+
title={A foundation model for human-AI collaboration in medical literature mining},
|
95 |
+
author={Wang, Zifeng and Cao, Lang and Jin, Qiao and Chan, Joey and Wan, Nicholas and Afzali, Behdad and Cho, Hyun-Jin and Choi, Chang-In and Emamverdi, Mehdi and Gill, Manjot K and others},
|
96 |
+
journal={arXiv preprint arXiv:2501.16255},
|
97 |
+
year={2025}
|
98 |
+
}
|
99 |
+
```
|