zifeng-ai commited on
Commit
45fd0fd
Β·
verified Β·
1 Parent(s): 97e8374

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -3
README.md CHANGED
@@ -4,8 +4,96 @@ base_model:
4
  - mistralai/Mistral-7B-Instruct-v0.3
5
  ---
6
 
7
- Github link: https://github.com/RyanWangZf/LEADS
8
 
9
- Instruction data link: https://huggingface.co/datasets/zifeng-ai/LEADSInstruct
10
 
11
- Paper link: https://arxiv.org/abs/2501.16255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - mistralai/Mistral-7B-Instruct-v0.3
5
  ---
6
 
7
+ # **LEADS-Mistral-7B-v1: A Fine-Tuned LLM for Systematic Review Literature Mining**
8
 
 
9
 
10
+ ## **Model Overview**
11
+ **LEADS-Mistral-7B-v1** is a fine-tuned version of the **Mistral-7B-Instruct** model, specifically designed for systematic review literature mining tasks. This model has been optimized to assist researchers and clinicians in automating key stages of systematic review methodology, including **literature search, citation screening, and data extraction**.
12
+
13
+ The model was trained on **LEADSInstruct**, the largest publicly available instruction dataset for systematic reviews, comprising **633,759 instruction data points**. This fine-tuning process enhances its ability to process biomedical literature, extract key study characteristics, and support AI-driven evidence synthesis.
14
+
15
+ ## **Key Features**
16
+ βœ… **Fine-tuned for Literature Mining**: Specialized for systematic review tasks, leveraging PubMed and ClinicalTrials.gov data.
17
+ βœ… **Multi-Step Systematic Review Support**: Handles search query generation, study eligibility assessment, and detailed data extraction.
18
+ βœ… **Instruction-Tuned for Task-Specific Adaptation**: Trained using structured input-output instructions for high precision.
19
+ βœ… **Optimized for Biomedical Text**: Evaluated against proprietary and open-source models for superior domain-specific performance.
20
+
21
+ ---
22
+
23
+ ## **Model Details**
24
+ - **Base Model**: [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
25
+ - **Fine-Tuning Dataset**: [LEADSInstruct](https://huggingface.co/datasets/zifeng-ai/LEADSInstruct)
26
+ - **Model Size**: 7.25B parameters
27
+ - **Tensor Type**: BF16
28
+ - **License**: MIT
29
+
30
+ ### **Training & Benchmarking**
31
+ The model was trained using systematic reviews and clinical trial data, covering six major subtasks:
32
+
33
+ 1. **Search Query Generation** – Maximizing study retrieval effectiveness.
34
+ 2. **Study Eligibility Prediction** – Automated citation screening for relevance.
35
+ 3. **Study Characteristics Extraction** – Extracting structured metadata from studies.
36
+ 4. **Trial Result Extraction** – Capturing key outcomes from clinical trials.
37
+ 5. **Participant Statistics Extraction** – Extracting sample size and demographic information.
38
+ 6. **Arm Design Extraction** – Identifying intervention and control groups.
39
+
40
+ ### **Evaluation & Performance**
41
+ LEADS-Mistral-7B-v1 was benchmarked against a variety of proprietary and open-source LLMs, including:
42
+ - **GPT-4o**, **GPT-3.5**, **Haiku-3** (Proprietary models)
43
+ - **Mistral-7B**, **Llama-3** (General-purpose open-source LLMs)
44
+ - **BioMistral**, **MedAlpaca** (Medical-specific LLMs)
45
+
46
+ The model demonstrated **state-of-the-art performance** in literature mining tasks, significantly improving **data extraction accuracy** and **study screening recall**, while reducing manual effort and time cost.
47
+
48
+ ---
49
+
50
+ ## **Usage**
51
+ ### **Loading the Model**
52
+ To use the model in Hugging Face's `transformers` library:
53
+
54
+ ```python
55
+ from transformers import AutoModelForCausalLM, AutoTokenizer
56
+
57
+ model_name = "zifeng-ai/leads-mistral-7b-v1"
58
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
59
+ model = AutoModelForCausalLM.from_pretrained(model_name)
60
+ ```
61
+
62
+ ### ***Examples***
63
+
64
+ ```python
65
+ from leads.api import search_query_generation
66
+
67
+ pico = {
68
+ "population": "Adults with type 2 diabetes",
69
+ "intervention": "SGLT2 inhibitors",
70
+ "comparison": "GLP-1 receptor agonists",
71
+ "outcome": "Cardiovascular outcomes and glycemic control"
72
+ }
73
+
74
+ search_query = search_query_generation(**pico)
75
+ print(search_query)
76
+ ```
77
+
78
+ ---
79
+
80
+ ## **Resources**
81
+ - **πŸ“„ Paper**: [LEADS: AI-Driven Literature Mining for Systematic Reviews](https://arxiv.org/abs/2501.16255)
82
+ - **πŸ“‚ Dataset**: [LEADSInstruct on Hugging Face](https://huggingface.co/datasets/zifeng-ai/LEADSInstruct)
83
+ - **πŸ”— GitHub**: [LEADS Repository](https://github.com/RyanWangZf/LEADS)
84
+
85
+ ---
86
+
87
+ ## **License & Citation**
88
+ This model is released under the **MIT License**, allowing unrestricted research and application use.
89
+
90
+ If you use **LEADS-Mistral-7B-v1**, please cite:
91
+
92
+ ```
93
+ @article{wang2025foundation,
94
+ title={A foundation model for human-AI collaboration in medical literature mining},
95
+ author={Wang, Zifeng and Cao, Lang and Jin, Qiao and Chan, Joey and Wan, Nicholas and Afzali, Behdad and Cho, Hyun-Jin and Choi, Chang-In and Emamverdi, Mehdi and Gill, Manjot K and others},
96
+ journal={arXiv preprint arXiv:2501.16255},
97
+ year={2025}
98
+ }
99
+ ```