LEADS-Mistral-7B-v1: A Fine-Tuned LLM for Systematic Review Literature Mining
Model Overview
LEADS-Mistral-7B-v1 is a fine-tuned version of the Mistral-7B-Instruct model, specifically designed for systematic review literature mining tasks. This model has been optimized to assist researchers and clinicians in automating key stages of systematic review methodology, including literature search, citation screening, and data extraction.
The model was trained on LEADSInstruct, the largest publicly available instruction dataset for systematic reviews, comprising 633,759 instruction data points. This fine-tuning process enhances its ability to process biomedical literature, extract key study characteristics, and support AI-driven evidence synthesis.
Key Features
β
Fine-tuned for Literature Mining: Specialized for systematic review tasks, leveraging PubMed and ClinicalTrials.gov data.
β
Multi-Step Systematic Review Support: Handles search query generation, study eligibility assessment, and detailed data extraction.
β
Instruction-Tuned for Task-Specific Adaptation: Trained using structured input-output instructions for high precision.
β
Optimized for Biomedical Text: Evaluated against proprietary and open-source models for superior domain-specific performance.
Model Details
- Base Model: Mistral-7B-Instruct-v0.3
- Fine-Tuning Dataset: LEADSInstruct
- Model Size: 7.25B parameters
- Tensor Type: BF16
- License: MIT
Training & Benchmarking
The model was trained using systematic reviews and clinical trial data, covering six major subtasks:
- Search Query Generation β Maximizing study retrieval effectiveness.
- Study Eligibility Prediction β Automated citation screening for relevance.
- Study Characteristics Extraction β Extracting structured metadata from studies.
- Trial Result Extraction β Capturing key outcomes from clinical trials.
- Participant Statistics Extraction β Extracting sample size and demographic information.
- Arm Design Extraction β Identifying intervention and control groups.
Evaluation & Performance
LEADS-Mistral-7B-v1 was benchmarked against a variety of proprietary and open-source LLMs, including:
- GPT-4o, GPT-3.5, Haiku-3 (Proprietary models)
- Mistral-7B, Llama-3 (General-purpose open-source LLMs)
- BioMistral, MedAlpaca (Medical-specific LLMs)
The model demonstrated state-of-the-art performance in literature mining tasks, significantly improving data extraction accuracy and study screening recall, while reducing manual effort and time cost.
Usage
Loading the Model
To use the model in Hugging Face's transformers
library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "zifeng-ai/leads-mistral-7b-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
Examples
from leads.api import search_query_generation
pico = {
"population": "Adults with type 2 diabetes",
"intervention": "SGLT2 inhibitors",
"comparison": "GLP-1 receptor agonists",
"outcome": "Cardiovascular outcomes and glycemic control"
}
search_query = search_query_generation(**pico)
print(search_query)
Resources
- π Paper: LEADS: AI-Driven Literature Mining for Systematic Reviews
- π Dataset: LEADSInstruct on Hugging Face
- π GitHub: LEADS Repository
License & Citation
This model is released under the MIT License, allowing unrestricted research and application use.
If you use LEADS-Mistral-7B-v1, please cite:
@article{wang2025foundation,
title={A foundation model for human-AI collaboration in medical literature mining},
author={Wang, Zifeng and Cao, Lang and Jin, Qiao and Chan, Joey and Wan, Nicholas and Afzali, Behdad and Cho, Hyun-Jin and Choi, Chang-In and Emamverdi, Mehdi and Gill, Manjot K and others},
journal={arXiv preprint arXiv:2501.16255},
year={2025}
}
- Downloads last month
- 37