|
--- |
|
license: llama3.1 |
|
datasets: |
|
- biodatlab/ec-raft-dataset |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# EC-RAFT: Automated Generation of Clinical Trial Eligibility Criteria |
|
|
|
## Model Description |
|
|
|
**EC-RAFT** is a fine-tuned Retrieval-Augmented Fine-Tuning (RAFT) model based on **LLaMA-3.1-8B-Instruct** architecture. |
|
It is designed to automatically generate **structured, high-quality clinical trial eligibility criteria (EC)** directly from trial titles and descriptions. |
|
|
|
EC-RAFT integrates **domain-specific retrieval** with **synthesized intermediate reasoning** steps, enabling it to produce **clinically relevant** and **contextually appropriate** EC sets. |
|
|
|
## Fine-tuning Details |
|
|
|
- **Original Model:** LLaMA-3.1-8B-Instruct |
|
- **Datasets used for fine-tuning:** |
|
- ClinicalTrials.gov (267,347 trials, 2000β2024) [biodatlab/ec-raft-dataset](https://huggingface.co/datasets/biodatlab/ec-raft-dataset) |
|
- Retrieval corpus constructed using **SciNCL model** |
|
- Intermediate reasoning steps **R** generated using **Gemini-1.5-flash-002** |
|
- Fine-tuning method: |
|
- **Retrieval-Augmented Fine-Tuning (RAFT)** |
|
- **Low-Rank Adaptation (LoRA)** |
|
|
|
## Model Performance |
|
|
|
Evaluated on a held-out ClinicalTrials.gov test split: |
|
|
|
| Metric | Score | |
|
|-----------------------------------|---------| |
|
| **BERTScore** (semantic similarity) | **86.23** | |
|
| **Precision** (LLM-guided evaluation) | **78.84%** | |
|
| **Recall** (LLM-guided evaluation) | **75.89%** | |
|
| **Mean LLM-as-a-Judge Score** (0β3) | **1.7150** | |
|
| **Mean Pair-BERTScore** | **67.76** | |
|
|
|
- **Outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines** |
|
- **Outperforms fine-tuned LLaMA and Meditron baselines** |
|
- **Clinically validated:** LLM-as-a-Judge scores highly correlated with human physician evaluation |
|
|
|
## Intended Use |
|
|
|
- Assist **researchers**, **trial designers**, and **sponsors** in drafting clinical trial eligibility criteria. |
|
- **Automate** EC generation to reduce manual effort and improve consistency. |
|
- Support **clinical trial design** transparency and quality. |
|
- Enable integration with **trial registry platforms**, **clinical trial matching systems**, and **EC recommendation tools**. |
|
|
|
## Limitations |
|
|
|
- Requires **human validation** of generated EC before clinical use. |
|
- Trained on **public ClinicalTrials.gov data** β may not generalize well to: |
|
- Rare or novel diseases |
|
- Specialized or non-standard trial designs |
|
- Non-public trial data |
|
- Optimized for **English-language clinical trials**. |
|
- As with any LLM-based system, risks include hallucination, subtle errors, and domain shifts. |
|
- Evaluation metrics (BERTScore, LLM-as-a-Judge) are proxies β not full substitutes for domain expert review. |
|
|
|
## Acknowledgments |
|
|
|
This model was developed using resources provided by: |
|
|
|
- **RAVIS Technology** for feedback and collaboration. |
|
- **Faculty of Medicine Ramathibodi Hospital** |
|
- **NSTDA Supercomputer Center (ThaiSC), Project \#pv814001** |
|
|
|
We also acknowledge the contributions of the broader open-source community whose tools and prior works on **RAFT**, **SciNCL**, **LoRA**, **LLaMA-3**, and **biomedical NLP** made this project possible. |
|
|