Text Generation
PyTorch
English
llama
conversational
File size: 3,210 Bytes
a3472c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ef03e1
 
d537530
a3472c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: llama3.1
datasets:
- biodatlab/ec-raft-dataset
language:
- en
pipeline_tag: text-generation
---

# EC-RAFT: Automated Generation of Clinical Trial Eligibility Criteria

## Model Description

**EC-RAFT** is a fine-tuned Retrieval-Augmented Fine-Tuning (RAFT) model based on **LLaMA-3.1-8B-Instruct** architecture.  
It is designed to automatically generate **structured, high-quality clinical trial eligibility criteria (EC)** directly from trial titles and descriptions.  

EC-RAFT integrates **domain-specific retrieval** with **synthesized intermediate reasoning** steps, enabling it to produce **clinically relevant** and **contextually appropriate** EC sets.

## Fine-tuning Details

- **Original Model:** LLaMA-3.1-8B-Instruct
- **Datasets used for fine-tuning:**
  - ClinicalTrials.gov (267,347 trials, 2000–2024) [biodatlab/ec-raft-dataset](https://huggingface.co/datasets/biodatlab/ec-raft-dataset)
  - Retrieval corpus constructed using **SciNCL model**
  - Intermediate reasoning steps **R** generated using **Gemini-1.5-flash-002**
  - Fine-tuning method:
    - **Retrieval-Augmented Fine-Tuning (RAFT)**
    - **Low-Rank Adaptation (LoRA)**

## Model Performance

Evaluated on a held-out ClinicalTrials.gov test split:

| Metric                            | Score   |
|-----------------------------------|---------|
| **BERTScore** (semantic similarity) | **86.23** |
| **Precision** (LLM-guided evaluation) | **78.84%** |
| **Recall** (LLM-guided evaluation)    | **75.89%** |
| **Mean LLM-as-a-Judge Score** (0–3)   | **1.7150** |
| **Mean Pair-BERTScore**               | **67.76** |

- **Outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines**
- **Outperforms fine-tuned LLaMA and Meditron baselines**
- **Clinically validated:** LLM-as-a-Judge scores highly correlated with human physician evaluation

## Intended Use

- Assist **researchers**, **trial designers**, and **sponsors** in drafting clinical trial eligibility criteria.
- **Automate** EC generation to reduce manual effort and improve consistency.
- Support **clinical trial design** transparency and quality.
- Enable integration with **trial registry platforms**, **clinical trial matching systems**, and **EC recommendation tools**.

## Limitations

- Requires **human validation** of generated EC before clinical use.
- Trained on **public ClinicalTrials.gov data** — may not generalize well to:
  - Rare or novel diseases
  - Specialized or non-standard trial designs
  - Non-public trial data
- Optimized for **English-language clinical trials**.
- As with any LLM-based system, risks include hallucination, subtle errors, and domain shifts.
- Evaluation metrics (BERTScore, LLM-as-a-Judge) are proxies — not full substitutes for domain expert review.

## Acknowledgments

This model was developed using resources provided by:

- **RAVIS Technology** for feedback and collaboration.
- **Faculty of Medicine Ramathibodi Hospital**
- **NSTDA Supercomputer Center (ThaiSC), Project \#pv814001**

We also acknowledge the contributions of the broader open-source community whose tools and prior works on **RAFT**, **SciNCL**, **LoRA**, **LLaMA-3**, and **biomedical NLP** made this project possible.