ec-raft / README.md

Update README.md

4ef03e1 verified 3 months ago

3.21 kB

	---
	license: llama3.1
	datasets:
	- biodatlab/ec-raft-dataset
	language:
	- en
	pipeline_tag: text-generation
	---

	# EC-RAFT: Automated Generation of Clinical Trial Eligibility Criteria

	## Model Description

	EC-RAFT is a fine-tuned Retrieval-Augmented Fine-Tuning (RAFT) model based on LLaMA-3.1-8B-Instruct architecture.
	It is designed to automatically generate structured, high-quality clinical trial eligibility criteria (EC) directly from trial titles and descriptions.

	EC-RAFT integrates domain-specific retrieval with synthesized intermediate reasoning steps, enabling it to produce clinically relevant and contextually appropriate EC sets.

	## Fine-tuning Details

	- Original Model: LLaMA-3.1-8B-Instruct
	- Datasets used for fine-tuning:
	- ClinicalTrials.gov (267,347 trials, 2000–2024) [biodatlab/ec-raft-dataset](https://huggingface.co/datasets/biodatlab/ec-raft-dataset)
	- Retrieval corpus constructed using SciNCL model
	- Intermediate reasoning steps R generated using Gemini-1.5-flash-002
	- Fine-tuning method:
	- Retrieval-Augmented Fine-Tuning (RAFT)
	- Low-Rank Adaptation (LoRA)

	## Model Performance

	Evaluated on a held-out ClinicalTrials.gov test split:

	\| Metric \| Score \|
	\|-----------------------------------\|---------\|
	\| BERTScore (semantic similarity) \| 86.23 \|
	\| Precision (LLM-guided evaluation) \| 78.84% \|
	\| Recall (LLM-guided evaluation) \| 75.89% \|
	\| Mean LLM-as-a-Judge Score (0–3) \| 1.7150 \|
	\| Mean Pair-BERTScore \| 67.76 \|

	- Outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines
	- Outperforms fine-tuned LLaMA and Meditron baselines
	- Clinically validated: LLM-as-a-Judge scores highly correlated with human physician evaluation

	## Intended Use

	- Assist researchers, trial designers, and sponsors in drafting clinical trial eligibility criteria.
	- Automate EC generation to reduce manual effort and improve consistency.
	- Support clinical trial design transparency and quality.
	- Enable integration with trial registry platforms, clinical trial matching systems, and EC recommendation tools.

	## Limitations

	- Requires human validation of generated EC before clinical use.
	- Trained on public ClinicalTrials.gov data — may not generalize well to:
	- Rare or novel diseases
	- Specialized or non-standard trial designs
	- Non-public trial data
	- Optimized for English-language clinical trials.
	- As with any LLM-based system, risks include hallucination, subtle errors, and domain shifts.
	- Evaluation metrics (BERTScore, LLM-as-a-Judge) are proxies — not full substitutes for domain expert review.

	## Acknowledgments

	This model was developed using resources provided by:

	- RAVIS Technology for feedback and collaboration.
	- Faculty of Medicine Ramathibodi Hospital
	- NSTDA Supercomputer Center (ThaiSC), Project \#pv814001

	We also acknowledge the contributions of the broader open-source community whose tools and prior works on RAFT, SciNCL, LoRA, LLaMA-3, and biomedical NLP made this project possible.

	---
	license: llama3.1
	datasets:
	- biodatlab/ec-raft-dataset
	language:
	- en
	pipeline_tag: text-generation
	---

	# EC-RAFT: Automated Generation of Clinical Trial Eligibility Criteria

	## Model Description

	EC-RAFT is a fine-tuned Retrieval-Augmented Fine-Tuning (RAFT) model based on LLaMA-3.1-8B-Instruct architecture.
	It is designed to automatically generate structured, high-quality clinical trial eligibility criteria (EC) directly from trial titles and descriptions.

	EC-RAFT integrates domain-specific retrieval with synthesized intermediate reasoning steps, enabling it to produce clinically relevant and contextually appropriate EC sets.

	## Fine-tuning Details

	- Original Model: LLaMA-3.1-8B-Instruct
	- Datasets used for fine-tuning:
	- ClinicalTrials.gov (267,347 trials, 2000–2024) [biodatlab/ec-raft-dataset](https://huggingface.co/datasets/biodatlab/ec-raft-dataset)
	- Retrieval corpus constructed using SciNCL model
	- Intermediate reasoning steps R generated using Gemini-1.5-flash-002
	- Fine-tuning method:
	- Retrieval-Augmented Fine-Tuning (RAFT)
	- Low-Rank Adaptation (LoRA)

	## Model Performance

	Evaluated on a held-out ClinicalTrials.gov test split:

	\| Metric \| Score \|
	\|-----------------------------------\|---------\|
	\| BERTScore (semantic similarity) \| 86.23 \|
	\| Precision (LLM-guided evaluation) \| 78.84% \|
	\| Recall (LLM-guided evaluation) \| 75.89% \|
	\| Mean LLM-as-a-Judge Score (0–3) \| 1.7150 \|
	\| Mean Pair-BERTScore \| 67.76 \|

	- Outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines
	- Outperforms fine-tuned LLaMA and Meditron baselines
	- Clinically validated: LLM-as-a-Judge scores highly correlated with human physician evaluation

	## Intended Use

	- Assist researchers, trial designers, and sponsors in drafting clinical trial eligibility criteria.
	- Automate EC generation to reduce manual effort and improve consistency.
	- Support clinical trial design transparency and quality.
	- Enable integration with trial registry platforms, clinical trial matching systems, and EC recommendation tools.

	## Limitations

	- Requires human validation of generated EC before clinical use.
	- Trained on public ClinicalTrials.gov data — may not generalize well to:
	- Rare or novel diseases
	- Specialized or non-standard trial designs
	- Non-public trial data
	- Optimized for English-language clinical trials.
	- As with any LLM-based system, risks include hallucination, subtle errors, and domain shifts.
	- Evaluation metrics (BERTScore, LLM-as-a-Judge) are proxies — not full substitutes for domain expert review.

	## Acknowledgments

	This model was developed using resources provided by:

	- RAVIS Technology for feedback and collaboration.
	- Faculty of Medicine Ramathibodi Hospital
	- NSTDA Supercomputer Center (ThaiSC), Project \#pv814001

	We also acknowledge the contributions of the broader open-source community whose tools and prior works on RAFT, SciNCL, LoRA, LLaMA-3, and biomedical NLP made this project possible.