Model Card: Lekhansh/Llama-3.2-3B-It-EHR-TextsimplificationAndIE

Model Overview

This model is a fine-tuned version of Meta/Llama-3.2-3B-Instruct designed for two core tasks in addiction psychiatry clinical workflows:

Proofreading and standardizing unstructured clinical notes (CNs) from Electronic Health Records (EHR).
Extracting structured substance use information, specifically substance presence and last use timing.

The model was developed on CNs from a five-year EHR dataset (2018–2023), annotated by doctors and nurses for gold-standard benchmarking. It outperformed baseline methods (Jamspell,medSpacy Contextual corrector) and GPT-4-o on both proofreading and information extraction. Human raters were unable to reliably distinguish model-edited notes from human-edited ones and preferred model outputs in a majority of cases. Despite strong overall performance (Mean F1: 0.99), performance on rarer substance classes like hallucinogens remains limited. Read research: https://osf.io/preprints/osf/d5m6e_v1

Dataset

Source: 6,500 addiction psychiatry clinical notes from NIMHANS EHR (2018–2023)
Annotations: By qualified clinical staff
Access: Dataset is not publicly available. Researchers may request access after clearance from the NIMHANS data safety board. A small sample is accessible at https://docs.google.com/spreadsheets/d/1JbBlDxFYZCXuvGJL06gDxwzi2GQIYZ-BZ1R6wRX93gI/edit?usp=sharing.

Training Details

Base model: Meta/Llama-3.2-3B-Instruct
Framework: TRL (Transformer Reinforcement Learning)
Training examples: 5,563; Validation set: 686
Tokens seen: 1,216,042
LoRA Adapter Rank: 64
Quantisation: 4-bit Quantisation Aware Training (QLoRA)
Hyperparameters:
- Learning rate: 1e-5
- Scheduler: Cosine
- Epochs: 3
- Batch size: 4
- Gradient Accumulation: 5
Generation during validation:
- Temperature: 0.1
- Top-p: 0.95
Hardware: A6000 (48GB), single GPU
Training time: ~6 hours

Evaluation

Proofreading:
- Increased readability: Flesch Kincaid 16 -> 9
- Reduced out-of-vocabulary terms: 5 % -> 1%
- Human evaluation: Only 27.9% identification accuracy (model vs human); 55.7% preference for model output
- Similarity to human corrected: METEOR:0.86, BERT Score: 0.85, Token Level F1: 0.73
Information Extraction:
- Mean F1 score: 0.99
- Limitations on rare classes (e.g., hallucinogens)

Intended Use

Suitable for:
- Research on standardization and information extraction from EHR clinical notes
- Academic benchmarking and prototyping
Limitations:
- Not tested outside addiction psychiatry or Indian EHR data
- Not validated for deployment in clinical decision support
- Should not be used in production settings
Output Format: Trained for structured JSON outputs

Model Architecture

Size: 3.2B parameters
Quantisation: 4-bit
Adapter Type: LoRA (Rank 64)

Licensing

License: For academic research use only
Usage Restriction: Commercial and clinical use is prohibited without explicit permission. Contact author for details.

Lekhansh
/

Llama-3.2-3B-It-EHR-TextsimplificationAndIE

You need to agree to share your contact information to access this model