Model Card: Lekhansh/Llama-3.2-3B-It-EHR-TextsimplificationAndIE
Model Overview
This model is a fine-tuned version of Meta/Llama-3.2-3B-Instruct
designed for two core tasks in addiction psychiatry clinical workflows:
- Proofreading and standardizing unstructured clinical notes (CNs) from Electronic Health Records (EHR).
- Extracting structured substance use information, specifically substance presence and last use timing.
The model was developed on CNs from a five-year EHR dataset (2018โ2023), annotated by doctors and nurses for gold-standard benchmarking. It outperformed baseline methods (Jamspell,medSpacy Contextual corrector) and GPT-4-o on both proofreading and information extraction. Human raters were unable to reliably distinguish model-edited notes from human-edited ones and preferred model outputs in a majority of cases. Despite strong overall performance (Mean F1: 0.99), performance on rarer substance classes like hallucinogens remains limited.
Read research: https://osf.io/preprints/osf/d5m6e_v1
Dataset
- Source: 6,500 addiction psychiatry clinical notes from NIMHANS EHR (2018โ2023)
- Annotations: By qualified clinical staff
- Access: Dataset is not publicly available. Researchers may request access after clearance from the NIMHANS data safety board. A small sample is accessible at
https://docs.google.com/spreadsheets/d/1JbBlDxFYZCXuvGJL06gDxwzi2GQIYZ-BZ1R6wRX93gI/edit?usp=sharing
.
Training Details
Base model: Meta/Llama-3.2-3B-Instruct
Framework: TRL (Transformer Reinforcement Learning)
Training examples: 5,563; Validation set: 686
Tokens seen: 1,216,042
LoRA Adapter Rank: 64
Quantisation: 4-bit Quantisation Aware Training (QLoRA)
Hyperparameters:
- Learning rate: 1e-5
- Scheduler: Cosine
- Epochs: 3
- Batch size: 4
- Gradient Accumulation: 5
Generation during validation:
- Temperature: 0.1
- Top-p: 0.95
Hardware: A6000 (48GB), single GPU
Training time: ~6 hours
Evaluation
Proofreading:
- Increased readability: Flesch Kincaid 16 -> 9
- Reduced out-of-vocabulary terms: 5 % -> 1%
- Human evaluation: Only 27.9% identification accuracy (model vs human); 55.7% preference for model output
- Similarity to human corrected: METEOR:0.86, BERT Score: 0.85, Token Level F1: 0.73
Information Extraction:
- Mean F1 score: 0.99
- Limitations on rare classes (e.g., hallucinogens)
Intended Use
Suitable for:
- Research on standardization and information extraction from EHR clinical notes
- Academic benchmarking and prototyping
Limitations:
- Not tested outside addiction psychiatry or Indian EHR data
- Not validated for deployment in clinical decision support
- Should not be used in production settings
Output Format: Trained for structured JSON outputs
Model Architecture
- Size: 3.2B parameters
- Quantisation: 4-bit
- Adapter Type: LoRA (Rank 64)
Licensing
- License: For academic research use only
- Usage Restriction: Commercial and clinical use is prohibited without explicit permission. Contact author for details.
- Downloads last month
- 6