Model Card for llama2-medical-finetuned

Model Details

Model Description

This is a finetuned version of LLaMA 2 specialized for medical text understanding and generation tasks. It is designed to assist with medical data processing, clinical note summarization, and healthcare question answering.

Developed by: Cydonia01
Shared by: Cydonia01 on Hugging Face
Model type: Large Language Model (Transformer-based, quantized with BitsAndBytes 4-bit NF4)
Language(s) (NLP): English (primarily medical domain)
Finetuned from model: LLaMA 2 (Meta AI, base model: aboonaji/llama2finetune-v2)

Model Sources

Repository: https://huggingface.co/Cydonia01/llama2-medical-finetuned

Uses

Direct Use

Medical text generation and summarization
Clinical decision support tools
Medical Q&A systems

Downstream Use

Integration into healthcare NLP pipelines
Training further domain-specific models

Out-of-Scope Use

Not intended for direct diagnostic or treatment decision-making without expert review
Should not be used for generating legally binding medical advice

Bias, Risks, and Limitations

The model may reflect biases present in training data from medical literature and may generate incorrect or outdated medical information.
Not a substitute for professional medical advice or diagnosis.
Users should verify outputs with medical professionals.

Recommendations

Users should exercise caution when deploying the model in real-world medical scenarios and combine its outputs with expert validation.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Cydonia01/llama2-medical-finetuned")
model = AutoModelForCausalLM.from_pretrained("Cydonia01/llama2-medical-finetuned")

input_text = "Explain the symptoms of diabetes."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Curated dataset of medical texts including wiki medical terms dataset (aboonaji/wiki_medical_terms_llam2_format).

Training Procedure

Finetuned from aboonaji/llama2finetune-v2 base model using 4-bit quantization with BitsAndBytes (NF4), using PEFT LoRA method for parameter-efficient tuning. The training employed causal language modeling.

Training Hyperparameters

Batch size: 1 (per device) with gradient accumulation of 4
Max steps: 100
LoRA config: r=16, alpha=16, dropout=0.1

Environmental Impact

Hardware Type: NVIDIA Tesla T4 GPU (Google Colab)
Hours used: Approximately 0.75 hours (45 minutes)
Cloud Provider: Google Colab

Technical Specifications

Model Architecture and Objective

LLaMA 2 base model finetuned with causal language modeling, quantized to 4-bit precision using NF4 quantization for efficiency, with LoRA PEFT fine-tuning.

Compute Infrastructure

Training was conducted on Google Colab’s cloud environment, utilizing accessible GPU resources optimized for research and experimentation. The setup leverages efficient quantization and parameter-efficient fine-tuning techniques to minimize compute requirements.

Hardware

NVIDIA Tesla T4 GPU with 16 GB VRAM, supporting mixed precision (float16) and 4-bit quantization via BitsAndBytes library.

Software

PyTorch
Transformers (Hugging Face)
PEFT (LoRA)
BitsAndBytes (4-bit quantization)
Datasets (Hugging Face)

Framework versions

PEFT 0.13.2
Transformers (compatible version with PEFT)
PyTorch (compatible with float16 and 4-bit quantization)

Cydonia01
/

llama2-medical-finetuned