Medical-Chatbot-Llama3.1-8B-4bit: Fine-Tuned LLM for Medical Q&A Chatbot
A 4-bit quantized, instruction-tuned LLaMA 3.1-8B model fine-tuned for medical question-answering using parameter-efficient fine-tuning (LoRA) and the ruslanmv/ai-medical-chatbot dataset.
Model Overview
This model builds on top of unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
and is fine-tuned using the LoRA method with Unsloth. It is trained on over 257,000 instruction-style Q&A samples from the medical chatbot dataset. The model is optimized for chat-based medical assistance and is ideal for applications in healthcare-related virtual assistants or medical FAQ systems.
Key Features
- Base Model:
unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
- Dataset:
ruslanmv/ai-medical-chatbot
- Fine-tuning: LoRA via
unsloth
framework - Quantization: 4-bit (bnb), suitable for GPU inference on consumer-grade hardware
- Model Type: Causal Language Model (AutoModelForCausalLM)
Usage Example
This model is accessible through the Hugging Face Transformers library. First install it using pip: pip install transformers
Use the following sample cod to use and interact with the Medical-Chatbot-Llama3.1-8B-4bit
. Note that the You must run the model on a GPU and use bfloat16
with device mapping.
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
import torch
# load tokenizer and model
model = AutoModelForCausalLM.from_pretrained("Dashanka/medical-chatbot-Llama3.1-8B-instruct-4bit", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Dashanka/medical-chatbot-Llama3.1-8B-instruct-4bit")
# setup the pipeline
generator_pipe = pipeline("text-generation",
model=model,
tokenizer=tokenizer,
device_map="auto", # key for multi-GPU/quantized setup
torch_dtype=torch.bfloat16
)
# Create messages structured for the chat template
sys_message = '''
You are an AI Medical Assistant trained on a dataset of health information. Please be thorough and provide an informative but consise answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
'''
user_query = "I get stomach pain when I eat spicy food."
prompt = f"### System: {sys_message}\n### User: {user_query}\n### Assistant:"
response = generator_pipe(prompt, max_new_tokens=512, temperature=0.8, do_sample=True)
output = response[0]["generated_text"].split('Assistant:')[-1]
print(output)
Downstream Use
This model can be used in a healthcare/medical chat bot assistant with appropriate prompt engineering. Furthermore, one can wrap the model into a endpoint to expose as a seperate service.
License
This work is distributed under the Apache License 2.0
Contributing
We welcome contributions to this repository. If you have improvements or suggestions, please feel free tp to create a pull request.
Disclaimer
Though the powerful Llama 3.1 8b 4bit LLM is fine-tuned on good amount of data, the the accuracy of the model's outputs cannot be 100% guaranteed. Therefore, it is not advised to follow answers directly but to consult a doctor or other healthcare professional for definitive medical advice.
- Downloads last month
- 8