Medical-Chatbot-Llama3.1-8B-4bit: Fine-Tuned LLM for Medical Q&A Chatbot

A 4-bit quantized, instruction-tuned LLaMA 3.1-8B model fine-tuned for medical question-answering using parameter-efficient fine-tuning (LoRA) and the ruslanmv/ai-medical-chatbot dataset.

Model Overview

This model builds on top of unsloth/meta-llama-3.1-8b-instruct-bnb-4bit and is fine-tuned using the LoRA method with Unsloth. It is trained on over 257,000 instruction-style Q&A samples from the medical chatbot dataset. The model is optimized for chat-based medical assistance and is ideal for applications in healthcare-related virtual assistants or medical FAQ systems.

Key Features

Usage Example

This model is accessible through the Hugging Face Transformers library. First install it using pip: pip install transformers

Use the following sample cod to use and interact with the Medical-Chatbot-Llama3.1-8B-4bit. Note that the You must run the model on a GPU and use bfloat16 with device mapping.

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
import torch

 
# load tokenizer and model
model = AutoModelForCausalLM.from_pretrained("Dashanka/medical-chatbot-Llama3.1-8B-instruct-4bit", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Dashanka/medical-chatbot-Llama3.1-8B-instruct-4bit")

# setup the pipeline
generator_pipe = pipeline("text-generation", 
                          model=model, 
                          tokenizer=tokenizer,
                          device_map="auto", # key for multi-GPU/quantized setup
                          torch_dtype=torch.bfloat16
                          )

# Create messages structured for the chat template
sys_message = ''' 
    You are an AI Medical Assistant trained on a dataset of health information. Please be thorough and provide an informative but consise answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
    ''' 
user_query = "I get stomach pain when I eat spicy food."

prompt = f"### System: {sys_message}\n### User: {user_query}\n### Assistant:"

response = generator_pipe(prompt, max_new_tokens=512, temperature=0.8, do_sample=True)
output = response[0]["generated_text"].split('Assistant:')[-1]

print(output)

Downstream Use

This model can be used in a healthcare/medical chat bot assistant with appropriate prompt engineering. Furthermore, one can wrap the model into a endpoint to expose as a seperate service.

License

This work is distributed under the Apache License 2.0

Contributing

We welcome contributions to this repository. If you have improvements or suggestions, please feel free tp to create a pull request.

Disclaimer

Though the powerful Llama 3.1 8b 4bit LLM is fine-tuned on good amount of data, the the accuracy of the model's outputs cannot be 100% guaranteed. Therefore, it is not advised to follow answers directly but to consult a doctor or other healthcare professional for definitive medical advice.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Dashanka/medical-chatbot-Llama3.1-8B-instruct-4bit