Model Card: MedGemma-4B-Abliterated

Model Description

This model is a fine-tuned version of google/medgemma-4b-it that has undergone an "abliteration" process to reduce its propensity to generate harmful or undesirable content, followed by instruction fine-tuning on a custom dataset using Axolotl.

The abliteration process aimed to instill a specific behavior (e.g., refusal of harmful requests) by identifying and orthogonalizing a "refusal direction" in the model's weights. This was followed by LoRA (Low-Rank Adaptation) fine-tuning to adapt the model to a new dataset while preserving the benefits of abliteration and the base model's capabilities.

Base Model: google/medgemma-4b-it Abliteration Target Dataset: mlabonne/harmful_behaviours **Abliteration Baseline Dataset:** mlabonne/harmless_alpaca` Fine-tuning Framework: Axolotl

Intended Uses & Limitations

Intended Uses

This model is intended for true uncensored medical use

Assisting with medical information retrieval based on provided instructions.
Generating responses to medical queries in a helpful and safe manner.
Research into mitigating harmful outputs in LLMs while maintaining utility on specific tasks.
Proper fine tuning

The model is designed to follow instructions and provide informative responses within its trained domain, with an increased resistance to generating outputs related to the abliterated behaviors.

Limitations and Out-of-Scope Uses

Not a Medical Professional: This model is NOT a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition. Never disregard professional medical advice or delay in seeking it because of something you have read or received from this model.
Potential for Hallucinations: Like all LLMs, this model can generate incorrect, biased, or nonsensical information (hallucinations), even in its domain of expertise. Outputs should be critically reviewed and verified.
Abliteration Imperfection: While abliteration aims to reduce specific unwanted behaviors, it may not be 100% effective, and the model might still produce undesirable content under certain prompts or conditions. The fine-tuning process might also slightly alter the effectiveness of the abliteration.
Knowledge Cutoff: The model's knowledge is limited to the data it was trained on (both pre-training of MedGemma and your fine-tuning dataset). It will not have information about events or developments occurring after its last training update.
Bias: The model may reflect biases present in its training data.
Not for Critical Decisions: Do not use this model for making critical decisions where an error could lead to harm.

How to Use

This model can be used with the Hugging Face transformers library. If LoRA adapters were trained, they need to be loaded on top of the abliterated base model.

from transformers import AutoModelForCausalLM, AutoTokenizer
# from peft import PeftModel # If using LoRA adapters
import torch

base_model_path = "[Path to your abliterated MedGemma 4B model, e.g., 'your_username/medgemma-4b-abliterated']"
# If you merged LoRA adapters into the base model and saved it as a new model:
# finetuned_model_path = "[Path to your final merged and fine-tuned model, e.g., 'your_username/medgemma-4b-abliterated-finetuned']"
# model = AutoModelForCausalLM.from_pretrained(finetuned_model_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
# tokenizer = AutoTokenizer.from_pretrained(finetuned_model_path, trust_remote_code=True)

# If you are loading LoRA adapters separately:
# model = AutoModelForCausalLM.from_pretrained(base_model_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
# adapter_path = "[Path to your trained LoRA adapters, e.g., '/content/drive/MyDrive/AI Work/axolotl_finetune_4b_output_run1/final_checkpoint_or_adapter_folder']"
# model = PeftModel.from_pretrained(model, adapter_path)
# model = model.merge_and_unload() # Optional: merge for faster inference
# tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)


# Example usage:
prompt_template = """USER: {instruction}
ASSISTANT:"""
instruction = "What are the common symptoms of influenza?"
full_prompt = prompt_template.format(instruction=instruction)

inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))