Model Card for Medical Reasoning Assistant
Model Details
- Developed by: Rhaymison - Medicine Information PT Adaptation
- Model type: Fine-tuned Large Language Model with explicit reasoning format
- Language(s): Portuguese
- License: Same as base model (Google Gemma license)
- Finetuned from model: unsloth/gemma-3-1b-it-unsloth-bnb-4bit
Model Sources
- Base Model Repository: https://huggingface.co/google/gemma-3-1b-it
- Training Dataset: https://huggingface.co/datasets/rhaymison/medicine-information-pt
Uses
Direct Use
This model is specifically designed to answer medical questions in Portuguese with a structured reasoning approach. The dataset was specifically modified to include reasoning components, training the model to explicitly show its thought process. It first provides its analysis and reasoning process, then delivers a clear answer. The model follows a specific format:
<start_working_out>
[Detailed reasoning and analysis about the medical question]
</end_working_out>
<SOLUTION>
[Clear and concise medical answer]
</SOLUTION>
This format makes the model's reasoning transparent, allowing users to understand how it arrived at its conclusions.
Downstream Use
The model can be integrated into:
- Medical education platforms
- Patient information systems
- Healthcare support tools
- Medical information chatbots
Out-of-Scope Use
This model should NOT be used for:
- Direct medical diagnosis without professional oversight
- Replacing healthcare professionals
- Providing treatment recommendations without medical supervision
- Critical healthcare decisions without human verification
Bias, Risks, and Limitations
- Medical Accuracy: While trained on medical information, the model may still produce inaccurate or incomplete medical information.
- Language Limitation: The model is primarily trained to respond in Portuguese.
- Data Cutoff: Knowledge is limited to the training data and base model's knowledge cutoff.
- No Real-time Data: The model lacks access to real-time medical research or updates.
- Reasoning Limitations: The model attempts to provide reasoning but may not capture all relevant medical factors.
- Not a Medical Professional: This is an AI tool and should not replace professional medical advice.
Recommendations
- Always verify any medical information with qualified healthcare professionals.
- Use the model as a supplementary information tool, not as a primary source for medical decisions.
- Review the model's reasoning process to understand how it reached its conclusions.
- Be aware that the model may occasionally generate incorrect or incomplete information.
- Supervise model usage in healthcare settings.
How to Get Started with the Model
# !pip install unsloth git+https://github.com/huggingface/[email protected] -q
from unsloth import FastModel
import torch
# Modelo base Unsloth
base_model, tokenizer = FastModel.from_pretrained(
model_name="unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
max_seq_length=1024,
load_in_4bit=True, # Atenção: usar a mesma quantização
)
# Adaptadores
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, "drguilhermeapolinario/gemma3-1b_med_reasoning")
# Query
input_text = "Como uma pancreatite se manifesta?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
The model was fine-tuned on the "rhaymison/medicine-information-pt" dataset, which contains medical information in Portuguese. The dataset was modified and processed to include explicit reasoning components. Each entry was transformed to separate questions from answers and reformatted to include dedicated reasoning sections, following the structured format with <start_working_out>
, <end_working_out>
, <SOLUTION>
, and </SOLUTION>
tags.
Training Procedure
The model was trained using Generative Reward-guided Policy Optimization (GRPO) with custom reward functions specifically designed to encourage and reward structured reasoning. The dataset was enhanced with explicit reasoning components before training. The training procedure included:
- Processing the dataset to separate questions and answers
- Defining a structured format for responses with explicit reasoning and solution sections
- Creating reward functions to evaluate:
- Exact format matching
- Approximate format compliance
- Answer quality
- Reasoning quality and depth
Training Hyperparameters
- Training method: Parameter-Efficient Fine-Tuning (PEFT) with LoRA
- Learning rate: 5e-6
- Optimizer: AdamW with fused implementation
- Weight decay: 0.1
- Warmup ratio: 0.1
- Training steps: 300
- Batch size: 1 per device
- Gradient accumulation steps: 4
- Maximum sequence length: 1024
- Number of generations per step: 2
Evaluation
Metrics
The model was evaluated based on:
- Format compliance (use of specified tags)
- Reasoning quality (depth and use of medical terminology)
- Answer accuracy compared to reference answers
- Overall response coherence
Results
The model successfully learned to:
- Provide structured responses with separated reasoning and solution sections
- Include relevant medical terminology in its reasoning
- Deliver accurate medical information for common conditions and questions
Environmental Impact
- Hardware Type: NVIDIA A100-SXM4-40GB
- Cloud Provider: Google Colab
- Training Duration: Approximately 5 hours
Technical Specifications
Model Architecture and Objective
The model uses the Gemma-3-1B architecture with LoRA adapters applied to attention and MLP modules. The training objective was to optimize the model to:
- Follow a specific reasoning format
- Provide detailed medical reasoning
- Deliver accurate answers to medical questions in Portuguese
Hardware Requirements
- GPU with at least 16GB memory recommended for inference
- 30GB+ GPU memory recommended for further fine-tuning
Software Requirements
- transformers >= 4.49.0
- peft (latest version)
- torch >= 2.0.0
- accelerate
Citation
If you use this model in your research, please cite:
@misc{medical-reasoning-assistant,
author = {Rhaymison},
title = {Medical Reasoning Assistant: A Fine-tuned Model for Structured Medical Information in Portuguese},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/your-username/medicine-reasoning-model}}
}
Model Card Contact
For questions or issues related to this model, please contact the model author through Hugging Face.
Model tree for drguilhermeapolinario/gemma3-1b_med_reasoning
Base model
google/gemma-3-1b-pt