Model Card for Medical Reasoning Assistant

Model Details

Developed by: Rhaymison - Medicine Information PT Adaptation
Model type: Fine-tuned Large Language Model with explicit reasoning format
Language(s): Portuguese
License: Same as base model (Google Gemma license)
Finetuned from model: unsloth/gemma-3-1b-it-unsloth-bnb-4bit

Model Sources

Base Model Repository: https://huggingface.co/google/gemma-3-1b-it
Training Dataset: https://huggingface.co/datasets/rhaymison/medicine-information-pt

Uses

Direct Use

This model is specifically designed to answer medical questions in Portuguese with a structured reasoning approach. The dataset was specifically modified to include reasoning components, training the model to explicitly show its thought process. It first provides its analysis and reasoning process, then delivers a clear answer. The model follows a specific format:

<start_working_out>
[Detailed reasoning and analysis about the medical question]
</end_working_out>
<SOLUTION>
[Clear and concise medical answer]
</SOLUTION>

This format makes the model's reasoning transparent, allowing users to understand how it arrived at its conclusions.

Downstream Use

The model can be integrated into:

Medical education platforms
Patient information systems
Healthcare support tools
Medical information chatbots

Out-of-Scope Use

This model should NOT be used for:

Direct medical diagnosis without professional oversight
Replacing healthcare professionals
Providing treatment recommendations without medical supervision
Critical healthcare decisions without human verification

Bias, Risks, and Limitations

Medical Accuracy: While trained on medical information, the model may still produce inaccurate or incomplete medical information.
Language Limitation: The model is primarily trained to respond in Portuguese.
Data Cutoff: Knowledge is limited to the training data and base model's knowledge cutoff.
No Real-time Data: The model lacks access to real-time medical research or updates.
Reasoning Limitations: The model attempts to provide reasoning but may not capture all relevant medical factors.
Not a Medical Professional: This is an AI tool and should not replace professional medical advice.

Recommendations

Always verify any medical information with qualified healthcare professionals.
Use the model as a supplementary information tool, not as a primary source for medical decisions.
Review the model's reasoning process to understand how it reached its conclusions.
Be aware that the model may occasionally generate incorrect or incomplete information.
Supervise model usage in healthcare settings.

How to Get Started with the Model

# !pip install unsloth git+https://github.com/huggingface/[email protected] -q
from unsloth import FastModel
import torch

# Modelo base Unsloth
base_model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
    max_seq_length=1024,
    load_in_4bit=True,  # Atenção: usar a mesma quantização
)

# Adaptadores
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, "drguilhermeapolinario/gemma3-1b_med_reasoning")

# Query
input_text = "Como uma pancreatite se manifesta?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was fine-tuned on the "rhaymison/medicine-information-pt" dataset, which contains medical information in Portuguese. The dataset was modified and processed to include explicit reasoning components. Each entry was transformed to separate questions from answers and reformatted to include dedicated reasoning sections, following the structured format with <start_working_out>, <end_working_out>, <SOLUTION>, and </SOLUTION> tags.

Training Procedure

The model was trained using Generative Reward-guided Policy Optimization (GRPO) with custom reward functions specifically designed to encourage and reward structured reasoning. The dataset was enhanced with explicit reasoning components before training. The training procedure included:

Processing the dataset to separate questions and answers
Defining a structured format for responses with explicit reasoning and solution sections
Creating reward functions to evaluate:
- Exact format matching
- Approximate format compliance
- Answer quality
- Reasoning quality and depth

Training Hyperparameters

Training method: Parameter-Efficient Fine-Tuning (PEFT) with LoRA
Learning rate: 5e-6
Optimizer: AdamW with fused implementation
Weight decay: 0.1
Warmup ratio: 0.1
Training steps: 300
Batch size: 1 per device
Gradient accumulation steps: 4
Maximum sequence length: 1024
Number of generations per step: 2

Evaluation

Metrics

The model was evaluated based on:

Format compliance (use of specified tags)
Reasoning quality (depth and use of medical terminology)
Answer accuracy compared to reference answers
Overall response coherence

Results

The model successfully learned to:

Provide structured responses with separated reasoning and solution sections
Include relevant medical terminology in its reasoning
Deliver accurate medical information for common conditions and questions

Environmental Impact

Hardware Type: NVIDIA A100-SXM4-40GB
Cloud Provider: Google Colab
Training Duration: Approximately 5 hours

Technical Specifications

Model Architecture and Objective

The model uses the Gemma-3-1B architecture with LoRA adapters applied to attention and MLP modules. The training objective was to optimize the model to:

Follow a specific reasoning format
Provide detailed medical reasoning
Deliver accurate answers to medical questions in Portuguese

Hardware Requirements

GPU with at least 16GB memory recommended for inference
30GB+ GPU memory recommended for further fine-tuning

Software Requirements

transformers >= 4.49.0
peft (latest version)
torch >= 2.0.0
accelerate

Citation

If you use this model in your research, please cite:

@misc{medical-reasoning-assistant,
  author = {Rhaymison},
  title = {Medical Reasoning Assistant: A Fine-tuned Model for Structured Medical Information in Portuguese},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/your-username/medicine-reasoning-model}}
}

Model Card Contact

For questions or issues related to this model, please contact the model author through Hugging Face.

drguilhermeapolinario
/

gemma3-1b_med_reasoning