This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct. It has been trained using TRL with GRPO (Generative Reward-Powered Optimization) for medical question answering.

Model Details

  • Base Model: Qwen/Qwen2-0.5B-Instruct
  • Training Method: GRPO (Generative Reward-Powered Optimization)
  • Training Dataset: FreedomIntelligence/medical-o1-reasoning-SFT
  • Hardware: Single GPU

Quick Start

question = "What are the common symptoms of diabetes?"
    system_prompt = """You are a medical AI assistant. Provide detailed reasoning before giving your final answer.
    Respond in the following format:

    <reasoning>
    ...
    </reasoning>
    <answer>
    ...
    </answer>
    """

    prompt = [
        {'role': 'system', 'content': system_prompt},
        {'role': 'user', 'content': question}
    ]
    
    inputs = tokenizer.apply_chat_template(prompt, return_tensors="pt")
    outputs = model.generate(inputs, max_new_tokens=512)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(response) 

Training Details

This model was trained using GRPO with multiple reward components:

  • Correctness reward (weight: 2.0)
  • Format adherence reward (weight: 1.0)
  • Reasoning quality reward (weight: 1.0)

Framework Versions

  • TRL: 0.13.0
  • Transformers: Latest
  • PyTorch: Latest
  • Flash Attention 2: Enabled
Downloads last month
6
Safetensors
Model size
1.54B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for iben/Tinymedical-o1

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(846)
this model
Quantizations
2 models