## Setting Up

In [1]:
%%capture
%pip install -U accelerate 
%pip install -U peft 
%pip install -U trl 
%pip install -U bitsandbytes
%pip install -U transformers
%pip install -U tensorboard
%pip install -U openai-harmony
%pip install -U tiktoken
%pip install -U pyctcdecode

In [2]:
from huggingface_hub import login
import os
HF_TOKEN= os.getenv("HF_TOKEN")

login(HF_TOKEN)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


## Configs

In [3]:
BASE_MODEL_ID = "openai/gpt-oss-20b"
SAVED_MODEL_ID = "gpt-oss-20b-dermatology-qa"
DATASET_NAME = "kingabzpro/dermatology-qa-firecrawl-dataset"

## Loading the model and tokenizer

In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Mxfp4Config

quantization_config = Mxfp4Config(dequantize=True)
model_kwargs = dict(
    attn_implementation="eager",
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config,
    use_cache=False,
    device_map="auto",
)

model = AutoModelForCausalLM.from_pretrained(BASE_MODEL_ID, **model_kwargs)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## Loading and processing the dataset

In [5]:
from openai_harmony import (
    Conversation,
    DeveloperContent,
    HarmonyEncodingName,
    Message,
    Role,
    load_harmony_encoding,
)

# Load the Harmony encoder once
enc = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)

In [6]:
DERM_DEV_INSTRUCTIONS = (
    "You are a board-certified dermatologist answering various dermatology questions."
    " Answer clearly in 1–3 sentences. No speculation."
)


def render_pair_harmony(question: str, answer: str) -> str:
    """Harmony-formatted prompt for training."""
    convo = Conversation.from_messages(
        [
            Message.from_role_and_content(
                Role.DEVELOPER,
                DeveloperContent.new().with_instructions(DERM_DEV_INSTRUCTIONS),
            ),
            Message.from_role_and_content(Role.USER, question.strip()),
            Message.from_role_and_content(Role.ASSISTANT, answer.strip()),
        ]
    )
    tokens = enc.render_conversation(convo)
    return enc.decode(tokens)


In [7]:
from datasets import load_dataset

# Load dataset
dataset = load_dataset("kingabzpro/dermatology-qa-firecrawl-dataset", split="train")


def to_harmony_batch(examples: dict) -> dict:
    """Convert batch of dermatology Q&A pairs to harmony format."""
    questions = examples["question"]
    answers = examples["answer"]

    formatted_texts = []
    for question, answer in zip(questions, answers):
        formatted_text = render_pair_harmony(question.strip(), answer.strip())
        formatted_texts.append(formatted_text)

    return {"text": formatted_texts}


# Process dataset
dataset = dataset.map(to_harmony_batch, batched=True)
dataset = dataset.train_test_split(test_size=0.1, seed=42)

print(dataset)

print(dataset["train"][0]["text"])




Map:   0%|          | 0/1001 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['question', 'answer', 'condition', 'difficulty', 'source_url', 'text'],
        num_rows: 900
    })
    test: Dataset({
        features: ['question', 'answer', 'condition', 'difficulty', 'source_url', 'text'],
        num_rows: 101
    })
})
<|start|>developer<|message|># Instructions

You are a board-certified dermatologist answering various dermatology questions. Answer clearly in 1–3 sentences. No speculation.<|end|><|start|>user<|message|>What type of skin changes accompany the pustules in GPP?<|end|><|start|>assistant<|message|>The skin surrounding the pustules becomes erythematous, which means it appears red and inflamed. The affected skin is also painful. These changes occur during the recurrent flares of the disease.<|end|>


## Model inference before fine-tuning

In [8]:
def render_inference_harmony(question: str) -> str:
    """Harmony-formatted prompt for inference."""
    convo = Conversation.from_messages(
        [
            Message.from_role_and_content(
                Role.DEVELOPER,
                DeveloperContent.new().with_instructions(DERM_DEV_INSTRUCTIONS),
            ),
            Message.from_role_and_content(Role.USER, question.strip()),
        ]
    )
    tokens = enc.render_conversation_for_completion(convo, Role.ASSISTANT)
    return enc.decode(tokens)


In [9]:
question = dataset["test"][20]["question"]

text = render_inference_harmony(question)

inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0])


<|start|>developer<|message|># Instructions

You are a board-certified dermatologist answering various dermatology questions. Answer clearly in 1–3 sentences. No speculation.<|end|><|start|>user<|message|>Why might winter be a problematic season for some people with eczema?<|end|><|start|>assistant<|channel|>analysis<|message|>They ask: "Why might winter be a problematic season for some people with eczema?" A dermatologist must answer succinctly. We'll provide reasons: cold, dry air, indoor heating increases dryness, reduces skin barrier, triggers flare-ups. Also less humidity helps dryness, exposure to indoor allergens, etc. Provide 1-3 sentences. Must be no speculation. Provide factual explanation. Must answer clearly.<|end|><|start|>assistant<|channel|>final<|message|>Winter can trigger eczema flare‑ups because cold, dry air and indoor heating strip the skin of moisture, compromising the skin barrier and making it more prone to irritation and infection. Lower humidity also increases

In [10]:
start_idx = response[0].find("<|start|>assistant<|channel|>final<|message|>") + \
            len("<|start|>assistant<|channel|>final<|message|>")
end_idx = response[0].rfind("<|return|>") if "<|return|>" in response[0] else len(response[0])
final_answer = response[0][start_idx:end_idx].strip()
print(final_answer)


Winter can trigger eczema flare‑ups because cold, dry air and indoor heating strip the skin of moisture, compromising the skin barrier and making it more prone to irritation and infection. Lower humidity also increases scratching and can worsen inflammation, so maintaining skin hydration is particularly important during the colder months.


In [11]:
dataset["test"][20]["answer"]

'During winter, indoor air tends to be dry, which can trigger eczema flare‑ups for some individuals. The dryness of indoor environments in winter is a known trigger for these patients.'

## Setting up the model

In [12]:
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules="all-linear",
    target_parameters=[
        "7.mlp.experts.gate_up_proj",
        "7.mlp.experts.down_proj",
        "15.mlp.experts.gate_up_proj",
        "15.mlp.experts.down_proj",
        "23.mlp.experts.gate_up_proj",
        "23.mlp.experts.down_proj",
    ],
)
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

trainable params: 15,040,512 || all params: 20,929,797,696 || trainable%: 0.0719




In [13]:
from trl import SFTConfig

training_args = SFTConfig(
    learning_rate=2e-4,
    gradient_checkpointing=True,
    num_train_epochs=1,
    logging_steps=10,
    bf16=True,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=2,
    max_length=2048,
    warmup_ratio=0.03,
    eval_strategy="steps",
    eval_steps=10,
    lr_scheduler_type="cosine_with_min_lr",
    lr_scheduler_kwargs={"min_lr_rate": 0.1},
    output_dir=SAVED_MODEL_ID,
    report_to="tensorboard",
    push_to_hub=True,
)

## Model Training

In [14]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model=peft_model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    processing_class=tokenizer,
)
trainer.train()

Adding EOS to train dataset:   0%|          | 0/900 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/900 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/900 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/101 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/101 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/101 [00:00<?, ? examples/s]

Step,Training Loss,Validation Loss
10,4.9702,2.089929
20,1.4549,0.981939
30,0.8719,0.867009
40,0.8309,0.836862
50,0.845,0.823363


TrainOutput(global_step=57, training_loss=1.6722588371812253, metrics={'train_runtime': 402.1596, 'train_samples_per_second': 2.238, 'train_steps_per_second': 0.142, 'total_flos': 1.1781569356397568e+16, 'train_loss': 1.6722588371812253})

In [15]:
trainer.save_model(SAVED_MODEL_ID)
trainer.push_to_hub(dataset_name=SAVED_MODEL_ID)

Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...events.1756035182.51336aae7b3d.93.0: 100%|##########| 6.77kB / 6.77kB            

  ...events.1756035572.51336aae7b3d.93.1: 100%|##########| 10.3kB / 10.3kB            

  ...ents.1756037571.51336aae7b3d.2438.0: 100%|##########| 6.45kB / 6.45kB            

  ...ents.1756037458.51336aae7b3d.2174.0: 100%|##########| 6.45kB / 6.45kB            

  ...ents.1756037652.51336aae7b3d.2641.0: 100%|##########| 10.4kB / 10.4kB            

  ...0b-dermatology-qa/training_args.bin: 100%|##########| 6.16kB / 6.16kB            

  ...s-20b-dermatology-qa/tokenizer.json: 100%|##########| 27.9MB / 27.9MB            

  ...tology-qa/adapter_model.safetensors:  70%|######9   | 41.8MB / 60.2MB            

No files have been modified since last commit. Skipping to prevent empty commit.


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...events.1756035182.51336aae7b3d.93.0: 100%|##########| 6.77kB / 6.77kB            

  ...events.1756035572.51336aae7b3d.93.1: 100%|##########| 10.3kB / 10.3kB            

  ...ents.1756037458.51336aae7b3d.2174.0: 100%|##########| 6.45kB / 6.45kB            

  ...ents.1756037571.51336aae7b3d.2438.0: 100%|##########| 6.45kB / 6.45kB            

  ...ents.1756037652.51336aae7b3d.2641.0: 100%|##########| 10.4kB / 10.4kB            

  ...0b-dermatology-qa/training_args.bin: 100%|##########| 6.16kB / 6.16kB            

  ...tology-qa/adapter_model.safetensors:  56%|#####5    | 33.6MB / 60.2MB            

  ...s-20b-dermatology-qa/tokenizer.json: 100%|##########| 27.9MB / 27.9MB            

CommitInfo(commit_url='https://huggingface.co/kingabzpro/gpt-oss-20b-dermatology-qa/commit/b1706fde1cbc1942ccf763061fa31b22e5b61cc6', commit_message='End of training', commit_description='', oid='b1706fde1cbc1942ccf763061fa31b22e5b61cc6', pr_url=None, repo_url=RepoUrl('https://huggingface.co/kingabzpro/gpt-oss-20b-dermatology-qa', endpoint='https://huggingface.co', repo_type='model', repo_id='kingabzpro/gpt-oss-20b-dermatology-qa'), pr_revision=None, pr_num=None)

## Model inference after fine-tuning

In [1]:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

BASE_MODEL_ID = "openai/gpt-oss-20b"
SAVED_LORA_MODEL_ID = "gpt-oss-20b-dermatology-qa"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)

# Load the original model first
model_kwargs = dict(
    attn_implementation="eager", torch_dtype="auto", use_cache=True, device_map="cuda"
)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID, **model_kwargs
)

# Merge fine-tuned weights with the base model
model = PeftModel.from_pretrained(base_model, SAVED_LORA_MODEL_ID)
model = model.merge_and_unload()

MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [2]:
from datasets import load_dataset

# Load dataset
dataset = load_dataset("kingabzpro/dermatology-qa-firecrawl-dataset", split="train")
dataset = dataset.train_test_split(test_size=0.1, seed=42)

In [9]:
from openai_harmony import (
    Conversation,
    DeveloperContent,
    HarmonyEncodingName,
    Message,
    Role,
    load_harmony_encoding,
)

# Load the Harmony encoder once
enc = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)

DERM_DEV_INSTRUCTIONS = (
    "You are a board-certified dermatologist answering various dermatology questions."
    " Answer clearly in 1–3 sentences. No speculation."
)

def render_inference_harmony(question: str) -> str:
    """Harmony-formatted prompt for inference."""
    convo = Conversation.from_messages(
        [
            Message.from_role_and_content(
                Role.DEVELOPER,
                DeveloperContent.new().with_instructions(DERM_DEV_INSTRUCTIONS),
            ),
            Message.from_role_and_content(Role.USER, question.strip()),
        ]
    )
    tokens = enc.render_conversation_for_completion(convo, Role.ASSISTANT)
    return enc.decode(tokens)

def extract_final_answer(text):
    # Find the start of the assistant's final message
    start_marker = "<|start|>assistant<|message|>"
    start_idx = text.find(start_marker)
    
    if start_idx == -1:
        return "No answer found in the text"
    
    # Move to the beginning of the actual answer
    start_idx += len(start_marker)
    
    # Find the end of the answer (either next tag or end of text)
    end_idx = text.find("<|end|>", start_idx)
    if end_idx == -1:
        end_idx = len(text)
    
    # Extract and clean the answer
    answer = text[start_idx:end_idx].strip()
    
    return answer

In [10]:
question = dataset["test"][20]["question"]

text = render_inference_harmony(question)

inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
final_answer = extract_final_answer(response[0])
print(final_answer)

During winter, the dry skin that is typical in eczema can become even more dry, which may worsen eczema symptoms. The lack of moisture in the air can increase skin dryness.


In [14]:
text

'<|start|>developer<|message|># Instructions\n\nYou are a board-certified dermatologist answering various dermatology questions. Answer clearly in 1–3 sentences. No speculation.<|end|><|start|>user<|message|>How does the source suggest clinicians approach the diagnosis of rosacea?<|end|><|start|>assistant'

In [11]:
dataset["test"][20]["answer"]

'During winter, indoor air tends to be dry, which can trigger eczema flare‑ups for some individuals. The dryness of indoor environments in winter is a known trigger for these patients.'

In [12]:
question = dataset["test"][50]["question"]

text = render_inference_harmony(question)

inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
final_answer = extract_final_answer(response[0])
print(final_answer)

The source suggests that clinicians approach the diagnosis of rosacea as a dynamic process that requires ongoing reassessment as patients develop new symptoms or present new presentations, rather than a one-time determination. This approach is intended to address changing manifestations within patients and evolving presentations in the population.


In [13]:
dataset["test"][50]["answer"]

'The source suggests using a stepped approach, which typically involves evaluating the patient and then progressing through treatment options as needed. The text also mentions a differential diagnosis list to aid in distinguishing rosacea from other similar conditions.'

In [4]:
from transformers import pipeline

# Load pipeline
generator = pipeline(
    "text-generation",
    model="kingabzpro/gpt-oss-20b-dermatology-qa",
    device="cuda"  # or device=0
)


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cuda


In [5]:
question = "How does the source suggest clinicians approach the diagnosis of rosacea?"

output = generator(
    [{"role": "user", "content": question}],
    max_new_tokens=200,
    return_full_text=False
)[0]

print(output["generated_text"])

The source advises that clinicians should not rely solely on clinical presentation to diagnose rosacea. Instead, they should use a standardized, validated diagnostic tool such as the 2016 International Rosacea Consensus (IRC) criteria to confirm the diagnosis. This approach ensures a consistent and evidence‑based assessment rather than a subjective interpretation of symptoms.


In [6]:
prompt = "<|start|>developer<|message|># Instructions\n\nYou are a board-certified dermatologist answering various dermatology questions. Answer clearly in 1–3 sentences. No speculation.<|end|><|start|>user<|message|>How does the source suggest clinicians approach the diagnosis of rosacea?<|end|><|start|>assistant"

output = generator(
    prompt,
    max_new_tokens=200,
    return_full_text=False
)[0]

print(output["generated_text"])

The source indicates that clinicians should consider rosacea when patients present with erythematous facial skin and may need to differentiate it from other conditions such as acne. Recognizing these features helps in identifying rosacea.
