Fine-Tuning Phi-4 with Unsloth
This tutorial will guide you through the process of fine-tuning a language model using the Unsloth library. We'll use a pre-trained model and fine-tune it on a custom dataset.
Prerequisites
Before you start, ensure you have the following installed:
Python 3.8 or later PyTorch Unsloth library Hugging Face transformers and datasets libraries You can install the necessary libraries using pip:
pip install torch unsloth transformers datasets
Step 1: Import Required Libraries
First, import the necessary libraries and modules:
from unsloth import FastLanguageModel
import torch
from datasets import Dataset, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
Step 2: Load the Pre-trained Model
Load a pre-trained model from the Unsloth library. You can choose from a list of supported models:
max_seq_length = 2048
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit",
max_seq_length=max_seq_length,
load_in_4bit=load_in_4bit,
)
Step 3: Prepare the Model for Fine-Tuning
Prepare the model for fine-tuning using PEFT (Parameter-Efficient Fine-Tuning):
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
use_rslora=False,
loftq_config=None,
)
Step 4: Load and Prepare the Dataset
Load your dataset and format it for training. In this example, we'll use a dataset with 'instruction', 'input', and 'output' fields:
dataset = load_dataset('aifeifei798/Chinese-DeepSeek-R1-Distill-data-110k-alpaca')
text_data = {'text': []}
for example in dataset:
input_text = example['input']
output_text = example['output']
text_format = f"<|system|>Your name is feifei, an AI math expert developed by DrakIdol.<|end|><|user|>{input_text}<|end|><|assistant|>{output_text}<|end|>"
text_data['text'].append(text_format)
train_dataset = Dataset.from_dict(text_data)
Step 5: Set Up the Trainer
Set up the SFTTrainer with the necessary training arguments:
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
dataset_num_proc=2,
packing=False,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=50,
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
save_steps=5,
save_total_limit=10,
report_to="none",
),
)
Step 6: Train the Model
Train the model using the SFTTrainer:
trainer_stats = trainer.train()
Step 7: Save the Model
Save the fine-tuned model and tokenizer:
model.save_pretrained("drakidol-Phi-4-lora_model")
tokenizer.save_pretrained("drakidol-Phi-4_model")
Step 8: Test the Model
Test the model by generating responses to prompts:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(tokenizer, chat_template="phi-4")
FastLanguageModel.for_inference(model)
messages = [{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=64, use_cache=True, temperature=1.5, min_p=0.1)
tokenizer.batch_decode(outputs)
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
This tutorial provides a basic guide to fine-tuning a language model using the Unsloth library. You can customize the dataset, model, and training parameters as needed for your specific use case.
Full Program
from unsloth import FastLanguageModel # FastVisionModel for LLMs
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 2x faster
"unsloth/Mistral-Small-Instruct-2409", # Mistral 22b 2x faster!
"unsloth/Phi-4", # Phi-4 2x faster!
"unsloth/Phi-4-unsloth-bnb-4bit", # Phi-4 Unsloth Dynamic 4-bit Quant
"unsloth/gemma-2-9b-bnb-4bit", # Gemma 2x faster!
"unsloth/Qwen2.5-7B-Instruct-bnb-4bit" # Qwen 2.5 2x faster!
"unsloth/Llama-3.2-1B-bnb-4bit", # NEW! Llama 3.2 models
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
"unsloth/Llama-3.2-3B-bnb-4bit",
"unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
] # More models at https://docs.unsloth.ai/get-started/all-our-models
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit",
max_seq_length = max_seq_length,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
#<|system|>Your name is Phi, an AI math expert developed by Microsoft.<|end|><|user|>How to solve 3*x^2+4*x+5=1?<|end|><|assistant|>How to solve 3*x^2+4*x+5=1?<|end|>
from datasets import Dataset, load_from_disk, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
# Load the dataset
dataset = load_dataset('aifeifei798/Chinese-DeepSeek-R1-Distill-data-110k-alpaca')
# Assuming the dataset has 'instruction', 'input', and 'output' fields
text_data = {
# 'instruction': [],
# 'input': [],
# 'output': [],
'text': []
}
# Iterate over the dataset and format the data
for example in dataset: # Iterate directly over the dataset
instruction = example['instruction']
input_text = example['input']
output_text = example['output']
# Format the text
text_format = f"<|system|>Your name is feifei, an AI math expert developed by DrakIdol.<|end|><|user|>{input_text}<|end|><|assistant|>{output_text}<|end|>"
# Append the formatted data
# text_data['instruction'].append(instruction)
# text_data['input'].append(input_text)
# text_data['output'].append(output_text)
text_data['text'].append(text_format)
# Convert the dictionary to a Dataset object
train_dataset = Dataset.from_dict(text_data)
del text_data
# Print the first 3 entries of the dataset
for i, row in enumerate(train_dataset.select(range(1))):
print(f"Row {i + 1}:")
for key in row.keys():
print(f"{key}: {row[key]}")
print("\n")
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
dataset_num_proc=2,
packing=False, # Can make training 5x faster for short sequences.
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=50, # Set max steps to 15,000
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
save_steps=5, # Save the model every 5 steps
save_total_limit=10, # Keep only the 10 most recent checkpoints
report_to="none", # Use this for WandB etc
#resume_from_checkpoint=True, # Resume from the latest checkpoint
#resume_from_checkpoint=checkpoint_path, # Resume from the specified checkpoint
),
)
trainer_stats = trainer.train()
trainer.model.save_pretrained("drakidol-Phi-4-lora_model") # Local saving
#model test
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "phi-4",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
outputs = model.generate(
input_ids = inputs, max_new_tokens = 64, use_cache = True, temperature = 1.5, min_p = 0.1
)
tokenizer.batch_decode(outputs)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(
input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
use_cache = True, temperature = 1.5, min_p = 0.1
)
# Save the trained model and tokenizer
model.save_pretrained("drakidol-Phi-4_model")
tokenizer.save_pretrained("drakidol-Phi-4_model")
Possible problems
class GraphModule(torch.nn.Module):
def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
l_args_1_ = L_args_1_
l_args_2_ = L_args_2_
# No stacktrace found for following nodes
_set_grad_enabled = torch._C._set_grad_enabled(False); _set_grad_enabled = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
max_1: "i64[][]cuda:0" = torch.max(l_args_2_); l_args_2_ = None
seq_len: "i64[][]cuda:0" = max_1 + 1; max_1 = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
gt: "b8[][]cuda:0" = seq_len > 4096; seq_len = gt = None
class GraphModule(torch.nn.Module):
def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
l_args_1_ = L_args_1_
l_args_2_ = L_args_2_
# No stacktrace found for following nodes
_set_grad_enabled = torch._C._set_grad_enabled(False); _set_grad_enabled = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
max_1: "i64[][]cuda:0" = torch.max(l_args_2_); l_args_2_ = None
seq_len: "i64[][]cuda:0" = max_1 + 1; max_1 = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
gt: "b8[][]cuda:0" = seq_len > 4096; seq_len = gt = None
class GraphModule(torch.nn.Module):
def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
l_args_1_ = L_args_1_
l_args_2_ = L_args_2_
# No stacktrace found for following nodes
_set_grad_enabled = torch._C._set_grad_enabled(False); _set_grad_enabled = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
max_1: "i64[][]cuda:0" = torch.max(l_args_2_); l_args_2_ = None
seq_len: "i64[][]cuda:0" = max_1 + 1; max_1 = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
gt: "b8[][]cuda:0" = seq_len > 4096; seq_len = gt = None
class GraphModule(torch.nn.Module):
def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_args_1_: "bf16[s0, s1, s2][s1*s2, s2, 1]cuda:0", L_args_2_: "i64[1, s1][s1, 1]cuda:0"):
l_args_1_ = L_args_1_
l_args_2_ = L_args_2_
# No stacktrace found for following nodes
_set_grad_enabled = torch._C._set_grad_enabled(False); _set_grad_enabled = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:45 in longrope_frequency_update, code: seq_len = torch.max(position_ids) + 1
max_1: "i64[][]cuda:0" = torch.max(l_args_2_); l_args_2_ = None
seq_len: "i64[][]cuda:0" = max_1 + 1; max_1 = None
# File: /home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py:50 in longrope_frequency_update, code: if seq_len > original_max_position_embeddings:
gt: "b8[][]cuda:0" = seq_len > 4096; seq_len = gt = None
Traceback (most recent call last):
File "/home/ubuntu/model/Phi-4/1.py", line 115, in <module>
trainer_stats = trainer.train()
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
return inner_training_loop(
File "<string>", line 315, in _fast_inner_training_loop
File "<string>", line 31, in _unsloth_training_step
File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/UnslothSFTTrainer.py", line 748, in compute_loss
outputs = super().compute_loss(
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/unsloth/models/_utils.py", line 1039, in _unsloth_pre_compute_loss
outputs = self._old_compute_loss(model, inputs, *args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/trainer.py", line 3801, in compute_loss
outputs = model(**inputs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 814, in forward
return model_forward(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 802, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/peft/peft_model.py", line 1757, in forward
return self.base_model(
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 193, in forward
return self.model.forward(*args, **kwargs)
File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 594, in forward
return Phi3ForCausalLM_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, cache_position, logits_to_keep, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
output = func(self, *args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 417, in Phi3ForCausalLM_forward
outputs: BaseModelOutputWithPast = self.model(
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 965, in wrapper
output = func(self, *args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py", line 567, in forward
position_embeddings = self.rotary_emb(hidden_states, position_ids)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/model/Phi-4/unsloth_compiled_cache/unsloth_compiled_module_phi3.py", line 357, in forward
return Phi3RotaryEmbedding_forward(self, x, position_ids)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 659, in _fn
raise e.with_traceback(None) from None
torch._dynamo.exc.Unsupported: Data-dependent branching
Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.
Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
Hint: Use `torch.cond` to express dynamic control flow.
Developer debug context: attempted to jump with TensorVariable()
from user code:
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 70, in inner
return fn(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 86, in wrapper
longrope_frequency_update(self, position_ids, device=x.device)
File "/home/ubuntu/unsloth-venv/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 50, in longrope_frequency_update
if seq_len > original_max_position_embeddings:
Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
Solution
Disabling Dynamo If you suspect that Dynamo is causing the issue, you can try disabling it by setting the environment variable TORCHDYNAMO_DISABLE to 1:
export TORCHDYNAMO_DISABLE=1