Hyperparams optimization with LoRA on Whisper
Hi, I have the following Python (3.10.12) packages installed:
torch === 2.3.1
torchaudio === 2.3.1
torchvision == 0.18.1
transformers == 4.41.2
and the following specs:
32 gb ram ddr5
4070 super 12 gb vram
ryzen 5 7600x
I'm trying to run a fine tuning task with PEFT LoRA on whisper-large-v3 locally on my GPU for a data transcription task (from audio to text), and I'm trying to do hyperparameter tuning according to this section: https://huggingface.co/docs/transformers/hpo_train
Here is the core part of source code that I use to do that:
model=WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v3")
processor=WhisperProcessor.from_pretrained("openai/whisper-large-v3", language="italian", task="transcribe")
from peft import prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
def make_inputs_require_grad(module, input, output):
output.requires_grad_(True)
model.model.encoder.conv1.register_forward_hook(make_inputs_require_grad)
from peft import LoraConfig, PeftModel, LoraModel, LoraConfig, get_peft_model
config = LoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none")
model = get_peft_model(model, config)
forced_decoder_ids = processor.get_decoder_prompt_ids(language="italian", task="transcribe")
model.config.forced_decoder_ids = forced_decoder_ids
model.config.suppress_tokens = []
model.generation_config.language = "it"
### training arguments
training_args = Seq2SeqTrainingArguments(
# per_device_train_batch_size=16, #commented cause I want to try optimization on this param
per_device_eval_batch_size=4,
gradient_accumulation_steps=2,
learning_rate=1e-3,
num_training_epochs=7,
warmup_steps=50,
eval_strategy="steps",
fp16=True,
logging_steps=50,
generation_max_length=128,
remove_unused_columns=False,
label_names=["labels"]
)
# This callback helps to save only the adapter weights and remove the base model weights.
class SavePeftModelCallback(TrainerCallback):
def on_save(
self,
args: TrainingArguments,
state: TrainerState,
control: TrainerControl,
**kwargs,
):
checkpoint_folder = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")
peft_model_path = os.path.join(checkpoint_folder, "adapter_model")
kwargs["model"].save_pretrained(peft_model_path)
pytorch_model_path = os.path.join(checkpoint_folder, "pytorch_model.bin")
if os.path.exists(pytorch_model_path):
os.remove(pytorch_model_path)
return control
trainer = Seq2SeqTrainer(
args=training_args,
model=model,
train_dataset=final_ds["train"],
eval_dataset=final_ds["val"],
data_collator=data_collator,
tokenizer=processor.feature_extractor,
callbacks=[SavePeftModelCallback]
)
def optuna_hp_space(trial):
return {
"per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [8, 16])
}
def compute_obj(metrics):
return metrics["eval_loss"]
best_trials = trainer.hyperparameter_search(
direction=["minimize"],
hp_space=optuna_hp_space,
n_trials=10,
compute_objective=compute_obj
print(best_trials)
but it gives the error "RuntimeError: To use hyperparameter search, you need to pass your model through a model_init function."
Now, I've noticed that the tutorial I was following redirects to the class TrainerHyperParameterMultiObjectOptunaIntegrationTest in this file:
https://github.com/huggingface/transformers/blob/main/tests/trainer/test_trainer.py
but I think that the model_init function is not suitable for my case, so maybe I have to write custom source code. Can someone help me?