Which is the actual way to store the adapters after PEFT finetuning
I am finetuning the mistral model using the following configurations
training_arguments = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_strategy="steps",
logging_steps=10,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=13000,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type
)
trainer = SFTTrainer(
model=peft_model,
train_dataset=data,
peft_config=peft_config,
dataset_text_field=" column name",
max_seq_length=3000,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)
trainer.train()
during this training I am getting the multiple checkpoints in the specified output directory output_dir
.
Once the model training is over I can save the model using
trainer.save_model()
Not only that i can save the final model using
trainer.model.save_pretrained("path")
So I bit confused. Which is the actual way to store the adapter after PEFT based lora fine-tuning
whether it is
1 - Take the least loss checkpoint folder from the output_dir
or
2 - save the adapter using
trainer.save_model()
or
3 - this method
trainer.model.save_pretrained("path")
Hi
@Pradeep1995
Thanks for the issue, if you want the model with the least loss (i.e. the "best" model), I would advise to go for the first option. Otherwise 2- and 3- should achieve the same goal and save the final checkpoint. Note you can also call trainer.push_to_hub()
and the trained adapters will be pushed on the Hub under your name space together with training logs
@ybelkada
If I want the model with the least loss (i.e. the "best" model), I would go for the first option- the checkpoints.
So the checkpoints act as the adapters?
yes, if you inspect the checkpoint folders, you should be able to see adapter_model.safetensors
and adapter_config.json
files. Those refer to the adapter weights
@ybelkada Thanks. so if I use the least loss checkpoint folder as an adapter, then should I again merge that checkpoint folder with the base model using
merge_and_unload()
or shall i directly use the checkpoint folder for inference without merging?