What is the best way for the inference process in LORA in PEFT approach

#70

by Pradeep1995 - opened Dec 29, 2023

Pradeep1995

Dec 29, 2023

Here is the SFTtrainer method i used for finetuning mistral

trainer = SFTTrainer(
    model=peft_model,
    train_dataset=data,
    peft_config=peft_config,
    dataset_text_field=" column name",
    max_seq_length=3000,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()

I found different mechanisms for the finetuned model inference after PEFT based LORA finetuning

Method - 1

save adapter after completing training and then merge with base model then use for inference

trainer.model.save_pretrained("new_adapter_path")
from peft import PeftModel
finetuned_model = PeftModel.from_pretrained(base_model,
                                  new_adapter_path,
                                  torch_dtype=torch.float16,
                                  is_trainable=False,
                                  device_map="auto"
                                  )
finetuned_model = finetuned_model.merge_and_unload()

Method - 2

save checkpoints during training and then use the checkpoint with the least loss

from peft import PeftModel
finetuned_model = PeftModel.from_pretrained(base_model,
                                  "least loss checkpoint path",
                                  torch_dtype=torch.float16,
                                  is_trainable=False,
                                  device_map="auto"
                                  )
finetuned_model = finetuned_model.merge_and_unload()

Method - 3

same method with AutoPeftModelForCausalLM class

model = AutoPeftModelForCausalLM.from_pretrained(
    "output directory checkpoint path",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="cuda")
finetuned_model = finetuned_model.merge_and_unload()

Method-4

AutoPeftModelForCausalLM class specifies the output folder without specifying a specific checkpoint

instruction_tuned_model = AutoPeftModelForCausalLM.from_pretrained(
    training_args.output_dir,
    torch_dtype=torch.bfloat16,
    device_map = 'auto',
    trust_remote_code=True,
)
finetuned_model = finetuned_model.merge_and_unload()

Method-5
All the above methods without merging

#finetuned_model = finetuned_model.merge_and_unload()

Which is the actual method I should follow for inference?
and when to use which method over another?

sumegh

Jan 6, 2024

I use "Method 1" and it works fine always. Better to save adapter checkpoints which are smaller in size and merge for once with base model rather than saving entire base model checkpoints everytime.

Btw can you share a sample notebook for finetuning ? I was using this - https://colab.research.google.com/drive/1VDa0lIfqiwm16hBlIlEaabGVTNB3dN1A?usp=sharing

But my training loss starts to increase after 1000 steps for some reason. Any ideas ? Running on custom dataset. Tried using both Alpaca & Mistral templates although that shouldn't matter much for finetuning i guess.

Pradeep1995

Jan 6, 2024

•

edited Jan 6, 2024

@sumegh try with a lower learning rate which will reduce the loss. Do you have any idea on to select the max_steps parameter?

sumegh

Jan 6, 2024

That is if training a full epoch is not feasible for you. Else set num_train_epochs = 1. Otherwise, see total number of steps for single epoch based on batch size and then set max_steps < total steps for epoch.

Can you share your finetuning notebook for reference ?

Pradeep1995

Jan 6, 2024

Notebook sharing is not possible due to security reasons. It is confidential in my organization level

sumegh

Jan 6, 2024

•

edited Jan 6, 2024

okay no issues. Also what optimizer are you using ? I was doing 4-bit LoRA finetuning. Using the paged_adamw_8bit optimizer from huggingface training config.

Pradeep1995

Jan 6, 2024

i am using - paged_adamw_32bit

mlkorra

Jan 25, 2024

@sumegh facing similar increasing loss issue during fine-tuning, were you able to resolve that using lower learning rate?

sumegh

Jan 25, 2024

no @mlkorra lowering the learning rate made the model converge to a sub-optimal minima. It doesn't diverge anymore but the model doesn't learn much.
Let me know if you figure out something.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment