Supervised fine-tuning and dpo implementation

#11

by Vitabile - opened Apr 15

Apr 15

Is there any news regarding the integration of this model, and in general the architecture with this recurrent block with the supervised fine-tuning and dpo implementations?

For example using the huggingface trainer a full-finetuning I think is executable without too many problems right?

But in the case of LoRa fine-tuning do you think it is possible? Or what modifications should be needed?

JonasGeiping

Tom Goldstein's Lab at University of Maryland, College Park org Apr 21

Hi, the current model definition can be used for finetuning (I can also upstream a few more recent changes to make it faster via flex attention). I don't use the hugging face trainer personally, does it execute? I'd be happy to debug potential issues, if you can post error logs here.

For LoRA, you should be able to set a matching config, as shown here: https://huggingface.co/docs/peft/main/en/developer_guides/custom_models#new-transformers-architectures, for example, for PEFT.

JonasGeiping

Tom Goldstein's Lab at University of Maryland, College Park org 5 days ago

Just as an update for anyone looking at this later, a finetuning script can nowadays be found at https://github.com/seal-rg/recurrent-pretraining/blob/main/finetuning_simple_example.py

JonasGeiping changed discussion status to closed 5 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment