Supervised fine-tuning and dpo implementation

#11
by Vitabile - opened

Is there any news regarding the integration of this model, and in general the architecture with this recurrent block with the supervised fine-tuning and dpo implementations?

For example using the huggingface trainer a full-finetuning I think is executable without too many problems right?

But in the case of LoRa fine-tuning do you think it is possible? Or what modifications should be needed?

Tom Goldstein's Lab at University of Maryland, College Park org

Hi, the current model definition can be used for finetuning (I can also upstream a few more recent changes to make it faster via flex attention). I don't use the hugging face trainer personally, does it execute? I'd be happy to debug potential issues, if you can post error logs here.

For LoRA, you should be able to set a matching config, as shown here: https://huggingface.co/docs/peft/main/en/developer_guides/custom_models#new-transformers-architectures, for example, for PEFT.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment