Supervised fine-tuning and dpo implementation
Is there any news regarding the integration of this model, and in general the architecture with this recurrent block with the supervised fine-tuning and dpo implementations?
For example using the huggingface trainer a full-finetuning I think is executable without too many problems right?
But in the case of LoRa fine-tuning do you think it is possible? Or what modifications should be needed?
Hi, the current model definition can be used for finetuning (I can also upstream a few more recent changes to make it faster via flex attention). I don't use the hugging face trainer personally, does it execute? I'd be happy to debug potential issues, if you can post error logs here.
For LoRA, you should be able to set a matching config, as shown here: https://huggingface.co/docs/peft/main/en/developer_guides/custom_models#new-transformers-architectures, for example, for PEFT.