NexaAIDev/Squid · "DolphinForCausalLM" OR "Qwen2ForCausalLM" ?

Sep 3

Hey guys, this work is much appreciated and a great job i guess on the finetuning data.

What i don't understand is why introducing an architecture named as "DolphinForCausalLM" when it is not a novel arch nor pretrained but rather a finetuned version of Qwen2. I'm sure you have a good reason so basically my question is : WHAT IS THE DIFFERENCE BETWEEN "DolphinForCausalLM" AND "Qwen2ForCausalLM" ?

Side note : Dolphin serie of models is already there and i believe Cognitive Computations (https://huggingface.co/cognitivecomputations) are the one in charge of it. The current naming causes a little bit of confusion.

Best 🤗

Ricepig

Sep 3

Sorry, but it seems that it is a new arch. You can read the code here: https://huggingface.co/NexaAIDev/Dolphin/blob/main/modeling_dolphin.py
It has 2 decoder models in one LM, one as an encoder.

alexchen4ai

Nexa AI org Sep 3

•

edited Sep 8

Hi, we are editing the name of the model to avoid the confusion haha. But the model is indeed a new arch. So, modelling file can't be the qwen2forcausalLM