File size: 368 Bytes
c7873fd 3399ce9 2d872f6 |
1 2 3 4 5 |
The vanilla model used in our Expert-Specialized Fine-Tuning (ESFT) research paper: https://arxiv.org/abs/2407.01906.
To use this model and specialized expert sets, please refer to the scripts at https://github.com/deepseek-ai/ESFT.
For the customized models used in this paper, please refer to https://huggingface.co/deepseek-ai/ESFT-{gate, token}-{task_name}-lite. |