it might be better to try knowledge distillation
#1
by
Alignment-Lab-AI
- opened
from the mtp model into pythia over a corpus of very long text, then initializing the weights from pythia into the storywriter architecture.
Alignment-Lab-AI
changed discussion status to
closed