Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
Note similar https://huggingface.co/papers/2402.18668
Note kinda similar https://arxiv.org/pdf/2402.02750.pdf
Note qmoe - https://arxiv.org/pdf/2310.16795.pdf