DiffSkip-Llama-3-8B-Instruct

The implementation of the paper Differential Layer Skipping in Large Language Models.

Model Description

DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.

Developed by: Xuan Luo, Weizhi Wang, Xifeng Yan
Model type: Causal Language Model with dynamic FFN skipping
Language(s) (NLP): English (en)
License: Apache-2.0
Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct

Model Card Contact

For questions or inquiries, please contact [email protected].

xuan-luo
/

DiffSkip-Llama-3-8B-Instruct

DiffSkip-Llama-3-8B-Instruct

Model Description

Model Card Contact

Model tree for xuan-luo/DiffSkip-Llama-3-8B-Instruct

Datasets used to train xuan-luo/DiffSkip-Llama-3-8B-Instruct