DiffSkip-Llama-3-8B-Instruct

The implementation of the paper Differential Layer Skipping in Large Language Models.

Model Description

DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.

  • Developed by: Xuan Luo, Weizhi Wang, Xifeng Yan
  • Model type: Causal Language Model with dynamic FFN skipping
  • Language(s) (NLP): English (en)
  • License: Apache-2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct

Model Card Contact

For questions or inquiries, please contact [email protected].

Downloads last month
51
Safetensors
Model size
8.24B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xuan-luo/DiffSkip-Llama-3-8B-Instruct

Finetuned
(636)
this model

Datasets used to train xuan-luo/DiffSkip-Llama-3-8B-Instruct