This repository houses a fork of togethercomputer/LLaMA-2-7B-32K
's modeling_flash_llama.py
, with a fix for padding of attention weights merged into it.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.