update -ngl arg to count the output layer

#6

GGUF counts the output layer as same as the transformers blocks, so we need to offload n+1 "layers"

27.1 tokens/s ==> 36.9 tokens/s :D

LG AI Research org

Thank you for your contribution!
We will update the relevant sections on the other pages. :)

nuxlear changed pull request status to merged

Sign up or log in to comment