update -ngl arg to count the output layer
#6
by
owao
- opened
GGUF counts the output layer as same as the transformers blocks, so we need to offload n+1 "layers"
27.1 tokens/s ==> 36.9 tokens/s :D
Thank you for your contribution!
We will update the relevant sections on the other pages. :)
nuxlear
changed pull request status to
merged