update -ngl arg to count the output layer

by owao - opened Jul 18

←

owao

Jul 18

GGUF counts the output layer as same as the transformers blocks, so we need to offload n+1 "layers"

27.1 tokens/s ==> 36.9 tokens/s :D

LG AI Research org Jul 21

Thank you for your contribution!
We will update the relevant sections on the other pages. :)

nuxlear changed pull request status to merged Jul 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment