Script used?

#1
by Downtown-Case - opened

Would you upload the modified exllamav2 python files you used? I think I get the gist of what you changed in the code, but it would be convenient to have it :)

Also, I like your strategy. My experience is that messing with the measurement phase (even simply extending the context length) screws up the quantization, but extending the length of the second pass helps, and maxing out the first/last layer is a good idea going by llama.cpp's findings.

Downtown-Case changed discussion status to closed

Sign up or log in to comment