Script used?

by Downtown-Case - opened Mar 30

Mar 30

Would you upload the modified exllamav2 python files you used? I think I get the gist of what you changed in the code, but it would be convenient to have it :)

Downtown-Case

Mar 30

Also, I like your strategy. My experience is that messing with the measurement phase (even simply extending the context length) screws up the quantization, but extending the length of the second pass helps, and maxing out the first/last layer is a good idea going by llama.cpp's findings.

DeusImperator

Owner Mar 30

I posted the code fragments in a discussion on another of my models: https://huggingface.co/DeusImperator/Mistral-Small-24B-Instruct-2501_exl2_6.5bpw_L/discussions/1#679f9b0efb15b4e60ae76050

Downtown-Case

Mar 30

Thanks!

Downtown-Case changed discussion status to closed Mar 30

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment