Wrong outputs

#1
by bullerwins - opened

Hi @MikeRoz !

Have you tested the models with tabbyapi?

I downloaded 3.0bpw but it's outputting gibberish with TabbyAPI. Yes, I did update the venv to exllamav3 dev branch manually. @turboderp 's GLM-4.5-Air work, but the full fat model does not.

I tried to quantize it myself but it's not working either a (non optimized) flat 3.0bpw and other sizes, but non work.
Just asking if you managed to make it work.

Works with the small edit in this PR: https://github.com/turboderp-org/exllamav3/pull/68

The issue and the fix ended up being completely different from what I proposed, but the dev branch should be fixed now.

MikeRoz changed discussion status to closed

The fact that it worked with the unoptimized DS3 routing function did help me narrow it down pretty quickly though. :+1:

You should update the readme, it still says the models don't work. (:

I can confirm it works now, both mike's an my own made ones.

The fact that it worked with the unoptimized DS3 routing function did help me narrow it down pretty quickly though. :+1:

You should update the readme, it still says the models don't work. (:

@turboderp I'd left that up while I was waiting for my re-quant of the 3bpw to complete so I could compare the weights and responses. I saw there was a flag used in both routing functions that was meant to be on during quantization, and it didn't seem to entirely bypass calculations using routed_scaling_factor, so I was concerned that there could still be an impact on the final result of quantization. Can you rule that out? If so, it would save me a lot of re-uploading.

MikeRoz changed discussion status to open

The two versions produce slightly different outputs. These examples were both produced with temperature 0 and are stable for their respective version of the quant.

Old quant (the one posted in this repo), created using the original version of GLM-4.5 support from the dev branch:

You are a helpful AI assistant.

User: How many Rs are in the word STRAWBERRY? Think step-by-step.

Assistant: <think>First, I need to count the number of times the letter "R" appears in the word "STRAWBERRY".

Let me write out the word: S-T-R-A-W-B-E-R-R-Y

Now, I'll go through each letter and check if it's an "R":

 β€’ S: Not R
 β€’ T: Not R
 β€’ R: Yes, this is an R (1)
 β€’ A: Not R
 β€’ W: Not R
 β€’ B: Not R
 β€’ E: Not R
 β€’ R: Yes, this is an R (2)
 β€’ R: Yes, this is an R (3)
 β€’ Y: Not R

So, I found 3 Rs in the word "STRAWBERRY".

Let me double-check by writing the word again and circling the Rs: ST(R)AWBERR(R)(R)Y

Yes, there are definitely 3 Rs in "STRAWBERRY".</think>To count the number of Rs in the word "STRAWBERRY", I'll go through the word letter by letter:

S - Not R T - Not R R - Yes, this is the first R A - Not R W - Not R B - Not R E - Not R R - Yes, this is the second R R - Yes, this is the third R Y - Not R

Therefore, there are 3 Rs in the word "STRAWBERRY".

New quant, built using the current dev branch:

You are a helpful AI assistant.

User: How many Rs are in the word STRAWBERRY? Think step-by-step.

Assistant: <think>First, I need to count the number of Rs in the word "STRAWBERRY".

Let me write out the word: S-T-R-A-W-B-E-R-R-Y.

Now, I'll go through each letter and count the Rs:

 β€’ S: Not R
 β€’ T: Not R
 β€’ R: This is an R. Count = 1
 β€’ A: Not R
 β€’ W: Not R
 β€’ B: Not R
 β€’ E: Not R
 β€’ R: This is an R. Count = 2
 β€’ R: This is an R. Count = 3
 β€’ Y: Not R

So there are 3 Rs in the word "STRAWBERRY".

Let me double-check by looking at the word again: STRAWBERRY.

The Rs are at positions 3, 8, and 9 in the word. So yes, there are 3 Rs.</think>To count the number of Rs in the word "STRAWBERRY", I'll go through each letter one by one:

S - Not R T - Not R R - This is an R (1) A - Not R W - Not R B - Not R E - Not R R - This is an R (2) R - This is an R (3) Y - Not R

Therefore, there are 3 Rs in the word "STRAWBERRY".

EDIT: Notice that the bit about circling Rs seems off in the original quant. Luck of the draw, or illustrative of an issue?

I'm going to upload the revised 3.0bpw weights and start quantizing the other sizes again, I think.

Thanks for the revised versions! Using the 4bpw one without issues.

Sign up or log in to comment