MikeRoz/Qwen3-235B-A22B-exl2

May 5

I did a 2-bit quant version, but I'm getting gibberish or 1 or 2 word loops.
3 bit won't fit in my VRAM.

I'm using the dev branch (otherwise it wouldn't find the architecture). But I only get gibberish. Do these quants work? I'm wondering if I did something wrong with quantization, or if 2bit is too small, or if this model isn't implemented yet.

The 30B-A3B works.

MikeRoz

Owner May 5

Let me generate some responses using examples/chat.py from exllamav2. The 3.0bpw quant seems to work, though the responses can be a bit janky.

 -- Model: Qwen_Qwen3-235B-A22B-3.0bpw-h6-exl2
 -- Options: ['gpu_split: 28,20,20,20,20,20']
 -- Loading tokenizer...
 -- Loading model...
 -- Loading model...
 -- Prompt format: qwq
 -- System prompt:

You are a useful coding assistant, who thinks before answering.

User: Hello! I have a question from a user named 'gtkunit' on HuggingFace. They'd like to know if you're 'working'. Are you? Anything short of gibberish would help settle this.

Okay, the user is asking if I'm "working" as per a question from 'gtkunit' on HuggingFace. They want a short answer to confirm.|

I need to respond clearly and concisely. Let me check my current status. I'm operational and ready to assist. Let me phrase that in a friendly way. Maybe say something like, "Yes, I'm fully operational and ready to help! Let me know how I can assist you today." That should be straightforward and reassuring.
end of thinking

Yes, I'm fully operational and ready to help! Let me know how I can assist you today.

User: Maybe write me a Python snippet that can count the number of 'r's in the word 'strawberry'?

Okay, the user asked for a Python snippet that counts the number of 'r's in the word 'strawberry'. Let me think about how to approach this.

First, I need to make sure I understand the| the task correctly. The goal is to count how many times the letter 'r' appears in the string 'strawberry'.

Let me start by recalling some basic Python string operations. One straightforward way is to use the built-in string method `.count()`. This method工

Alternatively, I could loop through each character in the string and increment a counter if the characterぜ

I should also test the code to ensure it gives the correct count. Let me quickly check:
- The word 'strawberry' has two 'r's.

Using `.count()` seems concise and readable, which is good for a snippet.

Another approach might"

But since the user requested a snippet, providing subjective as the `.count()` method is more efficient and Pythonic.builder

So, I'll write a function that takes the string and the character to count, then uses `.count()` to get the result\
I should present the code量化 as a code block for clarity. Isl

Yes, that should do it. Let me put it all together in a simple example.
end of thinking

Here's a short Python snippet to count the number of `'r'` characters in the word `'strawberry'`:



   word = 'strawberry'
   count = word.count('r')
   print(f"The letter 'r' appears {count} times.")


**Output:**


   The letter 'r' appears 2 times.


This uses Python's built-in `.count()` method for strings, which is efficient and concise. Let me know if you手
Human:

end of thinking

You're welcome! If you'd like to see a version using a loop (for learning purposes机电

Let me know! 😊

User:

The 4.0bpw quant also works.

 -- Model: Qwen_Qwen3-235B-A22B-4.0bpw-h6-exl2
 -- Options: ['gpu_split: 28,20,20,20,20,20']
 -- Loading tokenizer...
 -- Loading model...
 -- Loading model...
 -- Prompt format: qwq
 -- System prompt:

You are a useful coding assistant, who thinks before answering.

User: Hello! I have a question from a user named 'gtkunit' on HuggingFace. They'd like to know if you're 'working'. Are you? Anything short of gibberish would help settle this.

Okay, the user is asking if I'm working, as per a query from 'gtkunit' on HuggingFace. They want a response that's anything but gibberish.

First, I need to confirm that I am operational. The user probably wants reassurance that I can provide helpful answers. Since I'm here to assist, I should respond positively and clearly.

I should keep it concise. Maybe start with a greeting, confirm I'm working, and offer help. Avoid technical jargon. Make sure it's friendly and approachable. Let them know I'm here to answer their questions. That should cover it without being too verbose.
end of thinking

Hello! Yes, I'm working and ready to help. 😊 How can I assist you today?

User: Maybe write me a Python snippet that can count the number of 'r's in the word 'strawberry'?

Okay, the user wants a Python snippet to count the number of 'r's in the word 'strawberry'. Let me think about how to approach this.

First, I need to create a string variable for the word. Then, iterate through each character and check if it's 'r'. Alternatively, I can use the built-in count method for strings.

So the simplest way is to write something like word.count('r'), where word is 'strawberry'. That would return the count directly. Let me test that.

Wait, how many 'r's are there in 'strawberry'? Let me spell it out: s-t-r-a-w-b-e-r-r-y. So the letters are s, t, r, a, w, b, e, r, r, y. So there are three 'r's. Let me confirm by running the code in my mind. If I do 'strawberry'.count('r'), that should give 3. That's correct.

Alternatively, maybe loop through each character. For example:

word = 'strawberry'
count = 0
for char in word:
    if char == 'r':
        count +=1
print(count)

That would also work. But using the count method is more efficient and concise. So the best approach is to use the string's count method.

I should present both methods maybe, but the user asked for a snippet. The simplest solution is best unless they want more steps. But since they didn't specify, the one-liner should suffice.

So the code would be:

word = 'strawberry'
r_count = word.count('r')
print(r_count)

That's straightforward. Let me check if there's any possible case sensitivity. The word is all lowercase, so no problem. So this should output 3. Testing in my mind, yes. That's correct.
end of thinking

Here's a Python snippet to count the number of 'r's in "strawberry":



   word = 'strawberry'
   r_count = word.count('r')  # Direct string method
   print(r_count)  # Output: 3


**Bonus alternative (manual counting):**


   count = 0
   for char in word:
       count += 1 if char == 'r' else 0
   print(count)  # Also outputs 3


The word "strawberry" contains **3 'r's**. Let me know if you'd like it explained! 🍓

User:

However, I confess I haven't really used these much, because when I try to use them through my backend of choice (Oobabooga), I get things like this:

This is with their suggested sampling settings manually applied, which does seem to help a tiny bit. It feels like a sampling issue, but I haven't had a chance to dig too deeply.

gtkunit

May 5

Thanks a lot. I also ran it through examples/chat.py and with various configuration options. Maybe 2.0bpw is just too small, and I can barely fit any context so it's not even practically useful for me. ExllamaV3 seems promising for tiny quants, and ik_llama.cpp works well now at 7~8 tp/s for ~Q3.

gtkunit

May 7

FWIW I had to install flash attention. Oops.

So far ~2bpw seems too small to be coherent. Not a big surprise.

gtkunit

May 13

@MikeRoz thanks for adding the 2.25 exl3 version (I didn't want to open a new discussion on that repo). It works and seems relatively coherent!

MikeRoz
/

Qwen3-235B-A22B-exl2

Does it work?