deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

sm54

May 29

Hi,

I've tested out a q8 quant of this model, and it seems to be broken, the model just keeps talking for a very long time and then doesn't produce the correct output or result in my coding tests. I think there is an issue with it.

tarruda

May 29

This q8_0 GGUF works fine for me: https://huggingface.co/bartowski/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-GGUF

I used the recommended inference settings:

temp: 0.6
top_p: 0.95

sm54

May 29

I will try it, but I used the same settings with the other q8 and it just goes into a weird thinking loop, and seems to mix up language and code and do strange things, makes me think there's an issue with it.

Doctor-Chad-PhD

May 29

Kosh69

May 29

А меня вот так модель удивила и развеселила: "Я устал размышлять и понял, что я не смог решить эту задачу. Может быть, вопрос был задан неправильно, или я не учел какой-то момент."
This is how the model surprised and amused me: "I got tired of thinking and realized that I couldn't solve this problem. Maybe the question was asked incorrectly, or I didn't take some point into account."
unsloth q8

owao

May 29

@sm54 try the unsloth XL ones, Q6_K_XL works fine for me.

urtuuuu

May 29

It works in LM Studio but i'm not impressed by the performance. Maybe an update is needed, no idea. But for now, the quality of the answers is not good.

sm54 changed discussion status to closed May 30

sm54 changed discussion status to open May 30

DOFOFFICIAL

May 30

I suppose the problem of the QWEN3 base. QWEN3 has great problem in terms of instruction following. Same problem with me. I even tried the original safetensors and found this problem. Not only in quant models

SytanSD

Jun 3

Hi,

I've tested out a q8 quant of this model, and it seems to be broken, the model just keeps talking for a very long time and then doesn't produce the correct output or result in my coding tests. I think there is an issue with it.

I have tested this model for about 6 or so hours straight with dozens of different settings, talking to at least a dozen other people, and this just seems to be how the model is. I have not found a single redeeming quality of it as of now, and it seems to just be one huge failure of a distill. The original R1 distills were bad, and so is this one. Deepseek doesn't seem to be the best at distilling.

Here is a write up I did in the exl2/3 server:

"Alright, I have spent hours trying to get this R1 8b distill to be usable. I have matched their system prompt, samplers, everything. This model is a mess

it constantly gaslights itself, misses blatantly obvious stuff, makes no sense, can't count, and is just overal an absolute mess.

Some examples of its huge fails:
"the first three letters of 'HEROINE" is 'HERO'" Which is clearly 4 letters
Then it said "'her' does not signify a female"
It solved the Heroine riddle in 1.2k tokens, and then spent 11k tokens gaslighting itself that it was wrong
It said "C-O-L-T-R-O-L, or 'COLTROOL", where it added an O out of nowhere
Instead of decoding: FROWURO, it decoded FRWUROR, then got hung up for over 7k tokens on how "COTROOL" is not a real word, and I must have made a typo
This model's "vibe" seems to be getting something correct reasonably fast, then spending 5-20k tokens gaslighting itself on how what it just solved is wrong, before assuming the "user must be wrong", and then backtracking on everything, quite reliably giving the WRONG answer while also suggesting the right answer as something to be explored more. I would have to say about 70% of questions this model got wrong, it had reasoning chains considerably closer to the correct answer that it gaslit itself out of following for seemingly no reason other than "No, that doesn't make sense" when it very clearly does.

With a simple +3 cipher, I gave it 12 attempts. It passed 4 times, and it averaged over 18K tokens of reasoning... for nothing.

For a model that claims to rival Q3 235b, it sure can't do simple 1 2 3 counting.

I have given this model several tests that many models in the last 5 months can pass, and I would say this one has about a 60% failure rating on simpler questions, and about a 90% failure rating on complex questions. I find it inferior to the DeepScaleR 1.5b preview from several months ago in a lot of key ways (Specifically math, were DeepScaleR 1.5b considerably outperforms it in both math ability, and token efficiency)

For the sake of diligence, I tried the following versions: EXL2: 6bpw, 8bpw, FP16. EXL3: 6bpw, 8bpw. GGUF: Q8, FP16. All of them were just as incompetent. This model has to be one of the worst CoT models I have tried in a long time, which is crazy, cause its based off a fairly competent base (Qwen 3 8b)"

guynich

Jul 3

•

edited Jul 4

running on A100 (40GB) with the Colab from https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/colab

With "content": "I’m a word in the English language. My first two letters signify a male. My first three letters signify a female. My first four letters signify a great man. And my entire meaning signifies a great woman. What am I?".

The DeepSeek-R1-0528-Qwen3-8B model Colab generated the expected answer. The thinking was long so not quoted here. The answer is correct.

</think>
The word you are looking for is **HEROINE**.

- **First two letters (HE)** signify a male, as "HE" is commonly associated with male pronouns or references.
- **First three letters (HER)** signify a female, as "HER" is associated with female pronouns or references.
- **First four letters (HERO)** signify a great man, as "HERO" refers to a heroic figure, typically male and great.
- **Entire meaning (HEROINE)** signifies a great woman, as a heroine is the female counterpart to a hero, often depicted as brave and heroic.

This word fits all the given conditions based on common associations with pronouns and heroic figures.

Bravo DeepSeek team !

** UPDATE **:
Using llama.cpp I created an BF16 GGUF to quantize down to a Q4_K_M GGUF of DeepSeek-R1-0528-Qwen3-8B. I ran the same content prompt using llama-run with a larger context value -c than llama-run default value of 2048.

$ ./build/bin/llama-run   ./deepseek-r1-0528-qwen3-8b-q4_k_m.gguf -c 40960   --ngl 99   --prompt "Q: I’m a word in the English language.  My first two letters signify a male.  My first three letters signify a female.  My first four letters signify a great man.  And my entire meaning signifies a great woman.  What am I? \nA:"

This Q4_K_M model generated the expected answer for the riddle. I omit the long think tagged text in this block.

I think I should conclude that the word is "heroine".

So the answer should be "heroine".
</think>
The word is **heroine**.

- **First two letters: "he"** – These signify a male, as "he" is a common pronoun for a male person.
- **First three letters: "her"** – These signify a female, as "her" is a pronoun associated with females.
- **First four letters: "hero"** – These signify a great man, as "hero" is a word meaning a brave and virtuous person, often male.
- **Entire meaning: "heroine"** – This signifies a great woman, as it is the feminine equivalent of a hero, referring to a brave and admirable woman.

Based on the above posts I wasn't expecting this mostly 4-bit tensor quantized version to solve this riddle. But it did - cool stuff! The model doesn't look broken to me.

Especially cool as I ran this on a ARM SoC with 16GB RAM on a 150USD single board computer.

artartarta

Jul 6

•

edited Jul 6

running on A100 (40GB) with the Colab from https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/colab

With "content": "I’m a word in the English language. My first two letters signify a male. My first three letters signify a female. My first four letters signify a great man. And my entire meaning signifies a great woman. What am I?".

The DeepSeek-R1-0528-Qwen3-8B model Colab generated the expected answer. The thinking was long so not quoted here. The answer is correct.
</think>
The word you are looking for is **HEROINE**.

- **First two letters (HE)** signify a male, as "HE" is commonly associated with male pronouns or references.
- **First three letters (HER)** signify a female, as "HER" is associated with female pronouns or references.
- **First four letters (HERO)** signify a great man, as "HERO" refers to a heroic figure, typically male and great.
- **Entire meaning (HEROINE)** signifies a great woman, as a heroine is the female counterpart to a hero, often depicted as brave and heroic.

This word fits all the given conditions based on common associations with pronouns and heroic figures.
Bravo DeepSeek team !

** UPDATE **:
Using llama.cpp I created an BF16 GGUF to quantize down to a Q4_K_M GGUF of DeepSeek-R1-0528-Qwen3-8B. I ran the same content prompt using llama-run with a larger context value -c than llama-run default value of 2048.
$ ./build/bin/llama-run   ./deepseek-r1-0528-qwen3-8b-q4_k_m.gguf -c 40960   --ngl 99   --prompt "Q: I’m a word in the English language.  My first two letters signify a male.  My first three letters signify a female.  My first four letters signify a great man.  And my entire meaning signifies a great woman.  What am I? \nA:" 
This Q4_K_M model generated the expected answer for the riddle. I omit the long think tagged text in this block.
I think I should conclude that the word is "heroine".

So the answer should be "heroine".
</think>
The word is **heroine**.

- **First two letters: "he"** – These signify a male, as "he" is a common pronoun for a male person.
- **First three letters: "her"** – These signify a female, as "her" is a pronoun associated with females.
- **First four letters: "hero"** – These signify a great man, as "hero" is a word meaning a brave and virtuous person, often male.
- **Entire meaning: "heroine"** – This signifies a great woman, as it is the feminine equivalent of a hero, referring to a brave and admirable woman.
Based on the above posts I wasn't expecting this mostly 4-bit tensor quantized version to solve this riddle. But it did - cool stuff! The model doesn't look broken to me.

Especially cool as I ran this on a ARM SoC with 16GB RAM on a 150USD single board computer.

Koitenshin

19 days ago

•

edited 19 days ago

I've tested it numerous times, and it always breaks at roughly 4096 ± 272 context

deepseek-ai
/

DeepSeek-R1-0528-Qwen3-8B

Model broken