Any plans to release an updated version based on DeepSeek-V3-0526 + R1, or how to create the merge myself?

#4
by Lissanro - opened

I wonder if there are any plans to release a new version that would be based on DeepSeek-V3-0526 and R1?

Or, alternatively, maybe there are instructions how can I create an updated merge myself?

I assume DeepSeek-V3-0526 still has the same architecture, so if there are instructions how to create the merge, it would help a lot to save bandwidth assuming if I already have full versions of V3-0526 and R1, to merge and quantize myself (technically, I just found out about new version at https://www.reddit.com/r/LocalLLaMA/comments/1kvpwq3/deepseek_v3_0526/ and did not yet begun downloading V3-0526, but I already have R1). So I am trying to figure out if it is better to wait for updated R1T Chimera or download V3-0526 and try to make it myself, if that is possible.

TNG Technology Consulting GmbH org

Hi, no worries. If DeepSeek releases something new, and if the technical parameters of the new release are within range, of course we will create a gaggle of variations :-). Significant testing will have to be made, because even if these are research prototypes, it's better to know then to know not.

For example, on chutes.ai alone, the R1T Chimera is ranked as the third most-popular model, after V3-0324 and R1. The Chimera is currently used on 22 instances, i.e. deployed on 176 H200 GPUs. It processes 4.7B tokens per day. Any new version should either be clearly better than the Chimera, or marked as for a different purpose.

@TNGHK Can you merge new R1?

TNG Technology Consulting GmbH org

Hi there,

of course. Our colleague Benjamin already created a first R1-0528-Chimera this evening. I am currently testing it, and judging from the slow speed of the cluster, I guess some of the other TNGlers must be testing it, too :-).

Preliminary result: It appears to be quite well-behaved. Personally, I already consider that already a success, after all, it is still a bit of a miracle that the generated child LLMs are functional.

If it offers a performance benefit or interesting behaviours, we cannot say yet.

Cheers,
Henrik (and Robert, and Benjamin et al)

PS: It would be nice to have more GPUs.

Or, alternatively, maybe there are instructions how can I create an updated merge myself?

It's discussed here

Hey, I don't want to sound ungrateful, or impatient (ok, maybe I'm a little impatient lol) but it's been almost 4 weeks since you mentioned you had first R1-0528-Chimera, so:

  • How's the testing going on it?
  • When can we expect release?
TNG Technology Consulting GmbH org

Good questions.

How big should the performance increment be, in your opinion, to justify a R1T-0528 release? E.g. in standard benchmarks such as AIME24, AIME25, GPQA, HLE; or in benchmarks that you like?

This assumes roughly similar speed / token count to the original R1T.

Thanks for your feedback.

R1-0528 is more Gemini-like than V3-0324 which more GPT-like

in your opinion, to justify...

Objectively, the knowledge cutoff date would be advanced. (Or at least one would assume as this info seems difficult to find)

How big should the performance increment be, in your opinion, to justify a R1T-0528 release?

The greatest advantage of R1-0528 over R1, for me, is the multi-turn coherency. R1T was itself a significant step forward over R1 in this way. I'm not sure of a good benchmark to capture that, but I'd naively hope that the same mechanism that improved R1T over R1 in that area would further improve R1T-0528 over R1-0528 in multi-turn.

How big should the performance increment be, in your opinion, to justify a R1T-0528 release?

Tbh any improvement, that has statistical significance will be worth it, as it will push pareto frontier in speed vs intelligence in open-weight models

TNG Technology Consulting GmbH org

Right now, the versions we have that appear to be significantly smarter than R1T Chimera are also slower.

They are approximately like the "old" R1 in thinking speed, but appear to be significantly smarter, albeit not R1-0528-level.

So, if all plays out, you would have a model that is about as fast as R1, i.e. a lot faster than R1-0528, but also smarter than R1, but not R1-0528.

Is that interesting?

Interested

TNG Technology Consulting GmbH org
edited 4 days ago

We're looking for feedback while scanning the model-space. For those interested in trying the latest variant of DeepSeek-R1T-0528-Chimera, you can chat with the model here:

R1T-0528-Chimera beta access

You first need to switch the selected model:
Screenshot From 2025-06-25 11-11-58.png

Please note: it's a research prototype with certain limitations (think-tag consistency below 100%, some hallucinations).

Send your feedback per mail to research at tngtech.com we're looking forward to it!

Sign up or log in to comment