Any plans to release an updated version based on DeepSeek-V3-0526 + R1, or how to create the merge myself?

by Lissanro - opened May 26

Discussion

Lissanro

May 26

•

edited May 26

I wonder if there are any plans to release a new version that would be based on DeepSeek-V3-0526 and R1?

Or, alternatively, maybe there are instructions how can I create an updated merge myself?

I assume DeepSeek-V3-0526 still has the same architecture, so if there are instructions how to create the merge, it would help a lot to save bandwidth assuming if I already have full versions of V3-0526 and R1, to merge and quantize myself (technically, I just found out about new version at https://www.reddit.com/r/LocalLLaMA/comments/1kvpwq3/deepseek_v3_0526/ and did not yet begun downloading V3-0526, but I already have R1). So I am trying to figure out if it is better to wait for updated R1T Chimera or download V3-0526 and try to make it myself, if that is possible.

TNGHK

TNG Technology Consulting GmbH org May 26

Hi, no worries. If DeepSeek releases something new, and if the technical parameters of the new release are within range, of course we will create a gaggle of variations :-). Significant testing will have to be made, because even if these are research prototypes, it's better to know then to know not.

For example, on chutes.ai alone, the R1T Chimera is ranked as the third most-popular model, after V3-0324 and R1. The Chimera is currently used on 22 instances, i.e. deployed on 176 H200 GPUs. It processes 4.7B tokens per day. Any new version should either be clearly better than the Chimera, or marked as for a different purpose.

ChuckMcSneed

May 28

@TNGHK Can you merge new R1?

TNGHK

TNG Technology Consulting GmbH org May 29

Hi there,

of course. Our colleague Benjamin already created a first R1-0528-Chimera this evening. I am currently testing it, and judging from the slow speed of the cluster, I guess some of the other TNGlers must be testing it, too :-).

Preliminary result: It appears to be quite well-behaved. Personally, I already consider that already a success, after all, it is still a bit of a miracle that the generated child LLMs are functional.

If it offers a performance benefit or interesting behaviours, we cannot say yet.

Cheers,
Henrik (and Robert, and Benjamin et al)

PS: It would be nice to have more GPUs.

djuna

May 30

Or, alternatively, maybe there are instructions how can I create an updated merge myself?

It's discussed here

dixipi9178

Jun 21

Hey, I don't want to sound ungrateful, or impatient (ok, maybe I'm a little impatient lol) but it's been almost 4 weeks since you mentioned you had first R1-0528-Chimera, so:

How's the testing going on it?
When can we expect release?

TNGHK

TNG Technology Consulting GmbH org Jun 21

Good questions.

How big should the performance increment be, in your opinion, to justify a R1T-0528 release? E.g. in standard benchmarks such as AIME24, AIME25, GPQA, HLE; or in benchmarks that you like?

This assumes roughly similar speed / token count to the original R1T.

Thanks for your feedback.

djuna

Jun 22

R1-0528 is more Gemini-like than V3-0324 which more GPT-like

usrlocalben

Jun 22

in your opinion, to justify...

Objectively, the knowledge cutoff date would be advanced. (Or at least one would assume as this info seems difficult to find)

prettystupid

Jun 24

How big should the performance increment be, in your opinion, to justify a R1T-0528 release?

The greatest advantage of R1-0528 over R1, for me, is the multi-turn coherency. R1T was itself a significant step forward over R1 in this way. I'm not sure of a good benchmark to capture that, but I'd naively hope that the same mechanism that improved R1T over R1 in that area would further improve R1T-0528 over R1-0528 in multi-turn.

dixipi9178

Jun 24

•

edited Jun 24

How big should the performance increment be, in your opinion, to justify a R1T-0528 release?

Tbh any improvement, that has statistical significance will be worth it, as it will push pareto frontier in speed vs intelligence in open-weight models

TNGHK

TNG Technology Consulting GmbH org Jun 24

Right now, the versions we have that appear to be significantly smarter than R1T Chimera are also slower.

They are approximately like the "old" R1 in thinking speed, but appear to be significantly smarter, albeit not R1-0528-level.

So, if all plays out, you would have a model that is about as fast as R1, i.e. a lot faster than R1-0528, but also smarter than R1, but not R1-0528.

Is that interesting?

djuna

Jun 24

Interested

rbrt

TNG Technology Consulting GmbH org Jun 25

•

edited Jun 25

We're looking for feedback while scanning the model-space. For those interested in trying the latest variant of DeepSeek-R1T-0528-Chimera, you can chat with the model here:

R1T-0528-Chimera beta access

You first need to switch the selected model:

Please note: it's a research prototype with certain limitations (think-tag consistency below 100%, some hallucinations).

Send your feedback per mail to research at tngtech.com we're looking forward to it!

TNGHK

TNG Technology Consulting GmbH org Jul 2

New version released :-)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment