Wordstorms difference and AMD inference

#1
by SzilviaB - opened

Hi David,

What is the difference between all the wordstorm models, it does not seem to get into details in the description.

Have you had the chance of comparing the same model on an NVidia vs AMD card and if so, was there any difference in the output quality ?

Thank You

Owner

Hey;

Wordstorm (ORG) Mistral Nemo (12B), Wordstorm Dark Planet (Llama 3 - 8B), and Fiction on Fire (Qwen3 4b) are all based on "evolving"
a model using merging and density changes using Mergekit.

Density (less than 1) activates random pruning; which changes the model.

This creates an array of new models, which can be used in new merges, MOEs, by themselves, merged together (best traits) and so on.
Each can also be used to evolve new models using density again => IE best trait(s) used to further evolve these traits.

IE: If you are looking for specific prose traits and two models from the series meet these =-> merge these together and/or evolve these further.

Although this is called "random pruning", I refer to it as "Merge Gambling".

(see the Qwen 3 series for more details, merge formula and methods)

In terms of differences -> there will be slight differences in generation/instruction following per model.
To see full differences:

Test each model , with same quant / same settings at TEMP=0 , with a "creative prompt".
This will show net core changes -> ie prose, word choice, bias, etc etc.

There seem to be about 11 wordstorms, which one would you recommend trying out ?

Are you using the mergekit on this website or local ?

Try 10 or 11 ; a number of users have remarked these are some of the best.

Mergekit; local gen.

thanks

Sign up or log in to comment