Model Stock:
- SicariusSicariiStuff/Negative_LLAMA_70B
- Saxo/Linkbricks-Horizon-AI-Llama-3.3-Japanese-70B-sft-dpo-base
Base Model:
- Doctor-Shotgun/L3.3-70B-Magnum-Nexus
Initial Try
This is a model stock merge starting with Magnum Nexus as a base, which seems fairly solid but overcompliant. There's not many good options of base models for 3.3 so reigning in the overcompliance is probably not accomplished. I've been mostly waiting for something new and interesting to come along but it seems like Llama 3 is still somehow the best we have at this size. Not a ton to say here, this is a random selection of models and it performed better than the individual parts in my test. All of the usual problems are present, but this model is coherent -- it's got a bit of that typical annoying preachiness at times but it's not as overbearing as any of the original models. This model had an unusually good recall characteristic in vibe testing so that's where the name comes from.
Format:
This uses stock Llama 3 instruct
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>
What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
It's very likely you will have good results using ChatML as well, which might give you a more interesting experience as merges tend to pick up chat ml in passing. I'd recommend sampling at 1.1 temp with a top-k of ~100 and a small min-p of 0.01-0.02ish.