SambaInstruct-Alpha-v1.1.1-12B

An experimental assistant model fine tuned with ZERO GPT-4/3.5/Claude/etc. data. Instead, data was obtained by using a causal model (Nemo 12B base) to complete turns. Methodology used to obtain data is similar to URIAL, but we let the model generate dialogues and modify them manually if needed.

Despite still being synthetic, the resulting dataset is more conversational and has a drastically different feel from typical corporate model output. We are still addressing issues related to hallucinations and are hoping to publish a ready dataset in the future.

Please use the ChatML template. The recommended prompt we trained is You are a helpful AI chatbot. Modifying the prompt should have a noticeable effect on the model.

The model might hallucinate some personhood (due to the data), but it should still refer to itself as an AI by default. This doesn't matter too much and the system prompt does seem to be able to impose an identity on the model.

Notes

Our last model had an issue with EOS bleeding. Lowering the temperature seems to fix the issue with this one, and they are very similar so hopefully this will fix it.

During testing, we noticed some issues with helpfulness, specifically the assistant claiming it can't write code. If this is a problem and does not resolve itself through response regeneration, please open a discussion on the model page.

We added a small amount of math instructions, rewrote some existing ones manually, etc.

Future ideas

Thinking mode/chain of thought.
More specialized data (creative writing, roleplay, code, math etc.)
- Creative writing data might use nothingiisreal/Reddit-Dirty-And-WritingPrompts or even one of the Gutenberg datasets.
More compliance and customizable alignment. Ideally you'd be able to define safety settings in the system prompt, since not everybody wants an unhinged psychopath model, as fun as that wouldn't sound!

toasterai
/

SambaInstruct-Alpha-v1.1.1-12B

SambaInstruct-Alpha-v1.1.1-12B

Notes

Future ideas

Model tree for toasterai/SambaInstruct-Alpha-v1.1.1-12B

Collection including toasterai/SambaInstruct-Alpha-v1.1.1-12B

SambaInstruct