SambaInstruct-Alpha-v1.1-12B

An experimental assistant model fine tuned with ZERO GPT-4/3.5/Claude/etc. data. Instead, data was obtained by using a causal model (Nemo 12B base) to complete turns. Methodology used to obtain data is similar to URIAL. We let the model generate examples.

Despite still being synthetic, the resulting dataset is more conversational and has a drastically different feel from typical corporate model output. We are still addressing issues related to hallucinations and are hoping to publish a ready dataset in the future.

Please use the ChatML template. The recommended prompt we trained is You are a helpful AI chatbot. and it probably will work best, but make sure to test different configurations.

The model might hallucinate some personhood (due to the data), but it should still refer to itself as an AI. This doesn't matter too much and the system prompt does seem to be able to impose an identity on the model.

Findings and Information

As always, we use Unsloth with Google Colab. For this reason, we train a QLoRA. The model works rather nicely so it doesn't matter so much.

Nemo trained WAY better than Qwen2.5. It seems this will the base we will use from now on. It also feels also slightly more uncensored and human-like in its responses. According to the technical report Qwen2.5 uses synthetic data (here definitely OpenAI's) in the pretrain so this is unsurprising at all xD

We modified the dataset a bit, adding more data and revising the existing. We added some more multi-turn conversation.

If you encounter EOS bleeding, lower the model temperature and increase Min-P, lower Top-K etc.

Future ideas

Thinking mode/chain of thought.
More specialized data (creative writing, roleplay, code, math etc.)
- Creative writing data might use nothingiisreal/Reddit-Dirty-And-WritingPrompts or even one of the Gutenberg datasets.
More compliance and customizable alignment. Ideally you'd be able to define safety settings in the system prompt, since not everybody wants an unhinged psychopath model, as fun as wouldn't sound!

toasterai
/

SambaInstruct-Alpha-v1.1-12B

SambaInstruct-Alpha-v1.1-12B

Findings and Information

Future ideas

Model tree for toasterai/SambaInstruct-Alpha-v1.1-12B

Collection including toasterai/SambaInstruct-Alpha-v1.1-12B

SambaInstruct