Q3-8B-Kintsugi / README.md
Fizzarolli's picture
Update README.md
7ab6351 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3-8B-Base
library_name: transformers
tags:
  - mergekit
  - axolotl
  - unsloth
  - roleplay
  - conversational
datasets:
  - PygmalionAI/PIPPA
  - Alfitaria/nemotron-ultra-reasoning-synthkink
  - PocketDoc/Dans-Prosemaxx-Gutenberg
  - FreedomIntelligence/Medical-R1-Distill-Data
  - cognitivecomputations/SystemChat-2.0
  - allenai/tulu-3-sft-personas-instruction-following
  - kalomaze/Opus_Instruct_25k
  - simplescaling/s1K-claude-3-7-sonnet
  - ai2-adapt-dev/flan_v2_converted
  - grimulkan/theory-of-mind
  - grimulkan/physical-reasoning
  - nvidia/HelpSteer3
  - nbeerbower/gutenberg2-dpo
  - nbeerbower/gutenberg-moderne-dpo
  - nbeerbower/Purpura-DPO
  - antiven0m/physical-reasoning-dpo
  - allenai/tulu-3-IF-augmented-on-policy-70b
  - NobodyExistsOnTheInternet/system-message-DPO

Q3-8B-Kintsugi

Sketch drawing of a picture of a Kitsune hugging a Fox plushie on her bed. Generated with Midjourney v7 get it? because kintsugi sounds like kitsune? hahaha-

Overview

Q3-8B-Kintsugi is a roleplaying model finetuned from Qwen3-8B-Base.

During testing, Kintsugi punched well above its weight class in terms of parameters, especially for 1-on-1 roleplaying and general storywriting.

Quantizations

EXL3:

GGUF:

MLX:

Usage

  • Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do not need to prefill empty think tags for it not to reason -- see below).

  • Settings used by testers varied, but we generally stayed around 0.9 temperature and 0.1 min p. Do not use repetition penalties (DRY included). They break it.

  • Any system prompt can likely be used, but I used the Shingame system prompt (link will be added later i promise)

  • The official instruction following version of Qwen3-8B was not used as a base. Instruction-following is trained in post-hoc, and "thinking" traces were not included. As a result of this, "thinking" will not function.

Training Process

  1. The base model first went through a supervised finetune on a corpus of instruction following data, roleplay conversations, and human writing based on the Ink/Bigger Body/Remnant lineage.

  2. Finally, a KTO reinforcement learning phase steered the model away from the very purple prose the initial merge had, and improved its logical+spatial reasoning and sense of overall "intelligence".

Both stages here are very similar to Q3-30B-A3B-Designant, which went through a very similar process with the same data.

Credits

  • Fizz - Training, Data Wrangling

  • Toaster, Mango, Bot, probably others I forgot ;-; - Testing

  • inflatebot - original Designant model card that this one was yoinked from

  • Artus - Funding

  • Alibaba - Making the original model

  • Axolotl, Unsloth, Huggingface - Making the frameworks used to train this model (Axolotl was used for the SFT process, and Unsloth+TRL was used for the KTO process)

  • All quanters, inside and outside the org, specifically Artus, Lyra, and soundTeam/Heni

We would like to thank the Allura community on Discord, especially Curse, Heni, Artus and Mawnipulator, for their companionship and moral support. You all mean the world to us <3