--- license: apache-2.0 base_model: Qwen/Qwen3-8B-Base library_name: transformers tags: - mergekit - axolotl - unsloth - roleplay - conversational datasets: - PygmalionAI/PIPPA - Alfitaria/nemotron-ultra-reasoning-synthkink - PocketDoc/Dans-Prosemaxx-Gutenberg - FreedomIntelligence/Medical-R1-Distill-Data - cognitivecomputations/SystemChat-2.0 - allenai/tulu-3-sft-personas-instruction-following - kalomaze/Opus_Instruct_25k - simplescaling/s1K-claude-3-7-sonnet - ai2-adapt-dev/flan_v2_converted - grimulkan/theory-of-mind - grimulkan/physical-reasoning - nvidia/HelpSteer3 - nbeerbower/gutenberg2-dpo - nbeerbower/gutenberg-moderne-dpo - nbeerbower/Purpura-DPO - antiven0m/physical-reasoning-dpo - allenai/tulu-3-IF-augmented-on-policy-70b - NobodyExistsOnTheInternet/system-message-DPO --- # Q3-8B-Kintsugi ![Sketch drawing of a picture of a Kitsune hugging a Fox plushie on her bed. Generated with Midjourney v7](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/o_fhP0riFrKh-5XyPxQyk.png) get it? because kintsugi sounds like kitsune? hahaha- # Overview ***Q3-8B-Kintsugi*** is a roleplaying model finetuned from [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base). During testing, Kintsugi punched well above its weight class in terms of parameters, especially for 1-on-1 roleplaying and general storywriting. # Quantizations EXL3: - [Official EXL3 quant repo](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-EXL3) GGUF: - [Official static GGUF quants](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-GGUF) MLX: - [8, 6, and 4bpw MLX-formrt quants by soundTeam](https://huggingface.co/collections/allura-quants/q3-8b-kintsugi-mlx-684fc48444f1214749f538c4) # Usage - Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do *not* need to prefill empty think tags for it not to reason -- see below). - Settings used by testers varied, but we generally stayed around 0.9 temperature and 0.1 min p. Do *not* use repetition penalties (DRY included). They break it. - Any system prompt can likely be used, but I used the Shingame system prompt (link will be added later i promise) - The official instruction following version of Qwen3-8B was not used as a base. Instruction-following is trained in post-hoc, and "thinking" traces were not included. __As a result of this, "thinking" will not function.__ # Training Process 1. The [base model](https://huggingface.co/Qwen/Qwen3-8B-Base) first went through a supervised finetune on a corpus of instruction following data, roleplay conversations, and human writing based on the [Ink](https://huggingface.co/collections/allura-org/ink-6772fd1442308781594bbabb)/[Bigger Body](https://huggingface.co/collections/allura-org/bigger-body-67b277af0861cec33b54745d)/[Remnant](https://huggingface.co/collections/allura-org/remnant-6817c2113bbb2aed501513d0) lineage. 2. Finally, a KTO reinforcement learning phase steered the model away from the very purple prose the initial merge had, and improved its logical+spatial reasoning and sense of overall "intelligence". Both stages here are very similar to [Q3-30B-A3B-Designant](https://huggingface.co/allura-org/Q3-30B-A3B-Designant), which went through a very similar process with the same data. # Credits - Fizz - Training, Data Wrangling - Toaster, Mango, Bot, probably others I forgot ;-; - Testing - inflatebot - original Designant model card that this one was yoinked from - Artus - Funding - Alibaba - Making the original model - Axolotl, Unsloth, Huggingface - Making the frameworks used to train this model (Axolotl was used for the SFT process, and Unsloth+TRL was used for the KTO process) - All quanters, inside and outside the org, specifically Artus, Lyra, and soundTeam/Heni We would like to thank the Allura community on Discord, especially Curse, Heni, Artus and Mawnipulator, for their companionship and moral support. You all mean the world to us <3