Ba2han
/

Qwen3-30B-A3B-Geminized-v0.2

Model card Files Files and versions Community

Qwen3-30B-A3B-Geminized-v0.2 / README.md

Ba2han's picture

Update README.md

c2fb401 verified 29 days ago

|

history blame contribute delete

917 Bytes

	---
	license: mit
	language:
	- en
	base_model:
	- Qwen/Qwen3-30B-A3B
	---


	> [!NOTE]
	> Use "You are an assistant with reasoning capabilities." system message to consistently trigger gemini-style thinking.

	> [!NOTE]
	> I'm working on improving the dataset & model and will release a new, full version.


	# Training Dataset

	- The fine-tuning dataset consists of ~450 diverse examples, 250 of which are directly from Gemini 2.5 Pro.

	## Trained on:
	- Unsloth version of Qwen3-30B-A3B (instruct).
	- 32k seq_len with examples ranging from 1k to ~20k tokens.
	- Up to 2 turns of conversations.

	---

	- No benchmark data for now.

	Keep in mind that it's slightly overfit since the training dataset was quite small. The model can be used to create more high quality examples for further training.


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324eabf05bd8a54c6eb1650/TEBe1XQvpJA2IZ63btFWT.png)