|
--- |
|
base_model: |
|
- Qwen/Qwen3-4B-Thinking-2507 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- qwen3 |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
**PPRL-1-Small** is an advanced language model specifically optimized for high-quality writing generation. It is finetuned from Qwen3-4B-thinking-2507 using a Online BNPO (GRPO variant) training methodology. This approach significantly enhances the model's ability to perform deep thinking, resulting in outputs with superior creativity, logical coherence, and narrative depth. |
|
|
|
**Training Procedure** |
|
Preprocessing: Used deepseek r1 0528 and deepseek v3.1 generated 10k samples of creative writing. Then sft. |
|
|
|
SFT Fine-tuning 2: Used our own private dataset and done 1252 steps of supervised finetuning. |
|
|
|
RL Fine-tuning: Online BNPO alignment using unsloth with a private critic model generate critic data,then use dsv3.1 as reward model. |
|
|
|
Hardware: single A800 80GB GPU |
|
Training Time: Approximately 72 GPU Hours |
|
|
|
|
|
**Open-Source Contribution: The qwen3_4 Dataset** |
|
We have open-sourced a portion of the dataset used for the BNPO training phase as qwen3_4. We believe open collaboration is key to progress and invite the community to contribute to and expand this dataset to help advance the state of AI-assisted writing. |
|
You can commit to the dataset to support our work. |
|
|
|
**Uses** |
|
The model is intended for: |
|
|
|
Creative Writing: Generating stories, poetry, scripts, and other narrative content. |
|
Long-Form Content Creation: Writing essays, articles, reports, and blog posts with strong logical flow. |
|
Content Enhancement & Rewriting: Improving the creativity and coherence of existing text. |
|
|
|
**IMPORTANT** |
|
For any commercial use, you MUST report to me first. Otherwise, it will be considered an illegal action. |
|
|
|
**How to Get Started with the Model** |
|
Normal transformers inference framework is all available. Use it as a normal Qwen3 2507 thinking model. |
|
|
|
**Future Work** |
|
This is just the beginning. We are continuously working on training larger and more capable models. Stay tuned for more updates! |
|
If you are interested in supporting my work or hiring me for a project, please feel free to contact me via email. I will be sharing my contact details here shortly. |
|
|
|
# Uploaded finetuned model |
|
|
|
- **Developed by:** AnonymousCodeX |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** Qwen/qwen3-4b-thinking-2507 |
|
|
|
This qwen3 model was trained 2x faster with [Unsloth] and Huggingface's TRL library. |