Text Generation
Transformers
Safetensors
English
qwen3
shining-valiant
shining-valiant-3
valiant
valiant-labs
qwen
qwen-3
qwen-3-4b
4b
reasoning
code
code-reasoning
science
science-reasoning
physics
biology
chemistry
earth-science
astronomy
machine-learning
artificial-intelligence
compsci
computer-science
information-theory
ML-Ops
math
cuda
deep-learning
agentic
LLM
neuromorphic
self-improvement
complex-systems
cognition
linguistics
philosophy
logic
epistemology
simulation
game-theory
knowledge-management
creativity
problem-solving
architect
engineer
developer
creative
analytical
expert
rationality
conversational
chat
instruct
text-generation-inference
File size: 4,392 Bytes
ad3c23e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- shining-valiant
- shining-valiant-3
- valiant
- valiant-labs
- qwen
- qwen-3
- qwen-3-4b
- 4b
- reasoning
- code
- code-reasoning
- science
- science-reasoning
- physics
- biology
- chemistry
- earth-science
- astronomy
- machine-learning
- artificial-intelligence
- compsci
- computer-science
- information-theory
- ML-Ops
- math
- cuda
- deep-learning
- transformers
- agentic
- LLM
- neuromorphic
- self-improvement
- complex-systems
- cognition
- linguistics
- philosophy
- logic
- epistemology
- simulation
- game-theory
- knowledge-management
- creativity
- problem-solving
- architect
- engineer
- developer
- creative
- analytical
- expert
- rationality
- conversational
- chat
- instruct
base_model: Qwen/Qwen3-4B
datasets:
- sequelbox/Celestia3-DeepSeek-R1-0528
- sequelbox/Mitakihara-DeepSeek-R1-0528
- sequelbox/Raiden-DeepSeek-R1
license: apache-2.0
---
**[Support our open-source dataset and model releases!](https://huggingface.co/spaces/sequelbox/SupportOpenSource)**

Shining Valiant 3: [Qwen3-1.7B](https://huggingface.co/ValiantLabs/Qwen3-1.7B-ShiningValiant3), [Qwen3-4B](https://huggingface.co/ValiantLabs/Qwen3-4B-ShiningValiant3), [Qwen3-8B](https://huggingface.co/ValiantLabs/Qwen3-8B-ShiningValiant3)
Shining Valiant 3 is a science, AI design, and general reasoning specialist built on Qwen 3.
- Finetuned on our newest [science reasoning](https://huggingface.co/datasets/sequelbox/Celestia3-DeepSeek-R1-0528) data generated with [Deepseek R1 0528!](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528)
- AI to build AI: our [high-difficulty AI reasoning](https://huggingface.co/datasets/sequelbox/Mitakihara-DeepSeek-R1-0528) data makes Shining Valiant 3 your friend for building with current AI tech and discovering new innovations and improvements!
- Improved [general and creative reasoning](https://huggingface.co/datasets/sequelbox/Raiden-DeepSeek-R1) to supplement problem-solving and general chat performance.
- Small model sizes allow running on local desktop and mobile, plus super-fast server inference!
## Prompting Guide
Shining Valiant 3 uses the [Qwen 3](https://huggingface.co/Qwen/Qwen3-4B) prompt format.
Shining Valiant 3 is a reasoning finetune; **we recommend enable_thinking=True for all chats.**
Example inference script to get started:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ValiantLabs/Qwen3-4B-ShiningValiant3"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Propose a novel cognitive architecture where the primary memory component is a Graph Neural Network (GNN). How would this GNN represent working, declarative, and procedural memory? How would the \"cognitive cycle\" be implemented as operations on this graph?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
```

Shining Valiant 3 is created by [Valiant Labs.](http://valiantlabs.ca/)
[Check out our HuggingFace page to see all of our models!](https://huggingface.co/ValiantLabs)
We care about open source. For everyone to use.
|