File size: 2,162 Bytes
b9631e8
 
 
 
 
 
7bf6ae0
b9631e8
0317935
b25b379
 
 
b9631e8
 
0317935
 
b25b379
0317935
b25b379
 
0317935
b25b379
 
 
 
 
 
0317935
b25b379
0317935
 
 
b25b379
 
 
 
0317935
b25b379
470d48d
b25b379
 
0317935
b25b379
 
 
 
 
 
 
 
0317935
b25b379
 
0317935
b25b379
 
 
0317935
b25b379
b9631e8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
base_model:
- google/gemma-3-4b-pt
pipeline_tag: text-generation
library_name: transformers
---
# **thinkygemma-4b: your average fake reasoner**
Fine-tuned from **Gemma-3-4b-pt**  

πŸ“Œ **Model ID:** `xsanskarx/thinkygemma-4b`  
πŸ“Œ **Parameters trained:** **1.8 billion**  
πŸ“Œ **Trained on:** **25k rows of verified Chain-of-Thought (CoT) traces** from **DeepSeek R1** and **Qwen QWQ**  
πŸ“Œ **Next planned step:** **GRPO** 
πŸ“Œ **adapters repo:** `xsanskarx/thinkgemma-4b`


---

## **Model Description**
This is a **fine-tuned version of Google's Gemma-3-4b-it**, adapted for **structured reasoning / fake induced reasoning **. It is designed to excel in acting like a great reasoner**.  

### **Training Details**
- **Hardware:** Single NVIDIA **H100**  
- **Training Time:** **9 hours (1 epoch)**  
- **Training Method:** **LoRA fine-tuning (r = 128, alpha = 256)**  
- **Dataset:** **25k CoT traces**  
- **Base Model:** `google/gemma-3-4b-it`  

---



### **Setup**
```python
from transformers import AutoTokenizer, Gemma3ForConditionalGeneration, TextStreamer
import torch

# Load model and tokenizer
model_id = "xsanskarx/thinkygemma-4b"
model = Gemma3ForConditionalGeneration.from_pretrained(model_id, device_map="auto").eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)

def ask_model(prompt: str, max_tokens=8192, temperature=0.7):
    """
    Function to ask a question to the model and stream the response.
    """
    messages = [
        {"role": "system", "content": "You are an expert math problem solver, think and reason inside <think> tags, enclose all reasoning in <think> tags, verifying logic step by step and then return your final structured answer"},
        {"role": "user", "content": prompt}
    ]

    formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

    streamer = TextStreamer(tokenizer, skip_special_tokens=True)
    with torch.inference_mode():
        model.generate(**inputs, max_new_tokens=max_tokens, do_sample=True, temperature=temperature, streamer=streamer)

# Example usage
ask_model("do 2+2")