SVECTOR-OFFICIAL commited on
Commit
7789304
·
verified ·
1 Parent(s): 76c8b87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -3
README.md CHANGED
@@ -1,3 +1,147 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ base_model: SVECTOR/Theta-35
7
+ tags:
8
+ - chat
9
+ - reasoning
10
+ library_name: transformers
11
+ ---
12
+
13
+ # Theta-35
14
+
15
+ ## Introduction
16
+
17
+ Theta-35 is the advanced reasoning model in the Theta series by SVECTOR. Compared with conventional instruction-tuned models, Theta-35, which specializes in complex thinking and reasoning, achieves significantly enhanced performance in downstream tasks, particularly for challenging problems requiring deep logical analysis and multistep reasoning.
18
+
19
+ <p align="center">
20
+ <img width="100%" src="figures/benchmark.jpg">
21
+ </p>
22
+
23
+ **This repo contains the Theta-35 model**, which has the following features:
24
+ - Type: Causal Language Models
25
+ - Training Stage: Pretraining & Post-training (Supervised Finetuning and Reinforcement Learning)
26
+ - Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
27
+ - Number of Parameters: 35B
28
+ - Number of Parameters (Non-Embedding): 33.5B
29
+ - Number of Layers: 64
30
+ - Number of Attention Heads (GQA): 40 for Q and 8 for KV
31
+ - Context Length: Full 131,072 tokens
32
+ - Sliding Window: 32,768 tokens
33
+
34
+ **Note:** For the best experience, please review the [usage guidelines](#usage-guidelines) before deploying Theta models.
35
+
36
+ For more details, please refer to our [documentation](https://www.svector.co.in/models/theta-35).
37
+
38
+ ## Requirements
39
+
40
+ Theta-35 requires the latest version of Hugging Face `transformers`. We advise you to use version 4.43.1 or newer.
41
+
42
+ With older versions of transformers, you may encounter the following error:
43
+ ```
44
+ KeyError: 'theta'
45
+ ```
46
+
47
+ ## Quickstart
48
+
49
+ Here is a code snippet showing how to load the tokenizer and model, and how to generate content:
50
+
51
+ ```python
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer
53
+
54
+ # Load model and tokenizer directly
55
+ model_name = "SVECTOR-CORPORATION/Theta-35"
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ model_name,
58
+ torch_dtype="auto",
59
+ device_map="auto"
60
+ )
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
62
+
63
+ # Prepare prompt
64
+ prompt = "How many planets are in our solar system? Explain your reasoning."
65
+ messages = [
66
+ {"role": "user", "content": prompt}
67
+ ]
68
+ text = tokenizer.apply_chat_template(
69
+ messages,
70
+ tokenize=False,
71
+ add_generation_prompt=True # This will automatically add "<reasoning>" tag
72
+ )
73
+
74
+ # Generate response
75
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
76
+ generated_ids = model.generate(
77
+ **model_inputs,
78
+ max_new_tokens=32768,
79
+ temperature=0.6,
80
+ top_p=0.95,
81
+ top_k=30
82
+ )
83
+ generated_ids = [
84
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
85
+ ]
86
+
87
+ # Decode and print response
88
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
89
+ print(response)
90
+ ```
91
+
92
+ ### Usage Guidelines
93
+
94
+ To achieve optimal performance with Theta-35, we recommend the following settings:
95
+
96
+ 1. **Enforce Thoughtful Output**: Ensure the model starts with "\<reasoning\>\n" to promote step-by-step thinking, which enhances output quality. If you use `apply_chat_template` and set `add_generation_prompt=True`, this is automatically implemented.
97
+
98
+ 2. **Sampling Parameters**:
99
+ - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid repetitions.
100
+ - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining diversity.
101
+
102
+ 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
103
+ - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
104
+ - **Multiple-Choice Questions**: Add "Please show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`." to the prompt.
105
+
106
+ 4. **Handle Long Inputs**: For inputs exceeding 32,768 tokens, enable sliding window attention to improve the model's ability to process long sequences efficiently.
107
+
108
+ For supported frameworks, you could add the following to `config.json` to enable extended context handling:
109
+ ```json
110
+ {
111
+ ...,
112
+ "use_sliding_window": true,
113
+ "sliding_window": 32768
114
+ }
115
+ ```
116
+
117
+ ## Evaluation & Performance
118
+
119
+ Theta-35 demonstrates exceptional performance across various reasoning tasks, including:
120
+
121
+ - Mathematical reasoning
122
+ - Logical deduction
123
+ - Multi-step problem solving
124
+ - Code understanding and generation
125
+ - Scientific reasoning
126
+
127
+ Detailed evaluation results are reported in our [documentation](https://www.svector.co.in/models/theta-35).
128
+
129
+ ## Citation
130
+
131
+ If you find our work helpful, feel free to give us a cite.
132
+
133
+ ```
134
+ @misc{theta35,
135
+ title = {Theta-35: Advanced Reasoning in Large Language Models},
136
+ url = {https://www.svector.co.in/models/theta-35},
137
+ author = {SVECTOR Team},
138
+ month = {March},
139
+ year = {2025}
140
+ }
141
+
142
+ @article{theta,
143
+ title={Theta Technical Report},
144
+ author={SVECTOR Research Team},
145
+ year={2025}
146
+ }
147
+ ```