|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
base_model: SVECTOR/Theta-35 |
|
tags: |
|
- chat |
|
- reasoning |
|
library_name: transformers |
|
--- |
|
|
|
# Theta-35 |
|
|
|
## Introduction |
|
|
|
Theta-35 is the advanced reasoning model in the Theta series by SVECTOR. Compared with conventional instruction-tuned models, Theta-35, which specializes in complex thinking and reasoning, achieves significantly enhanced performance in downstream tasks, particularly for challenging problems requiring deep logical analysis and multistep reasoning. |
|
|
|
<p align="center"> |
|
<img width="100%" src="figures/benchmark.png"> |
|
</p> |
|
|
|
**This repo contains the Theta-35 model**, which has the following features: |
|
- Training Stage: Pretraining & Post-training (Supervised Finetuning and Reinforcement Learning) |
|
- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
- Number of Parameters: 35B |
|
- Number of Parameters (Non-Embedding): 33.5B |
|
- Number of Layers: 64 |
|
- Number of Attention Heads (GQA): 40 for Q and 8 for KV |
|
- Context Length: Full 131,072 tokens |
|
- Sliding Window: 32,768 tokens |
|
|
|
**Note:** For the best experience, please review the [usage guidelines](#usage-guidelines) before deploying Theta models. |
|
|
|
For more details, please refer to our [documentation](https://www.svector.co.in/models/theta-35). |
|
|
|
## Requirements |
|
|
|
Theta-35 requires the latest version of Hugging Face `transformers`. We advise you to use version 4.43.1 or newer. |
|
|
|
With older versions of transformers, you may encounter the following error: |
|
``` |
|
KeyError: 'theta' |
|
``` |
|
|
|
## Quickstart |
|
|
|
Here is a code snippet showing how to load the tokenizer and model, and how to generate content: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load model and tokenizer directly |
|
model_name = "SVECTOR-CORPORATION/Theta-35" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Prepare prompt |
|
prompt = "How many planets are in our solar system? Explain your reasoning." |
|
messages = [ |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True # This will automatically add "<reasoning>" tag |
|
) |
|
|
|
# Generate response |
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
generated_ids = model.generate( |
|
**model_inputs, |
|
max_new_tokens=32768, |
|
temperature=0.6, |
|
top_p=0.95, |
|
top_k=30 |
|
) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
# Decode and print response |
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(response) |
|
``` |
|
|
|
### Usage Guidelines |
|
|
|
To achieve optimal performance with Theta-35, we recommend the following settings: |
|
|
|
1. **Enforce Thoughtful Output**: Ensure the model starts with "\<reasoning\>\n" to promote step-by-step thinking, which enhances output quality. If you use `apply_chat_template` and set `add_generation_prompt=True`, this is automatically implemented. |
|
|
|
2. **Sampling Parameters**: |
|
- Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid repetitions. |
|
- Use TopK between 20 and 40 to filter out rare token occurrences while maintaining diversity. |
|
|
|
3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking. |
|
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt. |
|
- **Multiple-Choice Questions**: Add "Please show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`." to the prompt. |
|
|
|
4. **Handle Long Inputs**: For inputs exceeding 32,768 tokens, enable sliding window attention to improve the model's ability to process long sequences efficiently. |
|
|
|
For supported frameworks, you could add the following to `config.json` to enable extended context handling: |
|
```json |
|
{ |
|
..., |
|
"use_sliding_window": true, |
|
"sliding_window": 32768 |
|
} |
|
``` |
|
|
|
## Evaluation & Performance |
|
|
|
Theta-35 demonstrates exceptional performance across various reasoning tasks, including: |
|
|
|
- Mathematical reasoning |
|
- Logical deduction |
|
- Multi-step problem solving |
|
- Code understanding and generation |
|
- Scientific reasoning |
|
|
|
Detailed evaluation results are reported in our [documentation](https://www.svector.co.in/models/theta-35). |
|
|
|
## Citation |
|
|
|
If you find our work helpful, feel free to give us a cite. |
|
|
|
``` |
|
@misc{theta35, |
|
title = {Theta-35: Advanced Reasoning in Large Language Models}, |
|
url = {https://www.svector.co.in/models/theta-35}, |
|
author = {SVECTOR Team}, |
|
month = {March}, |
|
year = {2025} |
|
} |
|
|
|
@article{theta, |
|
title={Theta Technical Report}, |
|
author={SVECTOR Research Team}, |
|
year={2025} |
|
} |
|
``` |