File size: 13,438 Bytes

---
language:
- ru
license: apache-2.0
base_model:
- Qwen/Qwen3-32B
pipeline_tag: text-generation
library_name: transformers
---
# T-pro-it-2.0

**🚨 Users are advised to exercise caution and are responsible for any additional training and oversight required to ensure the model's responses meet acceptable ethical and safety standards. The responsibility for incorporating this model into industrial or commercial solutions lies entirely with those who choose to deploy it.**

## Description

T-pro-it-2.0 is a model built upon the Qwen 3 model family and incorporates both continual pre-training and alignment techniques. 

### 📚 Dataset

Instruction Pre-Training: 
40B tokens of instruction data, with one-third focused on reasoning tasks.

Supervised Fine-Tuning (SFT): 
~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up about 20% of the dataset.

Preference Tuning: 
~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.

## 📊 Benchmarks

| Model                              | MERA | ruMMLU | Ru Arena Hard | ru AIME 2025 | ru LCB |
|------------------------------------|:----:|:------:|:-------------:|:------------:|:------:|
| **T-pro 2.0**                      | **0.660** | **0.790** | **0.876** | **0.646** | **0.563** |
| Qwen 3 32B                         | 0.584 | 0.740 | 0.836 | 0.625 | 0.537 |
| Ruadapt 3 32B V2                   | 0.574 | 0.737 | 0.660 | 0.450 | 0.500 |
| DeepSeek-R1-Distill-Qwen-32B       | 0.508 | 0.702 | 0.426 | 0.402 | 0.493 |
| Gemma 3 27B                        | 0.577 | 0.695 | 0.759 | 0.231 | 0.261 |

## Switching Between Thinking and Non‑Thinking Modes

To enable or disable reasoning mode in HuggingFace, set the `enable_thinking` flag in `tokenizer.apply_chat_template`.  
For more details, see:  
- [SGLang Thinking/Non‑Thinking Modes](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes)  
- [vLLM Thinking/Non‑Thinking Modes](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes)  

---

## Recommended Generation Parameters

| Mode                              | Temperature | presence_penalty |
|-----------------------------------|-------------|------------------|
| No‑think (general requests)       | ≤ 0.3       | 1.0              |
| Think mode (standard requests)    | ≈ 0.6       | 1.0              |
| Complex reasoning requests        | ≥ 0.8       | 1.0              |

- Hybrid reasoning models need careful tuning of sampling hyperparameters, which vary by domain.   
- Use lower temperature for straightforward queries and higher temperature for complex 'think-mode' tasks. 
- A presence_penalty between 0 and 2 can help avoid repetitive outputs.


## 👨‍💻 Examples of usage



## SGLang Usage
For better quality and stable performance, we recommend SGLang as your inference framework.

To run an inference server for **T-pro-it-2.0**, start by launching the SGLang server:

```bash
python -m sglang.launch_server \
    --model-path t-tech/T-pro-it-2.0 \
    --reasoning-parser qwen3
````

Once the server is up and listening on `localhost:30000`, you can send chat-based requests via the OpenAI Python client.

```python
import openai

client = openai.OpenAI(
    base_url="http://127.0.0.1:30000/v1",
    api_key="ANY"  # the server ignores the API key
)

prompt = (
    "Пожалуйста, вычисли определённый интеграл ∫_0^1 x² eˣ dx, "
    "пошагово объясни решение и укажи окончательный результат."
)

completion = client.chat.completions.create(
    model="ANY",  # the server ignores the model name
    messages=[
        {"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
        {"role": "user", "content": prompt}
    ],
    # REQUIRED: sampling params from the "Recommended Generation Parameters" table
    temperature=0.6,
    presence_penalty=1.0,
)

# The generated reply is in `completion.choices[0].message.content`
print(completion.choices[0].message.content)
```

**Note:** It is **obligatory** to include both `temperature` and `presence_penalty` in every completion call.


### HF Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
torch.manual_seed(42)

model_name = "t-tech/T-pro-it-2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype="auto",
    device_map="auto",
)

prompt = (
    "Пожалуйста, вычисли определённый интеграл ∫_0^1 x² eˣ dx, "
    "пошагово объясни решение и укажи окончательный результат."
)
messages = [
    {"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

```

Output:
```
<think>
Хорошо, мне нужно вычислить определенный интеграл от 0 до 1 функции x² * e^x dx. Я помню, что для интегралов такого вида, где есть произведение многочлена и экспоненты, обычно применяют метод интегрирования по частям. Давайте вспомню формулу интегрирования по частям: ∫u dv = uv - ∫v du. 

Первым делом нужно выбрать, что взять за u, а что за dv. Обычно в таких случаях многочлен (в данном случае x²) берут за u, потому что при дифференцировании его степень понижается, и в конце концов он превратится в константу, что упростит интеграл. Тогда dv будет оставшаяся часть, то есть e^x dx. 

Итак, пусть u = x², тогда du = 2x dx. А dv = e^x dx, значит, v = ∫e^x dx = e^x. 

Теперь применим формулу интегрирования по частям: ∫x² e^x dx = uv - ∫v du = x² e^x - ∫e^x * 2x dx. 

Теперь у нас остался интеграл ∫2x e^x dx. Его тоже нужно взять по частям. Опять выбираем u и dv. Теперь u = 2x, тогда du = 2 dx, а dv = e^x dx, значит, v = e^x. 

Применяем формулу снова: ∫2x e^x dx = 2x e^x - ∫e^x * 2 dx = 2x e^x - 2∫e^x dx. 

Интеграл ∫e^x dx равен e^x + C. Подставляем обратно: 2x e^x - 2e^x + C. 

Теперь возвращаемся к исходному интегралу: x² e^x - (2x e^x - 2e^x) + C = x² e^x - 2x e^x + 2e^x + C. 

Теперь нужно вычислить определенный интеграл от 0 до 1. Для этого подставим верхний и нижний пределы в первообразную и вычтем. 

Сначала подставим x = 1: (1² * e^1) - 2*1*e^1 + 2*e^1 = e - 2e + 2e = (1 - 2 + 2)e = 1*e = e. 

Теперь подставим x = 0: (0² * e^0) - 2*0*e^0 + 2*e^0 = 0 - 0 + 2*1 = 2. 

Вычитаем: e - 2. 

Проверю, не ошибся ли я в вычислениях. При подстановке x=1: 1*e - 2*1*e + 2*e = e - 2e + 2e = e. Да, верно. При x=0: 0 - 0 + 2*1 = 2. Тогда разность e - 2. 

Но подожди, может быть я где-то пропустил знак? В первообразной после интегрирования по частям: x² e^x - 2x e^x + 2e^x. Да, правильно. Тогда при подстановке 1: 1*e - 2*1*e + 2*e = (1 - 2 + 2)e = 1*e. При 0: 0 - 0 + 2*1 = 2. Тогда ответ e - 2. 

Но e примерно равно 2.718, значит, e - 2 ≈ 0.718. Интуитивно кажется, что интеграл от 0 до 1 x² e^x должен быть положительным, что соответствует. 

Может, стоит проверить через другой метод или дифференцированием первообразной? Возьмем производную от x² e^x - 2x e^x + 2e^x. 

Производная: d/dx [x² e^x] = 2x e^x + x² e^x. 

d/dx [-2x e^x] = -2 e^x - 2x e^x. 

d/dx [2e^x] = 2 e^x. 

Складываем все: 2x e^x + x² e^x - 2 e^x - 2x e^x + 2 e^x = x² e^x. 

Да, производная равна подынтегральной функции. Значит, первообразная найдена верно. Тогда определенный интеграл равен e - 2. 

Вроде все правильно. Возможно, можно было использовать рекуррентную формулу или таблицу интегралов, но метод по частям сработал. 

Итак, окончательный ответ: e - 2.
</think>

Для вычисления определённого интеграла  
$$
\int_0^1 x^2 e^x \, dx
$$  
используем **метод интегрирования по частям**.  

### Шаг 1: Применение интегрирования по частям  
Формула интегрирования по частям:  
$$
\int u \, dv = uv - \int v \, du.
$$  
Выбираем:  
- $ u = x^2 $, тогда $ du = 2x \, dx $,  
- $ dv = e^x \, dx $, тогда $ v = e^x $.  

Подставляем в формулу:  
$$
\int x^2 e^x \, dx = x^2 e^x - \int e^x \cdot 2x \, dx = x^2 e^x - 2 \int x e^x \, dx.
$$  

### Шаг 2: Интегрирование оставшегося интеграла  
Теперь вычисляем $ \int x e^x \, dx $, снова применяя интегрирование по частям:  
- $ u = x $, тогда $ du = dx $,  
- $ dv = e^x \, dx $, тогда $ v = e^x $.  

Получаем:  
$$
\int x e^x \, dx = x e^x - \int e^x \, dx = x e^x - e^x + C.
$$  

Подставляем обратно:  
$$
\int x^2 e^x \, dx = x^2 e^x - 2 \left( x e^x - e^x \right) + C = x^2 e^x - 2x e^x + 2 e^x + C.
$$  

### Шаг 3: Вычисление определённого интеграла  
Подставляем пределы $ 0 $ и $ 1 $:  
$$
\left[ x^2 e^x - 2x e^x + 2 e^x \right]_0^1 = \left( 1^2 e^1 - 2 \cdot 1 \cdot e^1 + 2 e^1 \right) - \left( 0^2 e^0 - 2 \cdot 0 \cdot e^0 + 2 e^0 \right).
$$  

Упрощаем:  
- При $ x = 1 $:  
  $$
  e - 2e + 2e = e.
  $$  
- При $ x = 0 $:  
  $$
  0 - 0 + 2 \cdot 1 = 2.
  $$  

Итоговый результат:  
$$
e - 2.
$$  

### Ответ:  
$$
\boxed{e - 2}
$$
```

### VLLM Usage

```python
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "t-tech/T-pro-it-2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, max_model_len=8192)
sampling_params = SamplingParams(temperature=0.7,
                                repetition_penalty=1.05,
                                top_p=0.8, top_k=70,
                                max_tokens=512)

prompt = (
    "Пожалуйста, вычисли определённый интеграл ∫_0^1 x² eˣ dx, "
    "пошагово объясни решение и укажи окончательный результат."
)
messages = [
    {"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
    {"role": "user", "content": prompt}
]

prompt_token_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)
```

## Long Context Usage
T-pro-it-2.0 natively supports a context length of 32,768 tokens.  
For conversations where the input significantly exceeds this limit, follow the recommendations from the [Qwen3 model card](https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-long-texts) on processing long texts.

For example, in SGLang, you can enable 128K context support with the following command:  
`llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768`