File size: 6,091 Bytes
312ccb6
a7e6151
8cf255b
312ccb6
a7e6151
 
 
312ccb6
 
 
a7e6151
312ccb6
 
 
 
 
8cf255b
a7e6151
8cf255b
a7e6151
8cf255b
a7e6151
 
 
8cf255b
a7e6151
 
 
8cf255b
a7e6151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d9f64c
 
 
 
 
8cf255b
5d9f64c
 
 
 
 
a7e6151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8cf255b
a7e6151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
312ccb6
a7e6151
 
312ccb6
a7e6151
 
 
 
312ccb6
a7e6151
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
base_model:
- saishshinde15/Clyrai_Base_Reasoning
tags:
- vortex-family
- sft
- high-quality-data
- text-generation-inference
- transformers
- qwen2
- grpo
license: apache-2.0
language:
- en
---

# Clyrai Vortex  

- **Developed by:** clyrai 
- **License:** apache-2.0  
- **Fine-tuned from:** saishshinde15/Clyrai_Base_Reasoning  
- **Part of:** Vortex Family (A collection of four fine-tuned SFT models)  

## **Model Description**  
Clyrai Vortex is a **highly refined reasoning model** built upon `saishshinde15/Clyrai_Base_Reasoning`, further enhanced with **high-quality, curated datasets** that the base model lacked. This model is part of the **Vortex Family**, a series of four fine-tuned models designed for advanced reasoning, knowledge synthesis, and structured response generation.  

Unlike typical reinforcement learning-based improvements, **Supervised Fine-Tuning (SFT) was chosen** to ensure greater **control, stability, and alignment with human-preferred responses**, making Vortex more **reliable, interpretable, and useful** across a wide range of tasks.  

## **Why Clyrai Vortex Stands Out**  
- **Enhanced Knowledge & Reasoning**: Incorporates **higher-quality training data** to fill gaps in the base model, improving factual accuracy and logical reasoning.  
- **Better Response Coherence**: Fine-tuned to provide **more structured, well-reasoned, and contextually relevant answers** across different domains.  
- **Improved Handling of Complex Queries**: Excels in **multi-step logical deductions, research-oriented tasks, and structured decision-making**.  
- **Robust Generalization**: Performs well across **scientific, technical, and analytical reasoning problems**, ensuring reliability in diverse scenarios.  

## **Why Supervised Fine-Tuning (SFT) Instead of RL?**  
- **Greater Control Over Model Behavior**: SFT allows fine-tuning with **directly labeled high-quality data**, ensuring model responses remain **predictable and stable**.  
- **Avoids Reinforcement Learning Pitfalls**: Unlike RLHF (Reinforcement Learning with Human Feedback), which can lead to **over-optimization, reward hacking, or unintended biases**, SFT maintains a **balanced, reliable output**.  
- **Ensures Logical Consistency**: RL-based training can sometimes lead to **erratic or unnatural responses** in complex reasoning tasks. SFT helps **retain logical flow and factual correctness**.  
- **Preserves Efficiency**: SFT is computationally efficient and does not require the complex reward modeling and multi-stage training processes of RL.  

## **Intended Use Cases**  
- **Advanced Question-Answering**: Excels in **analytical, technical, and logical Q&A**, ensuring well-structured responses.  
- **Research & Knowledge Synthesis**: Processes and summarizes large amounts of information with **greater precision**.  
- **Problem-Solving & Deductive Reasoning**: Handles **multi-step logical deductions** effectively.  
- **Code & Algorithmic Logic**: Useful for **debugging, explaining code, and structuring algorithmic solutions**.  

## **Usage**  

# Follow the below structure to call the model using unsloth:
```python
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "saishshinde15/Clyrai_Vortex",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit
)

FastLanguageModel.for_inference(model)
instruction = """You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."""""

messages = [
    {"role": "system", "content": instruction},
    {"role": "user", "content": "who made you?"}
]

# Apply chat template (without tokenization but adding a generation prompt)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize prompt properly for model input
inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True).to("cuda")

# Generate response
outputs = model.generate(**inputs, max_new_tokens=1500, num_return_sequences=1)

# Decode output correctly
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract assistant response safely
assistant_start = text.find("assistant")
if assistant_start != -1:
    response = text[assistant_start + len("assistant"):].strip()
else:
    response = text  # Fallback: return full text if "assistant" is not found

print(response)
```

# Follow the below structure to call the model using Transformers:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
model_name = "saishshinde15/Clyrai_Vortex"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Define the system instruction
instruction = """You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."""

# Prepare input prompt using chat template
messages = [
    {"role": "system", "content": instruction},
    {"role": "user", "content": "Who made you?"}
]

# Format the prompt
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize input
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(device)

# Generate response with proper sampling parameters
output_ids = model.generate(
    **inputs,
    max_new_tokens=1500,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
)

# Decode output correctly
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Extract assistant response safely
assistant_start = response.find("assistant")
if assistant_start != -1:
    response = response[assistant_start + len("assistant"):].strip()

print(response)