File size: 6,091 Bytes
312ccb6 a7e6151 8cf255b 312ccb6 a7e6151 312ccb6 a7e6151 312ccb6 8cf255b a7e6151 8cf255b a7e6151 8cf255b a7e6151 8cf255b a7e6151 8cf255b a7e6151 5d9f64c 8cf255b 5d9f64c a7e6151 8cf255b a7e6151 312ccb6 a7e6151 312ccb6 a7e6151 312ccb6 a7e6151 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
base_model:
- saishshinde15/Clyrai_Base_Reasoning
tags:
- vortex-family
- sft
- high-quality-data
- text-generation-inference
- transformers
- qwen2
- grpo
license: apache-2.0
language:
- en
---
# Clyrai Vortex
- **Developed by:** clyrai
- **License:** apache-2.0
- **Fine-tuned from:** saishshinde15/Clyrai_Base_Reasoning
- **Part of:** Vortex Family (A collection of four fine-tuned SFT models)
## **Model Description**
Clyrai Vortex is a **highly refined reasoning model** built upon `saishshinde15/Clyrai_Base_Reasoning`, further enhanced with **high-quality, curated datasets** that the base model lacked. This model is part of the **Vortex Family**, a series of four fine-tuned models designed for advanced reasoning, knowledge synthesis, and structured response generation.
Unlike typical reinforcement learning-based improvements, **Supervised Fine-Tuning (SFT) was chosen** to ensure greater **control, stability, and alignment with human-preferred responses**, making Vortex more **reliable, interpretable, and useful** across a wide range of tasks.
## **Why Clyrai Vortex Stands Out**
- **Enhanced Knowledge & Reasoning**: Incorporates **higher-quality training data** to fill gaps in the base model, improving factual accuracy and logical reasoning.
- **Better Response Coherence**: Fine-tuned to provide **more structured, well-reasoned, and contextually relevant answers** across different domains.
- **Improved Handling of Complex Queries**: Excels in **multi-step logical deductions, research-oriented tasks, and structured decision-making**.
- **Robust Generalization**: Performs well across **scientific, technical, and analytical reasoning problems**, ensuring reliability in diverse scenarios.
## **Why Supervised Fine-Tuning (SFT) Instead of RL?**
- **Greater Control Over Model Behavior**: SFT allows fine-tuning with **directly labeled high-quality data**, ensuring model responses remain **predictable and stable**.
- **Avoids Reinforcement Learning Pitfalls**: Unlike RLHF (Reinforcement Learning with Human Feedback), which can lead to **over-optimization, reward hacking, or unintended biases**, SFT maintains a **balanced, reliable output**.
- **Ensures Logical Consistency**: RL-based training can sometimes lead to **erratic or unnatural responses** in complex reasoning tasks. SFT helps **retain logical flow and factual correctness**.
- **Preserves Efficiency**: SFT is computationally efficient and does not require the complex reward modeling and multi-stage training processes of RL.
## **Intended Use Cases**
- **Advanced Question-Answering**: Excels in **analytical, technical, and logical Q&A**, ensuring well-structured responses.
- **Research & Knowledge Synthesis**: Processes and summarizes large amounts of information with **greater precision**.
- **Problem-Solving & Deductive Reasoning**: Handles **multi-step logical deductions** effectively.
- **Code & Algorithmic Logic**: Useful for **debugging, explaining code, and structuring algorithmic solutions**.
## **Usage**
# Follow the below structure to call the model using unsloth:
```python
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "saishshinde15/Clyrai_Vortex",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit
)
FastLanguageModel.for_inference(model)
instruction = """You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."""""
messages = [
{"role": "system", "content": instruction},
{"role": "user", "content": "who made you?"}
]
# Apply chat template (without tokenization but adding a generation prompt)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Tokenize prompt properly for model input
inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True).to("cuda")
# Generate response
outputs = model.generate(**inputs, max_new_tokens=1500, num_return_sequences=1)
# Decode output correctly
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract assistant response safely
assistant_start = text.find("assistant")
if assistant_start != -1:
response = text[assistant_start + len("assistant"):].strip()
else:
response = text # Fallback: return full text if "assistant" is not found
print(response)
```
# Follow the below structure to call the model using Transformers:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load tokenizer and model
model_name = "saishshinde15/Clyrai_Vortex"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Move model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Define the system instruction
instruction = """You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."""
# Prepare input prompt using chat template
messages = [
{"role": "system", "content": instruction},
{"role": "user", "content": "Who made you?"}
]
# Format the prompt
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Tokenize input
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(device)
# Generate response with proper sampling parameters
output_ids = model.generate(
**inputs,
max_new_tokens=1500,
temperature=0.8,
top_p=0.95,
do_sample=True,
)
# Decode output correctly
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
# Extract assistant response safely
assistant_start = response.find("assistant")
if assistant_start != -1:
response = response[assistant_start + len("assistant"):].strip()
print(response)
|