Sorachio 1B - Conversational AI Model
Overview
Sorachio is a fine-tuned conversational AI model based on the Gemma3 architecture, optimized for roleplay and conversational tasks. The model has been trained using Supervised Fine-Tuning (SFT) with QLoRA (Quantized Low-Rank Adaptation) techniques to achieve efficient parameter updates while maintaining high performance.
Dataset
The model was trained on a custom curated dataset focusing on conversational and roleplay scenarios:
- Primary Dataset: IzzulGod/roleplay-conversation
- Additional Data: Custom conversational data for enhanced dialogue capabilities
Quick Start
Installation
pip install transformers torch accelerate
Inference Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "IzzulGod/sorachio-1b-8192-2e-4-it-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16,
attn_implementation="eager"
).eval()
messages = [
{"role": "user", "content": "Apa itu Machine Learning?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids=input_ids,
attention_mask=(input_ids != tokenizer.pad_token_id).long(),
max_new_tokens=512,
do_sample=True,
top_p=0.95,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id
)
output_text = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print("Sorachio:", output_text.strip())
Model Downloads
For efficient inference on various hardware configurations:
Training Details
Technical Specifications
- Architecture: Transformer-based language model
- Fine-tuning Method: Supervised Fine-Tuning (SFT) + Quantized Low-Rank Adaptation (QLoRA)
- LoRA Rank (
r
): 8 - LoRA Alpha: 16
- LoRA Dropout: 0.05
- Target Modules:
- Attention:
q_proj
,k_proj
,v_proj
,o_proj
- MLP:
gate_proj
,up_proj
,down_proj
- Attention:
- Quantization: 4-bit (NF4 via bitsandbytes)
- Optimizer: AdamW 8-bit
- Learning Rate: 2e-4 with cosine scheduling
- Batch Size: 2 (per device) × 4 (gradient accumulation) = 8 effective
- Training Epochs: 3
- Precision: FP16 for efficient training
Training Results
The model achieved consistent loss reduction during training:
[192/192 09:45, Epoch 3/3]
Step Training Loss
20 4.942400
40 2.978700
60 2.624300
80 2.247000
100 2.157000
120 2.083200
140 2.010100
160 1.916800
180 1.848900
Final Training Loss: 2.489
Training Runtime: 594.31 seconds
Training Samples/Second: 2.569
Use Cases
- Conversational AI: General purpose chatbot applications
- Roleplay: Interactive storytelling and character-based conversations
- Indonesian Language Tasks: Optimized for Indonesian language understanding
- Educational Applications: Q&A systems and tutoring applications
Limitations
- Performance may vary for highly specialized technical domains
- As a 1B parameter model, it may have limitations compared to larger models
- Responses should be validated for factual accuracy in critical applications
License
This model is released under the Apache 2.0 License. Please refer to the license file for more details.
Note: This entire project — from dataset preprocessing to fine-tuning and evaluation — was successfully completed using a free Google Colab T4 GPU environment, demonstrating its feasibility even on limited computing resources.
- Downloads last month
- 38