Sorachio 1B - Conversational AI Model

Overview

Sorachio is a fine-tuned conversational AI model based on the Gemma3 architecture, optimized for roleplay and conversational tasks. The model has been trained using Supervised Fine-Tuning (SFT) with QLoRA (Quantized Low-Rank Adaptation) techniques to achieve efficient parameter updates while maintaining high performance.

Dataset

The model was trained on a custom curated dataset focusing on conversational and roleplay scenarios:

Primary Dataset: IzzulGod/roleplay-conversation
Additional Data: Custom conversational data for enhanced dialogue capabilities

Quick Start

Installation

pip install transformers torch accelerate

Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "IzzulGod/sorachio-1b-8192-2e-4-it-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",               
    torch_dtype=torch.float16,       
    attn_implementation="eager"      
).eval()

messages = [
    {"role": "user", "content": "Apa itu Machine Learning?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids=input_ids,
        attention_mask=(input_ids != tokenizer.pad_token_id).long(),
        max_new_tokens=512,
        do_sample=True,
        top_p=0.95,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )

output_text = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print("Sorachio:", output_text.strip())

Model Downloads

For efficient inference on various hardware configurations:

F16 GGUF - Full precision quantized model
Q8_0 GGUF - 8-bit quantized model for memory efficiency

Training Details

Technical Specifications

Architecture: Transformer-based language model
Fine-tuning Method: Supervised Fine-Tuning (SFT) + Quantized Low-Rank Adaptation (QLoRA)
LoRA Rank (r): 8
LoRA Alpha: 16
LoRA Dropout: 0.05
Target Modules:
- Attention: q_proj, k_proj, v_proj, o_proj
- MLP: gate_proj, up_proj, down_proj
Quantization: 4-bit (NF4 via bitsandbytes)
Optimizer: AdamW 8-bit
Learning Rate: 2e-4 with cosine scheduling
Batch Size: 2 (per device) × 4 (gradient accumulation) = 8 effective
Training Epochs: 3
Precision: FP16 for efficient training

Training Results

The model achieved consistent loss reduction during training:

[192/192 09:45, Epoch 3/3]
Step	Training Loss
20	    4.942400
40	    2.978700
60	    2.624300
80	    2.247000
100	    2.157000
120	    2.083200
140	    2.010100
160	    1.916800
180	    1.848900

Final Training Loss: 2.489
Training Runtime: 594.31 seconds
Training Samples/Second: 2.569

Use Cases

Conversational AI: General purpose chatbot applications
Roleplay: Interactive storytelling and character-based conversations
Indonesian Language Tasks: Optimized for Indonesian language understanding
Educational Applications: Q&A systems and tutoring applications

Limitations

Performance may vary for highly specialized technical domains
As a 1B parameter model, it may have limitations compared to larger models
Responses should be validated for factual accuracy in critical applications

License

This model is released under the Apache 2.0 License. Please refer to the license file for more details.

Note: This entire project — from dataset preprocessing to fine-tuning and evaluation — was successfully completed using a free Google Colab T4 GPU environment, demonstrating its feasibility even on limited computing resources.

IzzulGod
/

sorachio-1b-8192-2e-4-it-v1