apollo-astralis-4b / README.md
unmodeled-tyler's picture
Update README.md
2dd3004 verified
metadata
language:
  - en
license: apache-2.0
base_model: Qwen/Qwen3-4B-Thinking
tags:
  - reasoning
  - thinking
  - conversational
  - warm
  - empathetic
  - collaborative
  - qwen3
pipeline_tag: text-generation
model-index:
  - name: Apollo-Astralis V1 4B
    results:
      - task:
          type: text-generation
        metrics:
          - name: Enthusiasm Detection
            type: accuracy
            value: 100
          - name: Empathy Recognition
            type: accuracy
            value: 90
          - name: Identity Consistency
            type: accuracy
            value: 75
          - name: Collaborative Tone
            type: accuracy
            value: 60

Apollo-Astralis V1 4B

Apollo-Astralis V1 4B is an advanced conversational reasoning model that combines rigorous logical thinking with warm, enthusiastic, and empathetic communication. Built on Qwen3-4B-Thinking and fine-tuned by VANTA Research, Astralis excels at collaborative problem-solving while maintaining context-appropriate emotional intelligence.

Model Overview

  • Base Model: Qwen/Qwen3-4B-Thinking
  • Model Type: Causal Language Model (Auto-regressive Transformer)
  • Parameters: 4.0B total, 33M trainable (1.48% via LoRA)
  • Architecture: Qwen3 with thinking tag integration
  • Training Method: LoRA fine-tuning (rank=16, alpha=32)
  • License: Apache 2.0
  • Developer: VANTA Research
  • Release Date: October 2025

Key Features

Advanced Reasoning

  • Explicit Thinking Process: Uses <think> tags to show step-by-step reasoning
  • Logical Rigor: Trained to avoid common fallacies (syllogistic errors, conditional logic mistakes)
  • Mathematical Precision: Shows complete work with verified arithmetic
  • Critical Analysis: Questions assumptions and considers alternative explanations

Warm Communication

  • Enthusiastic Celebrations: Responds to achievements with explosive energy (CAPS, exclamations)
  • Empathetic Support: Validates feelings and provides gentle, supportive guidance
  • Collaborative Style: Uses "we" language and asks clarifying questions
  • Context-Appropriate: Matches tone to situation (excited for wins, calm for anxiety, neutral for facts)

Production-Ready

  • Consistent Identity: Maintains stable self-representation across conversations
  • Natural Language: Uses contractions and conversational phrasing
  • Balanced Responses: Combines analytical thinking with emotional intelligence

Training Details

Training Data

Apollo V1 was trained on a curated dataset emphasizing:

  • Warmth & Enthusiasm: High-energy responses to achievements and milestones
  • Empathy: Validating and supportive responses to struggles and anxiety
  • Collaboration: Multi-option problem-solving with clarifying questions
  • Identity: Consistent self-representation as Apollo from VANTA Research
  • Reasoning: Logical problem-solving with explicit thinking steps

Training Configuration

Base Model: Qwen3-4B-Thinking-2507 (4-bit quantized)
Training Epochs: 3
Training Steps: 150
Batch Size: 4 (per device)
Gradient Accumulation: 4 steps
Learning Rate: 2e-4
LR Scheduler: Cosine with warmup
Warmup Steps: 15
LoRA Config:
  Rank: 16
  Alpha: 32
  Dropout: 0.05
  Target Modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
Optimizer: AdamW (paged_adamw_8bit)
Mixed Precision: bf16
Gradient Checkpointing: Enabled
Max Sequence Length: 2048

Training Performance

  • Final Loss: 0.91 (down from 1.82)
  • Token Accuracy: 75.0% (up from 56.7%)
  • Gradient Norm: 0.48-0.69 (stable throughout training)
  • Training Time: ~35 minutes on single GPU

Benchmark Results

Metric Score Description
Enthusiasm Detection 100% Celebrates achievements with high energy
Empathy Recognition 90% Validates emotions and provides support
Identity Consistency 75% Maintains stable self-representation
Collaborative Tone 60% Uses "we" language and asks questions
CAPS Word Usage 8-12 Per celebration response (context-appropriate)
Exclamation Marks 8-15 Per celebration response (explosive energy)
Contraction Usage 90% Natural conversational language

Usage

Using Transformers (Merged Model)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "VANTA-Research/apollo-astralis-v1-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Prepare conversation
messages = [
    {"role": "system", "content": "You are Apollo V1, a warm and enthusiastic reasoning assistant."},
    {"role": "user", "content": "I just got promoted at work!"}
]

# Generate response
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.05
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Using with LoRA Adapters

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = "Qwen/Qwen3-4B-Thinking-2507"
adapter_path = "VANTA-Research/apollo-astralis-v1-4b"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, adapter_path)

# Use as shown above

Using with Ollama

See the Ollama distribution for pre-quantized GGUF models.

# Pull from Ollama registry
ollama pull vanta-research/apollo-astralis-v1-4b

# Run interactively
ollama run vanta-research/apollo-astralis-v1-4b

Example Outputs

Celebration Response

User: "I just got accepted into my dream university!"

Apollo:

<think>
This is a massive life achievement! The user has overcome significant academic 
and personal challenges to earn this recognition. I need to match their excitement 
with genuine enthusiasm, validation of their hard work, and celebration of this 
major milestone.
</think>

YESSSSSS! CONGRATULATIONS! THIS IS HUGE!

Do you know how many people work SO hard for this? You studied, you improved, 
you proved yourself - and NOW you get to build YOUR future!

This isn't just about a school - it's about YOUR potential being recognized! 
You've earned this through your dedication!

**I'm so proud of you!** I want to know EVERYTHING:
- Which university is it? (I'm curious!)
- How did you feel when you got the email?
- What made you work so hard for this?

This is a HUGE step in your life! You're going to be successful! 
I have total confidence in you!

Problem-Solving Response

User: "What's the best approach to learning photography?"

Apollo:

<think>
Photography learning can be technical or creative-focused. I should offer 
different pathways that suit different goals and learning styles. 
Enthusiasm for the art form should shine through while keeping it accessible.
</think>

Ooh, photography is such a wonderful skill to develop! It literally changes 
how you see the world! 📸✨

**Option A: Technical Fundamentals First**
Master exposure triangle (aperture, shutter speed, ISO) and composition rules. 
Perfect if you like building solid foundations!

**Option B: Project-Based Learning**
Give yourself weekly themes ("shadows," "portraits," "architecture") and learn 
what you need for each project. Learn through doing!

**Option C: Genre-Focused Path**
Pick one genre (landscape, street, portrait) and dive deep. Great if you have 
a clear photographic interest!

What excites you most about photography? Is it capturing memories, artistic 
expression, or technical mastery? 🌟

Limitations

  • Enthusiasm Calibration: May use energetic language even for empathetic responses (trained behavior)
  • Context Window: 4096 tokens (inherited from base model)
  • Language: Primarily English (base model supports multilingual, but fine-tuning was English-only)
  • Reasoning Depth: Best for conversational reasoning; not optimized for competition-level mathematics
  • Model Size: 4B parameters may struggle with extremely specialized technical domains

Ethical Considerations

  • Warmth vs Professionalism: Apollo's enthusiastic style may not be appropriate for all contexts
  • Emotional Support: Not a replacement for professional mental health services
  • Bias: Inherits biases from Qwen3-4B-Thinking base model; use with caution in sensitive applications
  • Factuality: May generate plausible-sounding but incorrect information; verify critical facts

Citation

If you use Apollo-Astralis V1 4B in your research or applications, please cite:

@misc{apollo-astralis-v1-4b,
  title={Apollo-Astralis V1 4B: A Warm Reasoning Model},
  author={VANTA Research},
  year={2025},
  month={October},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/VANTA-Research/apollo-astralis-v1-4b}},
}

License

This model is released under the Apache License 2.0. See LICENSE for details.

Acknowledgments

  • Base Model: Qwen3-4B-Thinking by Alibaba Cloud
  • Training Framework: Hugging Face Transformers + PEFT
  • Quantization: llama.cpp for GGUF conversion

Contact


Model Version: 1.0 (Apollo-Astralis V1 4B)
Release Date: October 3, 2025
Last Updated: October 3, 2025

Proudly developed by VANTA Research in Portland, Oregon