Qwen3-Hermes8B-v1

This is a merged LoRA model based on Qwen/Qwen3-8B, SFT on Hermes3 Dataset. The model demonstrates strong performance across reasoning, mathematical problem-solving, and commonsense understanding tasks.

Model Details

Base Model: Qwen/Qwen3-8B
Language: English (en)
Library: transformers
Training Method: LoRA fine-tuning with Axolotl
Infrastructure: 8xB200 Cluster from PrimeIntellect
Training Framework: DeepSpeed Zero2

Performance

Benchmark	Score	Description
HellaSwag	82.3%	Commonsense reasoning and natural language inference
GSM8K	87.1%	Grade school math word problems
TheoryPlay	35%	Theory of mind and social reasoning tasks

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "justinj92/Qwen3-Hermes8B-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example usage for reasoning tasks
text = "Sarah believes that her keys are in her purse, but they are actually on the kitchen table. Where will Sarah look for her keys?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=200,
    temperature=0.1,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Chat Format

This model supports the Hermes chat format:

def format_chat(messages):
    formatted = ""
    for message in messages:
        role = message["role"]
        content = message["content"]
        if role == "system":
            formatted += f"<|im_start|>system\n{content}<|im_end|>\n"
        elif role == "user":
            formatted += f"<|im_start|>user\n{content}<|im_end|>\n"
        elif role == "assistant":
            formatted += f"<|im_start|>assistant\n{content}<|im_end|>\n"
    formatted += "<|im_start|>assistant\n"
    return formatted

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Solve this math problem: A store has 45 apples. If they sell 1/3 of them in the morning and 1/5 of the remaining apples in the afternoon, how many apples are left?"}
]

prompt = format_chat(messages)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Framework: Axolotl with DeepSpeed Zero2 optimization
Hardware: 8x NVIDIA B200 GPUs (PrimeIntellect cluster)
Base Model: Qwen/Qwen3-8B
Training Method: Low-Rank Adaptation (LoRA)
Dataset: NousResearch/Hermes-3-Dataset
Training Duration: 6 hours
Learning Rate: 0.0004
Batch Size: 8
Sequence Length: 4096

Evaluation Methodology

All evaluations were conducted using:

HellaSwag: Standard validation set with 4-way multiple choice accuracy
GSM8K: Test set with exact match accuracy on final numerical answers
TheoryPlay: Validation set with accuracy on theory of mind reasoning tasks

Limitations

The model may still struggle with very complex mathematical proofs
Performance on non-English languages may be limited
May occasionally generate inconsistent responses in edge cases
Training data cutoff affects knowledge of recent events

Ethical Considerations

This model has been trained on curated datasets and should be used responsibly. Users should:

Verify important information from the model
Be aware of potential biases in training data
Use appropriate content filtering for production applications

Citation

@misc{qwen3-hermes8b-v1,
  title={Qwen3-Hermes8B-v1: A Fine-tuned Language Model for Reasoning Tasks},
  author={[Your Name]},
  year={2025},
  url={https://huggingface.co/justinj92/Qwen3-Hermes8B-v1}
}

License

This model is released under the Apache 2.0 license.

justinj92
/

Qwen3-Hermes8B-v1