--- language: - en base_model: Qwen/Qwen3-8B library_name: transformers pipeline_tag: text-generation tags: - axolotl - reasoning - math - commonsense license: apache-2.0 datasets: - NousResearch/Hermes-3-Dataset model-index: - name: Qwen3-Hermes8B-v1 results: - task: type: text-generation name: Text Generation dataset: name: HellaSwag type: hellaswag metrics: - type: accuracy value: 0.823 name: Accuracy - task: type: text-generation name: Mathematical Reasoning dataset: name: GSM8K type: gsm8k metrics: - type: accuracy value: 0.871 name: Accuracy - task: type: text-generation name: Theory of Mind dataset: name: TheoryPlay type: theoryplay metrics: - type: accuracy value: 0.35 name: Accuracy --- # Qwen3-Hermes8B-v1 This is a merged LoRA model based on Qwen/Qwen3-8B, SFT on Hermes3 Dataset. The model demonstrates strong performance across reasoning, mathematical problem-solving, and commonsense understanding tasks. ## Model Details - **Base Model**: Qwen/Qwen3-8B - **Language**: English (en) - **Library**: transformers - **Training Method**: LoRA fine-tuning with Axolotl - **Infrastructure**: 8xB200 Cluster from PrimeIntellect - **Training Framework**: DeepSpeed Zero2 ## Performance | Benchmark | Score | Description | |-----------|-------|-------------| | **HellaSwag** | 82.3% | Commonsense reasoning and natural language inference | | **GSM8K** | 87.1% | Grade school math word problems | | **TheoryPlay** | 35% | Theory of mind and social reasoning tasks | ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_name = "justinj92/Qwen3-Hermes8B-v1" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto" ) # Example usage for reasoning tasks text = "Sarah believes that her keys are in her purse, but they are actually on the kitchen table. Where will Sarah look for her keys?" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate( **inputs, max_length=200, temperature=0.1, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Chat Format This model supports the Hermes chat format: ```python def format_chat(messages): formatted = "" for message in messages: role = message["role"] content = message["content"] if role == "system": formatted += f"<|im_start|>system\n{content}<|im_end|>\n" elif role == "user": formatted += f"<|im_start|>user\n{content}<|im_end|>\n" elif role == "assistant": formatted += f"<|im_start|>assistant\n{content}<|im_end|>\n" formatted += "<|im_start|>assistant\n" return formatted messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Solve this math problem: A store has 45 apples. If they sell 1/3 of them in the morning and 1/5 of the remaining apples in the afternoon, how many apples are left?"} ] prompt = format_chat(messages) inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=300, temperature=0.1) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Training Details - **Training Framework**: Axolotl with DeepSpeed Zero2 optimization - **Hardware**: 8x NVIDIA B200 GPUs (PrimeIntellect cluster) - **Base Model**: Qwen/Qwen3-8B - **Training Method**: Low-Rank Adaptation (LoRA) - **Dataset**: NousResearch/Hermes-3-Dataset - **Training Duration**: 6 hours - **Learning Rate**: 0.0004 - **Batch Size**: 8 - **Sequence Length**: 4096 ## Evaluation Methodology All evaluations were conducted using: - **HellaSwag**: Standard validation set with 4-way multiple choice accuracy - **GSM8K**: Test set with exact match accuracy on final numerical answers - **TheoryPlay**: Validation set with accuracy on theory of mind reasoning tasks ## Limitations - The model may still struggle with very complex mathematical proofs - Performance on non-English languages may be limited - May occasionally generate inconsistent responses in edge cases - Training data cutoff affects knowledge of recent events ## Ethical Considerations This model has been trained on curated datasets and should be used responsibly. Users should: - Verify important information from the model - Be aware of potential biases in training data - Use appropriate content filtering for production applications ## Citation ```bibtex @misc{qwen3-hermes8b-v1, title={Qwen3-Hermes8B-v1: A Fine-tuned Language Model for Reasoning Tasks}, author={[Your Name]}, year={2025}, url={https://huggingface.co/justinj92/Qwen3-Hermes8B-v1} } ``` ## License This model is released under the Apache 2.0 license.