Delphermes-8B / README.md

justinj92

Update README.md

f3fcb0a verified 3 months ago

preview code

raw

history blame

5.05 kB

metadata

language:
  - en
base_model: Qwen/Qwen3-8B
library_name: transformers
pipeline_tag: text-generation
tags:
  - axolotl
  - reasoning
  - math
  - commonsense
  - primeintellect
license: apache-2.0
datasets:
  - NousResearch/Hermes-3-Dataset
  - QuixiAI/dolphin
model-index:
  - name: Delphermes-8B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag
          type: hellaswag
        metrics:
          - type: accuracy
            value: 0.88
            name: Accuracy
      - task:
          type: text-generation
          name: Mathematical Reasoning
        dataset:
          name: GSM8K
          type: gsm8k
        metrics:
          - type: accuracy
            value: 0.89
            name: Accuracy
      - task:
          type: text-generation
          name: Theory of Mind
        dataset:
          name: TheoryPlay
          type: theoryplay
        metrics:
          - type: accuracy
            value: 0.8
            name: Accuracy

Delphermes-8B

This is a merged LoRA model based on Qwen/Qwen3-8B, SFT on Hermes3 + Dolphin Dataset. The model demonstrates strong performance across reasoning, mathematical problem-solving, and commonsense understanding tasks.

Model Details

Base Model: Qwen/Qwen3-8B
Language: English (en)
Library: transformers
Training Method: LoRA fine-tuning with Axolotl
Infrastructure: 8xB200 Cluster from PrimeIntellect
Training Framework: DeepSpeed Zero2

Performance

Benchmark	Score	Description
HellaSwag	88%	Commonsense reasoning and natural language inference
GSM8K	89%	Grade school math word problems
TheoryPlay	80%	Theory of mind and social reasoning tasks

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "justinj92/Delphermes-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example usage for reasoning tasks
text = "Sarah believes that her keys are in her purse, but they are actually on the kitchen table. Where will Sarah look for her keys?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=200,
    temperature=0.1,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Chat Format

This model supports the Hermes chat format:

def format_chat(messages):
    formatted = ""
    for message in messages:
        role = message["role"]
        content = message["content"]
        if role == "system":
            formatted += f"<|im_start|>system\n{content}<|im_end|>\n"
        elif role == "user":
            formatted += f"<|im_start|>user\n{content}<|im_end|>\n"
        elif role == "assistant":
            formatted += f"<|im_start|>assistant\n{content}<|im_end|>\n"
    formatted += "<|im_start|>assistant\n"
    return formatted

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Solve this math problem: A store has 45 apples. If they sell 1/3 of them in the morning and 1/5 of the remaining apples in the afternoon, how many apples are left?"}
]

prompt = format_chat(messages)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Framework: Axolotl with DeepSpeed Zero2 optimization
Hardware: 8x NVIDIA B200 GPUs (PrimeIntellect cluster)
Base Model: Qwen/Qwen3-8B
Training Method: Low-Rank Adaptation (LoRA)
Dataset: NousResearch/Hermes-3-Dataset + QuixiAI/dolphin
Training Duration: 28 hours
Learning Rate: 0.0004
Batch Size: 8
Sequence Length: 4096

Evaluation Methodology

All evaluations were conducted using:

HellaSwag: Standard validation set with 4-way multiple choice accuracy
GSM8K: Test set with exact match accuracy on final numerical answers
TheoryPlay: Validation set with accuracy on theory of mind reasoning tasks

Limitations

The model may still struggle with very complex mathematical proofs
Performance on non-English languages may be limited
May occasionally generate inconsistent responses in edge cases
Training data cutoff affects knowledge of recent events

Ethical Considerations

This model has been trained on curated datasets and should be used responsibly. Users should:

Verify important information from the model
Be aware of potential biases in training data
Use appropriate content filtering for production applications

Citation

@misc{Delphermes-8B,
  title={Delphermes-8B: A Fine-tuned Language Model for Reasoning Tasks},
  author={[Your Name]},
  year={2025},
  url={https://huggingface.co/justinj92/Delphermes-8B}
}

License

This model is released under the Apache 2.0 license.