leonvanbokhorst's picture
Upload README.md with huggingface_hub
8037d60 verified
|
raw
history blame
4.04 kB

Friction Reasoning Model

This model is fine-tuned to engage in productive disagreement, overthinking, and reluctance. It's based on DeepSeek-R1-Distill-Qwen-7B and trained on a curated dataset of disagreement, overthinking, and reluctance examples.

Model Description

  • Model Architecture: DeepSeek-R1-Distill-Qwen-7B with LoRA adapters
  • Language(s): English
  • License: Apache 2.0
  • Finetuning Approach: Instruction tuning with friction-based reasoning examples

Training Data

The model was trained on a combination of three datasets:

  1. leonvanbokhorst/friction-disagreement-v2 (8.5% weight)
    • Examples of productive disagreement and challenging assumptions
  2. leonvanbokhorst/friction-overthinking-v2 (9.5% weight)
    • Examples of deep analytical thinking and self-reflection
  3. leonvanbokhorst/reluctance-v6.1 (82% weight)
    • Examples of hesitation and careful consideration

Training Procedure

  • Hardware: NVIDIA RTX 4090 (24GB)
  • Framework: Unsloth + PyTorch
  • Training Time: 35 minutes
  • Epochs: 7 (early convergence around epoch 4)
  • Batch Size: 2 per device (effective batch size 8 with gradient accumulation)
  • Optimization: AdamW 8-bit
  • Learning Rate: 2e-4 with cosine schedule
  • Weight Decay: 0.01
  • Gradient Clipping: 0.5
  • Mixed Precision: bfloat16

Performance Metrics

  • Training Loss: 1.437 (final)
  • Best Validation Loss: 1.527 (epoch 3.57)
  • Memory Usage: 3.813 GB for training (15.9% of GPU memory)

Intended Use

This model is designed for:

  • Engaging in productive disagreement
  • Challenging assumptions constructively
  • Providing alternative perspectives
  • Deep analytical thinking
  • Careful consideration of complex issues

Limitations

The model:

  • Is not designed for factual question-answering
  • May sometimes be overly disagreeable
  • Should not be used for medical, legal, or financial advice
  • Works best with reflective or analytical queries
  • May not perform well on objective or factual tasks

Bias and Risks

The model:

  • May exhibit biases present in the training data
  • Could potentially reinforce overthinking in certain situations
  • Might challenge user assumptions in sensitive contexts
  • Should be used with appropriate content warnings

Usage

Example usage with the Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "leonvanbokhorst/deepseek-r1-mixture-of-friction"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Format input with chat template
prompt = """<|im_start|>system
You are a human-like AI assistant.
<|im_end|>
<|im_start|>user
Why do I keep procrastinating important tasks?
<|im_end|>
<|im_start|>assistant"""

# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs["input_ids"],
    max_length=512,
    temperature=0.7,
    top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Details

LoRA Configuration

  • Rank: 16
  • Alpha: 32
  • Target Modules:
    • q_proj
    • k_proj
    • v_proj
    • o_proj
    • gate_proj
    • up_proj
    • down_proj

Dataset Processing

  • Examples stacked up to 4096 tokens
  • 90/10 train/validation split
  • Consistent seed (42) for reproducibility
  • Token-based sampling for balanced training

Citation

If you use this model in your research, please cite:

@misc{friction-reasoning-2025,
  author = {Leon van Bokhorst},
  title = {Mixture of Friction: Fine-tuned Language Model for Productive Disagreement, Overthinking, and Hesitation},
  year = {2025},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/leonvanbokhorst/deepseek-r1-mixture-of-friction}}
}

Acknowledgments

  • DeepSeek AI for the base model
  • Unsloth team for the optimization toolkit
  • HuggingFace for the model hosting and infrastructure