leonvanbokhorst
/

deepseek-r1-mixture-of-friction

+# Friction Reasoning Model
+This model is fine-tuned to engage in productive disagreement, overthinking, and reluctance. It's based on DeepSeek-R1-Distill-Qwen-7B and trained on a curated dataset of disagreement, overthinking, and reluctance examples.
+## Model Description
+- **Model Architecture**: DeepSeek-R1-Distill-Qwen-7B with LoRA adapters
+- **Language(s)**: English
+- **License**: Apache 2.0
+- **Finetuning Approach**: Instruction tuning with friction-based reasoning examples
+### Training Data
+The model was trained on a combination of three datasets:
+1. `leonvanbokhorst/friction-disagreement-v2` (8.5% weight)
+   - Examples of productive disagreement and challenging assumptions
+2. `leonvanbokhorst/friction-overthinking-v2` (9.5% weight)
+   - Examples of deep analytical thinking and self-reflection
+3. `leonvanbokhorst/reluctance-v6.1` (82% weight)
+   - Examples of hesitation and careful consideration
+### Training Procedure
+- **Hardware**: NVIDIA RTX 4090 (24GB)
+- **Framework**: Unsloth + PyTorch
+- **Training Time**: 35 minutes
+- **Epochs**: 7 (early convergence around epoch 4)
+- **Batch Size**: 2 per device (effective batch size 8 with gradient accumulation)
+- **Optimization**: AdamW 8-bit
+- **Learning Rate**: 2e-4 with cosine schedule
+- **Weight Decay**: 0.01
+- **Gradient Clipping**: 0.5
+- **Mixed Precision**: bfloat16
+### Performance Metrics
+- **Training Loss**: 1.437 (final)
+- **Best Validation Loss**: 1.527 (epoch 3.57)
+- **Memory Usage**: 3.813 GB for training (15.9% of GPU memory)
+## Intended Use
+This model is designed for:
+- Engaging in productive disagreement
+- Challenging assumptions constructively
+- Providing alternative perspectives
+- Deep analytical thinking
+- Careful consideration of complex issues
+### Limitations
+The model:
+- Is not designed for factual question-answering
+- May sometimes be overly disagreeable
+- Should not be used for medical, legal, or financial advice
+- Works best with reflective or analytical queries
+- May not perform well on objective or factual tasks
+### Bias and Risks
+The model:
+- May exhibit biases present in the training data
+- Could potentially reinforce overthinking in certain situations
+- Might challenge user assumptions in sensitive contexts
+- Should be used with appropriate content warnings
+## Usage
+Example usage with the Transformers library:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load model and tokenizer
+model_name = "leonvanbokhorst/deepseek-r1-mixture-of-friction"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+# Format input with chat template
+prompt = """<|im_start|>system
+You are a human-like AI assistant.
+<|im_end|>
+<|im_start|>user
+Why do I keep procrastinating important tasks?
+<|im_end|>
+<|im_start|>assistant"""
+# Generate response
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    inputs["input_ids"],
+    max_length=512,
+    temperature=0.7,
+    top_p=0.9
+)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+## Training Details
+### LoRA Configuration
+- **Rank**: 16
+- **Alpha**: 32
+- **Target Modules**:
+  - q_proj
+  - k_proj
+  - v_proj
+  - o_proj
+  - gate_proj
+  - up_proj
+  - down_proj
+### Dataset Processing
+- Examples stacked up to 4096 tokens
+- 90/10 train/validation split
+- Consistent seed (42) for reproducibility
+- Token-based sampling for balanced training
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{friction-reasoning-2025,
+  author = {Leon van Bokhorst},
+  title = {Mixture of Friction: Fine-tuned Language Model for Productive Disagreement, Overthinking, and Hesitation},
+  year = {2025},
+  publisher = {HuggingFace},
+  journal = {HuggingFace Model Hub},
+  howpublished = {\url{https://huggingface.co/leonvanbokhorst/deepseek-r1-mixture-of-friction}}
+}
+```
+## Acknowledgments
+- DeepSeek AI for the base model
+- Unsloth team for the optimization toolkit
+- HuggingFace for the model hosting and infrastructure