deepseek-r1-mixture-of-friction / README.md

Upload README.md with huggingface_hub

8037d60 verified 8 months ago

4.04 kB

	# Friction Reasoning Model

	This model is fine-tuned to engage in productive disagreement, overthinking, and reluctance. It's based on DeepSeek-R1-Distill-Qwen-7B and trained on a curated dataset of disagreement, overthinking, and reluctance examples.

	## Model Description

	- Model Architecture: DeepSeek-R1-Distill-Qwen-7B with LoRA adapters
	- Language(s): English
	- License: Apache 2.0
	- Finetuning Approach: Instruction tuning with friction-based reasoning examples

	### Training Data

	The model was trained on a combination of three datasets:
	1. `leonvanbokhorst/friction-disagreement-v2` (8.5% weight)
	- Examples of productive disagreement and challenging assumptions
	2. `leonvanbokhorst/friction-overthinking-v2` (9.5% weight)
	- Examples of deep analytical thinking and self-reflection
	3. `leonvanbokhorst/reluctance-v6.1` (82% weight)
	- Examples of hesitation and careful consideration

	### Training Procedure

	- Hardware: NVIDIA RTX 4090 (24GB)
	- Framework: Unsloth + PyTorch
	- Training Time: 35 minutes
	- Epochs: 7 (early convergence around epoch 4)
	- Batch Size: 2 per device (effective batch size 8 with gradient accumulation)
	- Optimization: AdamW 8-bit
	- Learning Rate: 2e-4 with cosine schedule
	- Weight Decay: 0.01
	- Gradient Clipping: 0.5
	- Mixed Precision: bfloat16

	### Performance Metrics

	- Training Loss: 1.437 (final)
	- Best Validation Loss: 1.527 (epoch 3.57)
	- Memory Usage: 3.813 GB for training (15.9% of GPU memory)

	## Intended Use

	This model is designed for:
	- Engaging in productive disagreement
	- Challenging assumptions constructively
	- Providing alternative perspectives
	- Deep analytical thinking
	- Careful consideration of complex issues

	### Limitations

	The model:
	- Is not designed for factual question-answering
	- May sometimes be overly disagreeable
	- Should not be used for medical, legal, or financial advice
	- Works best with reflective or analytical queries
	- May not perform well on objective or factual tasks

	### Bias and Risks

	The model:
	- May exhibit biases present in the training data
	- Could potentially reinforce overthinking in certain situations
	- Might challenge user assumptions in sensitive contexts
	- Should be used with appropriate content warnings

	## Usage

	Example usage with the Transformers library:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model and tokenizer
	model_name = "leonvanbokhorst/deepseek-r1-mixture-of-friction"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	# Format input with chat template
	prompt = """<\|im_start\|>system
	You are a human-like AI assistant.
	<\|im_end\|>
	<\|im_start\|>user
	Why do I keep procrastinating important tasks?
	<\|im_end\|>
	<\|im_start\|>assistant"""

	# Generate response
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(
	inputs["input_ids"],
	max_length=512,
	temperature=0.7,
	top_p=0.9
	)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Training Details

	### LoRA Configuration
	- Rank: 16
	- Alpha: 32
	- Target Modules:
	- q_proj
	- k_proj
	- v_proj
	- o_proj
	- gate_proj
	- up_proj
	- down_proj

	### Dataset Processing
	- Examples stacked up to 4096 tokens
	- 90/10 train/validation split
	- Consistent seed (42) for reproducibility
	- Token-based sampling for balanced training

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{friction-reasoning-2025,
	author = {Leon van Bokhorst},
	title = {Mixture of Friction: Fine-tuned Language Model for Productive Disagreement, Overthinking, and Hesitation},
	year = {2025},
	publisher = {HuggingFace},
	journal = {HuggingFace Model Hub},
	howpublished = {\url{https://huggingface.co/leonvanbokhorst/deepseek-r1-mixture-of-friction}}
	}
	```

	## Acknowledgments

	- DeepSeek AI for the base model
	- Unsloth team for the optimization toolkit
	- HuggingFace for the model hosting and infrastructure