System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience
We're excited to announce System Prompt Learning (SPL), a new paradigm that enables Large Language Models to learn and improve their problem-solving capabilities through experience. This approach has been implemented as an open-source plugin in optillm, showing significant performance improvements across multiple benchmarks.
The Motivation: Bridging the System Prompt Gap
If you've ever wondered why ChatGPT, Claude, and other popular AI assistants seem so capable, part of the secret lies in their sophisticated system prompts. These prompts contain elaborate problem-solving strategies, reasoning frameworks, and detailed instructions that guide the models to better performance. However, most developers and researchers work with basic or empty system prompts, missing out on these benefits entirely.
This disparity inspired us to explore Andrej Karpathy's proposed "third paradigm" for LLM learning:
- Pretraining: Learning facts and patterns from massive text corpora
- Finetuning: Learning behaviors through supervised/reinforcement learning
- System Prompt Learning: Learning explicit problem-solving strategies through experience โ NEW
What is System Prompt Learning?
System Prompt Learning represents a fundamental shift in how LLMs approach problem-solving. Instead of treating each query as an isolated challenge, SPL enables models to:
- Learn from Experience: Build a knowledge base of effective problem-solving strategies
- Classify Problems: Automatically categorize queries into specific problem types
- Apply Relevant Strategies: Select and apply the most effective strategies for each problem type
- Improve Over Time: Refine strategies based on success rates and new examples
- Maintain Transparency: Generate human-readable strategies that can be inspected and understood
Impressive Results
We evaluated SPL using gemini-2.0-flash-lite
across multiple benchmarks, with the learning phase using 400 training instances and evaluation on separate test sets:
Benchmark | Baseline | With SPL | Improvement |
---|---|---|---|
OptILLMBench | 61% | 65% | +4% |
MATH-500 | 85% | 85.6% | +0.6% |
Arena Auto Hard | 29% | 37.6% | +8.6% |
AIME24 | 23.33% | 30% | +6.67% |
The improvements are particularly notable for challenging benchmarks like Arena Auto Hard and AIME24, where strategic problem-solving approaches make the biggest difference.
How It Works
The SPL system maintains a dynamic database of problem-solving strategies that evolves over time:
1. Problem Classification
Every query is automatically classified into one of 16 problem types (arithmetic, word problems, logical reasoning, coding, etc.)
2. Strategy Management
- Creation: Generate new strategies for unfamiliar problem types
- Selection: Choose the most relevant strategies (up to 3) for inference
- Evaluation: Assess strategy effectiveness after each use
- Refinement: Improve strategies every 10 applications
- Maintenance: Merge similar strategies and prune poor performers
3. System Prompt Augmentation
Selected strategies are integrated into the system prompt, providing the model with explicit guidance on how to approach the problem.
Example Strategy
Here's a refined strategy the system learned for word problems:
**Strategy for Solving Word Problems:**
1. **Understand:**
* Read the problem carefully (multiple times)
* Identify the question (what are you trying to find?)
* List all given information (facts, numbers, units)
2. **Plan and Translate:**
* Define all variables with units
* Identify relationships between knowns and unknowns
* Write equations or expressions
* Ensure units are consistent throughout
3. **Solve:**
* Show work step-by-step
* Track units throughout calculations
* Calculate accurately
4. **Verify:**
* Check if the answer is reasonable
* State the final answer with units
After 500 training queries, our system developed:
- 129 strategies created
- 97 strategies refined
- 28 strategies merged
- 346 successful resolutions
Getting Started
SPL is implemented as a plugin in optillm, making it easy to integrate with existing workflows:
Installation
pip install optillm
Basic Usage (Inference Mode)
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="http://localhost:8000/v1" # optillm proxy
)
response = client.chat.completions.create(
model="spl-gpt-4o", # SPL prefix enables the plugin
messages=[
{"role": "user", "content": "Your challenging problem here"}
]
)
Learning Mode (Strategy Creation/Refinement)
response = client.chat.completions.create(
model="spl-gpt-4o",
messages=[
{"role": "user", "content": "Your problem here"}
],
extra_body={"spl_learning": True} # Enable learning mode
)
Combining with Other Techniques
# Combine SPL with other optillm techniques
response = client.chat.completions.create(
model="spl&memory-gpt-4o", # SPL + memory plugin
messages=[...]
)
Key Benefits
๐ง Cumulative Learning: The LLM improves on specific problem types over time
๐ Transparent Knowledge: Strategies are human-readable and provide insight into reasoning
โก Efficiency: Reuses successful approaches rather than solving each problem from scratch
๐ฏ Adaptability: Different strategies for different problem types
๐ Inspectable: Learning process and outcomes can be examined and understood
Implementation Details
The complete implementation is available in the optillm repository. Key components include:
- Strategy Database: JSON-based persistent storage
- Problem Classifier: Automatic query categorization
- Strategy Generator: LLM-powered strategy creation
- Effectiveness Evaluator: Post-completion strategy assessment
- Strategy Refiner: Continuous improvement of existing strategies
Future Implications
System Prompt Learning opens exciting possibilities for AI development:
- Domain-Specific Expertise: Models that develop specialized knowledge in particular fields
- Collaborative Learning: Sharing strategy databases across different deployments
- Human-AI Collaboration: Allowing human experts to contribute and refine strategies
- Multimodal Strategies: Extending the approach beyond text to include visual and other modalities
Try It Today
Ready to give your LLM the ability to learn from experience?
๐ GitHub Repository: https://github.com/codelion/optillm
๐ SPL Plugin: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
๐ Documentation: Complete setup and usage guide in the repository
We believe System Prompt Learning represents a fundamental step toward more intelligent, adaptive AI systems. By enabling models to learn from their experiences in a transparent, interpretable way, we're moving closer to AI that truly improves over time.
What strategies will your LLM learn? Try SPL today and find out!
System Prompt Learning is implemented in optillm, an open-source project focused on optimizing LLM inference through state-of-the-art techniques. Join our community and help shape the future of adaptive AI systems.
Tags: #MachineLearning #AI #LLM #ProblemSolving #OpenSource #InferenceOptimization