LLMPromptKit: LLM Prompt Management System
LLMPromptKit is a comprehensive library for managing, versioning, testing, and evaluating prompts for Large Language Models (LLMs). It provides a structured framework to help data scientists and developers create, optimize, and maintain high-quality prompts.
Features
- Prompt Management: Create, update, and organize prompts with metadata and tags
- Version Control: Track prompt changes over time with full version history
- A/B Testing: Compare different prompt variations to find the most effective one
- Evaluation Framework: Measure prompt quality with customizable metrics
- Advanced Templating: Create dynamic prompts with variables, conditionals, and loops
- Command-line Interface: Easily integrate into your workflow
- Hugging Face Integration: Seamlessly test prompts with thousands of open-source models
Hugging Face Integration
LLMPromptKit includes a powerful integration with Hugging Face models, allowing you to:
- Test prompts with thousands of open-source models
- Run evaluations with models like FLAN-T5, GPT-2, and others
- Compare prompt performance across different model architectures
- Access specialized models for tasks like translation, summarization, and question answering
from llmpromptkit import PromptManager, PromptTesting
from llmpromptkit.integrations.huggingface import get_huggingface_callback
# Initialize components
prompt_manager = PromptManager()
testing = PromptTesting(prompt_manager)
# Get a HuggingFace callback
hf_callback = get_huggingface_callback(
model_name="google/flan-t5-base",
task="text2text-generation"
)
# Run tests with the model
test_results = await testing.run_test_cases(prompt_id="your_prompt_id", llm_callback=hf_callback)
Documentation
For detailed documentation, see the docs directory:
Installation
pip install llmpromptkit
Quick Start
from llmpromptkit import PromptManager, VersionControl, PromptTesting, Evaluator
# Initialize components
prompt_manager = PromptManager()
version_control = VersionControl(prompt_manager)
testing = PromptTesting(prompt_manager)
evaluator = Evaluator(prompt_manager)
# Create a prompt
prompt = prompt_manager.create(
content="Summarize the following text: {text}",
name="Simple Summarization",
description="A simple prompt for text summarization",
tags=["summarization", "basic"]
)
# Create a new version
version_control.commit(
prompt_id=prompt.id,
commit_message="Initial version"
)
# Update the prompt
prompt_manager.update(
prompt.id,
content="Please provide a concise summary of the following text in 2-3 sentences: {text}"
)
# Commit the updated version
version_control.commit(
prompt_id=prompt.id,
commit_message="Improved prompt with length guidance"
)
# Create a test case
test_case = testing.create_test_case(
prompt_id=prompt.id,
input_vars={"text": "Lorem ipsum dolor sit amet..."},
expected_output="This is a summary of the text."
)
# Define an LLM callback for testing
async def llm_callback(prompt, vars):
# In a real scenario, this would call an actual LLM API
return "This is a summary of the text."
# Run the test case
import asyncio
test_result = asyncio.run(testing.run_test_case(
test_case_id=test_case.id,
llm_callback=llm_callback
))
# Evaluate a prompt with multiple inputs
evaluation_result = asyncio.run(evaluator.evaluate_prompt(
prompt_id=prompt.id,
inputs=[{"text": "Sample text 1"}, {"text": "Sample text 2"}],
llm_callback=llm_callback
))
print(f"Evaluation metrics: {evaluation_result['aggregated_metrics']}")
Command-line Interface
LLMPromptKit comes with a powerful CLI for managing prompts:
# Create a prompt
llmpromptkit prompt create "Summarization" --content "Summarize: {text}" --tags "summarization,basic"
# List all prompts
llmpromptkit prompt list
# Create a new version
llmpromptkit version commit <prompt_id> --message "Updated prompt"
# Run tests
llmpromptkit test run-all <prompt_id> --llm openai
Advanced Usage
Advanced Templating
LLMPromptKit supports advanced templating with conditionals and loops:
from llmpromptkit import PromptTemplate
template = PromptTemplate("""
{system_message}
{for example in examples}
Input: {example.input}
Output: {example.output}
{endfor}
Input: {input}
Output:
""")
rendered = template.render(
system_message="You are a helpful assistant.",
examples=[
{"input": "Hello", "output": "Hi there!"},
{"input": "How are you?", "output": "I'm doing well, thanks!"}
],
input="What's the weather like?"
)
Custom Evaluation Metrics
Create custom metrics to evaluate prompt performance:
from llmpromptkit import EvaluationMetric, Evaluator
class CustomMetric(EvaluationMetric):
def __init__(self):
super().__init__("custom_metric", "My custom evaluation metric")
def compute(self, generated_output, expected_output=None, **kwargs):
# Custom logic to score the output
return score # A float between 0 and 1
# Register the custom metric
evaluator = Evaluator(prompt_manager)
evaluator.register_metric(CustomMetric())
Use Cases
Prompt Development: Iteratively develop and refine prompts with version control
Prompt Optimization: A/B test different prompt variations to find the most effective approach
Quality Assurance: Ensure prompt quality with automated testing and evaluation
Team Collaboration: Share and collaborate on prompts with a centralized management system
Production Deployment: Maintain consistent prompt quality in production applications
License
MIT License
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Author
Biswanath Roul - [GitHub](https://github.com/biswanathroul)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support