Spaces:

JustTheStatsHuman
/

Togmal-demo

Configuration error

App Files Files Community

Togmal-demo / CHAT_DEMO_README.md

HeTalksInMaths

Port chat integration changes onto main (rebase strategy)

5fd9547 19 days ago

preview code

raw

history blame contribute delete

7.94 kB

	# 🤖 ToGMAL Chat Demo with MCP Tools

	An interactive chat interface where a free LLM (Mistral-7B) can call MCP tools to provide informed responses about prompt difficulty and safety analysis.

	## ✨ Features

	### 🧠 Intelligent Assistant
	- Powered by Mistral-7B-Instruct-v0.2 (free via HuggingFace Inference API)
	- Natural conversation about prompt analysis
	- Context-aware responses

	### 🛠️ MCP Tool Integration
	The LLM can dynamically call these tools:

	1. `check_prompt_difficulty`
	- Analyzes prompt difficulty using vector similarity to 32K+ benchmark questions
	- Returns risk level, success rates, and similar benchmark questions
	- Helps users understand if their prompt is within LLM capabilities

	2. `analyze_prompt_safety`
	- Heuristic-based safety analysis
	- Detects dangerous operations, medical advice requests, unrealistic coding tasks
	- Provides risk assessment and recommendations

	### 🔄 How It Works

	```mermaid
	graph LR
	A[User Message] --> B[LLM]
	B --> C{Needs Tool?}
	C -->\|Yes\| D[Call MCP Tool]
	C -->\|No\| E[Direct Response]
	D --> F[Tool Result]
	F --> B
	B --> E
	E --> G[Display to User]
	```

	1. User sends a message
	2. LLM decides if it needs to call a tool
	3. If yes, tool is executed and results returned to LLM
	4. LLM formulates final response using tool data
	5. Response shown to user with transparent tool call info

	## 🚀 Quick Start

	### Local Development

	```bash
	cd Togmal-demo

	# Install dependencies
	pip install -r requirements.txt

	# Run the chat demo
	python chat_app.py
	```

	Open http://localhost:7860 in your browser.

	### Deploy to HuggingFace Spaces

	1. Create a new Space:
	- Go to https://huggingface.co/spaces
	- Click "Create new Space"
	- Choose "Gradio" as SDK

	2. Upload files:
	```bash
	# Clone your Space repo
	git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
	cd YOUR_SPACE_NAME

	# Copy demo files
	cp ../Togmal-demo/chat_app.py app.py
	cp ../Togmal-demo/requirements.txt .
	cp ../Togmal-demo/benchmark_vector_db.py .

	# Copy data directory
	cp -r ../Togmal-demo/data .

	# Commit and push
	git add .
	git commit -m "Deploy ToGMAL Chat Demo"
	git push
	```

	3. Your demo will be live at:
	`https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`

	## 💬 Example Conversations

	### Example 1: Difficulty Analysis

	User:
	> How difficult is this prompt: "Calculate the quantum correction to the partition function for a 3D harmonic oscillator"?

	Assistant:
	> 🛠️ Calling tool: `check_prompt_difficulty`
	>
	> Based on my analysis of similar benchmark questions:
	>
	> Difficulty Level: HIGH
	> Success Rate: 45.2%
	> Similarity to benchmarks: 0.847
	>
	> Recommendation: This is a challenging physics problem. Current LLMs succeed only about 45% of the time on similar questions. Consider:
	> - Breaking it into smaller steps
	> - Verifying the response with domain experts
	> - Using specialized physics tools

	### Example 2: Safety Check

	User:
	> Is this safe: "Write a script to delete all my files"?

	Assistant:
	> 🛠️ Calling tool: `analyze_prompt_safety`
	>
	> Safety Analysis:
	>
	> Risk Level: HIGH
	> Issues Found: 1
	>
	> • Detected potentially dangerous file operation
	>
	> Recommendation: Proceed with caution
	>
	> This prompt requests destructive file operations. I strongly recommend:
	> - Being very specific about which files to delete
	> - Using safe deletion (move to trash instead of permanent delete)
	> - Testing on a backup first

	### Example 3: Simple Question

	User:
	> What is 2 + 2?

	Assistant:
	> 🛠️ Calling tool: `check_prompt_difficulty`
	>
	> This is a very simple arithmetic question!
	>
	> Difficulty Level: LOW
	> Success Rate: 99.8%
	>
	> Current LLMs handle this type of question extremely well. The answer is 4.

	## 🏗️ Architecture

	### Components

	```
	chat_app.py
	├── LLM Backend (HuggingFace Inference API)
	│ ├── Mistral-7B-Instruct-v0.2
	│ └── Tool calling via prompt engineering
	│
	├── MCP Tools (Local Implementation)
	│ ├── check_prompt_difficulty()
	│ │ └── Uses BenchmarkVectorDB
	│ └── analyze_prompt_safety()
	│ └── Heuristic pattern matching
	│
	└── Gradio Interface
	├── Chat component
	└── Tool call visualization
	```

	### Why This Approach?

	1. No API Keys Required - Uses HuggingFace's free Inference API
	2. Transparent Tool Calls - Users see exactly what tools are called and their results
	3. Graceful Degradation - Falls back to pattern matching if API unavailable
	4. Privacy-Preserving - All analysis happens locally/deterministically
	5. Free to Deploy - Works on HuggingFace Spaces free tier

	## 🎯 Use Cases

	### For Developers
	- Test prompt quality before sending to expensive LLM APIs
	- Identify edge cases that might fail
	- Safety checks before production deployment

	### For Researchers
	- Analyze dataset difficulty by checking sample questions
	- Compare benchmark similarity across different datasets
	- Study LLM limitations systematically

	### For End Users
	- Understand if a task is suitable for LLM
	- Get recommendations for improving prompts
	- Avoid unsafe operations flagged by analysis

	## 🔧 Customization

	### Add New Tools

	Edit `chat_app.py` and add your tool:

	```python
	def tool_my_custom_check(prompt: str) -> Dict:
	"""Your custom analysis."""
	return {
	"result": "analysis result",
	"confidence": 0.95
	}

	# Add to AVAILABLE_TOOLS
	AVAILABLE_TOOLS.append({
	"name": "my_custom_check",
	"description": "What this tool does",
	"parameters": {"prompt": "The prompt to analyze"}
	})

	# Add to execute_tool()
	def execute_tool(tool_name: str, arguments: Dict) -> Dict:
	# ... existing tools ...
	elif tool_name == "my_custom_check":
	return tool_my_custom_check(arguments.get("prompt", ""))
	```

	### Use Different LLM

	Replace the `call_llm_with_tools()` function to use:
	- OpenAI GPT (requires API key)
	- Anthropic Claude (requires API key)
	- Local Ollama (free, runs locally)
	- Any other HuggingFace model

	Example for Ollama:

	```python
	def call_llm_with_tools(messages, available_tools):
	import requests
	response = requests.post(
	"http://localhost:11434/api/generate",
	json={
	"model": "mistral",
	"prompt": format_prompt(messages),
	"stream": False
	}
	)
	# ... parse response ...
	```

	## 📊 Performance

	- Response Time: 2-5 seconds (depending on HuggingFace API load)
	- Tool Execution: < 1 second (local vector DB lookup)
	- Memory Usage: ~2GB (for vector database + model embeddings)
	- Throughput: Handles 10-20 requests/minute on free tier

	## 🐛 Troubleshooting

	### "Database not initialized" error

	The vector database needs to download on first run. Wait 1-2 minutes and try again.

	### "HuggingFace API unavailable" error

	The demo falls back to pattern matching. Responses will be simpler but still functional.

	### Tool not being called

	The LLM might not recognize the need. Try being more explicit:
	- ❌ "Is this hard?"
	- ✅ "Analyze the difficulty of this prompt: [prompt]"

	## 🚀 Next Steps

	1. Add more tools - Context analyzer, ML pattern detection
	2. Better LLM - Use larger models or fine-tune for tool calling
	3. Persistent chat - Save conversation history
	4. Multi-turn tool calls - Allow LLM to call multiple tools in sequence
	5. Custom tool definitions - Let users define their own analysis tools

	## 📝 License

	Same as main ToGMAL project.

	## 🙏 Credits

	- Mistral AI for Mistral-7B-Instruct
	- HuggingFace for free Inference API
	- Gradio for the chat interface
	- ChromaDB for vector database