# HuggingFace Clustering → ToGMAL Dynamic Tools Integration Strategy **Date:** October 18, 2025 **Purpose:** Define how ML clustering on safety datasets informs ToGMAL's dynamic tool exposure **Status:** Ready for Implementation --- ## Executive Summary This document outlines the strategy for using **real clustering analysis** on HuggingFace safety datasets to automatically discover limitation patterns and expose them as dynamic MCP tools in ToGMAL. ### The Core Flow: ``` [HuggingFace Datasets] → [Embedding + Clustering] → [Dangerous Cluster Discovery] ↓ [Pattern Extraction] ↓ [ToGMAL Dynamic Tool Generation] ↓ [Context-Aware Tool Exposure] ``` --- ## 1. Current State Analysis ### What You Have (Existing Implementation) #### A. Research Pipeline (`research_pipeline.py`) ✅ **Working:** Fetches 10 dataset sources ✅ **Working:** TF-IDF feature extraction ✅ **Working:** K-Means, DBSCAN clustering ✅ **Working:** Dangerous cluster identification (>70% harmful threshold) ✅ **Working:** Silhouette scoring (current: 0.25-0.26) **Current Results:** - 2-3 clusters identified - Dangerous clusters: 71-100% harmful content - Successfully differentiates harmful from benign #### B. Dynamic Tools (`togmal/context_analyzer.py`, `togmal/ml_tools.py`) ✅ **Working:** Context analyzer with keyword matching ✅ **Working:** ML tools cache (`./data/ml_discovered_tools.json`) ✅ **Working:** Domain filtering for tool recommendations ⚠️ **Missing:** Connection from clustering results to tool cache ### What Files (2-4) Propose #### C. Enhanced Dataset Fetcher (`research-datasets-fetcher.py`) 🆕 **Proposed:** Professional domain-specific datasets 🆕 **Proposed:** Real HuggingFace integration via `datasets` library 🆕 **Proposed:** Aqumen/ToGMAL data integration endpoints 🆕 **Proposed:** 10 professional domains with specific datasets #### D. Enhanced Clustering Trainer (`research-training-clustering.py`) 🆕 **Proposed:** Sentence transformers for better embeddings 🆕 **Proposed:** Cluster quality analysis (purity, pattern description) 🆕 **Proposed:** Detection rule generation from clusters 🆕 **Proposed:** Visualization and model comparison --- ## 2. The Missing Link: Clustering → Dynamic Tools ### Current Gap Your existing `research_pipeline.py` does clustering but: - ❌ Doesn't use sentence transformers (uses TF-IDF) - ❌ Doesn't export results in format for `ml_tools.py` - ❌ Doesn't generate detection rules - ❌ Doesn't map clusters to professional domains ### Proposed Solution Create a new integration layer that: 1. **Runs enhanced clustering** with sentence transformers 2. **Analyzes dangerous clusters** for patterns 3. **Generates detection heuristics** from cluster characteristics 4. **Exports to ML tools cache** in correct format 5. **Triggers ToGMAL reload** to expose new tools --- ## 3. Professional Domain Clustering Strategy ### The 10 Professional Domains Based on files (4) proposals, focus on domains where **LLMs demonstrably struggle**: | Domain | Dataset Sources | Expected Cluster Behavior | ToGMAL Tool | |--------|----------------|--------------------------|-------------| | **Mathematics** | `hendrycks/math`, `competition_math`, `gsm8k` | LIMITATIONS cluster (LLM accuracy: 42% on MATH) | `check_math_complexity` | | **Medicine** | `medqa`, `pubmedqa`, `truthful_qa` subset | LIMITATIONS cluster (LLM accuracy: 65% on MedQA) | `check_medical_advice` | | **Law** | `pile-of-law`, legal case reports | LIMITATIONS cluster (jurisdiction-specific errors) | `check_legal_boundaries` | | **Coding** | `code_x_glue_cc_defect_detection`, `humaneval`, `apps` | MIXED clusters (some code safe, some vulnerable) | `check_code_security` | | **Finance** | `financial_phrasebank`, `finqa` | LIMITATIONS cluster (regulatory compliance) | `check_financial_advice` | | **Translation** | `wmt14`, `opus-100` | HARMLESS cluster (LLM near-human performance) | (no tool needed) | | **General QA** | `squad_v2`, `natural_questions` | HARMLESS cluster (LLM accuracy: 86% on MMLU) | (no tool needed) | | **Summarization** | `cnn_dailymail`, `xsum` | HARMLESS cluster (high ROUGE scores) | (no tool needed) | | **Creative Writing** | `TinyStories`, `writing_prompts` | HARMLESS cluster (subjective, no "wrong" answer) | (no tool needed) | | **Therapy** | Mental health corpora (if available) | LIMITATIONS cluster (crisis intervention risks) | `check_therapy_boundaries` | ### Clustering Hypothesis **LIMITATIONS Cluster:** - Contains: Math, medicine, law, finance, coding bugs, therapy - Characteristics: High reasoning complexity, domain expertise required, factual correctness critical - Cluster purity: >70% harmful/failure examples - Silhouette score: Aim for >0.4 (currently 0.25) **HARMLESS Cluster:** - Contains: Translation, summarization, general QA, creative writing - Characteristics: Pattern matching, well-represented in training data, less critical if wrong - Cluster purity: >70% safe/successful examples **MIXED Cluster:** - Contains: General coding, factual QA, educational content - Needs further subdivision or context-dependent handling --- ## 4. Implementation Plan: Enhanced Clustering Pipeline ### Phase 1: Upgrade Clustering (Week 1-2) #### Step 1.1: Install Dependencies ```bash cd /Users/hetalksinmaths/togmal source .venv/bin/activate uv pip install sentence-transformers datasets scikit-learn matplotlib seaborn joblib ``` #### Step 1.2: Enhance `research_pipeline.py` **Add sentence transformers instead of TF-IDF:** ```python # Add to research_pipeline.py from sentence_transformers import SentenceTransformer class FeatureExtractor: """Use sentence transformers for semantic embeddings""" def __init__(self, model_name: str = "all-MiniLM-L6-v2"): self.model = SentenceTransformer(model_name) self.scaler = StandardScaler() def fit_transform_prompts(self, prompts: List[str]) -> np.ndarray: """Extract semantic embeddings""" embeddings = self.model.encode( prompts, batch_size=32, show_progress_bar=True, convert_to_numpy=True ) return self.scaler.fit_transform(embeddings) ``` **Why sentence transformers?** - Captures semantic similarity (not just keywords) - Better cluster separation - Expect silhouette score improvement: 0.25 → 0.4+ #### Step 1.3: Add Professional Domain Datasets **Update DatasetFetcher to use HuggingFace `datasets` library:** ```python from datasets import load_dataset async def _fetch_huggingface_real(self, config: DatasetConfig) -> List[DatasetEntry]: """Actual HuggingFace integration""" dataset = load_dataset( config.source_id, split=config.split, trust_remote_code=True ) entries = [] for item in dataset: entries.append(DatasetEntry( id="", source=config.name, type=config.cluster_category, prompt=item.get(config.text_column, ""), category=config.domains[0] if config.domains else "unknown", is_harmful=(config.cluster_category == "limitations"), metadata={"dataset": config.source_id} )) return entries ``` **Priority datasets to fetch first:** 1. **Mathematics (LIMITATIONS)** - `hendrycks/math` - 12,500 competition-level problems - Use for detecting math complexity 2. **Medicine (LIMITATIONS)** - `medqa` - Medical licensing exam questions - Use for detecting medical advice boundaries 3. **Coding (MIXED)** - `code_x_glue_cc_defect_detection` - Buggy vs clean code - Use for detecting security vulnerabilities 4. **General QA (HARMLESS)** - `squad_v2` - Reading comprehension - Use as baseline "safe" cluster ### Phase 2: Extract Patterns from Clusters (Week 3) #### Step 2.1: Add Cluster Analysis **Enhance `AnomalyClusteringModel._identify_dangerous_clusters`:** ```python def _identify_dangerous_clusters( self, cluster_labels: np.ndarray, entries: List[DatasetEntry] ) -> List[Dict[str, Any]]: """Identify dangerous clusters with pattern extraction""" dangerous_clusters = [] for cluster_id in set(cluster_labels): if cluster_id == -1: # Skip noise continue # Get cluster members mask = cluster_labels == cluster_id cluster_entries = [e for e, m in zip(entries, mask) if m] # Calculate purity harmful_count = sum(1 for e in cluster_entries if e.is_harmful) purity = harmful_count / len(cluster_entries) if purity < 0.7: # Not dangerous enough continue # Extract pattern pattern = self._extract_pattern_from_cluster(cluster_entries) dangerous_clusters.append({ "cluster_id": int(cluster_id), "size": len(cluster_entries), "purity": float(purity), "domain": pattern["domain"], "pattern_description": pattern["description"], "detection_rule": pattern["heuristic"], "examples": pattern["examples"] }) return dangerous_clusters ``` #### Step 2.2: Pattern Extraction Logic **Add pattern extraction method:** ```python def _extract_pattern_from_cluster( self, entries: List[DatasetEntry] ) -> Dict[str, Any]: """Extract actionable pattern from cluster members""" # Determine primary domain domain_counts = Counter(e.category for e in entries) primary_domain = domain_counts.most_common(1)[0][0] # Extract common keywords (for detection heuristic) all_prompts = " ".join(e.prompt for e in entries if e.prompt) words = re.findall(r'\b[a-z]{4,}\b', all_prompts.lower()) top_keywords = [w for w, c in Counter(words).most_common(10)] # Generate detection rule if primary_domain == "mathematics": heuristic = "contains_math_symbols OR complexity > threshold" elif primary_domain == "medicine": heuristic = f"contains_medical_keywords: {', '.join(top_keywords[:5])}" else: heuristic = f"keyword_match: {', '.join(top_keywords[:5])}" # Get representative examples examples = [e.prompt for e in entries[:5] if e.prompt] # Generate description description = f"{primary_domain.title()} limitation pattern (cluster purity: {purity:.1%})" return { "domain": primary_domain, "description": description, "heuristic": heuristic, "examples": examples, "keywords": top_keywords } ``` ### Phase 3: Export to ML Tools Cache (Week 3-4) #### Step 3.1: Update Pipeline to Export **Add export method to `ResearchPipeline`:** ```python def export_to_togmal_ml_tools(self, training_results: Dict[str, Any]): """Export dangerous clusters as ToGMAL dynamic tools""" patterns = [] for model_type, result in training_results.items(): for cluster in result.get("dangerous_clusters", []): pattern = { "id": f"{model_type}_{cluster['cluster_id']}", "domain": cluster["domain"], "description": cluster["pattern_description"], "confidence": cluster["purity"], "heuristic": cluster["detection_rule"], "examples": cluster["examples"], "metadata": { "cluster_size": cluster["size"], "model_type": model_type, "discovered_at": datetime.now().isoformat() } } patterns.append(pattern) # Save to ML tools cache (format expected by ml_tools.py) ml_tools_cache = { "updated_at": datetime.now().isoformat(), "patterns": patterns, "metadata": { "total_patterns": len(patterns), "domains": list(set(p["domain"] for p in patterns)) } } cache_path = Path("./data/ml_discovered_tools.json") cache_path.parent.mkdir(parents=True, exist_ok=True) with open(cache_path, 'w') as f: json.dump(ml_tools_cache, f, indent=2) print(f"✓ Exported {len(patterns)} patterns to {cache_path}") ``` #### Step 3.2: Update `togmal_mcp.py` to Use Patterns **Modify existing `togmal_list_tools_dynamic` to load ML patterns:** ```python @mcp.tool() async def togmal_list_tools_dynamic( conversation_history: Optional[List[Dict[str, str]]] = None, user_context: Optional[Dict[str, Any]] = None ) -> Dict[str, Any]: """ Returns dynamically recommended tools based on conversation context ENHANCED: Now includes ML-discovered limitation patterns """ # Existing domain detection domains = await analyze_conversation_context(conversation_history, user_context) # Load ML-discovered tools (NEW) ml_tools = await get_ml_discovered_tools( relevant_domains=domains, min_confidence=0.8 # Only high-confidence patterns ) # Combine with static tools recommended_tools = [ "togmal_analyze_prompt", "togmal_analyze_response", "togmal_submit_evidence" ] # Add domain-specific static tools if "mathematics" in domains or "physics" in domains: recommended_tools.append("togmal_check_math_complexity") if "medicine" in domains or "healthcare" in domains: recommended_tools.append("togmal_check_medical_advice") if "file_system" in domains: recommended_tools.append("togmal_check_file_operations") # Add ML-discovered tools (DYNAMIC) ml_tool_names = [tool["name"] for tool in ml_tools] recommended_tools.extend(ml_tool_names) return { "recommended_tools": recommended_tools, "detected_domains": domains, "ml_discovered_tools": ml_tools, # Full definitions "context": { "conversation_depth": len(conversation_history) if conversation_history else 0, "has_user_context": bool(user_context) } } ``` --- ## 5. Expected Improvements ### Clustering Quality **Current (TF-IDF + K-Means):** - Silhouette score: 0.25-0.26 - Clusters: 2-3 - Dangerous clusters: Identified, but low separation **Expected (Sentence Transformers + K-Means/DBSCAN):** - Silhouette score: 0.4-0.6 (✅ 60-140% improvement) - Clusters: 3-5 meaningful clusters - Dangerous clusters: Better defined with clear boundaries **Why?** - Sentence transformers capture semantic meaning - TF-IDF only captures word overlap - Example: "What's the integral of x²" vs "Solve this calculus problem" → same cluster with ST, different with TF-IDF ### Dynamic Tool Exposure **Before:** - 5 static tools always available - Manual keyword matching for domain detection **After:** - 5 static tools + N ML-discovered tools (N = # dangerous clusters) - Automatic tool exposure based on real clustering - Example: Cluster discovers "complex math word problems" → new tool `check_math_word_problem_complexity` ### Coverage of Professional Domains **Before:** - Generic "math", "medical", "file operations" - No fine-grained domain understanding **After:** - 10 professional domains with dataset-backed clustering - Sub-domain detection (e.g., "cardiology" vs "psychiatry" within medicine) - Evidence-based: Each tool backed by cluster of real failure examples --- ## 6. Integration with Aqumen (Future) ### Bidirectional Feedback Loop ``` [ToGMAL Clustering] → Discovers "law" limitation cluster ↓ [ToGMAL ML Tools] → Exposes check_legal_boundaries ↓ [Aqumen Error Catalog] ← Imports "law" failures from ToGMAL ↓ [Aqumen Assessments] → Tests users on legal reasoning ↓ [Assessment Failures] → Reported back to ToGMAL ↓ [ToGMAL Re-Clustering] → Refines "law" cluster with new data ``` **Not implementing yet** (per your request), but architecture is ready when needed. --- ## 7. Action Items (Next 2 Weeks) ### Week 1: Enhanced Clustering **Day 1-2: Setup** - [ ] Install dependencies: `sentence-transformers`, `datasets`, visualization libs - [ ] Copy `research-datasets-fetcher.py` and `research-training-clustering.py` to workspace - [ ] Integrate with existing `research_pipeline.py` **Day 3-5: Dataset Fetching** - [ ] Implement real HuggingFace dataset loading - [ ] Fetch 4 priority datasets: - `hendrycks/math` (mathematics) - `medqa` (medicine) - `code_x_glue_cc_defect_detection` (coding) - `squad_v2` (general QA as baseline) - [ ] Verify dataset cache works **Day 6-7: Clustering with Sentence Transformers** - [ ] Replace TF-IDF with sentence transformers in `FeatureExtractor` - [ ] Run clustering on fetched datasets - [ ] Verify silhouette score improvement (target: >0.4) ### Week 2: Pattern Extraction & Tool Generation **Day 8-10: Pattern Extraction** - [ ] Implement `_extract_pattern_from_cluster` method - [ ] Generate detection heuristics from clusters - [ ] Visualize clusters (PCA 2D projection) **Day 11-12: Export to ML Tools** - [ ] Implement `export_to_togmal_ml_tools` in pipeline - [ ] Run full pipeline and generate `ml_discovered_tools.json` - [ ] Verify format matches what `ml_tools.py` expects **Day 13-14: Testing & Validation** - [ ] Test `togmal_list_tools_dynamic` with ML tools - [ ] Verify context analyzer correctly triggers ML tools - [ ] Run end-to-end test: conversation → domain detection → ML tool exposure --- ## 8. Success Metrics ### Technical Metrics | Metric | Current | Target | How to Measure | |--------|---------|--------|----------------| | Silhouette Score | 0.25-0.26 | >0.4 | sklearn.metrics.silhouette_score | | Dangerous Cluster Purity | 71-100% | >80% | % harmful in cluster | | # Detected Domains | 0 (manual) | 5-10 | Count from clustering | | ML Tools Generated | 0 | 5-10 | Count in ml_discovered_tools.json | | Tool Precision | N/A | >85% | Manual review of triggered tools | ### Functional Metrics - [ ] Can differentiate "math limitations" from "general QA" clusters - [ ] Can automatically expose `check_math_complexity` when conversation contains math - [ ] Can generate heuristic rules that are interpretable (not just "cluster 3") - [ ] Visualization shows clear cluster separation --- ## 9. Risks & Mitigations | Risk | Impact | Mitigation | |------|--------|------------| | **Sentence transformer slower than TF-IDF** | High | Cache embeddings, use batch processing | | **Silhouette score doesn't improve** | High | Try different embedding models (mpnet, distilbert) | | **HuggingFace datasets too large** | Medium | Sample datasets (max 5000 entries each) | | **Clusters don't align with domains** | High | Add domain labels to training data, use semi-supervised clustering | | **ML tools not useful in practice** | Medium | Start with high confidence threshold (0.8+), iterate | --- ## 10. File Structure After Implementation ``` /Users/hetalksinmaths/togmal/ ├── research_pipeline.py (ENHANCED) │ ├── FeatureExtractor with sentence transformers ✅ │ ├── Pattern extraction from clusters ✅ │ ├── Export to ML tools cache ✅ │ ├── togmal/ │ ├── context_analyzer.py (EXISTING - works as-is) │ ├── ml_tools.py (EXISTING - works as-is) │ └── config.py (EXISTING) │ ├── data/ │ ├── datasets/ (NEW) │ │ ├── combined_dataset.csv │ │ └── [domain]_[dataset].csv │ │ │ ├── cache/ (EXISTING) │ │ └── [source].json │ │ │ └── ml_discovered_tools.json (GENERATED by pipeline) │ ├── models/ (NEW) │ ├── clustering/ │ │ ├── kmeans_model.pkl │ │ ├── embeddings_cache.npy │ │ └── training_results.json │ └── visualization/ │ └── clusters_2d.png │ └── CLUSTERING_TO_DYNAMIC_TOOLS_STRATEGY.md (THIS FILE) ``` --- ## 11. Next Steps After This Implementation ### Phase 4: Aqumen Integration (When Ready) 1. Export ToGMAL clustering results to Aqumen error catalogs 2. Import Aqumen assessment failures back into ToGMAL 3. Re-train clustering with combined data ### Phase 5: Continuous Improvement 1. Weekly automated re-training on new data 2. A/B testing of ML tools vs static tools 3. User feedback loop to improve heuristics ### Phase 6: Grant Preparation 1. Publish clustering results as research artifact 2. Use improved metrics (silhouette 0.4+) in grant proposal 3. Demonstrate concrete improvements over baseline --- ## Conclusion **What This Gets You:** 1. ✅ **Real clustering** on professional domain datasets 2. ✅ **Better separation** between limitations and harmless clusters 3. ✅ **Automatic tool generation** from clustering results 4. ✅ **Evidence-backed** limitation detection (not just heuristics) 5. ✅ **Scalable architecture** ready for Aqumen integration **What This Doesn't Do (Yet):** - ❌ Aqumen bidirectional integration (Phase 4) - ❌ Production deployment (focus on research validation) - ❌ Comprehensive grant proposal (focus on technical foundation) **Recommended Focus:** Start with **Week 1-2 action items** to prove the clustering approach works, then decide on Aqumen integration vs grant preparation. --- **Ready to proceed?** Let me know if you want me to: 1. Start implementing the enhanced clustering pipeline 2. Create a test harness for validating clusters 3. Build the export-to-ML-tools integration 4. Something else?