Togmal-demo / CLUSTERING_TO_DYNAMIC_TOOLS_STRATEGY.md
HeTalksInMaths
Initial commit: ToGMAL Prompt Difficulty Analyzer with real MMLU data
f9b1ad5
|
raw
history blame
21.9 kB

HuggingFace Clustering β†’ ToGMAL Dynamic Tools Integration Strategy

Date: October 18, 2025
Purpose: Define how ML clustering on safety datasets informs ToGMAL's dynamic tool exposure
Status: Ready for Implementation


Executive Summary

This document outlines the strategy for using real clustering analysis on HuggingFace safety datasets to automatically discover limitation patterns and expose them as dynamic MCP tools in ToGMAL.

The Core Flow:

[HuggingFace Datasets] β†’ [Embedding + Clustering] β†’ [Dangerous Cluster Discovery]
                                                            ↓
                                                    [Pattern Extraction]
                                                            ↓
                                              [ToGMAL Dynamic Tool Generation]
                                                            ↓
                                                [Context-Aware Tool Exposure]

1. Current State Analysis

What You Have (Existing Implementation)

A. Research Pipeline (research_pipeline.py)

βœ… Working: Fetches 10 dataset sources
βœ… Working: TF-IDF feature extraction
βœ… Working: K-Means, DBSCAN clustering
βœ… Working: Dangerous cluster identification (>70% harmful threshold)
βœ… Working: Silhouette scoring (current: 0.25-0.26)

Current Results:

  • 2-3 clusters identified
  • Dangerous clusters: 71-100% harmful content
  • Successfully differentiates harmful from benign

B. Dynamic Tools (togmal/context_analyzer.py, togmal/ml_tools.py)

βœ… Working: Context analyzer with keyword matching
βœ… Working: ML tools cache (./data/ml_discovered_tools.json)
βœ… Working: Domain filtering for tool recommendations
⚠️ Missing: Connection from clustering results to tool cache

What Files (2-4) Propose

C. Enhanced Dataset Fetcher (research-datasets-fetcher.py)

πŸ†• Proposed: Professional domain-specific datasets
πŸ†• Proposed: Real HuggingFace integration via datasets library
πŸ†• Proposed: Aqumen/ToGMAL data integration endpoints
πŸ†• Proposed: 10 professional domains with specific datasets

D. Enhanced Clustering Trainer (research-training-clustering.py)

πŸ†• Proposed: Sentence transformers for better embeddings
πŸ†• Proposed: Cluster quality analysis (purity, pattern description)
πŸ†• Proposed: Detection rule generation from clusters
πŸ†• Proposed: Visualization and model comparison


2. The Missing Link: Clustering β†’ Dynamic Tools

Current Gap

Your existing research_pipeline.py does clustering but:

  • ❌ Doesn't use sentence transformers (uses TF-IDF)
  • ❌ Doesn't export results in format for ml_tools.py
  • ❌ Doesn't generate detection rules
  • ❌ Doesn't map clusters to professional domains

Proposed Solution

Create a new integration layer that:

  1. Runs enhanced clustering with sentence transformers
  2. Analyzes dangerous clusters for patterns
  3. Generates detection heuristics from cluster characteristics
  4. Exports to ML tools cache in correct format
  5. Triggers ToGMAL reload to expose new tools

3. Professional Domain Clustering Strategy

The 10 Professional Domains

Based on files (4) proposals, focus on domains where LLMs demonstrably struggle:

Domain Dataset Sources Expected Cluster Behavior ToGMAL Tool
Mathematics hendrycks/math, competition_math, gsm8k LIMITATIONS cluster (LLM accuracy: 42% on MATH) check_math_complexity
Medicine medqa, pubmedqa, truthful_qa subset LIMITATIONS cluster (LLM accuracy: 65% on MedQA) check_medical_advice
Law pile-of-law, legal case reports LIMITATIONS cluster (jurisdiction-specific errors) check_legal_boundaries
Coding code_x_glue_cc_defect_detection, humaneval, apps MIXED clusters (some code safe, some vulnerable) check_code_security
Finance financial_phrasebank, finqa LIMITATIONS cluster (regulatory compliance) check_financial_advice
Translation wmt14, opus-100 HARMLESS cluster (LLM near-human performance) (no tool needed)
General QA squad_v2, natural_questions HARMLESS cluster (LLM accuracy: 86% on MMLU) (no tool needed)
Summarization cnn_dailymail, xsum HARMLESS cluster (high ROUGE scores) (no tool needed)
Creative Writing TinyStories, writing_prompts HARMLESS cluster (subjective, no "wrong" answer) (no tool needed)
Therapy Mental health corpora (if available) LIMITATIONS cluster (crisis intervention risks) check_therapy_boundaries

Clustering Hypothesis

LIMITATIONS Cluster:

  • Contains: Math, medicine, law, finance, coding bugs, therapy
  • Characteristics: High reasoning complexity, domain expertise required, factual correctness critical
  • Cluster purity: >70% harmful/failure examples
  • Silhouette score: Aim for >0.4 (currently 0.25)

HARMLESS Cluster:

  • Contains: Translation, summarization, general QA, creative writing
  • Characteristics: Pattern matching, well-represented in training data, less critical if wrong
  • Cluster purity: >70% safe/successful examples

MIXED Cluster:

  • Contains: General coding, factual QA, educational content
  • Needs further subdivision or context-dependent handling

4. Implementation Plan: Enhanced Clustering Pipeline

Phase 1: Upgrade Clustering (Week 1-2)

Step 1.1: Install Dependencies

cd /Users/hetalksinmaths/togmal
source .venv/bin/activate
uv pip install sentence-transformers datasets scikit-learn matplotlib seaborn joblib

Step 1.2: Enhance research_pipeline.py

Add sentence transformers instead of TF-IDF:

# Add to research_pipeline.py
from sentence_transformers import SentenceTransformer

class FeatureExtractor:
    """Use sentence transformers for semantic embeddings"""
    
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.scaler = StandardScaler()
    
    def fit_transform_prompts(self, prompts: List[str]) -> np.ndarray:
        """Extract semantic embeddings"""
        embeddings = self.model.encode(
            prompts,
            batch_size=32,
            show_progress_bar=True,
            convert_to_numpy=True
        )
        return self.scaler.fit_transform(embeddings)

Why sentence transformers?

  • Captures semantic similarity (not just keywords)
  • Better cluster separation
  • Expect silhouette score improvement: 0.25 β†’ 0.4+

Step 1.3: Add Professional Domain Datasets

Update DatasetFetcher to use HuggingFace datasets library:

from datasets import load_dataset

async def _fetch_huggingface_real(self, config: DatasetConfig) -> List[DatasetEntry]:
    """Actual HuggingFace integration"""
    dataset = load_dataset(
        config.source_id,
        split=config.split,
        trust_remote_code=True
    )
    
    entries = []
    for item in dataset:
        entries.append(DatasetEntry(
            id="",
            source=config.name,
            type=config.cluster_category,
            prompt=item.get(config.text_column, ""),
            category=config.domains[0] if config.domains else "unknown",
            is_harmful=(config.cluster_category == "limitations"),
            metadata={"dataset": config.source_id}
        ))
    
    return entries

Priority datasets to fetch first:

  1. Mathematics (LIMITATIONS)

    • hendrycks/math - 12,500 competition-level problems
    • Use for detecting math complexity
  2. Medicine (LIMITATIONS)

    • medqa - Medical licensing exam questions
    • Use for detecting medical advice boundaries
  3. Coding (MIXED)

    • code_x_glue_cc_defect_detection - Buggy vs clean code
    • Use for detecting security vulnerabilities
  4. General QA (HARMLESS)

    • squad_v2 - Reading comprehension
    • Use as baseline "safe" cluster

Phase 2: Extract Patterns from Clusters (Week 3)

Step 2.1: Add Cluster Analysis

Enhance AnomalyClusteringModel._identify_dangerous_clusters:

def _identify_dangerous_clusters(
    self, cluster_labels: np.ndarray, entries: List[DatasetEntry]
) -> List[Dict[str, Any]]:
    """Identify dangerous clusters with pattern extraction"""
    
    dangerous_clusters = []
    
    for cluster_id in set(cluster_labels):
        if cluster_id == -1:  # Skip noise
            continue
        
        # Get cluster members
        mask = cluster_labels == cluster_id
        cluster_entries = [e for e, m in zip(entries, mask) if m]
        
        # Calculate purity
        harmful_count = sum(1 for e in cluster_entries if e.is_harmful)
        purity = harmful_count / len(cluster_entries)
        
        if purity < 0.7:  # Not dangerous enough
            continue
        
        # Extract pattern
        pattern = self._extract_pattern_from_cluster(cluster_entries)
        
        dangerous_clusters.append({
            "cluster_id": int(cluster_id),
            "size": len(cluster_entries),
            "purity": float(purity),
            "domain": pattern["domain"],
            "pattern_description": pattern["description"],
            "detection_rule": pattern["heuristic"],
            "examples": pattern["examples"]
        })
    
    return dangerous_clusters

Step 2.2: Pattern Extraction Logic

Add pattern extraction method:

def _extract_pattern_from_cluster(
    self, entries: List[DatasetEntry]
) -> Dict[str, Any]:
    """Extract actionable pattern from cluster members"""
    
    # Determine primary domain
    domain_counts = Counter(e.category for e in entries)
    primary_domain = domain_counts.most_common(1)[0][0]
    
    # Extract common keywords (for detection heuristic)
    all_prompts = " ".join(e.prompt for e in entries if e.prompt)
    words = re.findall(r'\b[a-z]{4,}\b', all_prompts.lower())
    top_keywords = [w for w, c in Counter(words).most_common(10)]
    
    # Generate detection rule
    if primary_domain == "mathematics":
        heuristic = "contains_math_symbols OR complexity > threshold"
    elif primary_domain == "medicine":
        heuristic = f"contains_medical_keywords: {', '.join(top_keywords[:5])}"
    else:
        heuristic = f"keyword_match: {', '.join(top_keywords[:5])}"
    
    # Get representative examples
    examples = [e.prompt for e in entries[:5] if e.prompt]
    
    # Generate description
    description = f"{primary_domain.title()} limitation pattern (cluster purity: {purity:.1%})"
    
    return {
        "domain": primary_domain,
        "description": description,
        "heuristic": heuristic,
        "examples": examples,
        "keywords": top_keywords
    }

Phase 3: Export to ML Tools Cache (Week 3-4)

Step 3.1: Update Pipeline to Export

Add export method to ResearchPipeline:

def export_to_togmal_ml_tools(self, training_results: Dict[str, Any]):
    """Export dangerous clusters as ToGMAL dynamic tools"""
    
    patterns = []
    
    for model_type, result in training_results.items():
        for cluster in result.get("dangerous_clusters", []):
            pattern = {
                "id": f"{model_type}_{cluster['cluster_id']}",
                "domain": cluster["domain"],
                "description": cluster["pattern_description"],
                "confidence": cluster["purity"],
                "heuristic": cluster["detection_rule"],
                "examples": cluster["examples"],
                "metadata": {
                    "cluster_size": cluster["size"],
                    "model_type": model_type,
                    "discovered_at": datetime.now().isoformat()
                }
            }
            patterns.append(pattern)
    
    # Save to ML tools cache (format expected by ml_tools.py)
    ml_tools_cache = {
        "updated_at": datetime.now().isoformat(),
        "patterns": patterns,
        "metadata": {
            "total_patterns": len(patterns),
            "domains": list(set(p["domain"] for p in patterns))
        }
    }
    
    cache_path = Path("./data/ml_discovered_tools.json")
    cache_path.parent.mkdir(parents=True, exist_ok=True)
    
    with open(cache_path, 'w') as f:
        json.dump(ml_tools_cache, f, indent=2)
    
    print(f"βœ“ Exported {len(patterns)} patterns to {cache_path}")

Step 3.2: Update togmal_mcp.py to Use Patterns

Modify existing togmal_list_tools_dynamic to load ML patterns:

@mcp.tool()
async def togmal_list_tools_dynamic(
    conversation_history: Optional[List[Dict[str, str]]] = None,
    user_context: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
    """
    Returns dynamically recommended tools based on conversation context
    
    ENHANCED: Now includes ML-discovered limitation patterns
    """
    # Existing domain detection
    domains = await analyze_conversation_context(conversation_history, user_context)
    
    # Load ML-discovered tools (NEW)
    ml_tools = await get_ml_discovered_tools(
        relevant_domains=domains,
        min_confidence=0.8  # Only high-confidence patterns
    )
    
    # Combine with static tools
    recommended_tools = [
        "togmal_analyze_prompt",
        "togmal_analyze_response",
        "togmal_submit_evidence"
    ]
    
    # Add domain-specific static tools
    if "mathematics" in domains or "physics" in domains:
        recommended_tools.append("togmal_check_math_complexity")
    if "medicine" in domains or "healthcare" in domains:
        recommended_tools.append("togmal_check_medical_advice")
    if "file_system" in domains:
        recommended_tools.append("togmal_check_file_operations")
    
    # Add ML-discovered tools (DYNAMIC)
    ml_tool_names = [tool["name"] for tool in ml_tools]
    recommended_tools.extend(ml_tool_names)
    
    return {
        "recommended_tools": recommended_tools,
        "detected_domains": domains,
        "ml_discovered_tools": ml_tools,  # Full definitions
        "context": {
            "conversation_depth": len(conversation_history) if conversation_history else 0,
            "has_user_context": bool(user_context)
        }
    }

5. Expected Improvements

Clustering Quality

Current (TF-IDF + K-Means):

  • Silhouette score: 0.25-0.26
  • Clusters: 2-3
  • Dangerous clusters: Identified, but low separation

Expected (Sentence Transformers + K-Means/DBSCAN):

  • Silhouette score: 0.4-0.6 (βœ… 60-140% improvement)
  • Clusters: 3-5 meaningful clusters
  • Dangerous clusters: Better defined with clear boundaries

Why?

  • Sentence transformers capture semantic meaning
  • TF-IDF only captures word overlap
  • Example: "What's the integral of xΒ²" vs "Solve this calculus problem" β†’ same cluster with ST, different with TF-IDF

Dynamic Tool Exposure

Before:

  • 5 static tools always available
  • Manual keyword matching for domain detection

After:

  • 5 static tools + N ML-discovered tools (N = # dangerous clusters)
  • Automatic tool exposure based on real clustering
  • Example: Cluster discovers "complex math word problems" β†’ new tool check_math_word_problem_complexity

Coverage of Professional Domains

Before:

  • Generic "math", "medical", "file operations"
  • No fine-grained domain understanding

After:

  • 10 professional domains with dataset-backed clustering
  • Sub-domain detection (e.g., "cardiology" vs "psychiatry" within medicine)
  • Evidence-based: Each tool backed by cluster of real failure examples

6. Integration with Aqumen (Future)

Bidirectional Feedback Loop

[ToGMAL Clustering] β†’ Discovers "law" limitation cluster
         ↓
[ToGMAL ML Tools] β†’ Exposes check_legal_boundaries
         ↓
[Aqumen Error Catalog] ← Imports "law" failures from ToGMAL
         ↓
[Aqumen Assessments] β†’ Tests users on legal reasoning
         ↓
[Assessment Failures] β†’ Reported back to ToGMAL
         ↓
[ToGMAL Re-Clustering] β†’ Refines "law" cluster with new data

Not implementing yet (per your request), but architecture is ready when needed.


7. Action Items (Next 2 Weeks)

Week 1: Enhanced Clustering

Day 1-2: Setup

  • Install dependencies: sentence-transformers, datasets, visualization libs
  • Copy research-datasets-fetcher.py and research-training-clustering.py to workspace
  • Integrate with existing research_pipeline.py

Day 3-5: Dataset Fetching

  • Implement real HuggingFace dataset loading
  • Fetch 4 priority datasets:
    • hendrycks/math (mathematics)
    • medqa (medicine)
    • code_x_glue_cc_defect_detection (coding)
    • squad_v2 (general QA as baseline)
  • Verify dataset cache works

Day 6-7: Clustering with Sentence Transformers

  • Replace TF-IDF with sentence transformers in FeatureExtractor
  • Run clustering on fetched datasets
  • Verify silhouette score improvement (target: >0.4)

Week 2: Pattern Extraction & Tool Generation

Day 8-10: Pattern Extraction

  • Implement _extract_pattern_from_cluster method
  • Generate detection heuristics from clusters
  • Visualize clusters (PCA 2D projection)

Day 11-12: Export to ML Tools

  • Implement export_to_togmal_ml_tools in pipeline
  • Run full pipeline and generate ml_discovered_tools.json
  • Verify format matches what ml_tools.py expects

Day 13-14: Testing & Validation

  • Test togmal_list_tools_dynamic with ML tools
  • Verify context analyzer correctly triggers ML tools
  • Run end-to-end test: conversation β†’ domain detection β†’ ML tool exposure

8. Success Metrics

Technical Metrics

Metric Current Target How to Measure
Silhouette Score 0.25-0.26 >0.4 sklearn.metrics.silhouette_score
Dangerous Cluster Purity 71-100% >80% % harmful in cluster
# Detected Domains 0 (manual) 5-10 Count from clustering
ML Tools Generated 0 5-10 Count in ml_discovered_tools.json
Tool Precision N/A >85% Manual review of triggered tools

Functional Metrics

  • Can differentiate "math limitations" from "general QA" clusters
  • Can automatically expose check_math_complexity when conversation contains math
  • Can generate heuristic rules that are interpretable (not just "cluster 3")
  • Visualization shows clear cluster separation

9. Risks & Mitigations

Risk Impact Mitigation
Sentence transformer slower than TF-IDF High Cache embeddings, use batch processing
Silhouette score doesn't improve High Try different embedding models (mpnet, distilbert)
HuggingFace datasets too large Medium Sample datasets (max 5000 entries each)
Clusters don't align with domains High Add domain labels to training data, use semi-supervised clustering
ML tools not useful in practice Medium Start with high confidence threshold (0.8+), iterate

10. File Structure After Implementation

/Users/hetalksinmaths/togmal/
β”œβ”€β”€ research_pipeline.py (ENHANCED)
β”‚   β”œβ”€β”€ FeatureExtractor with sentence transformers βœ…
β”‚   β”œβ”€β”€ Pattern extraction from clusters βœ…
β”‚   β”œβ”€β”€ Export to ML tools cache βœ…
β”‚
β”œβ”€β”€ togmal/
β”‚   β”œβ”€β”€ context_analyzer.py (EXISTING - works as-is)
β”‚   β”œβ”€β”€ ml_tools.py (EXISTING - works as-is)
β”‚   └── config.py (EXISTING)
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ datasets/ (NEW)
β”‚   β”‚   β”œβ”€β”€ combined_dataset.csv
β”‚   β”‚   └── [domain]_[dataset].csv
β”‚   β”‚
β”‚   β”œβ”€β”€ cache/ (EXISTING)
β”‚   β”‚   └── [source].json
β”‚   β”‚
β”‚   └── ml_discovered_tools.json (GENERATED by pipeline)
β”‚
β”œβ”€β”€ models/ (NEW)
β”‚   β”œβ”€β”€ clustering/
β”‚   β”‚   β”œβ”€β”€ kmeans_model.pkl
β”‚   β”‚   β”œβ”€β”€ embeddings_cache.npy
β”‚   β”‚   └── training_results.json
β”‚   └── visualization/
β”‚       └── clusters_2d.png
β”‚
└── CLUSTERING_TO_DYNAMIC_TOOLS_STRATEGY.md (THIS FILE)

11. Next Steps After This Implementation

Phase 4: Aqumen Integration (When Ready)

  1. Export ToGMAL clustering results to Aqumen error catalogs
  2. Import Aqumen assessment failures back into ToGMAL
  3. Re-train clustering with combined data

Phase 5: Continuous Improvement

  1. Weekly automated re-training on new data
  2. A/B testing of ML tools vs static tools
  3. User feedback loop to improve heuristics

Phase 6: Grant Preparation

  1. Publish clustering results as research artifact
  2. Use improved metrics (silhouette 0.4+) in grant proposal
  3. Demonstrate concrete improvements over baseline

Conclusion

What This Gets You:

  1. βœ… Real clustering on professional domain datasets
  2. βœ… Better separation between limitations and harmless clusters
  3. βœ… Automatic tool generation from clustering results
  4. βœ… Evidence-backed limitation detection (not just heuristics)
  5. βœ… Scalable architecture ready for Aqumen integration

What This Doesn't Do (Yet):

  • ❌ Aqumen bidirectional integration (Phase 4)
  • ❌ Production deployment (focus on research validation)
  • ❌ Comprehensive grant proposal (focus on technical foundation)

Recommended Focus:

Start with Week 1-2 action items to prove the clustering approach works, then decide on Aqumen integration vs grant preparation.


Ready to proceed? Let me know if you want me to:

  1. Start implementing the enhanced clustering pipeline
  2. Create a test harness for validating clusters
  3. Build the export-to-ML-tools integration
  4. Something else?