--- title: OpenThoughts Benchmark Explorer emoji: 📊 colorFrom: blue colorTo: red sdk: streamlit sdk_version: 1.28.0 app_file: app.py pinned: false license: apache-2.0 --- # OpenThoughts Evalchemy Benchmark Explorer A comprehensive web application for exploring OpenThoughts benchmark correlations and model performance. ## Features - Interactive correlation heatmaps - Scatter plot explorer with uncertainty analysis - Model performance comparisons - Statistical summaries and uncertainty analysis ## Usage The app automatically loads benchmark data and provides multiple views for analysis: 1. **Overview Dashboard**: High-level summary of benchmarks and correlations 2. **Interactive Heatmap**: Correlation matrix visualization 3. **Scatter Explorer**: Detailed pairwise benchmark comparisons 4. **Model Performance**: Individual model analysis 5. **Statistical Summary**: Correlation statistics across methods 6. **Uncertainty Analysis**: Measurement reliability analysis ## Data Files The app requires two CSV files: - `comprehensive_benchmark_scores.csv`: Main benchmark scores - `benchmark_standard_errors.csv`: Standard error estimates (optional) These files should be in the root directory of the repository.