Spaces:
Runtime error
Runtime error
| NCHMARK_COLS: ['Perplexity'] | |
| === END COLUMN SETUP === | |
| π§ CHECKING MODEL TRACING AVAILABILITY... | |
| - Model tracing path: /home/user/app/src/evaluation/../../model-tracing | |
| - Path exists: True | |
| - main.py exists: True | |
| π― Final MODEL_TRACING_AVAILABLE = True | |
| .gitattributes: 0%| | 0.00/2.46k [00:00<?, ?B/s] | |
| .gitattributes: 100%|ββββββββββ| 2.46k/2.46k [00:00<00:00, 10.1MB/s] | |
| (β¦)therAI_gpt-neo-1.3B_20250726_010247.json: 0%| | 0.00/202 [00:00<?, ?B/s] | |
| (β¦)therAI_gpt-neo-1.3B_20250726_010247.json: 100%|ββββββββββ| 202/202 [00:00<00:00, 748kB/s] | |
| (β¦)s_facebook_opt-125m_20250726_020655.json: 0%| | 0.00/205 [00:00<?, ?B/s] | |
| (β¦)s_facebook_opt-125m_20250726_020655.json: 100%|ββββββββββ| 205/205 [00:00<00:00, 909kB/s] | |
| (β¦)s_facebook_opt-350m_20250726_021737.json: 0%| | 0.00/205 [00:00<?, ?B/s] | |
| (β¦)s_facebook_opt-350m_20250726_021737.json: 100%|ββββββββββ| 205/205 [00:00<00:00, 850kB/s] | |
| (β¦)ommunity_gpt2-large_20250726_013038.json: 0%| | 0.00/214 [00:00<?, ?B/s] | |
| (β¦)ommunity_gpt2-large_20250726_013038.json: 100%|ββββββββββ| 214/214 [00:00<00:00, 1.03MB/s] | |
| (β¦)mmunity_gpt2-medium_20250726_015555.json: 0%| | 0.00/216 [00:00<?, ?B/s] | |
| (β¦)mmunity_gpt2-medium_20250726_015555.json: 100%|ββββββββββ| 216/216 [00:00<00:00, 730kB/s] | |
| (β¦)enai-community_gpt2_20250725_231201.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
| (β¦)enai-community_gpt2_20250725_231201.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 533kB/s] | |
| (β¦)enai-community_gpt2_20250725_233155.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
| (β¦)enai-community_gpt2_20250725_233155.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 905kB/s] | |
| (β¦)enai-community_gpt2_20250725_235115.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
| (β¦)enai-community_gpt2_20250725_235115.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 801kB/s] | |
| (β¦)enai-community_gpt2_20250725_235748.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
| (β¦)enai-community_gpt2_20250725_235748.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 856kB/s] | |
| (β¦)enai-community_gpt2_20250726_000358.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
| (β¦)enai-community_gpt2_20250726_000358.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 696kB/s] | |
| (β¦)enai-community_gpt2_20250726_000650.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
| (β¦)enai-community_gpt2_20250726_000650.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 792kB/s] | |
| (β¦)enai-community_gpt2_20250726_015147.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
| (β¦)enai-community_gpt2_20250726_015147.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 1.12MB/s] | |
| π STARTING GRADIO APP INITIALIZATION | |
| π Initializing allowed models... | |
| π INITIALIZING ALLOWED MODELS | |
| π Models to initialize: ['lmsys/vicuna-7b-v1.5', 'ibm-granite/granite-7b-base', 'EleutherAI/llemma_7b'] | |
| π§Ή CLEANING NON-ALLOWED RESULT FILES | |
| ποΈ Removing non-allowed model result: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json (model: EleutherAI/gpt-neo-1.3B) | |
| ποΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-125m_20250726_020655.json (model: facebook/opt-125m) | |
| ποΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-350m_20250726_021737.json (model: facebook/opt-350m) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json (model: openai-community/gpt2-large) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-medium_20250726_015555.json (model: openai-community/gpt2-medium) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json (model: openai-community/gpt2) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json (model: openai-community/gpt2) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json (model: openai-community/gpt2) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json (model: openai-community/gpt2) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json (model: openai-community/gpt2) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json (model: openai-community/gpt2) | |
| ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_015147.json (model: openai-community/gpt2) | |
| β Removed 12 non-allowed result files | |
| π§ CREATING RESULT FILE FOR: lmsys/vicuna-7b-v1.5 | |
| π Result file path: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json | |
| β Created result file: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json | |
| π§ CREATING RESULT FILE FOR: ibm-granite/granite-7b-base | |
| π Result file path: ./eval-results/ibm_granite_granite_7b_base_float16.json | |
| β Created result file: ./eval-results/ibm_granite_granite_7b_base_float16.json | |
| π§ CREATING RESULT FILE FOR: EleutherAI/llemma_7b | |
| π Result file path: ./eval-results/EleutherAI_llemma_7b_float16.json | |
| β Created result file: ./eval-results/EleutherAI_llemma_7b_float16.json | |
| β Initialized 3 model result files | |
| π Creating initial results DataFrame... | |
| π CREATE_RESULTS_DATAFRAME CALLED | |
| === GET_LEADERBOARD_DF DEBUG === | |
| Starting leaderboard creation... | |
| Looking for results in: ./eval-results | |
| Expected columns: ['T', 'Model', 'Average β¬οΈ', 'Perplexity', 'Match P-Value β¬οΈ', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub β€οΈ', 'Available on the hub', 'Model sha'] | |
| Benchmark columns: ['Perplexity'] | |
| Searching for result files in: ./eval-results | |
| Found 0 result files | |
| Processing 0 evaluation results | |
| Returning 0 processed results | |
| Found 0 raw results | |
| No raw data found, creating empty DataFrame | |
| Creating empty fallback DataFrame... | |
| Empty DataFrame created with columns: ['T', 'Model', 'Average β¬οΈ', 'Perplexity', 'Match P-Value β¬οΈ', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub β€οΈ', 'Available on the hub', 'Model sha'] | |
| π Retrieved leaderboard df: (0, 13) | |
| β οΈ DataFrame is None or empty, returning empty DataFrame | |
| β Initial DataFrame created with shape: (0, 6) | |
| π Columns: ['Model', 'Perplexity', 'Match P-Value', 'Average Score', 'Type', 'Precision'] | |
| π¨ Creating Gradio interface... | |
| π― GRADIO INTERFACE SETUP COMPLETE | |
| π LAUNCHING GRADIO APP WITH MODEL TRACING INTEGRATION | |
| π Features enabled: | |
| - Perplexity evaluation | |
| - Model trace p-value computation (vs GPT-2 base) | |
| - Match statistic with alignment | |
| π Ready to accept requests! | |
| * Running on local URL: http://0.0.0.0:7860, with SSR β‘ (experimental, to disable set `ssr=False` in `launch()`) |