Spaces:
Runtime error
Runtime error
Searching for result files in: ./eval-results | |
Found 7 result files | |
Processing file: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json | |
config.json: 0%| | 0.00/1.35k [00:00<?, ?B/s] | |
config.json: 100%|██████████| 1.35k/1.35k [00:00<00:00, 17.2MB/s] | |
Created result object for: EleutherAI/gpt-neo-1.3B | |
Added new result for EleutherAI_gpt-neo-1.3B_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json | |
config.json: 0%| | 0.00/665 [00:00<?, ?B/s] | |
config.json: 100%|██████████| 665/665 [00:00<00:00, 8.83MB/s] | |
Created result object for: openai-community/gpt2 | |
Added new result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing 2 evaluation results | |
Converting result to dict for: EleutherAI/gpt-neo-1.3B | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: EleutherAI/gpt-neo-1.3B | |
Raw results: {'perplexity': 5.9609375} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 5.9609375 | |
Converted score: 82.1477223263516 | |
Calculated average score: 82.1477223263516 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 5.9609375 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully converted and added result | |
Converting result to dict for: openai-community/gpt2 | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: openai-community/gpt2 | |
Raw results: {'perplexity': 20.663532257080078} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 20.663532257080078 | |
Converted score: 69.7162958010531 | |
Calculated average score: 69.7162958010531 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 20.663532257080078 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully converted and added result | |
Returning 2 processed results | |
Found 2 raw results | |
Processing result 1/2: EleutherAI/gpt-neo-1.3B | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: EleutherAI/gpt-neo-1.3B | |
Raw results: {'perplexity': 5.9609375} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 5.9609375 | |
Converted score: 82.1477223263516 | |
Calculated average score: 82.1477223263516 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 5.9609375 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully processed result 1/2: EleutherAI/gpt-neo-1.3B | |
Processing result 2/2: openai-community/gpt2 | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: openai-community/gpt2 | |
Raw results: {'perplexity': 20.663532257080078} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 20.663532257080078 | |
Converted score: 69.7162958010531 | |
Calculated average score: 69.7162958010531 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 20.663532257080078 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully processed result 2/2: openai-community/gpt2 | |
Converted to 2 JSON records | |
Sample record keys: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
Created DataFrame with columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
DataFrame shape: (2, 14) | |
Sorted DataFrame by average | |
Selected and rounded columns | |
Final DataFrame shape after filtering: (2, 12) | |
Final columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha'] | |
=== FINAL RESULT: DataFrame with 2 rows and 12 columns === | |
=== Initializing Leaderboard === | |
DataFrame shape: (2, 12) | |
DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha'] | |
* Running on local URL: http://0.0.0.0:7860, with SSR ⚡ (experimental, to disable set `ssr=False` in `launch()`) | |
To create a public link, set `share=True` in `launch()`. | |
=== RUNNING PERPLEXITY TEST === | |
Model: openai-community/gpt2-large | |
Revision: main | |
Precision: float16 | |
Starting dynamic evaluation for openai-community/gpt2-large | |
Running perplexity evaluation... | |
Loading model: openai-community/gpt2-large (revision: main) | |
Loading tokenizer... | |
tokenizer_config.json: 0%| | 0.00/26.0 [00:00<?, ?B/s] | |
tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 183kB/s] | |
config.json: 0%| | 0.00/666 [00:00<?, ?B/s] | |
config.json: 100%|██████████| 666/666 [00:00<00:00, 7.11MB/s] | |
vocab.json: 0%| | 0.00/1.04M [00:00<?, ?B/s] | |
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 45.7MB/s] | |
merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s] | |
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 44.9MB/s] | |
tokenizer.json: 0%| | 0.00/1.36M [00:00<?, ?B/s] | |
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 25.3MB/s] | |
Tokenizer loaded successfully | |
Loading model... | |
model.safetensors: 0%| | 0.00/3.25G [00:00<?, ?B/s] | |
model.safetensors: 0%| | 3.99M/3.25G [00:01<18:26, 2.93MB/s] | |
model.safetensors: 4%|▍ | 138M/3.25G [00:02<00:47, 65.1MB/s] | |
model.safetensors: 7%|▋ | 235M/3.25G [00:03<00:46, 65.4MB/s] | |
model.safetensors: 28%|██▊ | 905M/3.25G [00:05<00:09, 258MB/s] | |
model.safetensors: 46%|████▋ | 1.51G/3.25G [00:06<00:04, 360MB/s] | |
model.safetensors: 71%|███████ | 2.31G/3.25G [00:07<00:01, 484MB/s] | |
model.safetensors: 98%|█████████▊| 3.18G/3.25G [00:08<00:00, 593MB/s] | |
model.safetensors: 100%|██████████| 3.25G/3.25G [00:08<00:00, 390MB/s] | |
generation_config.json: 0%| | 0.00/124 [00:00<?, ?B/s] | |
generation_config.json: 100%|██████████| 124/124 [00:00<00:00, 1.04MB/s] | |
Model loaded successfully | |
Tokenizing input text... | |
Tokenized input shape: torch.Size([1, 141]) | |
Moved inputs to device: cpu | |
Running forward pass... | |
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`. | |
Calculated loss: 2.1944427490234375 | |
Final perplexity: 8.974998474121094 | |
Perplexity evaluation completed: 8.974998474121094 | |
Created result structure: {'config': {'model_dtype': 'torch.float16', 'model_name': 'openai-community/gpt2-large', 'model_sha': 'main'}, 'results': {'perplexity': {'perplexity': 8.974998474121094}}} | |
Saving result to: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json | |
Result file saved locally | |
Uploading to HF dataset: ahmedsqrd/results | |
Upload completed successfully | |
Evaluation result - Success: True, Result: 8.974998474121094 | |
Attempting to refresh leaderboard... | |
=== REFRESH LEADERBOARD DEBUG === | |
Refreshing leaderboard data... | |
=== GET_LEADERBOARD_DF DEBUG === | |
Starting leaderboard creation... | |
Looking for results in: ./eval-results | |
Expected columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha'] | |
Benchmark columns: ['Perplexity'] | |
Searching for result files in: ./eval-results | |
Found 8 result files | |
Processing file: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json | |
Created result object for: EleutherAI/gpt-neo-1.3B | |
Added new result for EleutherAI_gpt-neo-1.3B_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json | |
Created result object for: openai-community/gpt2 | |
Added new result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json | |
Created result object for: openai-community/gpt2 | |
Updated existing result for openai-community_gpt2_float16 | |
Processing file: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json | |
Created result object for: openai-community/gpt2-large | |
Added new result for openai-community_gpt2-large_float16 | |
Processing 3 evaluation results | |
Converting result to dict for: EleutherAI/gpt-neo-1.3B | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: EleutherAI/gpt-neo-1.3B | |
Raw results: {'perplexity': 5.9609375} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 5.9609375 | |
Converted score: 82.1477223263516 | |
Calculated average score: 82.1477223263516 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 5.9609375 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully converted and added result | |
Converting result to dict for: openai-community/gpt2 | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: openai-community/gpt2 | |
Raw results: {'perplexity': 20.663532257080078} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 20.663532257080078 | |
Converted score: 69.7162958010531 | |
Calculated average score: 69.7162958010531 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 20.663532257080078 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully converted and added result | |
Converting result to dict for: openai-community/gpt2-large | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: openai-community/gpt2-large | |
Raw results: {'perplexity': 8.974998474121094} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 8.974998474121094 | |
Converted score: 78.05557235640035 | |
Calculated average score: 78.05557235640035 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 8.974998474121094 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully converted and added result | |
Returning 3 processed results | |
Found 3 raw results | |
Processing result 1/3: EleutherAI/gpt-neo-1.3B | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: EleutherAI/gpt-neo-1.3B | |
Raw results: {'perplexity': 5.9609375} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 5.9609375 | |
Converted score: 82.1477223263516 | |
Calculated average score: 82.1477223263516 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 5.9609375 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully processed result 1/3: EleutherAI/gpt-neo-1.3B | |
Processing result 2/3: openai-community/gpt2 | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: openai-community/gpt2 | |
Raw results: {'perplexity': 20.663532257080078} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 20.663532257080078 | |
Converted score: 69.7162958010531 | |
Calculated average score: 69.7162958010531 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 20.663532257080078 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully processed result 2/3: openai-community/gpt2 | |
Processing result 3/3: openai-community/gpt2-large | |
=== PROCESSING RESULT TO_DICT === | |
Processing result for model: openai-community/gpt2-large | |
Raw results: {'perplexity': 8.974998474121094} | |
Model precision: Precision.float16 | |
Model type: ModelType.PT | |
Weight type: WeightType.Original | |
Available tasks: ['task0'] | |
Looking for task: perplexity in results | |
Found score for perplexity: 8.974998474121094 | |
Converted score: 78.05557235640035 | |
Calculated average score: 78.05557235640035 | |
Created base data_dict with 13 columns | |
Added task score: Perplexity = 8.974998474121094 | |
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
=== END PROCESSING RESULT TO_DICT === | |
Successfully processed result 3/3: openai-community/gpt2-large | |
Converted to 3 JSON records | |
Sample record keys: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
Created DataFrame with columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity'] | |
DataFrame shape: (3, 14) | |
Sorted DataFrame by average | |
Selected and rounded columns | |
Final DataFrame shape after filtering: (3, 12) | |
Final columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha'] | |
=== FINAL RESULT: DataFrame with 3 rows and 12 columns === | |
get_leaderboard_df returned: <class 'pandas.core.frame.DataFrame'> | |
DataFrame shape: (3, 12) | |
DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha'] | |
DataFrame empty: False | |
Final DataFrame for leaderboard - Shape: (3, 12), Columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha'] | |
Creating leaderboard component... | |
=== Initializing Leaderboard === | |
DataFrame shape: (3, 12) | |
DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha'] | |
Leaderboard component created successfully | |
Leaderboard refresh successful | |
Traceback (most recent call last): | |
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 625, in process_events | |
response = await route_utils.call_process_api( | |
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api | |
output = await app.get_blocks().process_api( | |
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2106, in process_api | |
data = await self.postprocess_data(block_fn, result["prediction"], state) | |
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1899, in postprocess_data | |
state[block._id] = block.__class__(**kwargs) | |
File "/usr/local/lib/python3.10/site-packages/gradio/component_meta.py", line 181, in wrapper | |
return fn(self, **kwargs) | |
File "/usr/local/lib/python3.10/site-packages/gradio_leaderboard/leaderboard.py", line 126, in __init__ | |
raise ValueError("Leaderboard component must have a value set.") | |
ValueError: Leaderboard component must have a value set. |