model_trace / logs.txt
Ahmed Ahmed
no more dynamic updates
3a2ac99
raw
history blame
19.5 kB
Searching for result files in: ./eval-results
Found 7 result files
Processing file: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json
config.json: 0%| | 0.00/1.35k [00:00<?, ?B/s]
config.json: 100%|██████████| 1.35k/1.35k [00:00<00:00, 17.2MB/s]
Created result object for: EleutherAI/gpt-neo-1.3B
Added new result for EleutherAI_gpt-neo-1.3B_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json
config.json: 0%| | 0.00/665 [00:00<?, ?B/s]
config.json: 100%|██████████| 665/665 [00:00<00:00, 8.83MB/s]
Created result object for: openai-community/gpt2
Added new result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing 2 evaluation results
Converting result to dict for: EleutherAI/gpt-neo-1.3B
=== PROCESSING RESULT TO_DICT ===
Processing result for model: EleutherAI/gpt-neo-1.3B
Raw results: {'perplexity': 5.9609375}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 5.9609375
Converted score: 82.1477223263516
Calculated average score: 82.1477223263516
Created base data_dict with 13 columns
Added task score: Perplexity = 5.9609375
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully converted and added result
Converting result to dict for: openai-community/gpt2
=== PROCESSING RESULT TO_DICT ===
Processing result for model: openai-community/gpt2
Raw results: {'perplexity': 20.663532257080078}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 20.663532257080078
Converted score: 69.7162958010531
Calculated average score: 69.7162958010531
Created base data_dict with 13 columns
Added task score: Perplexity = 20.663532257080078
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully converted and added result
Returning 2 processed results
Found 2 raw results
Processing result 1/2: EleutherAI/gpt-neo-1.3B
=== PROCESSING RESULT TO_DICT ===
Processing result for model: EleutherAI/gpt-neo-1.3B
Raw results: {'perplexity': 5.9609375}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 5.9609375
Converted score: 82.1477223263516
Calculated average score: 82.1477223263516
Created base data_dict with 13 columns
Added task score: Perplexity = 5.9609375
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully processed result 1/2: EleutherAI/gpt-neo-1.3B
Processing result 2/2: openai-community/gpt2
=== PROCESSING RESULT TO_DICT ===
Processing result for model: openai-community/gpt2
Raw results: {'perplexity': 20.663532257080078}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 20.663532257080078
Converted score: 69.7162958010531
Calculated average score: 69.7162958010531
Created base data_dict with 13 columns
Added task score: Perplexity = 20.663532257080078
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully processed result 2/2: openai-community/gpt2
Converted to 2 JSON records
Sample record keys: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
Created DataFrame with columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
DataFrame shape: (2, 14)
Sorted DataFrame by average
Selected and rounded columns
Final DataFrame shape after filtering: (2, 12)
Final columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
=== FINAL RESULT: DataFrame with 2 rows and 12 columns ===
=== Initializing Leaderboard ===
DataFrame shape: (2, 12)
DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
* Running on local URL: http://0.0.0.0:7860, with SSR ⚡ (experimental, to disable set `ssr=False` in `launch()`)
To create a public link, set `share=True` in `launch()`.
=== RUNNING PERPLEXITY TEST ===
Model: openai-community/gpt2-large
Revision: main
Precision: float16
Starting dynamic evaluation for openai-community/gpt2-large
Running perplexity evaluation...
Loading model: openai-community/gpt2-large (revision: main)
Loading tokenizer...
tokenizer_config.json: 0%| | 0.00/26.0 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 183kB/s]
config.json: 0%| | 0.00/666 [00:00<?, ?B/s]
config.json: 100%|██████████| 666/666 [00:00<00:00, 7.11MB/s]
vocab.json: 0%| | 0.00/1.04M [00:00<?, ?B/s]
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 45.7MB/s]
merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 44.9MB/s]
tokenizer.json: 0%| | 0.00/1.36M [00:00<?, ?B/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 25.3MB/s]
Tokenizer loaded successfully
Loading model...
model.safetensors: 0%| | 0.00/3.25G [00:00<?, ?B/s]
model.safetensors: 0%| | 3.99M/3.25G [00:01<18:26, 2.93MB/s]
model.safetensors: 4%|▍ | 138M/3.25G [00:02<00:47, 65.1MB/s]
model.safetensors: 7%|▋ | 235M/3.25G [00:03<00:46, 65.4MB/s]
model.safetensors: 28%|██▊ | 905M/3.25G [00:05<00:09, 258MB/s]
model.safetensors: 46%|████▋ | 1.51G/3.25G [00:06<00:04, 360MB/s]
model.safetensors: 71%|███████ | 2.31G/3.25G [00:07<00:01, 484MB/s]
model.safetensors: 98%|█████████▊| 3.18G/3.25G [00:08<00:00, 593MB/s]
model.safetensors: 100%|██████████| 3.25G/3.25G [00:08<00:00, 390MB/s]
generation_config.json: 0%| | 0.00/124 [00:00<?, ?B/s]
generation_config.json: 100%|██████████| 124/124 [00:00<00:00, 1.04MB/s]
Model loaded successfully
Tokenizing input text...
Tokenized input shape: torch.Size([1, 141])
Moved inputs to device: cpu
Running forward pass...
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Calculated loss: 2.1944427490234375
Final perplexity: 8.974998474121094
Perplexity evaluation completed: 8.974998474121094
Created result structure: {'config': {'model_dtype': 'torch.float16', 'model_name': 'openai-community/gpt2-large', 'model_sha': 'main'}, 'results': {'perplexity': {'perplexity': 8.974998474121094}}}
Saving result to: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json
Result file saved locally
Uploading to HF dataset: ahmedsqrd/results
Upload completed successfully
Evaluation result - Success: True, Result: 8.974998474121094
Attempting to refresh leaderboard...
=== REFRESH LEADERBOARD DEBUG ===
Refreshing leaderboard data...
=== GET_LEADERBOARD_DF DEBUG ===
Starting leaderboard creation...
Looking for results in: ./eval-results
Expected columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
Benchmark columns: ['Perplexity']
Searching for result files in: ./eval-results
Found 8 result files
Processing file: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json
Created result object for: EleutherAI/gpt-neo-1.3B
Added new result for EleutherAI_gpt-neo-1.3B_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json
Created result object for: openai-community/gpt2
Added new result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json
Created result object for: openai-community/gpt2
Updated existing result for openai-community_gpt2_float16
Processing file: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json
Created result object for: openai-community/gpt2-large
Added new result for openai-community_gpt2-large_float16
Processing 3 evaluation results
Converting result to dict for: EleutherAI/gpt-neo-1.3B
=== PROCESSING RESULT TO_DICT ===
Processing result for model: EleutherAI/gpt-neo-1.3B
Raw results: {'perplexity': 5.9609375}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 5.9609375
Converted score: 82.1477223263516
Calculated average score: 82.1477223263516
Created base data_dict with 13 columns
Added task score: Perplexity = 5.9609375
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully converted and added result
Converting result to dict for: openai-community/gpt2
=== PROCESSING RESULT TO_DICT ===
Processing result for model: openai-community/gpt2
Raw results: {'perplexity': 20.663532257080078}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 20.663532257080078
Converted score: 69.7162958010531
Calculated average score: 69.7162958010531
Created base data_dict with 13 columns
Added task score: Perplexity = 20.663532257080078
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully converted and added result
Converting result to dict for: openai-community/gpt2-large
=== PROCESSING RESULT TO_DICT ===
Processing result for model: openai-community/gpt2-large
Raw results: {'perplexity': 8.974998474121094}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 8.974998474121094
Converted score: 78.05557235640035
Calculated average score: 78.05557235640035
Created base data_dict with 13 columns
Added task score: Perplexity = 8.974998474121094
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully converted and added result
Returning 3 processed results
Found 3 raw results
Processing result 1/3: EleutherAI/gpt-neo-1.3B
=== PROCESSING RESULT TO_DICT ===
Processing result for model: EleutherAI/gpt-neo-1.3B
Raw results: {'perplexity': 5.9609375}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 5.9609375
Converted score: 82.1477223263516
Calculated average score: 82.1477223263516
Created base data_dict with 13 columns
Added task score: Perplexity = 5.9609375
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully processed result 1/3: EleutherAI/gpt-neo-1.3B
Processing result 2/3: openai-community/gpt2
=== PROCESSING RESULT TO_DICT ===
Processing result for model: openai-community/gpt2
Raw results: {'perplexity': 20.663532257080078}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 20.663532257080078
Converted score: 69.7162958010531
Calculated average score: 69.7162958010531
Created base data_dict with 13 columns
Added task score: Perplexity = 20.663532257080078
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully processed result 2/3: openai-community/gpt2
Processing result 3/3: openai-community/gpt2-large
=== PROCESSING RESULT TO_DICT ===
Processing result for model: openai-community/gpt2-large
Raw results: {'perplexity': 8.974998474121094}
Model precision: Precision.float16
Model type: ModelType.PT
Weight type: WeightType.Original
Available tasks: ['task0']
Looking for task: perplexity in results
Found score for perplexity: 8.974998474121094
Converted score: 78.05557235640035
Calculated average score: 78.05557235640035
Created base data_dict with 13 columns
Added task score: Perplexity = 8.974998474121094
Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
=== END PROCESSING RESULT TO_DICT ===
Successfully processed result 3/3: openai-community/gpt2-large
Converted to 3 JSON records
Sample record keys: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
Created DataFrame with columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
DataFrame shape: (3, 14)
Sorted DataFrame by average
Selected and rounded columns
Final DataFrame shape after filtering: (3, 12)
Final columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
=== FINAL RESULT: DataFrame with 3 rows and 12 columns ===
get_leaderboard_df returned: <class 'pandas.core.frame.DataFrame'>
DataFrame shape: (3, 12)
DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
DataFrame empty: False
Final DataFrame for leaderboard - Shape: (3, 12), Columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
Creating leaderboard component...
=== Initializing Leaderboard ===
DataFrame shape: (3, 12)
DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
Leaderboard component created successfully
Leaderboard refresh successful
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 625, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2106, in process_api
data = await self.postprocess_data(block_fn, result["prediction"], state)
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1899, in postprocess_data
state[block._id] = block.__class__(**kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/component_meta.py", line 181, in wrapper
return fn(self, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio_leaderboard/leaderboard.py", line 126, in __init__
raise ValueError("Leaderboard component must have a value set.")
ValueError: Leaderboard component must have a value set.