Spaces:

ahmedsqrd
/

model_trace

Runtime error

model_trace / logs.txt

Ahmed Ahmed

no more dynamic updates

3a2ac99 17 days ago

19.5 kB


	Searching for result files in: ./eval-results
	Found 7 result files

	Processing file: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json

	config.json: 0%\| \| 0.00/1.35k [00:00<?, ?B/s]
	config.json: 100%\|██████████\| 1.35k/1.35k [00:00<00:00, 17.2MB/s]
	Created result object for: EleutherAI/gpt-neo-1.3B
	Added new result for EleutherAI_gpt-neo-1.3B_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json

	config.json: 0%\| \| 0.00/665 [00:00<?, ?B/s]
	config.json: 100%\|██████████\| 665/665 [00:00<00:00, 8.83MB/s]
	Created result object for: openai-community/gpt2
	Added new result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing 2 evaluation results

	Converting result to dict for: EleutherAI/gpt-neo-1.3B

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: EleutherAI/gpt-neo-1.3B
	Raw results: {'perplexity': 5.9609375}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 5.9609375
	Converted score: 82.1477223263516
	Calculated average score: 82.1477223263516
	Created base data_dict with 13 columns
	Added task score: Perplexity = 5.9609375
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully converted and added result

	Converting result to dict for: openai-community/gpt2

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: openai-community/gpt2
	Raw results: {'perplexity': 20.663532257080078}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 20.663532257080078
	Converted score: 69.7162958010531
	Calculated average score: 69.7162958010531
	Created base data_dict with 13 columns
	Added task score: Perplexity = 20.663532257080078
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully converted and added result

	Returning 2 processed results

	Found 2 raw results
	Processing result 1/2: EleutherAI/gpt-neo-1.3B

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: EleutherAI/gpt-neo-1.3B
	Raw results: {'perplexity': 5.9609375}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 5.9609375
	Converted score: 82.1477223263516
	Calculated average score: 82.1477223263516
	Created base data_dict with 13 columns
	Added task score: Perplexity = 5.9609375
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully processed result 1/2: EleutherAI/gpt-neo-1.3B
	Processing result 2/2: openai-community/gpt2

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: openai-community/gpt2
	Raw results: {'perplexity': 20.663532257080078}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 20.663532257080078
	Converted score: 69.7162958010531
	Calculated average score: 69.7162958010531
	Created base data_dict with 13 columns
	Added task score: Perplexity = 20.663532257080078
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully processed result 2/2: openai-community/gpt2

	Converted to 2 JSON records
	Sample record keys: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']

	Created DataFrame with columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	DataFrame shape: (2, 14)

	Sorted DataFrame by average

	Selected and rounded columns

	Final DataFrame shape after filtering: (2, 12)
	Final columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
	=== FINAL RESULT: DataFrame with 2 rows and 12 columns ===

	=== Initializing Leaderboard ===
	DataFrame shape: (2, 12)
	DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
	* Running on local URL: http://0.0.0.0:7860, with SSR ⚡ (experimental, to disable set `ssr=False` in `launch()`)

	To create a public link, set `share=True` in `launch()`.

	=== RUNNING PERPLEXITY TEST ===
	Model: openai-community/gpt2-large
	Revision: main
	Precision: float16
	Starting dynamic evaluation for openai-community/gpt2-large
	Running perplexity evaluation...
	Loading model: openai-community/gpt2-large (revision: main)
	Loading tokenizer...

	tokenizer_config.json: 0%\| \| 0.00/26.0 [00:00<?, ?B/s]
	tokenizer_config.json: 100%\|██████████\| 26.0/26.0 [00:00<00:00, 183kB/s]

	config.json: 0%\| \| 0.00/666 [00:00<?, ?B/s]
	config.json: 100%\|██████████\| 666/666 [00:00<00:00, 7.11MB/s]

	vocab.json: 0%\| \| 0.00/1.04M [00:00<?, ?B/s]
	vocab.json: 100%\|██████████\| 1.04M/1.04M [00:00<00:00, 45.7MB/s]

	merges.txt: 0%\| \| 0.00/456k [00:00<?, ?B/s]
	merges.txt: 100%\|██████████\| 456k/456k [00:00<00:00, 44.9MB/s]

	tokenizer.json: 0%\| \| 0.00/1.36M [00:00<?, ?B/s]
	tokenizer.json: 100%\|██████████\| 1.36M/1.36M [00:00<00:00, 25.3MB/s]
	Tokenizer loaded successfully
	Loading model...

	model.safetensors: 0%\| \| 0.00/3.25G [00:00<?, ?B/s]
	model.safetensors: 0%\| \| 3.99M/3.25G [00:01<18:26, 2.93MB/s]
	model.safetensors: 4%\|▍ \| 138M/3.25G [00:02<00:47, 65.1MB/s]
	model.safetensors: 7%\|▋ \| 235M/3.25G [00:03<00:46, 65.4MB/s]
	model.safetensors: 28%\|██▊ \| 905M/3.25G [00:05<00:09, 258MB/s]
	model.safetensors: 46%\|████▋ \| 1.51G/3.25G [00:06<00:04, 360MB/s]
	model.safetensors: 71%\|███████ \| 2.31G/3.25G [00:07<00:01, 484MB/s]
	model.safetensors: 98%\|█████████▊\| 3.18G/3.25G [00:08<00:00, 593MB/s]
	model.safetensors: 100%\|██████████\| 3.25G/3.25G [00:08<00:00, 390MB/s]

	generation_config.json: 0%\| \| 0.00/124 [00:00<?, ?B/s]
	generation_config.json: 100%\|██████████\| 124/124 [00:00<00:00, 1.04MB/s]
	Model loaded successfully
	Tokenizing input text...
	Tokenized input shape: torch.Size([1, 141])
	Moved inputs to device: cpu
	Running forward pass...
	`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
	Calculated loss: 2.1944427490234375
	Final perplexity: 8.974998474121094
	Perplexity evaluation completed: 8.974998474121094
	Created result structure: {'config': {'model_dtype': 'torch.float16', 'model_name': 'openai-community/gpt2-large', 'model_sha': 'main'}, 'results': {'perplexity': {'perplexity': 8.974998474121094}}}
	Saving result to: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json
	Result file saved locally
	Uploading to HF dataset: ahmedsqrd/results
	Upload completed successfully
	Evaluation result - Success: True, Result: 8.974998474121094
	Attempting to refresh leaderboard...
	=== REFRESH LEADERBOARD DEBUG ===
	Refreshing leaderboard data...

	=== GET_LEADERBOARD_DF DEBUG ===
	Starting leaderboard creation...
	Looking for results in: ./eval-results
	Expected columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
	Benchmark columns: ['Perplexity']

	Searching for result files in: ./eval-results
	Found 8 result files

	Processing file: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json
	Created result object for: EleutherAI/gpt-neo-1.3B
	Added new result for EleutherAI_gpt-neo-1.3B_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json
	Created result object for: openai-community/gpt2
	Added new result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json
	Created result object for: openai-community/gpt2
	Updated existing result for openai-community_gpt2_float16

	Processing file: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json
	Created result object for: openai-community/gpt2-large
	Added new result for openai-community_gpt2-large_float16

	Processing 3 evaluation results

	Converting result to dict for: EleutherAI/gpt-neo-1.3B

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: EleutherAI/gpt-neo-1.3B
	Raw results: {'perplexity': 5.9609375}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 5.9609375
	Converted score: 82.1477223263516
	Calculated average score: 82.1477223263516
	Created base data_dict with 13 columns
	Added task score: Perplexity = 5.9609375
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully converted and added result

	Converting result to dict for: openai-community/gpt2

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: openai-community/gpt2
	Raw results: {'perplexity': 20.663532257080078}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 20.663532257080078
	Converted score: 69.7162958010531
	Calculated average score: 69.7162958010531
	Created base data_dict with 13 columns
	Added task score: Perplexity = 20.663532257080078
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully converted and added result

	Converting result to dict for: openai-community/gpt2-large

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: openai-community/gpt2-large
	Raw results: {'perplexity': 8.974998474121094}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 8.974998474121094
	Converted score: 78.05557235640035
	Calculated average score: 78.05557235640035
	Created base data_dict with 13 columns
	Added task score: Perplexity = 8.974998474121094
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully converted and added result

	Returning 3 processed results

	Found 3 raw results
	Processing result 1/3: EleutherAI/gpt-neo-1.3B

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: EleutherAI/gpt-neo-1.3B
	Raw results: {'perplexity': 5.9609375}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 5.9609375
	Converted score: 82.1477223263516
	Calculated average score: 82.1477223263516
	Created base data_dict with 13 columns
	Added task score: Perplexity = 5.9609375
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully processed result 1/3: EleutherAI/gpt-neo-1.3B
	Processing result 2/3: openai-community/gpt2

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: openai-community/gpt2
	Raw results: {'perplexity': 20.663532257080078}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 20.663532257080078
	Converted score: 69.7162958010531
	Calculated average score: 69.7162958010531
	Created base data_dict with 13 columns
	Added task score: Perplexity = 20.663532257080078
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully processed result 2/3: openai-community/gpt2
	Processing result 3/3: openai-community/gpt2-large

	=== PROCESSING RESULT TO_DICT ===
	Processing result for model: openai-community/gpt2-large
	Raw results: {'perplexity': 8.974998474121094}
	Model precision: Precision.float16
	Model type: ModelType.PT
	Weight type: WeightType.Original
	Available tasks: ['task0']
	Looking for task: perplexity in results
	Found score for perplexity: 8.974998474121094
	Converted score: 78.05557235640035
	Calculated average score: 78.05557235640035
	Created base data_dict with 13 columns
	Added task score: Perplexity = 8.974998474121094
	Final data dict has 14 columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	=== END PROCESSING RESULT TO_DICT ===
	Successfully processed result 3/3: openai-community/gpt2-large

	Converted to 3 JSON records
	Sample record keys: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']

	Created DataFrame with columns: ['eval_name', 'Precision', 'Type', 'T', 'Weight type', 'Architecture', 'Model', 'Model sha', 'Average ⬆️', 'Available on the hub', 'Hub License', '#Params (B)', 'Hub ❤️', 'Perplexity']
	DataFrame shape: (3, 14)

	Sorted DataFrame by average

	Selected and rounded columns

	Final DataFrame shape after filtering: (3, 12)
	Final columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
	=== FINAL RESULT: DataFrame with 3 rows and 12 columns ===
	get_leaderboard_df returned: <class 'pandas.core.frame.DataFrame'>
	DataFrame shape: (3, 12)
	DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
	DataFrame empty: False
	Final DataFrame for leaderboard - Shape: (3, 12), Columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
	Creating leaderboard component...

	=== Initializing Leaderboard ===
	DataFrame shape: (3, 12)
	DataFrame columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❤️', 'Available on the hub', 'Model sha']
	Leaderboard component created successfully
	Leaderboard refresh successful
	Traceback (most recent call last):
	File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 625, in process_events
	response = await route_utils.call_process_api(
	File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
	output = await app.get_blocks().process_api(
	File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2106, in process_api
	data = await self.postprocess_data(block_fn, result["prediction"], state)
	File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1899, in postprocess_data
	state[block._id] = block.__class__(**kwargs)
	File "/usr/local/lib/python3.10/site-packages/gradio/component_meta.py", line 181, in wrapper
	return fn(self, **kwargs)
	File "/usr/local/lib/python3.10/site-packages/gradio_leaderboard/leaderboard.py", line 126, in __init__
	raise ValueError("Leaderboard component must have a value set.")
	ValueError: Leaderboard component must have a value set.