Spaces:
Running
on
CPU Upgrade
Process Displays Running But Might Be Hanging
My bright idea is to increase logs feedback during this stage because it's time consuming and the user isnt well oriented on what to expect :-)
Screen :
Logs :
2025-06-03 13:22:48.880 | INFO | yourbench.main:run:74 - Running pipeline with config: /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/config.yml
2025-06-03 13:22:48.885 | INFO | yourbench.utils.loading_engine:load_config:81 - Configuration loaded successfully from /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/config.yml
2025-06-03 13:22:48.885 | INFO | yourbench.pipeline.handler:run_pipeline:90 - Debug mode set to False
2025-06-03 13:22:48.890 | INFO | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'ingestion'
/home/user/app/.venv/lib/python3.12/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2025-06-03 13:22:50.403 | INFO | yourbench.pipeline.ingestion:_initialize_markdown_processor:255 - Initializing MarkItDown with LLM support: model='Qwen/Qwen2.5-VL-72B-Instruct'.
2025-06-03 13:22:50.430 | INFO | yourbench.pipeline.ingestion:run:195 - Ingestion stage: Converting files from '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/' to '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested'...
2025-06-03 13:23:23.024 | INFO | yourbench.pipeline.ingestion:_convert_document_to_markdown:303 - Successfully converted '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/esma74-362-2281_final_report_guidelines_emir_refit.pdf' -> '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested/esma74-362-2281_final_report_guidelines_emir_refit.md'.
2025-06-03 13:23:23.024 | SUCCESS | yourbench.pipeline.ingestion:run:210 - Ingestion stage complete: Processed files from '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/' and saved Markdown to '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested'.
2025-06-03 13:23:23.024 | SUCCESS | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'ingestion' in 34.135s
2025-06-03 13:23:23.029 | INFO | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'upload_ingest_to_hub'
2025-06-03 13:23:23.582 | INFO | yourbench.pipeline.upload_ingest_to_hub:run:137 - Using source directory: /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested
2025-06-03 13:23:23.689 | INFO | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|ββββββββββ| 1/1 [00:00<00:00, 244.20ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:01<00:00, 1.28s/it]
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:01<00:00, 1.29s/it]
2025-06-03 13:23:27.040 | SUCCESS | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:23:27.040 | SUCCESS | yourbench.pipeline.upload_ingest_to_hub:run:154 - Successfully completed 'upload_ingest_to_hub' stage.
2025-06-03 13:23:27.040 | SUCCESS | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'upload_ingest_to_hub' in 4.012s
2025-06-03 13:23:27.048 | INFO | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'summarization'
2025-06-03 13:23:27.075 | INFO | yourbench.pipeline.summarization:run:213 - === Summarization v2 β map-reduce ===
2025-06-03 13:23:27.190 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Generating train split: 0%| | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|ββββββββββ| 1/1 [00:00<00:00, 128.84 examples/s]
2025-06-03 13:23:30.734 | INFO | yourbench.pipeline.summarization:run:220 - Loaded 1 documents for summarisation.
2025-06-03 13:23:31.120 | INFO | yourbench.pipeline.summarization:_build_chunk_calls:119 - Prepared 13 chunk-level inference calls.
2025-06-03 13:23:31.121 | INFO | yourbench.utils.inference_engine:_load_models:232 - No models defined in model_roles for step 'summarization_chunk'. Using the first model from model_list: Qwen/Qwen2.5-VL-72B-Instruct
2025-06-03 13:23:31.121 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:23:31.121 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 13 (models=1 x calls=13)
0%| | 0/13 [00:00<?, ?it/s]
8%|β | 1/13 [00:13<02:45, 13.82s/it]
15%|ββ | 2/13 [00:18<01:29, 8.16s/it]
23%|βββ | 3/13 [00:23<01:08, 6.80s/it]
31%|βββ | 4/13 [00:23<00:39, 4.41s/it]
38%|ββββ | 5/13 [00:32<00:47, 5.90s/it]
46%|βββββ | 6/13 [00:37<00:39, 5.59s/it]
54%|ββββββ | 7/13 [00:38<00:24, 4.01s/it]
62%|βββββββ | 8/13 [00:42<00:20, 4.08s/it]
69%|βββββββ | 9/13 [00:50<00:21, 5.26s/it]
77%|ββββββββ | 10/13 [00:55<00:15, 5.15s/it]
85%|βββββββββ | 11/13 [00:55<00:07, 3.62s/it]
92%|ββββββββββ| 12/13 [01:06<00:05, 5.81s/it]
100%|ββββββββββ| 13/13 [01:12<00:00, 6.01s/it]
100%|ββββββββββ| 13/13 [01:12<00:00, 5.59s/it]
2025-06-03 13:24:43.774 | SUCCESS | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:24:43.777 | INFO | yourbench.pipeline.summarization:_build_combine_calls:180 - Prepared 1 reducer calls (0 docs skipped β single / empty chunk).
2025-06-03 13:24:43.777 | INFO | yourbench.utils.inference_engine:_load_models:232 - No models defined in model_roles for step 'summarization_combine'. Using the first model from model_list: Qwen/Qwen2.5-VL-72B-Instruct
2025-06-03 13:24:43.778 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:24:43.778 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 1 (models=1 x calls=1)
0%| | 0/1 [00:00<?, ?it/s]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.04s/it]
100%|ββββββββββ| 1/1 [00:02<00:00, 2.04s/it]
2025-06-03 13:24:45.823 | SUCCESS | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:24:46.027 | INFO | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|ββββββββββ| 1/1 [00:00<00:00, 219.80ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.00it/s]
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.00it/s]
2025-06-03 13:24:48.837 | SUCCESS | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:24:48.837 | SUCCESS | yourbench.pipeline.summarization:run:253 - Hierarchical summarisation completed (1 documents).
2025-06-03 13:24:48.838 | SUCCESS | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'summarization' in 81.789s
2025-06-03 13:24:48.842 | INFO | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'chunking'
2025-06-03 13:24:50.618 | INFO | yourbench.pipeline.chunking:<module>:73 - PyTorch is available.
2025-06-03 13:24:51.747 | INFO | yourbench.pipeline.chunking:<module>:95 - Transformers library is available.
2025-06-03 13:24:51.747 | INFO | yourbench.pipeline.chunking:<module>:111 - Could not load perplexity metric from 'evaluate'. Skipping perplexity. Error: No module named 'evaluate'
2025-06-03 13:24:51.748 | INFO | yourbench.pipeline.chunking:<module>:122 - Package 'textstat' not installed. Readability metrics will be skipped.
2025-06-03 13:24:51.750 | INFO | yourbench.pipeline.chunking:run:202 - Starting chunking stage...
2025-06-03 13:24:51.857 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Generating train split: 0%| | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|ββββββββββ| 1/1 [00:00<00:00, 125.57 examples/s]
2025-06-03 13:24:54.390 | INFO | yourbench.pipeline.chunking:run:206 - Loaded summarized subset with 1 rows for chunking.
2025-06-03 13:24:54.391 | INFO | yourbench.pipeline.chunking:run:266 - Using fast_chunking mode: purely length-based chunking with no embeddings.
2025-06-03 13:24:54.391 | INFO | yourbench.pipeline.chunking:run:277 - Starting chunking process for 1 documents
Chunking documents: 0%| | 0/1 [00:00<?, ?it/s]2025-06-03 13:24:54.392 | INFO | yourbench.pipeline.chunking:run:281 - [1/1] Processing document ID=6be8c4d6-1dfc-42fa-82d7-99a4052e8760 (740608 chars)
2025-06-03 13:24:54.392 | INFO | yourbench.pipeline.chunking:run:286 - [0] doc_id=6be8c4d6-1dfc-42fa-82d7-99a4052e8760 | text_len=740608 | preview='Final Report\nhttps://sherpa.esma.europa.eu/sites/\nGuidelines for reporting under EMIR\nMKT/MDP/Groups'
2025-06-03 13:24:54.392 | INFO | yourbench.pipeline.chunking:run:305 - Progress: 100.0% | Completed 1/1 documents
2025-06-03 13:24:54.392 | INFO | yourbench.pipeline.chunking:run:306 - Avg time per doc: 0.00s | Est. remaining: 0.0 minutes
2025-06-03 13:24:54.403 | INFO | yourbench.pipeline.chunking:run:363 - [6be8c4d6-1dfc-42fa-82d7-99a4052e8760] Performing fast_chunking on 18458 sentences (l_max_tokens=512)
2025-06-03 13:24:54.515 | INFO | yourbench.pipeline.chunking:_multihop_chunking:627 - Starting multi-hop chunking, total single chunks: 403
2025-06-03 13:24:54.515 | WARNING | yourbench.pipeline.chunking:_multihop_chunking:648 - Target 403 is too high for given sample size and effective_h_max
2025-06-03 13:24:54.515 | INFO | yourbench.pipeline.chunking:_multihop_chunking:651 - Targeting ~80 multi-hop chunks, effective h_max: 5, h_min: 2
2025-06-03 13:24:54.516 | INFO | yourbench.pipeline.chunking:_multihop_chunking:672 - Generated 80 unique index combinations.
2025-06-03 13:24:54.516 | INFO | yourbench.pipeline.chunking:_multihop_chunking:688 - Created 80 multi-hop chunks.
Chunking documents: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.86it/s]
Chunking documents: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.85it/s]
2025-06-03 13:24:54.704 | INFO | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|ββββββββββ| 1/1 [00:00<00:00, 132.05ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.05it/s]
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.05it/s]
2025-06-03 13:24:56.777 | SUCCESS | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:24:56.777 | SUCCESS | yourbench.pipeline.chunking:run:411 - Chunking stage completed successfully.
2025-06-03 13:24:56.778 | SUCCESS | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'chunking' in 7.936s
2025-06-03 13:24:56.782 | INFO | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'single_shot_question_generation'
2025-06-03 13:24:56.889 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Generating train split: 0%| | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|ββββββββββ| 1/1 [00:00<00:00, 98.50 examples/s]
2025-06-03 13:24:59.839 | INFO | yourbench.pipeline.single_shot_question_generation:run:117 - Loaded chunked subset with 1 rows for Single-shot question generation.
2025-06-03 13:24:59.854 | INFO | yourbench.pipeline.single_shot_question_generation:_execute_inference:255 - Sending 5 calls to inference for single-shot question generation.
2025-06-03 13:24:59.854 | INFO | yourbench.utils.inference_engine:_load_models:241 - Found 1 models in config for step 'single_shot_question_generation': ['Qwen/Qwen2.5-72B-Instruct']
2025-06-03 13:24:59.856 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:24:59.856 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 5 (models=1 x calls=5)
0%| | 0/5 [00:00<?, ?it/s]
20%|ββ | 1/5 [00:17<01:10, 17.54s/it]
40%|ββββ | 2/5 [00:18<00:22, 7.66s/it]
60%|ββββββ | 3/5 [00:18<00:08, 4.31s/it]
80%|ββββββββ | 4/5 [00:22<00:04, 4.00s/it]
100%|ββββββββββ| 5/5 [00:22<00:00, 2.82s/it]
100%|ββββββββββ| 5/5 [00:22<00:00, 4.57s/it]
2025-06-03 13:25:22.715 | SUCCESS | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:25:22.717 | INFO | yourbench.pipeline.single_shot_question_generation:_process_responses_and_build_dataset:279 - Processing 5 responses from model: Qwen/Qwen2.5-72B-Instruct
2025-06-03 13:25:22.718 | INFO | yourbench.pipeline.single_shot_question_generation:_process_responses_and_build_dataset:338 - Constructing final dataset with 28 single-hop questions.
2025-06-03 13:25:22.880 | INFO | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|ββββββββββ| 1/1 [00:00<00:00, 583.76ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.54it/s]
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.54it/s]
2025-06-03 13:25:24.901 | SUCCESS | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:25:24.902 | SUCCESS | yourbench.pipeline.single_shot_question_generation:run:134 - Single-shot question generation completed successfully.
2025-06-03 13:25:24.902 | SUCCESS | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'single_shot_question_generation' in 28.120s
2025-06-03 13:25:24.906 | INFO | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'multi_hop_question_generation'
2025-06-03 13:25:25.027 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Generating train split: 0%| | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|ββββββββββ| 1/1 [00:00<00:00, 109.85 examples/s]
2025-06-03 13:25:28.307 | INFO | yourbench.pipeline.multi_hop_question_generation:run:145 - Loaded chunked subset with 1 rows for Multi-hop question generation.
2025-06-03 13:25:28.317 | INFO | yourbench.pipeline.multi_hop_question_generation:_multihop_qa_generation:268 - Sending 24 multi-hop calls to inference...
2025-06-03 13:25:28.317 | INFO | yourbench.utils.inference_engine:_load_models:241 - Found 1 models in config for step 'multi_hop_question_generation': ['Qwen/Qwen2.5-72B-Instruct']
2025-06-03 13:25:28.318 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:25:28.318 | INFO | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 24 (models=1 x calls=24)
0%| | 0/24 [00:00<?, ?it/s]
4%|β | 1/24 [00:24<09:29, 24.78s/it]
8%|β | 2/24 [00:24<03:46, 10.29s/it]
12%|ββ | 3/24 [00:28<02:28, 7.09s/it]
17%|ββ | 4/24 [00:29<01:33, 4.66s/it]
21%|ββ | 5/24 [00:29<00:58, 3.08s/it]
25%|βββ | 6/24 [00:30<00:43, 2.41s/it]
29%|βββ | 7/24 [00:31<00:31, 1.87s/it]
33%|ββββ | 8/24 [00:31<00:22, 1.40s/it]
38%|ββββ | 9/24 [00:32<00:16, 1.12s/it]
42%|βββββ | 10/24 [00:32<00:13, 1.04it/s]
46%|βββββ | 11/24 [00:32<00:09, 1.41it/s]
50%|βββββ | 12/24 [00:33<00:08, 1.49it/s]
54%|ββββββ | 13/24 [00:33<00:06, 1.70it/s]
58%|ββββββ | 14/24 [00:34<00:05, 1.72it/s]
62%|βββββββ | 15/24 [00:35<00:05, 1.69it/s]
71%|βββββββ | 17/24 [00:36<00:04, 1.71it/s]
75%|ββββββββ | 18/24 [00:36<00:02, 2.11it/s]
79%|ββββββββ | 19/24 [00:36<00:02, 2.02it/s]
83%|βββββββββ | 20/24 [00:37<00:01, 2.02it/s]
88%|βββββββββ | 21/24 [00:38<00:01, 1.75it/s]
92%|ββββββββββ| 22/24 [00:40<00:02, 1.03s/it]
96%|ββββββββββ| 23/24 [00:41<00:01, 1.07s/it]
100%|ββββββββββ| 24/24 [00:42<00:00, 1.12s/it]
100%|ββββββββββ| 24/24 [00:42<00:00, 1.78s/it]
2025-06-03 13:26:11.112 | SUCCESS | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:26:11.115 | INFO | yourbench.pipeline.multi_hop_question_generation:_parse_and_build_final:288 - Processing 24 responses for model: Qwen/Qwen2.5-72B-Instruct
2025-06-03 13:26:11.119 | INFO | yourbench.pipeline.multi_hop_question_generation:_parse_and_build_final:340 - Constructing multi-hop question dataset with 102 rows...
2025-06-03 13:26:11.227 | INFO | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Uploading the dataset shards: 0%| | 0/1 [00:00<?, ?it/s]
Creating parquet from Arrow format: 0%| | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|ββββββββββ| 1/1 [00:00<00:00, 652.10ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.52it/s]
Uploading the dataset shards: 100%|ββββββββββ| 1/1 [00:00<00:00, 1.52it/s]
2025-06-03 13:26:13.141 | SUCCESS | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:26:13.141 | SUCCESS | yourbench.pipeline.multi_hop_question_generation:run:164 - Multi-hop question generation completed successfully.
2025-06-03 13:26:13.141 | SUCCESS | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'multi_hop_question_generation' in 48.235s
2025-06-03 13:26:13.146 | INFO | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'lighteval'
2025-06-03 13:26:13.146 | INFO | yourbench.pipeline.lighteval:run:88 - Saving lighteval compatible dataset
2025-06-03 13:26:13.258 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Generating train split: 0%| | 0/28 [00:00<?, ? examples/s]
Generating train split: 100%|ββββββββββ| 28/28 [00:00<00:00, 6078.07 examples/s]
2025-06-03 13:26:16.564 | INFO | yourbench.pipeline.lighteval:run:95 - Loaded single-shot Q subset single_shot_questions with 28 rows.
2025-06-03 13:26:16.666 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Generating train split: 0%| | 0/102 [00:00<?, ? examples/s]
Generating train split: 100%|ββββββββββ| 102/102 [00:00<00:00, 8422.30 examples/s]
2025-06-03 13:26:19.628 | INFO | yourbench.pipeline.lighteval:run:102 - Loaded multi-hop Q subset multi_hop_subset with 102 rows.
2025-06-03 13:26:19.754 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
Generating train split: 0%| | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|ββββββββββ| 1/1 [00:00<00:00, 73.73 examples/s]
2025-06-03 13:26:22.281 | INFO | yourbench.pipeline.lighteval:run:109 - Loaded chunked subset with 1 rows.
2025-06-03 13:26:22.368 | INFO | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
So i refreshed and "lost my progress" , but it seems like if i add the dataset name in the first page i can click on the leaderboard task in the third page :-)
dataset : https://huggingface.co/datasets/Tonic/ESMA-Auto-Bench
lighteval still running :
Logs are not displayed , which is too bad because it's the same problem as before : not sure if it's hanging with no feedback :-)
Leaderboard is published :-) https://huggingface.co/spaces/Tonic/leaderboard_yourbench_Tonic_ESMA-Auto-Bench
I had to "solve a 500 for some users" (me) , by setting ssr_mode=False
hope this helps !
going to make a fork for auto code bench now :-)