yourbench/advanced · Process Displays Running But Might Be Hanging

My bright idea is to increase logs feedback during this stage because it's time consuming and the user isnt well oriented on what to expect :-)
Screen :
Logs :
2025-06-03 13:22:48.880 | INFO     | yourbench.main:run:74 - Running pipeline with config: /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/config.yml
2025-06-03 13:22:48.885 | INFO     | yourbench.utils.loading_engine:load_config:81 - Configuration loaded successfully from /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/config.yml
2025-06-03 13:22:48.885 | INFO     | yourbench.pipeline.handler:run_pipeline:90 - Debug mode set to False
2025-06-03 13:22:48.890 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'ingestion'
/home/user/app/.venv/lib/python3.12/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2025-06-03 13:22:50.403 | INFO     | yourbench.pipeline.ingestion:_initialize_markdown_processor:255 - Initializing MarkItDown with LLM support: model='Qwen/Qwen2.5-VL-72B-Instruct'.
2025-06-03 13:22:50.430 | INFO     | yourbench.pipeline.ingestion:run:195 - Ingestion stage: Converting files from '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/' to '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested'...
2025-06-03 13:23:23.024 | INFO     | yourbench.pipeline.ingestion:_convert_document_to_markdown:303 - Successfully converted '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/esma74-362-2281_final_report_guidelines_emir_refit.pdf' -> '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested/esma74-362-2281_final_report_guidelines_emir_refit.md'.
2025-06-03 13:23:23.024 | SUCCESS  | yourbench.pipeline.ingestion:run:210 - Ingestion stage complete: Processed files from '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/' and saved Markdown to '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested'.
2025-06-03 13:23:23.024 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'ingestion' in 34.135s
2025-06-03 13:23:23.029 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'upload_ingest_to_hub'
2025-06-03 13:23:23.582 | INFO     | yourbench.pipeline.upload_ingest_to_hub:run:137 - Using source directory: /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested
2025-06-03 13:23:23.689 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 244.20ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|██████████| 1/1 [00:01<00:00,  1.28s/it]
Uploading the dataset shards: 100%|██████████| 1/1 [00:01<00:00,  1.29s/it]
2025-06-03 13:23:27.040 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:23:27.040 | SUCCESS  | yourbench.pipeline.upload_ingest_to_hub:run:154 - Successfully completed 'upload_ingest_to_hub' stage.
2025-06-03 13:23:27.040 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'upload_ingest_to_hub' in 4.012s
2025-06-03 13:23:27.048 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'summarization'
2025-06-03 13:23:27.075 | INFO     | yourbench.pipeline.summarization:run:213 - === Summarization v2 – map-reduce ===
2025-06-03 13:23:27.190 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 1/1 [00:00<00:00, 128.84 examples/s]
2025-06-03 13:23:30.734 | INFO     | yourbench.pipeline.summarization:run:220 - Loaded 1 documents for summarisation.
2025-06-03 13:23:31.120 | INFO     | yourbench.pipeline.summarization:_build_chunk_calls:119 - Prepared 13 chunk-level inference calls.
2025-06-03 13:23:31.121 | INFO     | yourbench.utils.inference_engine:_load_models:232 - No models defined in model_roles for step 'summarization_chunk'. Using the first model from model_list: Qwen/Qwen2.5-VL-72B-Instruct
2025-06-03 13:23:31.121 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:23:31.121 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 13  (models=1  x  calls=13)

  0%|          | 0/13 [00:00<?, ?it/s]
  8%|▊         | 1/13 [00:13<02:45, 13.82s/it]
 15%|█▌        | 2/13 [00:18<01:29,  8.16s/it]
 23%|██▎       | 3/13 [00:23<01:08,  6.80s/it]
 31%|███       | 4/13 [00:23<00:39,  4.41s/it]
 38%|███▊      | 5/13 [00:32<00:47,  5.90s/it]
 46%|████▌     | 6/13 [00:37<00:39,  5.59s/it]
 54%|█████▍    | 7/13 [00:38<00:24,  4.01s/it]
 62%|██████▏   | 8/13 [00:42<00:20,  4.08s/it]
 69%|██████▉   | 9/13 [00:50<00:21,  5.26s/it]
 77%|███████▋  | 10/13 [00:55<00:15,  5.15s/it]
 85%|████████▍ | 11/13 [00:55<00:07,  3.62s/it]
 92%|█████████▏| 12/13 [01:06<00:05,  5.81s/it]
100%|██████████| 13/13 [01:12<00:00,  6.01s/it]
100%|██████████| 13/13 [01:12<00:00,  5.59s/it]
2025-06-03 13:24:43.774 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:24:43.777 | INFO     | yourbench.pipeline.summarization:_build_combine_calls:180 - Prepared 1 reducer calls (0 docs skipped – single / empty chunk).
2025-06-03 13:24:43.777 | INFO     | yourbench.utils.inference_engine:_load_models:232 - No models defined in model_roles for step 'summarization_combine'. Using the first model from model_list: Qwen/Qwen2.5-VL-72B-Instruct
2025-06-03 13:24:43.778 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:24:43.778 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 1  (models=1  x  calls=1)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:02<00:00,  2.04s/it]
100%|██████████| 1/1 [00:02<00:00,  2.04s/it]
2025-06-03 13:24:45.823 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:24:46.027 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 219.80ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.00it/s]
Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.00it/s]
2025-06-03 13:24:48.837 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:24:48.837 | SUCCESS  | yourbench.pipeline.summarization:run:253 - Hierarchical summarisation completed (1 documents).
2025-06-03 13:24:48.838 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'summarization' in 81.789s
2025-06-03 13:24:48.842 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'chunking'
2025-06-03 13:24:50.618 | INFO     | yourbench.pipeline.chunking:<module>:73 - PyTorch is available.
2025-06-03 13:24:51.747 | INFO     | yourbench.pipeline.chunking:<module>:95 - Transformers library is available.
2025-06-03 13:24:51.747 | INFO     | yourbench.pipeline.chunking:<module>:111 - Could not load perplexity metric from 'evaluate'. Skipping perplexity. Error: No module named 'evaluate'
2025-06-03 13:24:51.748 | INFO     | yourbench.pipeline.chunking:<module>:122 - Package 'textstat' not installed. Readability metrics will be skipped.
2025-06-03 13:24:51.750 | INFO     | yourbench.pipeline.chunking:run:202 - Starting chunking stage...
2025-06-03 13:24:51.857 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 1/1 [00:00<00:00, 125.57 examples/s]
2025-06-03 13:24:54.390 | INFO     | yourbench.pipeline.chunking:run:206 - Loaded summarized subset with 1 rows for chunking.
2025-06-03 13:24:54.391 | INFO     | yourbench.pipeline.chunking:run:266 - Using fast_chunking mode: purely length-based chunking with no embeddings.
2025-06-03 13:24:54.391 | INFO     | yourbench.pipeline.chunking:run:277 - Starting chunking process for 1 documents

Chunking documents:   0%|                                                     | 0/1 [00:00<?, ?it/s]2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:281 - [1/1] Processing document ID=6be8c4d6-1dfc-42fa-82d7-99a4052e8760 (740608 chars)
2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:286 - [0] doc_id=6be8c4d6-1dfc-42fa-82d7-99a4052e8760 | text_len=740608 | preview='Final Report\nhttps://sherpa.esma.europa.eu/sites/\nGuidelines for reporting under EMIR\nMKT/MDP/Groups'
2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:305 - Progress: 100.0% | Completed 1/1 documents
2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:306 - Avg time per doc: 0.00s | Est. remaining: 0.0 minutes
2025-06-03 13:24:54.403 | INFO     | yourbench.pipeline.chunking:run:363 - [6be8c4d6-1dfc-42fa-82d7-99a4052e8760] Performing fast_chunking on 18458 sentences (l_max_tokens=512)
2025-06-03 13:24:54.515 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:627 - Starting multi-hop chunking, total single chunks: 403
2025-06-03 13:24:54.515 | WARNING  | yourbench.pipeline.chunking:_multihop_chunking:648 - Target 403 is too high for given sample size and effective_h_max
2025-06-03 13:24:54.515 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:651 - Targeting ~80 multi-hop chunks, effective h_max: 5, h_min: 2
2025-06-03 13:24:54.516 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:672 - Generated 80 unique index combinations.
2025-06-03 13:24:54.516 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:688 - Created 80 multi-hop chunks.

Chunking documents: 100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  5.86it/s]
Chunking documents: 100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  5.85it/s]
2025-06-03 13:24:54.704 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 132.05ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.05it/s]
Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.05it/s]
2025-06-03 13:24:56.777 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:24:56.777 | SUCCESS  | yourbench.pipeline.chunking:run:411 - Chunking stage completed successfully.
2025-06-03 13:24:56.778 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'chunking' in 7.936s
2025-06-03 13:24:56.782 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'single_shot_question_generation'
2025-06-03 13:24:56.889 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 1/1 [00:00<00:00, 98.50 examples/s]
2025-06-03 13:24:59.839 | INFO     | yourbench.pipeline.single_shot_question_generation:run:117 - Loaded chunked subset with 1 rows for Single-shot question generation.
2025-06-03 13:24:59.854 | INFO     | yourbench.pipeline.single_shot_question_generation:_execute_inference:255 - Sending 5 calls to inference for single-shot question generation.
2025-06-03 13:24:59.854 | INFO     | yourbench.utils.inference_engine:_load_models:241 - Found 1 models in config for step 'single_shot_question_generation': ['Qwen/Qwen2.5-72B-Instruct']
2025-06-03 13:24:59.856 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:24:59.856 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 5  (models=1  x  calls=5)

  0%|          | 0/5 [00:00<?, ?it/s]
 20%|██        | 1/5 [00:17<01:10, 17.54s/it]
 40%|████      | 2/5 [00:18<00:22,  7.66s/it]
 60%|██████    | 3/5 [00:18<00:08,  4.31s/it]
 80%|████████  | 4/5 [00:22<00:04,  4.00s/it]
100%|██████████| 5/5 [00:22<00:00,  2.82s/it]
100%|██████████| 5/5 [00:22<00:00,  4.57s/it]
2025-06-03 13:25:22.715 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:25:22.717 | INFO     | yourbench.pipeline.single_shot_question_generation:_process_responses_and_build_dataset:279 - Processing 5 responses from model: Qwen/Qwen2.5-72B-Instruct
2025-06-03 13:25:22.718 | INFO     | yourbench.pipeline.single_shot_question_generation:_process_responses_and_build_dataset:338 - Constructing final dataset with 28 single-hop questions.
2025-06-03 13:25:22.880 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 583.76ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.54it/s]
Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.54it/s]
2025-06-03 13:25:24.901 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:25:24.902 | SUCCESS  | yourbench.pipeline.single_shot_question_generation:run:134 - Single-shot question generation completed successfully.
2025-06-03 13:25:24.902 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'single_shot_question_generation' in 28.120s
2025-06-03 13:25:24.906 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'multi_hop_question_generation'
2025-06-03 13:25:25.027 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 1/1 [00:00<00:00, 109.85 examples/s]
2025-06-03 13:25:28.307 | INFO     | yourbench.pipeline.multi_hop_question_generation:run:145 - Loaded chunked subset with 1 rows for Multi-hop question generation.
2025-06-03 13:25:28.317 | INFO     | yourbench.pipeline.multi_hop_question_generation:_multihop_qa_generation:268 - Sending 24 multi-hop calls to inference...
2025-06-03 13:25:28.317 | INFO     | yourbench.utils.inference_engine:_load_models:241 - Found 1 models in config for step 'multi_hop_question_generation': ['Qwen/Qwen2.5-72B-Instruct']
2025-06-03 13:25:28.318 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:25:28.318 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 24  (models=1  x  calls=24)

  0%|          | 0/24 [00:00<?, ?it/s]
  4%|▍         | 1/24 [00:24<09:29, 24.78s/it]
  8%|▊         | 2/24 [00:24<03:46, 10.29s/it]
 12%|█▎        | 3/24 [00:28<02:28,  7.09s/it]
 17%|█▋        | 4/24 [00:29<01:33,  4.66s/it]
 21%|██        | 5/24 [00:29<00:58,  3.08s/it]
 25%|██▌       | 6/24 [00:30<00:43,  2.41s/it]
 29%|██▉       | 7/24 [00:31<00:31,  1.87s/it]
 33%|███▎      | 8/24 [00:31<00:22,  1.40s/it]
 38%|███▊      | 9/24 [00:32<00:16,  1.12s/it]
 42%|████▏     | 10/24 [00:32<00:13,  1.04it/s]
 46%|████▌     | 11/24 [00:32<00:09,  1.41it/s]
 50%|█████     | 12/24 [00:33<00:08,  1.49it/s]
 54%|█████▍    | 13/24 [00:33<00:06,  1.70it/s]
 58%|█████▊    | 14/24 [00:34<00:05,  1.72it/s]
 62%|██████▎   | 15/24 [00:35<00:05,  1.69it/s]
 71%|███████   | 17/24 [00:36<00:04,  1.71it/s]
 75%|███████▌  | 18/24 [00:36<00:02,  2.11it/s]
 79%|███████▉  | 19/24 [00:36<00:02,  2.02it/s]
 83%|████████▎ | 20/24 [00:37<00:01,  2.02it/s]
 88%|████████▊ | 21/24 [00:38<00:01,  1.75it/s]
 92%|█████████▏| 22/24 [00:40<00:02,  1.03s/it]
 96%|█████████▌| 23/24 [00:41<00:01,  1.07s/it]
100%|██████████| 24/24 [00:42<00:00,  1.12s/it]
100%|██████████| 24/24 [00:42<00:00,  1.78s/it]
2025-06-03 13:26:11.112 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:26:11.115 | INFO     | yourbench.pipeline.multi_hop_question_generation:_parse_and_build_final:288 - Processing 24 responses for model: Qwen/Qwen2.5-72B-Instruct
2025-06-03 13:26:11.119 | INFO     | yourbench.pipeline.multi_hop_question_generation:_parse_and_build_final:340 - Constructing multi-hop question dataset with 102 rows...
2025-06-03 13:26:11.227 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s][A
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 652.10ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.52it/s]
Uploading the dataset shards: 100%|██████████| 1/1 [00:00<00:00,  1.52it/s]
2025-06-03 13:26:13.141 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:26:13.141 | SUCCESS  | yourbench.pipeline.multi_hop_question_generation:run:164 - Multi-hop question generation completed successfully.
2025-06-03 13:26:13.141 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'multi_hop_question_generation' in 48.235s
2025-06-03 13:26:13.146 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'lighteval'
2025-06-03 13:26:13.146 | INFO     | yourbench.pipeline.lighteval:run:88 - Saving lighteval compatible dataset
2025-06-03 13:26:13.258 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/28 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 28/28 [00:00<00:00, 6078.07 examples/s]
2025-06-03 13:26:16.564 | INFO     | yourbench.pipeline.lighteval:run:95 - Loaded single-shot Q subset single_shot_questions with 28 rows.
2025-06-03 13:26:16.666 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/102 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 102/102 [00:00<00:00, 8422.30 examples/s]
2025-06-03 13:26:19.628 | INFO     | yourbench.pipeline.lighteval:run:102 - Loaded multi-hop Q subset multi_hop_subset with 102 rows.
2025-06-03 13:26:19.754 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|██████████| 1/1 [00:00<00:00, 73.73 examples/s]
2025-06-03 13:26:22.281 | INFO     | yourbench.pipeline.lighteval:run:109 - Loaded chunked subset with 1 rows.
2025-06-03 13:26:22.368 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'