Spaces:
yourbench
/
Running on CPU Upgrade

Process Displays Running But Might Be Hanging

#2
by Tonic - opened

My bright idea is to increase logs feedback during this stage because it's time consuming and the user isnt well oriented on what to expect :-)

Screen :

image.png

Logs :

2025-06-03 13:22:48.880 | INFO     | yourbench.main:run:74 - Running pipeline with config: /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/config.yml
2025-06-03 13:22:48.885 | INFO     | yourbench.utils.loading_engine:load_config:81 - Configuration loaded successfully from /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/config.yml
2025-06-03 13:22:48.885 | INFO     | yourbench.pipeline.handler:run_pipeline:90 - Debug mode set to False
2025-06-03 13:22:48.890 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'ingestion'
/home/user/app/.venv/lib/python3.12/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2025-06-03 13:22:50.403 | INFO     | yourbench.pipeline.ingestion:_initialize_markdown_processor:255 - Initializing MarkItDown with LLM support: model='Qwen/Qwen2.5-VL-72B-Instruct'.
2025-06-03 13:22:50.430 | INFO     | yourbench.pipeline.ingestion:run:195 - Ingestion stage: Converting files from '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/' to '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested'...
2025-06-03 13:23:23.024 | INFO     | yourbench.pipeline.ingestion:_convert_document_to_markdown:303 - Successfully converted '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/esma74-362-2281_final_report_guidelines_emir_refit.pdf' -> '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested/esma74-362-2281_final_report_guidelines_emir_refit.md'.
2025-06-03 13:23:23.024 | SUCCESS  | yourbench.pipeline.ingestion:run:210 - Ingestion stage complete: Processed files from '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/uploaded_files/' and saved Markdown to '/home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested'.
2025-06-03 13:23:23.024 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'ingestion' in 34.135s
2025-06-03 13:23:23.029 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'upload_ingest_to_hub'
2025-06-03 13:23:23.582 | INFO     | yourbench.pipeline.upload_ingest_to_hub:run:137 - Using source directory: /home/user/app/94d74d5e-e8d7-4b90-8443-283fa5a3515b/ingested
2025-06-03 13:23:23.689 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 244.20ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00,  1.28s/it]
Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00,  1.29s/it]
2025-06-03 13:23:27.040 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:23:27.040 | SUCCESS  | yourbench.pipeline.upload_ingest_to_hub:run:154 - Successfully completed 'upload_ingest_to_hub' stage.
2025-06-03 13:23:27.040 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'upload_ingest_to_hub' in 4.012s
2025-06-03 13:23:27.048 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'summarization'
2025-06-03 13:23:27.075 | INFO     | yourbench.pipeline.summarization:run:213 - === Summarization v2 – map-reduce ===
2025-06-03 13:23:27.190 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 128.84 examples/s]
2025-06-03 13:23:30.734 | INFO     | yourbench.pipeline.summarization:run:220 - Loaded 1 documents for summarisation.
2025-06-03 13:23:31.120 | INFO     | yourbench.pipeline.summarization:_build_chunk_calls:119 - Prepared 13 chunk-level inference calls.
2025-06-03 13:23:31.121 | INFO     | yourbench.utils.inference_engine:_load_models:232 - No models defined in model_roles for step 'summarization_chunk'. Using the first model from model_list: Qwen/Qwen2.5-VL-72B-Instruct
2025-06-03 13:23:31.121 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:23:31.121 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 13  (models=1  x  calls=13)

  0%|          | 0/13 [00:00<?, ?it/s]
  8%|β–Š         | 1/13 [00:13<02:45, 13.82s/it]
 15%|β–ˆβ–Œ        | 2/13 [00:18<01:29,  8.16s/it]
 23%|β–ˆβ–ˆβ–Ž       | 3/13 [00:23<01:08,  6.80s/it]
 31%|β–ˆβ–ˆβ–ˆ       | 4/13 [00:23<00:39,  4.41s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 5/13 [00:32<00:47,  5.90s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 6/13 [00:37<00:39,  5.59s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 7/13 [00:38<00:24,  4.01s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 8/13 [00:42<00:20,  4.08s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 9/13 [00:50<00:21,  5.26s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 10/13 [00:55<00:15,  5.15s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 11/13 [00:55<00:07,  3.62s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 12/13 [01:06<00:05,  5.81s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13/13 [01:12<00:00,  6.01s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13/13 [01:12<00:00,  5.59s/it]
2025-06-03 13:24:43.774 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:24:43.777 | INFO     | yourbench.pipeline.summarization:_build_combine_calls:180 - Prepared 1 reducer calls (0 docs skipped – single / empty chunk).
2025-06-03 13:24:43.777 | INFO     | yourbench.utils.inference_engine:_load_models:232 - No models defined in model_roles for step 'summarization_combine'. Using the first model from model_list: Qwen/Qwen2.5-VL-72B-Instruct
2025-06-03 13:24:43.778 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:24:43.778 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 1  (models=1  x  calls=1)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:02<00:00,  2.04s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:02<00:00,  2.04s/it]
2025-06-03 13:24:45.823 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:24:46.027 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 219.80ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.00it/s]
Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.00it/s]
2025-06-03 13:24:48.837 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:24:48.837 | SUCCESS  | yourbench.pipeline.summarization:run:253 - Hierarchical summarisation completed (1 documents).
2025-06-03 13:24:48.838 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'summarization' in 81.789s
2025-06-03 13:24:48.842 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'chunking'
2025-06-03 13:24:50.618 | INFO     | yourbench.pipeline.chunking:<module>:73 - PyTorch is available.
2025-06-03 13:24:51.747 | INFO     | yourbench.pipeline.chunking:<module>:95 - Transformers library is available.
2025-06-03 13:24:51.747 | INFO     | yourbench.pipeline.chunking:<module>:111 - Could not load perplexity metric from 'evaluate'. Skipping perplexity. Error: No module named 'evaluate'
2025-06-03 13:24:51.748 | INFO     | yourbench.pipeline.chunking:<module>:122 - Package 'textstat' not installed. Readability metrics will be skipped.
2025-06-03 13:24:51.750 | INFO     | yourbench.pipeline.chunking:run:202 - Starting chunking stage...
2025-06-03 13:24:51.857 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 125.57 examples/s]
2025-06-03 13:24:54.390 | INFO     | yourbench.pipeline.chunking:run:206 - Loaded summarized subset with 1 rows for chunking.
2025-06-03 13:24:54.391 | INFO     | yourbench.pipeline.chunking:run:266 - Using fast_chunking mode: purely length-based chunking with no embeddings.
2025-06-03 13:24:54.391 | INFO     | yourbench.pipeline.chunking:run:277 - Starting chunking process for 1 documents

Chunking documents:   0%|                                                     | 0/1 [00:00<?, ?it/s]2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:281 - [1/1] Processing document ID=6be8c4d6-1dfc-42fa-82d7-99a4052e8760 (740608 chars)
2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:286 - [0] doc_id=6be8c4d6-1dfc-42fa-82d7-99a4052e8760 | text_len=740608 | preview='Final Report\nhttps://sherpa.esma.europa.eu/sites/\nGuidelines for reporting under EMIR\nMKT/MDP/Groups'
2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:305 - Progress: 100.0% | Completed 1/1 documents
2025-06-03 13:24:54.392 | INFO     | yourbench.pipeline.chunking:run:306 - Avg time per doc: 0.00s | Est. remaining: 0.0 minutes
2025-06-03 13:24:54.403 | INFO     | yourbench.pipeline.chunking:run:363 - [6be8c4d6-1dfc-42fa-82d7-99a4052e8760] Performing fast_chunking on 18458 sentences (l_max_tokens=512)
2025-06-03 13:24:54.515 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:627 - Starting multi-hop chunking, total single chunks: 403
2025-06-03 13:24:54.515 | WARNING  | yourbench.pipeline.chunking:_multihop_chunking:648 - Target 403 is too high for given sample size and effective_h_max
2025-06-03 13:24:54.515 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:651 - Targeting ~80 multi-hop chunks, effective h_max: 5, h_min: 2
2025-06-03 13:24:54.516 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:672 - Generated 80 unique index combinations.
2025-06-03 13:24:54.516 | INFO     | yourbench.pipeline.chunking:_multihop_chunking:688 - Created 80 multi-hop chunks.

Chunking documents: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  5.86it/s]
Chunking documents: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  5.85it/s]
2025-06-03 13:24:54.704 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 132.05ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.05it/s]
Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.05it/s]
2025-06-03 13:24:56.777 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:24:56.777 | SUCCESS  | yourbench.pipeline.chunking:run:411 - Chunking stage completed successfully.
2025-06-03 13:24:56.778 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'chunking' in 7.936s
2025-06-03 13:24:56.782 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'single_shot_question_generation'
2025-06-03 13:24:56.889 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 98.50 examples/s]
2025-06-03 13:24:59.839 | INFO     | yourbench.pipeline.single_shot_question_generation:run:117 - Loaded chunked subset with 1 rows for Single-shot question generation.
2025-06-03 13:24:59.854 | INFO     | yourbench.pipeline.single_shot_question_generation:_execute_inference:255 - Sending 5 calls to inference for single-shot question generation.
2025-06-03 13:24:59.854 | INFO     | yourbench.utils.inference_engine:_load_models:241 - Found 1 models in config for step 'single_shot_question_generation': ['Qwen/Qwen2.5-72B-Instruct']
2025-06-03 13:24:59.856 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:24:59.856 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 5  (models=1  x  calls=5)

  0%|          | 0/5 [00:00<?, ?it/s]
 20%|β–ˆβ–ˆ        | 1/5 [00:17<01:10, 17.54s/it]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 2/5 [00:18<00:22,  7.66s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 3/5 [00:18<00:08,  4.31s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 4/5 [00:22<00:04,  4.00s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:22<00:00,  2.82s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:22<00:00,  4.57s/it]
2025-06-03 13:25:22.715 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:25:22.717 | INFO     | yourbench.pipeline.single_shot_question_generation:_process_responses_and_build_dataset:279 - Processing 5 responses from model: Qwen/Qwen2.5-72B-Instruct
2025-06-03 13:25:22.718 | INFO     | yourbench.pipeline.single_shot_question_generation:_process_responses_and_build_dataset:338 - Constructing final dataset with 28 single-hop questions.
2025-06-03 13:25:22.880 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 583.76ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.54it/s]
Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.54it/s]
2025-06-03 13:25:24.901 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:25:24.902 | SUCCESS  | yourbench.pipeline.single_shot_question_generation:run:134 - Single-shot question generation completed successfully.
2025-06-03 13:25:24.902 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'single_shot_question_generation' in 28.120s
2025-06-03 13:25:24.906 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'multi_hop_question_generation'
2025-06-03 13:25:25.027 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 109.85 examples/s]
2025-06-03 13:25:28.307 | INFO     | yourbench.pipeline.multi_hop_question_generation:run:145 - Loaded chunked subset with 1 rows for Multi-hop question generation.
2025-06-03 13:25:28.317 | INFO     | yourbench.pipeline.multi_hop_question_generation:_multihop_qa_generation:268 - Sending 24 multi-hop calls to inference...
2025-06-03 13:25:28.317 | INFO     | yourbench.utils.inference_engine:_load_models:241 - Found 1 models in config for step 'multi_hop_question_generation': ['Qwen/Qwen2.5-72B-Instruct']
2025-06-03 13:25:28.318 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:167 - Starting asynchronous inference with per-model concurrency control.
2025-06-03 13:25:28.318 | INFO     | yourbench.utils.inference_engine:_run_inference_async_helper:190 - Total tasks scheduled: 24  (models=1  x  calls=24)

  0%|          | 0/24 [00:00<?, ?it/s]
  4%|▍         | 1/24 [00:24<09:29, 24.78s/it]
  8%|β–Š         | 2/24 [00:24<03:46, 10.29s/it]
 12%|β–ˆβ–Ž        | 3/24 [00:28<02:28,  7.09s/it]
 17%|β–ˆβ–‹        | 4/24 [00:29<01:33,  4.66s/it]
 21%|β–ˆβ–ˆ        | 5/24 [00:29<00:58,  3.08s/it]
 25%|β–ˆβ–ˆβ–Œ       | 6/24 [00:30<00:43,  2.41s/it]
 29%|β–ˆβ–ˆβ–‰       | 7/24 [00:31<00:31,  1.87s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 8/24 [00:31<00:22,  1.40s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 9/24 [00:32<00:16,  1.12s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 10/24 [00:32<00:13,  1.04it/s]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 11/24 [00:32<00:09,  1.41it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 12/24 [00:33<00:08,  1.49it/s]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 13/24 [00:33<00:06,  1.70it/s]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 14/24 [00:34<00:05,  1.72it/s]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 15/24 [00:35<00:05,  1.69it/s]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 17/24 [00:36<00:04,  1.71it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 18/24 [00:36<00:02,  2.11it/s]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 19/24 [00:36<00:02,  2.02it/s]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 20/24 [00:37<00:01,  2.02it/s]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 21/24 [00:38<00:01,  1.75it/s]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 22/24 [00:40<00:02,  1.03s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 23/24 [00:41<00:01,  1.07s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 24/24 [00:42<00:00,  1.12s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 24/24 [00:42<00:00,  1.78s/it]
2025-06-03 13:26:11.112 | SUCCESS  | yourbench.utils.inference_engine:_run_inference_async_helper:199 - Completed parallel inference for all models.
2025-06-03 13:26:11.115 | INFO     | yourbench.pipeline.multi_hop_question_generation:_parse_and_build_final:288 - Processing 24 responses for model: Qwen/Qwen2.5-72B-Instruct
2025-06-03 13:26:11.119 | INFO     | yourbench.pipeline.multi_hop_question_generation:_parse_and_build_final:340 - Constructing multi-hop question dataset with 102 rows...
2025-06-03 13:26:11.227 | INFO     | yourbench.utils.dataset_engine:custom_save_dataset:204 - Pushing dataset to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]
Creating parquet from Arrow format: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 652.10ba/s]
Uploading files as bytes or binary IO objects is not supported by Xet Storage. Falling back to HTTP upload.

Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.52it/s]
Uploading the dataset shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  1.52it/s]
2025-06-03 13:26:13.141 | SUCCESS  | yourbench.utils.dataset_engine:custom_save_dataset:210 - Dataset successfully pushed to HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'
2025-06-03 13:26:13.141 | SUCCESS  | yourbench.pipeline.multi_hop_question_generation:run:164 - Multi-hop question generation completed successfully.
2025-06-03 13:26:13.141 | SUCCESS  | yourbench.pipeline.handler:run_pipeline:153 - Completed stage: 'multi_hop_question_generation' in 48.235s
2025-06-03 13:26:13.146 | INFO     | yourbench.pipeline.handler:run_pipeline:126 - Starting execution of stage: 'lighteval'
2025-06-03 13:26:13.146 | INFO     | yourbench.pipeline.lighteval:run:88 - Saving lighteval compatible dataset
2025-06-03 13:26:13.258 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/28 [00:00<?, ? examples/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 28/28 [00:00<00:00, 6078.07 examples/s]
2025-06-03 13:26:16.564 | INFO     | yourbench.pipeline.lighteval:run:95 - Loaded single-shot Q subset single_shot_questions with 28 rows.
2025-06-03 13:26:16.666 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/102 [00:00<?, ? examples/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 102/102 [00:00<00:00, 8422.30 examples/s]
2025-06-03 13:26:19.628 | INFO     | yourbench.pipeline.lighteval:run:102 - Loaded multi-hop Q subset multi_hop_subset with 102 rows.
2025-06-03 13:26:19.754 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]
Generating train split: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 73.73 examples/s]
2025-06-03 13:26:22.281 | INFO     | yourbench.pipeline.lighteval:run:109 - Loaded chunked subset with 1 rows.
2025-06-03 13:26:22.368 | INFO     | yourbench.utils.dataset_engine:custom_load_dataset:154 - Loading dataset HuggingFace Hub with repo_id='Tonic/ESMA-Auto-Bench'

So i refreshed and "lost my progress" , but it seems like if i add the dataset name in the first page i can click on the leaderboard task in the third page :-)

dataset : https://huggingface.co/datasets/Tonic/ESMA-Auto-Bench

lighteval still running :

image.png

Logs are not displayed , which is too bad because it's the same problem as before : not sure if it's hanging with no feedback :-)

Leaderboard is published :-) https://huggingface.co/spaces/Tonic/leaderboard_yourbench_Tonic_ESMA-Auto-Bench

I had to "solve a 500 for some users" (me) , by setting ssr_mode=False

hope this helps !

going to make a fork for auto code bench now :-)

Tonic changed discussion status to closed
Your Bench org

Hi @Tonic , just to clarify, what precisely did you want? A bit more info when running lighteval?

Sign up or log in to comment