[2025-05-13 22:11:03] Created output directory: train_results/google_t5-v1_1-large_ds1000_upsample1000 [2025-05-13 22:11:03] Chat mode disabled [2025-05-13 22:11:03] Model size is 3B or smaller (0 B). Using full fine-tuning. [2025-05-13 22:11:03] Adjusted parameters for t5 model: [2025-05-13 22:11:03] - LEARNING_RATE: 1e-4 [2025-05-13 22:11:03] - BATCH_SIZE: 64 [2025-05-13 22:11:03] - GRADIENT_ACCUMULATION_STEPS: 1 [2025-05-13 22:11:03] No QA format data will be used [2025-05-13 22:11:03] Limiting dataset size to: 1000 samples [2025-05-13 22:11:03] ======================================= [2025-05-13 22:11:03] Starting training for model: google/t5-v1_1-large [2025-05-13 22:11:03] ======================================= [2025-05-13 22:11:03] CUDA_VISIBLE_DEVICES: 2,3 [2025-05-13 22:11:03] WANDB_PROJECT: wikidyk-ar [2025-05-13 22:11:03] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json [2025-05-13 22:11:03] Global Batch Size: 128 [2025-05-13 22:11:03] Data Size: 1000 [2025-05-13 22:11:03] Executing command: torchrun --nproc_per_node "2" --master-port 29512 src/train.py --model_name_or_path "google/t5-v1_1-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_t5-v1_1-large_ds1000_upsample1000" --num_upsample "1000" --per_device_train_batch_size "64" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false" --ds_size 1000 [2025-05-13 22:11:03] Training started at Tue May 13 22:11:03 UTC 2025 W0513 22:11:04.326000 523566 site-packages/torch/distributed/run.py:792] W0513 22:11:04.326000 523566 site-packages/torch/distributed/run.py:792] ***************************************** W0513 22:11:04.326000 523566 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0513 22:11:04.326000 523566 site-packages/torch/distributed/run.py:792] ***************************************** WARNING:root:Output directory: train_results/google_t5-v1_1-large_ds1000_upsample1000 WARNING:root:Output directory: train_results/google_t5-v1_1-large_ds1000_upsample1000 You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 1000 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 1000000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 1000 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 1000000 /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter. /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call. warnings.warn( Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`. wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin [rank1]:[W513 22:11:19.931151645 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) wandb: Tracking run with wandb version 0.19.11 wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250513_221119-z17cctla wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run train_results/google_t5-v1_1-large_ds1000_upsample1000 wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/z17cctla 0%| | 0/7813 [00:00