======================================================= RESTART [10-14 20:50:22] ======================================================= ======================================================= RESTART [10-14 20:50:22] ======================================================= ======================================================= RESTART [10-14 21:01:22] ======================================================= ======================================================= RESTART [10-14 21:01:22] ======================================================= ======================================================= RESTART [10-14 21:01:22] ======================================================= ======================================================= RESTART [10-14 21:01:22] ======================================================= wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. wandb: Currently logged in as: kaiqiu. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.18.3 wandb: Run data is saved locally in /home/user/VAR/wandb/run-20241014_060242-6t1ak0ty wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run peach-waterfall-190 wandb: View project at https://wandb.ai/kaiqiu/VAR wandb: View run at https://wandb.ai/kaiqiu/VAR/runs/6t1ak0ty /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6840x1088, 6840x64, 64x1088) triton_mm_13 0.0320 ms 100.0% triton_mm_9 0.0333 ms 96.0% triton_mm_16 0.0339 ms 94.3% triton_mm_8 0.0352 ms 91.0% triton_mm_6 0.0358 ms 89.3% triton_mm_5 0.0366 ms 87.5% triton_mm_7 0.0372 ms 86.1% triton_mm_14 0.0378 ms 84.7% triton_mm_11 0.0378 ms 84.6% triton_mm_10 0.0387 ms 82.6% SingleProcess AUTOTUNE benchmarking takes 2.4579 seconds and 0.0116 seconds precompiling AUTOTUNE mm(24x1088, 1088x2176) triton_mm_1604 0.0147 ms 100.0% triton_mm_1608 0.0153 ms 96.1% triton_mm_1612 0.0159 ms 92.4% triton_mm_1616 0.0174 ms 84.3% mm 0.0175 ms 83.7% triton_mm_1603 0.0180 ms 81.4% triton_mm_1602 0.0186 ms 78.8% triton_mm_1607 0.0187 ms 78.4% triton_mm_1611 0.0191 ms 76.7% triton_mm_1601 0.0192 ms 76.4% SingleProcess AUTOTUNE benchmarking takes 2.1954 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x16384, 6864x1088, 1088x16384) bias_addmm 1.0885 ms 100.0% triton_mm_1634 1.1077 ms 98.3% triton_mm_1633 1.1110 ms 98.0% triton_mm_1635 1.2498 ms 87.1% triton_mm_1628 1.3124 ms 82.9% triton_mm_1630 1.3740 ms 79.2% addmm 1.3989 ms 77.8% triton_mm_1626 1.4118 ms 77.1% triton_mm_1631 1.5544 ms 70.0% triton_mm_1627 1.5589 ms 69.8% SingleProcess AUTOTUNE benchmarking takes 10.9272 seconds and 0.0000 seconds precompiling AUTOTUNE mm(16384x6864, 6864x1088) triton_mm_1671 1.0396 ms 100.0% mm 1.0504 ms 99.0% triton_mm_1672 1.0724 ms 96.9% triton_mm_1673 1.1193 ms 92.9% triton_mm_1668 1.3076 ms 79.5% triton_mm_1666 1.3260 ms 78.4% triton_mm_1664 1.3297 ms 78.2% triton_mm_1669 1.4031 ms 74.1% triton_mm_1667 1.4308 ms 72.7% triton_mm_1665 1.4845 ms 70.0% SingleProcess AUTOTUNE benchmarking takes 9.8000 seconds and 0.0000 seconds precompiling AUTOTUNE mm(2176x24, 24x1088) triton_mm_1696 0.0118 ms 100.0% triton_mm_1699 0.0120 ms 98.1% triton_mm_1703 0.0123 ms 95.8% triton_mm_1704 0.0123 ms 95.6% triton_mm_1697 0.0138 ms 85.3% triton_mm_1698 0.0140 ms 84.2% mm 0.0142 ms 83.1% triton_mm_1694 0.0142 ms 83.1% triton_mm_1695 0.0142 ms 83.1% triton_mm_1706 0.0142 ms 82.9% SingleProcess AUTOTUNE benchmarking takes 2.2537 seconds and 0.0000 seconds precompiling AUTOTUNE mm(24x2176, 2176x1088) triton_mm_1678 0.0171 ms 100.0% mm 0.0179 ms 95.2% triton_mm_1682 0.0184 ms 92.9% triton_mm_1686 0.0222 ms 76.7% triton_mm_1690 0.0233 ms 73.1% triton_mm_1677 0.0243 ms 70.3% triton_mm_1676 0.0244 ms 69.8% triton_mm_1681 0.0261 ms 65.4% triton_mm_1685 0.0264 ms 64.7% triton_mm_1675 0.0270 ms 63.2% SingleProcess AUTOTUNE benchmarking takes 2.2347 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x16384, 16384x1088) mm 1.0811 ms 100.0% triton_mm_1652 1.1329 ms 95.4% triton_mm_1653 1.1497 ms 94.0% triton_mm_1654 1.1782 ms 91.8% triton_mm_1649 1.3285 ms 81.4% triton_mm_1647 1.3323 ms 81.1% triton_mm_1645 1.3330 ms 81.1% triton_mm_1648 1.3797 ms 78.4% triton_mm_1646 1.4519 ms 74.5% triton_mm_1650 1.4740 ms 73.3% SingleProcess AUTOTUNE benchmarking takes 9.5903 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6840, 6840x64) mm 0.0316 ms 100.0% triton_mm_4874 0.0446 ms 70.8% triton_mm_4882 0.0502 ms 62.9% triton_mm_4878 0.0504 ms 62.7% triton_mm_4872 0.0614 ms 51.5% triton_mm_4873 0.0657 ms 48.1% triton_mm_4881 0.0684 ms 46.2% triton_mm_4871 0.0714 ms 44.2% triton_mm_4877 0.0747 ms 42.3% triton_mm_4880 0.0810 ms 39.0% SingleProcess AUTOTUNE benchmarking takes 2.2249 seconds and 0.1080 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6840x1088, 6840x64, 64x1088) triton_mm_9 0.0322 ms 100.0% triton_mm_13 0.0326 ms 98.7% triton_mm_16 0.0340 ms 94.6% triton_mm_8 0.0347 ms 92.8% triton_mm_6 0.0348 ms 92.4% triton_mm_5 0.0362 ms 89.0% triton_mm_7 0.0369 ms 87.3% triton_mm_10 0.0374 ms 86.1% triton_mm_12 0.0376 ms 85.7% triton_mm_14 0.0379 ms 84.9% SingleProcess AUTOTUNE benchmarking takes 2.4502 seconds and 0.0348 seconds precompiling AUTOTUNE addmm(6864x16384, 6864x1088, 1088x16384) triton_mm_1634 1.1239 ms 100.0% triton_mm_1633 1.1363 ms 98.9% bias_addmm 1.1705 ms 96.0% triton_mm_1635 1.2703 ms 88.5% triton_mm_1628 1.3404 ms 83.8% triton_mm_1630 1.3948 ms 80.6% addmm 1.4234 ms 79.0% triton_mm_1626 1.4513 ms 77.4% triton_mm_1631 1.5727 ms 71.5% triton_mm_1627 1.5859 ms 70.9% SingleProcess AUTOTUNE benchmarking takes 10.9089 seconds and 0.0000 seconds precompiling AUTOTUNE mm(16384x6864, 6864x1088) triton_mm_1671 1.0605 ms 100.0% triton_mm_1672 1.0899 ms 97.3% triton_mm_1673 1.1421 ms 92.9% mm 1.1782 ms 90.0% triton_mm_1668 1.3354 ms 79.4% triton_mm_1666 1.3547 ms 78.3% triton_mm_1664 1.3631 ms 77.8% triton_mm_1669 1.4287 ms 74.2% triton_mm_1667 1.4573 ms 72.8% triton_mm_1665 1.4919 ms 71.1% SingleProcess AUTOTUNE benchmarking takes 9.6130 seconds and 0.0000 seconds precompiling AUTOTUNE mm(2176x24, 24x1088) triton_mm_1702 0.0111 ms 100.0% triton_mm_1698 0.0115 ms 96.7% triton_mm_1701 0.0116 ms 96.1% triton_mm_1700 0.0118 ms 94.3% triton_mm_1697 0.0119 ms 93.3% triton_mm_1704 0.0120 ms 92.8% triton_mm_1691 0.0120 ms 92.6% triton_mm_1696 0.0121 ms 92.3% triton_mm_1699 0.0124 ms 90.2% triton_mm_1703 0.0124 ms 89.7% SingleProcess AUTOTUNE benchmarking takes 2.1963 seconds and 0.0000 seconds precompiling AUTOTUNE mm(24x2176, 2176x1088) triton_mm_1678 0.0164 ms 100.0% mm 0.0172 ms 95.3% triton_mm_1682 0.0173 ms 94.7% triton_mm_1686 0.0209 ms 78.2% triton_mm_1677 0.0230 ms 71.0% triton_mm_1690 0.0234 ms 69.8% triton_mm_1676 0.0245 ms 66.8% triton_mm_1681 0.0251 ms 65.2% triton_mm_1685 0.0264 ms 61.9% triton_mm_1675 0.0270 ms 60.6% SingleProcess AUTOTUNE benchmarking takes 2.1867 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x16384, 16384x1088) mm 1.0976 ms 100.0% triton_mm_1652 1.1477 ms 95.6% triton_mm_1653 1.1684 ms 93.9% triton_mm_1654 1.1961 ms 91.8% triton_mm_1649 1.3486 ms 81.4% triton_mm_1647 1.3594 ms 80.7% triton_mm_1645 1.3607 ms 80.7% triton_mm_1648 1.4014 ms 78.3% triton_mm_1646 1.4787 ms 74.2% triton_mm_1650 1.4971 ms 73.3% SingleProcess AUTOTUNE benchmarking takes 2.7114 seconds and 0.0008 seconds precompiling AUTOTUNE mm(1088x6840, 6840x64) mm 0.0292 ms 100.0% triton_mm_4874 0.0437 ms 66.8% triton_mm_4882 0.0482 ms 60.6% triton_mm_4878 0.0488 ms 59.9% triton_mm_4872 0.0590 ms 49.5% triton_mm_4873 0.0633 ms 46.1% triton_mm_4881 0.0673 ms 43.4% triton_mm_4871 0.0711 ms 41.1% triton_mm_4877 0.0720 ms 40.6% triton_mm_4880 0.0789 ms 37.0% SingleProcess AUTOTUNE benchmarking takes 7.5243 seconds and 0.0000 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6840x1088, 6840x64, 64x1088) triton_mm_13 0.0330 ms 100.0% triton_mm_9 0.0334 ms 98.9% triton_mm_16 0.0340 ms 97.3% triton_mm_8 0.0342 ms 96.5% triton_mm_6 0.0350 ms 94.2% triton_mm_5 0.0356 ms 92.7% triton_mm_11 0.0378 ms 87.5% triton_mm_12 0.0378 ms 87.5% triton_mm_7 0.0383 ms 86.3% triton_mm_14 0.0387 ms 85.3% SingleProcess AUTOTUNE benchmarking takes 2.4331 seconds and 0.0182 seconds precompiling AUTOTUNE mm(24x1088, 1088x2176) triton_mm_1604 0.0134 ms 100.0% triton_mm_1608 0.0152 ms 88.7% triton_mm_1616 0.0160 ms 83.8% triton_mm_1612 0.0161 ms 83.3% mm 0.0168 ms 80.0% triton_mm_1603 0.0168 ms 79.8% triton_mm_1607 0.0174 ms 77.1% triton_mm_1601 0.0179 ms 75.3% triton_mm_1602 0.0184 ms 73.1% triton_mm_1614 0.0197 ms 68.2% SingleProcess AUTOTUNE benchmarking takes 2.1913 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x16384, 6864x1088, 1088x16384) bias_addmm 1.0698 ms 100.0% triton_mm_1634 1.0916 ms 98.0% triton_mm_1633 1.0932 ms 97.9% triton_mm_1635 1.2452 ms 85.9% triton_mm_1628 1.2939 ms 82.7% triton_mm_1630 1.3500 ms 79.2% addmm 1.3637 ms 78.4% triton_mm_1626 1.3958 ms 76.6% triton_mm_1627 1.5418 ms 69.4% triton_mm_1631 1.5422 ms 69.4% SingleProcess AUTOTUNE benchmarking takes 10.9439 seconds and 0.0000 seconds precompiling AUTOTUNE mm(16384x6864, 6864x1088) triton_mm_1671 1.0212 ms 100.0% mm 1.0320 ms 99.0% triton_mm_1672 1.0557 ms 96.7% triton_mm_1673 1.1041 ms 92.5% triton_mm_1668 1.2777 ms 79.9% triton_mm_1666 1.2995 ms 78.6% triton_mm_1664 1.3047 ms 78.3% triton_mm_1669 1.3827 ms 73.9% triton_mm_1667 1.4081 ms 72.5% triton_mm_1665 1.4576 ms 70.1% SingleProcess AUTOTUNE benchmarking takes 9.8433 seconds and 0.0000 seconds precompiling AUTOTUNE mm(2176x24, 24x1088) triton_mm_1696 0.0112 ms 100.0% triton_mm_1700 0.0112 ms 99.7% triton_mm_1701 0.0115 ms 97.5% triton_mm_1702 0.0116 ms 97.0% triton_mm_1699 0.0117 ms 95.4% triton_mm_1697 0.0124 ms 90.2% triton_mm_1695 0.0129 ms 86.8% mm 0.0130 ms 86.4% triton_mm_1698 0.0130 ms 86.2% triton_mm_1692 0.0131 ms 85.8% SingleProcess AUTOTUNE benchmarking takes 2.2148 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x16384, 16384x1088) mm 1.0692 ms 100.0% triton_mm_1652 1.1133 ms 96.0% triton_mm_1653 1.1298 ms 94.6% triton_mm_1654 1.1566 ms 92.4% triton_mm_1649 1.3020 ms 82.1% triton_mm_1645 1.3091 ms 81.7% triton_mm_1647 1.3124 ms 81.5% triton_mm_1648 1.3525 ms 79.1% triton_mm_1646 1.4370 ms 74.4% triton_mm_1650 1.4534 ms 73.6% SingleProcess AUTOTUNE benchmarking takes 9.5957 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6840, 6840x64) mm 0.0297 ms 100.0% triton_mm_4874 0.0456 ms 65.2% triton_mm_4882 0.0494 ms 60.1% triton_mm_4878 0.0495 ms 60.0% triton_mm_4872 0.0606 ms 49.0% triton_mm_4873 0.0641 ms 46.3% triton_mm_4881 0.0692 ms 42.9% triton_mm_4871 0.0720 ms 41.2% triton_mm_4877 0.0729 ms 40.8% triton_mm_4880 0.0791 ms 37.6% SingleProcess AUTOTUNE benchmarking takes 7.6691 seconds and 0.0000 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6840x1088, 6840x64, 64x1088) triton_mm_13 0.0320 ms 100.0% triton_mm_9 0.0331 ms 96.5% triton_mm_8 0.0341 ms 93.7% triton_mm_16 0.0347 ms 92.2% triton_mm_6 0.0357 ms 89.5% triton_mm_5 0.0364 ms 87.7% triton_mm_7 0.0373 ms 85.8% triton_mm_10 0.0375 ms 85.3% triton_mm_12 0.0378 ms 84.6% triton_mm_11 0.0378 ms 84.5% SingleProcess AUTOTUNE benchmarking takes 2.4545 seconds and 0.0148 seconds precompiling AUTOTUNE mm(24x1088, 1088x2176) triton_mm_1604 0.0134 ms 100.0% triton_mm_1608 0.0151 ms 88.5% triton_mm_1616 0.0159 ms 84.0% triton_mm_1612 0.0160 ms 83.6% triton_mm_1603 0.0168 ms 79.8% triton_mm_1607 0.0174 ms 76.8% triton_mm_1601 0.0177 ms 75.6% mm 0.0179 ms 74.6% triton_mm_1602 0.0185 ms 72.3% triton_mm_1614 0.0195 ms 68.5% SingleProcess AUTOTUNE benchmarking takes 2.1807 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x16384, 6864x1088, 1088x16384) bias_addmm 1.0660 ms 100.0% triton_mm_1634 1.0820 ms 98.5% triton_mm_1633 1.0892 ms 97.9% triton_mm_1635 1.2273 ms 86.9% triton_mm_1628 1.2832 ms 83.1% triton_mm_1630 1.3464 ms 79.2% addmm 1.3666 ms 78.0% triton_mm_1626 1.3835 ms 77.1% triton_mm_1631 1.5217 ms 70.1% triton_mm_1627 1.5378 ms 69.3% SingleProcess AUTOTUNE benchmarking takes 10.8632 seconds and 0.0000 seconds precompiling AUTOTUNE mm(16384x6864, 6864x1088) triton_mm_1671 1.0150 ms 100.0% mm 1.0284 ms 98.7% triton_mm_1672 1.0536 ms 96.3% triton_mm_1673 1.0964 ms 92.6% triton_mm_1668 1.2770 ms 79.5% triton_mm_1666 1.3002 ms 78.1% triton_mm_1664 1.3037 ms 77.9% triton_mm_1669 1.3844 ms 73.3% triton_mm_1667 1.4054 ms 72.2% triton_mm_1665 1.4389 ms 70.5% SingleProcess AUTOTUNE benchmarking takes 9.8616 seconds and 0.0000 seconds precompiling AUTOTUNE mm(2176x24, 24x1088) triton_mm_1696 0.0112 ms 100.0% triton_mm_1697 0.0114 ms 97.8% triton_mm_1691 0.0121 ms 92.1% triton_mm_1701 0.0122 ms 91.6% triton_mm_1700 0.0124 ms 89.7% triton_mm_1699 0.0125 ms 89.0% triton_mm_1703 0.0127 ms 88.1% triton_mm_1698 0.0127 ms 87.8% triton_mm_1695 0.0127 ms 87.7% triton_mm_1706 0.0130 ms 85.7% SingleProcess AUTOTUNE benchmarking takes 2.2308 seconds and 0.0000 seconds precompiling AUTOTUNE mm(24x2176, 2176x1088) triton_mm_1678 0.0160 ms 100.0% triton_mm_1682 0.0168 ms 95.2% mm 0.0179 ms 89.3% triton_mm_1686 0.0215 ms 74.4% triton_mm_1677 0.0225 ms 71.1% triton_mm_1690 0.0238 ms 67.3% triton_mm_1676 0.0240 ms 66.7% triton_mm_1681 0.0262 ms 61.1% triton_mm_1675 0.0266 ms 60.2% triton_mm_1685 0.0268 ms 59.6% SingleProcess AUTOTUNE benchmarking takes 2.2254 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x16384, 16384x1088) mm 1.0615 ms 100.0% triton_mm_1652 1.1111 ms 95.5% triton_mm_1653 1.1244 ms 94.4% triton_mm_1654 1.1536 ms 92.0% triton_mm_1649 1.2919 ms 82.2% triton_mm_1645 1.3086 ms 81.1% triton_mm_1647 1.3088 ms 81.1% triton_mm_1648 1.3450 ms 78.9% triton_mm_1650 1.4457 ms 73.4% triton_mm_1646 1.4463 ms 73.4% SingleProcess AUTOTUNE benchmarking takes 9.6390 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6840, 6840x64) mm 0.0298 ms 100.0% triton_mm_4874 0.0454 ms 65.7% triton_mm_4878 0.0488 ms 61.0% triton_mm_4882 0.0494 ms 60.3% triton_mm_4872 0.0597 ms 49.9% triton_mm_4873 0.0650 ms 45.8% triton_mm_4881 0.0681 ms 43.7% triton_mm_4871 0.0709 ms 42.0% triton_mm_4877 0.0738 ms 40.4% triton_mm_4880 0.0790 ms 37.7% SingleProcess AUTOTUNE benchmarking takes 7.6180 seconds and 0.0000 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): Validation Progress: 0%| | 0/44 [00:00