======================================================= RESTART [10-24 17:49:19] ======================================================= ======================================================= RESTART [10-24 17:49:19] ======================================================= wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. wandb: Currently logged in as: kaiqiu. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.18.3 wandb: Run data is saved locally in /home/user/VAR/wandb/run-20241024_025046-x4jmugaq wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run bright-pine-231 wandb: View project at https://wandb.ai/kaiqiu/VAR wandb: View run at https://wandb.ai/kaiqiu/VAR/runs/x4jmugaq ======================================================= RESTART [10-25 01:31:12] ======================================================= ======================================================= RESTART [10-25 01:31:12] ======================================================= ======================================================= RESTART [10-25 01:31:12] ======================================================= ======================================================= RESTART [10-25 01:31:12] ======================================================= wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. wandb: Currently logged in as: kaiqiu. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.18.3 wandb: Run data is saved locally in /home/user/VAR/wandb/run-20241024_103234-cluj3jbl wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run comfy-vortex-232 wandb: View project at https://wandb.ai/kaiqiu/VAR wandb: View run at https://wandb.ai/kaiqiu/VAR/runs/cluj3jbl /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6840x1088, 6840x64, 64x1088) triton_mm_13 0.0318 ms 100.0% triton_mm_9 0.0334 ms 95.4% triton_mm_16 0.0344 ms 92.5% triton_mm_6 0.0348 ms 91.5% triton_mm_8 0.0351 ms 90.7% triton_mm_5 0.0362 ms 88.1% triton_mm_7 0.0371 ms 85.9% triton_mm_12 0.0376 ms 84.8% triton_mm_14 0.0376 ms 84.7% triton_mm_10 0.0380 ms 83.7% SingleProcess AUTOTUNE benchmarking takes 2.4212 seconds and 0.0209 seconds precompiling AUTOTUNE addmm(24x6528, 24x1088, 1088x6528) triton_mm_31 0.0218 ms 100.0% triton_mm_35 0.0224 ms 97.3% triton_mm_26 0.0225 ms 97.0% triton_mm_27 0.0231 ms 94.5% triton_mm_21 0.0239 ms 91.2% triton_mm_23 0.0239 ms 91.2% triton_mm_22 0.0244 ms 89.5% triton_mm_30 0.0245 ms 89.2% triton_mm_20 0.0245 ms 88.9% bias_addmm 0.0250 ms 87.4% SingleProcess AUTOTUNE benchmarking takes 2.2050 seconds and 0.0008 seconds precompiling AUTOTUNE addmm(6864x3264, 6864x1088, 1088x3264) triton_mm_52 0.2252 ms 100.0% triton_mm_53 0.2301 ms 97.9% triton_mm_54 0.2589 ms 87.0% triton_mm_47 0.2686 ms 83.8% triton_mm_49 0.2769 ms 81.3% triton_mm_45 0.2802 ms 80.4% addmm 0.2853 ms 78.9% bias_addmm 0.2995 ms 75.2% triton_mm_46 0.3116 ms 72.3% triton_mm_50 0.3149 ms 71.5% SingleProcess AUTOTUNE benchmarking takes 2.6103 seconds and 0.0068 seconds precompiling AUTOTUNE addmm(6864x1088, 6864x1088, 1088x1088) triton_mm_72 0.0907 ms 100.0% triton_mm_71 0.0916 ms 99.0% bias_addmm 0.0981 ms 92.4% triton_mm_68 0.0985 ms 92.1% triton_mm_66 0.0986 ms 92.0% triton_mm_73 0.0998 ms 90.9% triton_mm_64 0.1026 ms 88.4% addmm 0.1111 ms 81.6% triton_mm_65 0.1123 ms 80.8% triton_mm_69 0.1128 ms 80.4% SingleProcess AUTOTUNE benchmarking takes 2.5100 seconds and 0.0009 seconds precompiling AUTOTUNE addmm(6864x4352, 6864x1088, 1088x4352) triton_mm_90 0.2931 ms 100.0% triton_mm_91 0.2972 ms 98.6% bias_addmm 0.3258 ms 90.0% triton_mm_92 0.3358 ms 87.3% triton_mm_85 0.3447 ms 85.0% triton_mm_83 0.3628 ms 80.8% triton_mm_87 0.3669 ms 79.9% addmm 0.3720 ms 78.8% triton_mm_84 0.4032 ms 72.7% triton_mm_88 0.4129 ms 71.0% SingleProcess AUTOTUNE benchmarking takes 2.6486 seconds and 0.0126 seconds precompiling AUTOTUNE addmm(6864x1088, 6864x4352, 4352x1088) bias_addmm 0.2932 ms 100.0% triton_mm_110 0.3092 ms 94.8% triton_mm_109 0.3177 ms 92.3% addmm 0.3280 ms 89.4% triton_mm_111 0.3284 ms 89.3% triton_mm_104 0.3588 ms 81.7% triton_mm_106 0.3590 ms 81.7% triton_mm_102 0.3721 ms 78.8% triton_mm_105 0.3725 ms 78.7% triton_mm_107 0.4100 ms 71.5% SingleProcess AUTOTUNE benchmarking takes 2.6599 seconds and 0.0010 seconds precompiling AUTOTUNE addmm(24x6528, 24x1088, 1088x6528) triton_mm_124 0.0218 ms 100.0% triton_mm_128 0.0223 ms 97.7% triton_mm_116 0.0229 ms 95.1% triton_mm_120 0.0230 ms 94.7% triton_mm_119 0.0234 ms 93.0% triton_mm_114 0.0240 ms 90.9% triton_mm_115 0.0243 ms 89.6% triton_mm_113 0.0244 ms 89.5% triton_mm_123 0.0244 ms 89.3% bias_addmm 0.0245 ms 88.8% SingleProcess AUTOTUNE benchmarking takes 2.2076 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x4352, 6864x1088, 1088x4352) triton_mm_183 0.2923 ms 100.0% triton_mm_184 0.2969 ms 98.4% bias_addmm 0.3268 ms 89.4% triton_mm_185 0.3347 ms 87.3% triton_mm_178 0.3438 ms 85.0% triton_mm_176 0.3618 ms 80.8% triton_mm_180 0.3640 ms 80.3% addmm 0.3737 ms 78.2% triton_mm_177 0.4001 ms 73.0% triton_mm_181 0.4104 ms 71.2% SingleProcess AUTOTUNE benchmarking takes 2.6391 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x1088, 6864x4352, 4352x1088) bias_addmm 0.2937 ms 100.0% triton_mm_203 0.3089 ms 95.1% triton_mm_202 0.3170 ms 92.6% triton_mm_204 0.3275 ms 89.7% addmm 0.3287 ms 89.3% triton_mm_197 0.3582 ms 82.0% triton_mm_199 0.3586 ms 81.9% triton_mm_195 0.3700 ms 79.4% triton_mm_198 0.3718 ms 79.0% triton_mm_200 0.4095 ms 71.7% SingleProcess AUTOTUNE benchmarking takes 2.6578 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x4352, 6864x1088, 1088x4352) triton_mm_276 0.2929 ms 100.0% triton_mm_277 0.2968 ms 98.7% bias_addmm 0.3269 ms 89.6% triton_mm_278 0.3338 ms 87.7% triton_mm_271 0.3421 ms 85.6% triton_mm_269 0.3618 ms 81.0% triton_mm_273 0.3634 ms 80.6% triton_mm_270 0.4013 ms 73.0% triton_mm_274 0.4113 ms 71.2% addmm 0.4181 ms 70.1% SingleProcess AUTOTUNE benchmarking takes 2.6465 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x1088, 6864x4352, 4352x1088) bias_addmm 0.2936 ms 100.0% triton_mm_296 0.3068 ms 95.7% triton_mm_295 0.3178 ms 92.4% triton_mm_297 0.3263 ms 90.0% addmm 0.3277 ms 89.6% triton_mm_292 0.3568 ms 82.3% triton_mm_290 0.3574 ms 82.1% triton_mm_288 0.3714 ms 79.1% triton_mm_291 0.3724 ms 78.8% triton_mm_293 0.4084 ms 71.9% SingleProcess AUTOTUNE benchmarking takes 2.6519 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x1088, 6864x4352, 4352x1088) triton_mm_389 0.3065 ms 100.0% triton_mm_388 0.3162 ms 96.9% triton_mm_390 0.3276 ms 93.6% bias_addmm 0.3309 ms 92.6% triton_mm_385 0.3563 ms 86.0% triton_mm_383 0.3566 ms 85.9% addmm 0.3668 ms 83.6% triton_mm_381 0.3708 ms 82.7% triton_mm_384 0.3714 ms 82.5% triton_mm_386 0.4076 ms 75.2% SingleProcess AUTOTUNE benchmarking takes 2.6458 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x1088, 6864x4352, 4352x1088) triton_mm_482 0.3065 ms 100.0% triton_mm_481 0.3171 ms 96.7% triton_mm_483 0.3272 ms 93.7% addmm 0.3289 ms 93.2% bias_addmm 0.3300 ms 92.9% triton_mm_476 0.3564 ms 86.0% triton_mm_478 0.3572 ms 85.8% triton_mm_474 0.3697 ms 82.9% triton_mm_477 0.3716 ms 82.5% triton_mm_479 0.4069 ms 75.3% SingleProcess AUTOTUNE benchmarking takes 2.6512 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x1088, 6864x4352, 4352x1088) triton_mm_575 0.3061 ms 100.0% triton_mm_574 0.3164 ms 96.8% triton_mm_576 0.3267 ms 93.7% bias_addmm 0.3300 ms 92.8% triton_mm_569 0.3559 ms 86.0% triton_mm_571 0.3566 ms 85.8% addmm 0.3670 ms 83.4% triton_mm_567 0.3696 ms 82.8% triton_mm_570 0.3699 ms 82.8% triton_mm_572 0.4070 ms 75.2% SingleProcess AUTOTUNE benchmarking takes 2.6587 seconds and 0.0000 seconds precompiling AUTOTUNE mm(24x1088, 1088x2176) triton_mm_1604 0.0146 ms 100.0% triton_mm_1608 0.0150 ms 97.0% triton_mm_1616 0.0159 ms 91.8% triton_mm_1612 0.0161 ms 90.7% triton_mm_1602 0.0174 ms 83.7% triton_mm_1603 0.0178 ms 82.0% mm 0.0179 ms 81.6% triton_mm_1607 0.0184 ms 79.4% triton_mm_1601 0.0186 ms 78.6% triton_mm_1611 0.0189 ms 77.0% SingleProcess AUTOTUNE benchmarking takes 7.4498 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x8192, 6864x1088, 1088x8192) bias_addmm 0.5559 ms 100.0% triton_mm_1633 0.5603 ms 99.2% triton_mm_1634 0.5606 ms 99.2% triton_mm_1635 0.6314 ms 88.0% triton_mm_1628 0.6609 ms 84.1% triton_mm_1630 0.6922 ms 80.3% addmm 0.6985 ms 79.6% triton_mm_1626 0.7096 ms 78.3% triton_mm_1631 0.7786 ms 71.4% triton_mm_1627 0.7822 ms 71.1% SingleProcess AUTOTUNE benchmarking takes 11.2664 seconds and 0.0000 seconds precompiling AUTOTUNE mm(8192x6864, 6864x1088) triton_mm_1671 0.5587 ms 100.0% triton_mm_1672 0.5772 ms 96.8% mm 0.5945 ms 94.0% triton_mm_1673 0.6017 ms 92.8% triton_mm_1666 0.6700 ms 83.4% triton_mm_1664 0.6842 ms 81.7% triton_mm_1668 0.6855 ms 81.5% triton_mm_1667 0.7281 ms 76.7% triton_mm_1669 0.7306 ms 76.5% triton_mm_1665 0.7418 ms 75.3% SingleProcess AUTOTUNE benchmarking takes 9.4731 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6864, 6864x4352) mm 0.2907 ms 100.0% triton_mm_1744 0.2977 ms 97.7% triton_mm_1743 0.2989 ms 97.2% triton_mm_1745 0.3028 ms 96.0% triton_mm_1738 0.3561 ms 81.6% triton_mm_1736 0.3586 ms 81.1% triton_mm_1740 0.3658 ms 79.5% triton_mm_1739 0.3816 ms 76.2% triton_mm_1741 0.4014 ms 72.4% triton_mm_1737 0.4075 ms 71.3% SingleProcess AUTOTUNE benchmarking takes 9.7528 seconds and 0.0000 seconds precompiling AUTOTUNE mm(2176x24, 24x1088) triton_mm_1696 0.0113 ms 100.0% triton_mm_1700 0.0115 ms 98.3% triton_mm_1699 0.0116 ms 97.0% triton_mm_1704 0.0117 ms 96.2% triton_mm_1706 0.0121 ms 93.6% triton_mm_1691 0.0123 ms 91.9% triton_mm_1705 0.0124 ms 91.2% triton_mm_1707 0.0124 ms 91.2% triton_mm_1693 0.0125 ms 90.1% triton_mm_1702 0.0125 ms 90.1% SingleProcess AUTOTUNE benchmarking takes 5.9373 seconds and 0.0000 seconds precompiling AUTOTUNE mm(24x2176, 2176x1088) triton_mm_1682 0.0163 ms 100.0% triton_mm_1678 0.0167 ms 97.7% mm 0.0171 ms 95.3% triton_mm_1686 0.0213 ms 76.6% triton_mm_1690 0.0233 ms 70.1% triton_mm_1677 0.0235 ms 69.5% triton_mm_1676 0.0236 ms 69.2% triton_mm_1681 0.0252 ms 64.7% triton_mm_1685 0.0257 ms 63.4% triton_mm_1675 0.0273 ms 59.8% SingleProcess AUTOTUNE benchmarking takes 7.7578 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x8192, 8192x1088) mm 0.5435 ms 100.0% triton_mm_1652 0.5715 ms 95.1% triton_mm_1653 0.5804 ms 93.6% triton_mm_1654 0.5938 ms 91.5% triton_mm_1649 0.6640 ms 81.9% triton_mm_1647 0.6649 ms 81.7% triton_mm_1645 0.6739 ms 80.7% triton_mm_1648 0.7017 ms 77.5% triton_mm_1646 0.7292 ms 74.5% triton_mm_1650 0.7366 ms 73.8% SingleProcess AUTOTUNE benchmarking takes 9.7012 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x1088, 1088x4352) triton_mm_1724 0.2778 ms 100.0% triton_mm_1725 0.2800 ms 99.2% mm 0.2826 ms 98.3% triton_mm_1726 0.3179 ms 87.4% triton_mm_1717 0.3396 ms 81.8% triton_mm_1719 0.3418 ms 81.3% triton_mm_1721 0.3464 ms 80.2% triton_mm_1718 0.3643 ms 76.3% triton_mm_1722 0.3859 ms 72.0% triton_mm_1720 0.4242 ms 65.5% SingleProcess AUTOTUNE benchmarking takes 2.5293 seconds and 0.0008 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) triton_mm_1839 0.2245 ms 100.0% triton_mm_1838 0.2313 ms 97.1% triton_mm_1840 0.2425 ms 92.6% triton_mm_1835 0.2577 ms 87.1% mm 0.2578 ms 87.1% triton_mm_1833 0.2629 ms 85.4% triton_mm_1831 0.2645 ms 84.9% triton_mm_1834 0.2889 ms 77.7% triton_mm_1832 0.2915 ms 77.0% triton_mm_1836 0.2988 ms 75.1% SingleProcess AUTOTUNE benchmarking takes 9.9413 seconds and 0.0000 seconds precompiling AUTOTUNE mm(4352x6864, 6864x1088) triton_mm_1781 0.2910 ms 100.0% mm 0.2948 ms 98.7% triton_mm_1782 0.2957 ms 98.4% triton_mm_1783 0.3432 ms 84.8% triton_mm_1778 0.3484 ms 83.5% triton_mm_1774 0.3559 ms 81.8% triton_mm_1776 0.3586 ms 81.1% triton_mm_1777 0.3847 ms 75.6% triton_mm_1779 0.3944 ms 73.8% triton_mm_1775 0.4016 ms 72.5% SingleProcess AUTOTUNE benchmarking takes 9.9767 seconds and 0.0000 seconds precompiling AUTOTUNE mm(3264x6864, 6864x1088) mm 0.2272 ms 100.0% triton_mm_1857 0.2795 ms 81.3% triton_mm_1852 0.2848 ms 79.8% triton_mm_1850 0.2854 ms 79.6% triton_mm_1858 0.2861 ms 79.4% triton_mm_1859 0.2892 ms 78.5% triton_mm_1853 0.3121 ms 72.8% triton_mm_1855 0.3267 ms 69.5% triton_mm_1851 0.3310 ms 68.6% triton_mm_1849 0.3396 ms 66.9% SingleProcess AUTOTUNE benchmarking takes 9.8465 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6864, 6864x1088) mm 0.0973 ms 100.0% triton_mm_1821 0.0986 ms 98.8% triton_mm_1814 0.1061 ms 91.8% triton_mm_1820 0.1086 ms 89.6% triton_mm_1811 0.1186 ms 82.1% triton_mm_1812 0.1196 ms 81.4% triton_mm_1816 0.1207 ms 80.7% triton_mm_1819 0.1225 ms 79.5% triton_mm_1815 0.1264 ms 77.0% triton_mm_1817 0.1305 ms 74.6% SingleProcess AUTOTUNE benchmarking takes 9.7205 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x4352, 4352x1088) mm 0.3040 ms 100.0% triton_mm_1762 0.3040 ms 100.0% triton_mm_1763 0.3112 ms 97.7% triton_mm_1764 0.3209 ms 94.7% triton_mm_1759 0.3435 ms 88.5% triton_mm_1757 0.3492 ms 87.0% triton_mm_1755 0.3501 ms 86.8% triton_mm_1758 0.3661 ms 83.0% triton_mm_1756 0.3865 ms 78.6% triton_mm_1760 0.3947 ms 77.0% SingleProcess AUTOTUNE benchmarking takes 2.5383 seconds and 0.0009 seconds precompiling AUTOTUNE mm(6864x1088, 1088x1088) triton_mm_1801 0.0837 ms 100.0% mm 0.0839 ms 99.8% triton_mm_1800 0.0844 ms 99.3% triton_mm_1797 0.0926 ms 90.4% triton_mm_1802 0.0934 ms 89.6% triton_mm_1793 0.0942 ms 88.9% triton_mm_1795 0.0954 ms 87.7% triton_mm_1794 0.1028 ms 81.5% triton_mm_1798 0.1056 ms 79.3% triton_mm_1796 0.1236 ms 67.8% SingleProcess AUTOTUNE benchmarking takes 2.4180 seconds and 0.0010 seconds precompiling AUTOTUNE mm(6864x1088, 1088x4352) triton_mm_1910 0.2759 ms 100.0% triton_mm_1911 0.2787 ms 99.0% mm 0.2806 ms 98.3% triton_mm_1912 0.3166 ms 87.2% triton_mm_1903 0.3381 ms 81.6% triton_mm_1905 0.3419 ms 80.7% triton_mm_1907 0.3470 ms 79.5% triton_mm_1904 0.3640 ms 75.8% triton_mm_1908 0.3860 ms 71.5% triton_mm_1906 0.4237 ms 65.1% SingleProcess AUTOTUNE benchmarking takes 2.5287 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x4352, 4352x1088) triton_mm_1948 0.3029 ms 100.0% mm 0.3042 ms 99.6% triton_mm_1949 0.3108 ms 97.5% triton_mm_1950 0.3212 ms 94.3% triton_mm_1945 0.3439 ms 88.1% triton_mm_1941 0.3506 ms 86.4% triton_mm_1943 0.3507 ms 86.4% triton_mm_1944 0.3666 ms 82.6% triton_mm_1942 0.3871 ms 78.3% triton_mm_1946 0.3953 ms 76.6% SingleProcess AUTOTUNE benchmarking takes 2.5340 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6528x24, 24x1088) triton_mm_1889 0.0172 ms 100.0% triton_mm_1882 0.0178 ms 96.6% triton_mm_1892 0.0183 ms 93.7% triton_mm_1883 0.0184 ms 93.1% triton_mm_1885 0.0184 ms 93.1% triton_mm_1888 0.0186 ms 92.3% triton_mm_1886 0.0186 ms 92.1% triton_mm_1893 0.0191 ms 89.8% triton_mm_1891 0.0191 ms 89.6% triton_mm_1890 0.0192 ms 89.5% SingleProcess AUTOTUNE benchmarking takes 5.8073 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6864, 6864x4352) mm 0.2940 ms 100.0% triton_mm_1930 0.2977 ms 98.8% triton_mm_1929 0.2993 ms 98.2% triton_mm_1931 0.3027 ms 97.1% triton_mm_1924 0.3554 ms 82.7% triton_mm_1922 0.3573 ms 82.3% triton_mm_1926 0.3654 ms 80.5% triton_mm_1925 0.3816 ms 77.1% triton_mm_1927 0.4003 ms 73.5% triton_mm_1923 0.4071 ms 72.2% SingleProcess AUTOTUNE benchmarking takes 2.5261 seconds and 0.0000 seconds precompiling AUTOTUNE mm(24x6528, 6528x1088) triton_mm_1864 0.0264 ms 100.0% mm 0.0268 ms 98.6% triton_mm_1868 0.0307 ms 86.1% triton_mm_1872 0.0399 ms 66.2% triton_mm_1876 0.0518 ms 51.0% triton_mm_1863 0.0521 ms 50.7% triton_mm_1862 0.0553 ms 47.8% triton_mm_1867 0.0602 ms 43.9% triton_mm_1871 0.0618 ms 42.8% triton_mm_1861 0.0660 ms 40.1% SingleProcess AUTOTUNE benchmarking takes 6.5986 seconds and 0.0000 seconds precompiling AUTOTUNE mm(3264x6864, 6864x1088) mm 0.2266 ms 100.0% triton_mm_2043 0.2787 ms 81.3% triton_mm_2038 0.2839 ms 79.8% triton_mm_2044 0.2855 ms 79.4% triton_mm_2036 0.2861 ms 79.2% triton_mm_2040 0.2864 ms 79.1% triton_mm_2045 0.2893 ms 78.3% triton_mm_2039 0.3120 ms 72.6% triton_mm_2041 0.3267 ms 69.4% triton_mm_2037 0.3307 ms 68.5% SingleProcess AUTOTUNE benchmarking takes 2.5035 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6864, 6864x1088) mm 0.0973 ms 100.0% triton_mm_2007 0.0986 ms 98.7% triton_mm_2000 0.1051 ms 92.6% triton_mm_2006 0.1079 ms 90.2% triton_mm_1997 0.1178 ms 82.6% triton_mm_1998 0.1204 ms 80.8% triton_mm_2002 0.1208 ms 80.6% triton_mm_2005 0.1218 ms 79.9% triton_mm_2001 0.1264 ms 77.0% triton_mm_2003 0.1306 ms 74.5% SingleProcess AUTOTUNE benchmarking takes 2.4013 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x1088, 1088x1088) triton_mm_1987 0.0837 ms 100.0% mm 0.0842 ms 99.4% triton_mm_1986 0.0844 ms 99.2% triton_mm_1983 0.0926 ms 90.4% triton_mm_1988 0.0935 ms 89.6% triton_mm_1981 0.0947 ms 88.4% triton_mm_1979 0.0949 ms 88.2% triton_mm_1980 0.1021 ms 82.0% triton_mm_1984 0.1048 ms 79.9% triton_mm_1977 0.1237 ms 67.7% SingleProcess AUTOTUNE benchmarking takes 2.4112 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) mm 0.2240 ms 100.0% triton_mm_2025 0.2255 ms 99.3% triton_mm_2024 0.2317 ms 96.7% triton_mm_2026 0.2441 ms 91.8% triton_mm_2021 0.2586 ms 86.6% triton_mm_2017 0.2642 ms 84.8% triton_mm_2019 0.2656 ms 84.3% triton_mm_2020 0.2890 ms 77.5% triton_mm_2018 0.2928 ms 76.5% triton_mm_2022 0.2987 ms 75.0% SingleProcess AUTOTUNE benchmarking takes 2.5114 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x1088, 1088x4352) triton_mm_2096 0.2760 ms 100.0% triton_mm_2097 0.2804 ms 98.4% mm 0.2815 ms 98.0% triton_mm_2098 0.3172 ms 87.0% triton_mm_2089 0.3401 ms 81.1% triton_mm_2091 0.3441 ms 80.2% triton_mm_2093 0.3461 ms 79.7% triton_mm_2090 0.3644 ms 75.7% triton_mm_2094 0.3860 ms 71.5% triton_mm_2092 0.4257 ms 64.8% SingleProcess AUTOTUNE benchmarking takes 2.5250 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x4352, 4352x1088) mm 0.3022 ms 100.0% triton_mm_2134 0.3048 ms 99.1% triton_mm_2135 0.3125 ms 96.7% triton_mm_2136 0.3218 ms 93.9% triton_mm_2131 0.3454 ms 87.5% triton_mm_2129 0.3511 ms 86.1% triton_mm_2127 0.3520 ms 85.8% triton_mm_2130 0.3680 ms 82.1% triton_mm_2128 0.3885 ms 77.8% triton_mm_2132 0.3975 ms 76.0% SingleProcess AUTOTUNE benchmarking takes 2.5265 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6528x24, 24x1088) triton_mm_2069 0.0172 ms 100.0% triton_mm_2071 0.0172 ms 100.0% triton_mm_2068 0.0177 ms 96.8% triton_mm_2074 0.0179 ms 95.7% triton_mm_2073 0.0181 ms 94.5% triton_mm_2076 0.0182 ms 94.2% triton_mm_2075 0.0183 ms 93.7% triton_mm_2072 0.0186 ms 92.3% triton_mm_2077 0.0191 ms 89.8% triton_mm_2078 0.0193 ms 88.7% SingleProcess AUTOTUNE benchmarking takes 2.1170 seconds and 0.0000 seconds precompiling AUTOTUNE mm(3264x6864, 6864x1088) mm 0.2268 ms 100.0% triton_mm_2229 0.2790 ms 81.3% triton_mm_2226 0.2847 ms 79.7% triton_mm_2224 0.2848 ms 79.6% triton_mm_2230 0.2862 ms 79.2% triton_mm_2222 0.2873 ms 78.9% triton_mm_2231 0.2901 ms 78.2% triton_mm_2225 0.3127 ms 72.5% triton_mm_2227 0.3271 ms 69.3% triton_mm_2223 0.3316 ms 68.4% SingleProcess AUTOTUNE benchmarking takes 2.5203 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6864, 6864x1088) mm 0.0973 ms 100.0% triton_mm_2193 0.0994 ms 97.9% triton_mm_2186 0.1052 ms 92.5% triton_mm_2192 0.1078 ms 90.2% triton_mm_2183 0.1187 ms 82.0% triton_mm_2184 0.1196 ms 81.3% triton_mm_2188 0.1206 ms 80.6% triton_mm_2191 0.1217 ms 79.9% triton_mm_2187 0.1265 ms 76.9% triton_mm_2189 0.1304 ms 74.6% SingleProcess AUTOTUNE benchmarking takes 2.4173 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x1088, 1088x1088) triton_mm_2173 0.0830 ms 100.0% mm 0.0832 ms 99.7% triton_mm_2172 0.0851 ms 97.5% triton_mm_2169 0.0928 ms 89.4% triton_mm_2174 0.0943 ms 88.1% triton_mm_2167 0.0949 ms 87.5% triton_mm_2165 0.0951 ms 87.3% triton_mm_2166 0.1024 ms 81.1% triton_mm_2170 0.1057 ms 78.5% triton_mm_2168 0.1236 ms 67.1% SingleProcess AUTOTUNE benchmarking takes 2.4053 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) mm 0.2251 ms 100.0% triton_mm_2211 0.2255 ms 99.8% triton_mm_2210 0.2320 ms 97.0% triton_mm_2212 0.2440 ms 92.2% triton_mm_2207 0.2594 ms 86.8% triton_mm_2205 0.2650 ms 85.0% triton_mm_2203 0.2665 ms 84.5% triton_mm_2206 0.2894 ms 77.8% triton_mm_2204 0.2928 ms 76.9% triton_mm_2208 0.2995 ms 75.2% SingleProcess AUTOTUNE benchmarking takes 2.5103 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x4352, 4352x1088) mm 0.3037 ms 100.0% triton_mm_2320 0.3049 ms 99.6% triton_mm_2321 0.3125 ms 97.2% triton_mm_2322 0.3225 ms 94.2% triton_mm_2317 0.3459 ms 87.8% triton_mm_2315 0.3512 ms 86.5% triton_mm_2313 0.3519 ms 86.3% triton_mm_2316 0.3683 ms 82.5% triton_mm_2314 0.3884 ms 78.2% triton_mm_2318 0.3974 ms 76.4% SingleProcess AUTOTUNE benchmarking takes 2.5408 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6528x24, 24x1088) triton_mm_2255 0.0172 ms 100.0% triton_mm_2257 0.0181 ms 94.8% triton_mm_2261 0.0182 ms 94.6% triton_mm_2262 0.0182 ms 94.4% triton_mm_2264 0.0183 ms 93.7% triton_mm_2256 0.0184 ms 93.4% triton_mm_2258 0.0184 ms 93.2% triton_mm_2260 0.0184 ms 93.1% triton_mm_2254 0.0187 ms 92.0% triton_mm_2263 0.0191 ms 89.8% SingleProcess AUTOTUNE benchmarking takes 2.1539 seconds and 0.0000 seconds precompiling AUTOTUNE mm(3264x6864, 6864x1088) mm 0.2269 ms 100.0% triton_mm_2415 0.2786 ms 81.5% triton_mm_2416 0.2857 ms 79.4% triton_mm_2412 0.2858 ms 79.4% triton_mm_2410 0.2858 ms 79.4% triton_mm_2408 0.2879 ms 78.8% triton_mm_2417 0.2895 ms 78.4% triton_mm_2411 0.3139 ms 72.3% triton_mm_2413 0.3271 ms 69.4% triton_mm_2409 0.3321 ms 68.3% SingleProcess AUTOTUNE benchmarking takes 2.5150 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6864, 6864x1088) mm 0.0966 ms 100.0% triton_mm_2379 0.0986 ms 98.0% triton_mm_2372 0.1052 ms 91.8% triton_mm_2378 0.1078 ms 89.6% triton_mm_2369 0.1186 ms 81.4% triton_mm_2370 0.1197 ms 80.7% triton_mm_2374 0.1206 ms 80.1% triton_mm_2377 0.1218 ms 79.3% triton_mm_2373 0.1265 ms 76.4% triton_mm_2375 0.1304 ms 74.1% SingleProcess AUTOTUNE benchmarking takes 2.4286 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x1088, 1088x1088) triton_mm_2359 0.0830 ms 100.0% mm 0.0840 ms 98.8% triton_mm_2358 0.0853 ms 97.3% triton_mm_2355 0.0928 ms 89.5% triton_mm_2360 0.0935 ms 88.8% triton_mm_2351 0.0944 ms 88.0% triton_mm_2353 0.0957 ms 86.8% triton_mm_2352 0.1031 ms 80.6% triton_mm_2356 0.1049 ms 79.2% triton_mm_2354 0.1237 ms 67.1% SingleProcess AUTOTUNE benchmarking takes 2.4298 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) mm 0.2246 ms 100.0% triton_mm_2397 0.2268 ms 99.0% triton_mm_2396 0.2315 ms 97.0% triton_mm_2398 0.2434 ms 92.3% triton_mm_2393 0.2582 ms 87.0% triton_mm_2391 0.2647 ms 84.9% triton_mm_2389 0.2652 ms 84.7% triton_mm_2392 0.2903 ms 77.4% triton_mm_2390 0.2934 ms 76.5% triton_mm_2394 0.2989 ms 75.1% SingleProcess AUTOTUNE benchmarking takes 2.5108 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x1088, 1088x1088) mm 0.0832 ms 100.0% triton_mm_2545 0.0840 ms 99.0% triton_mm_2544 0.0846 ms 98.3% triton_mm_2541 0.0929 ms 89.5% triton_mm_2546 0.0935 ms 88.9% triton_mm_2537 0.0953 ms 87.3% triton_mm_2539 0.0957 ms 86.9% triton_mm_2538 0.1030 ms 80.7% triton_mm_2542 0.1050 ms 79.2% triton_mm_2540 0.1233 ms 67.5% SingleProcess AUTOTUNE benchmarking takes 2.4185 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) mm 0.2238 ms 100.0% triton_mm_2583 0.2257 ms 99.1% triton_mm_2582 0.2320 ms 96.5% triton_mm_2584 0.2447 ms 91.5% triton_mm_2579 0.2601 ms 86.0% triton_mm_2575 0.2649 ms 84.5% triton_mm_2577 0.2656 ms 84.3% triton_mm_2578 0.2900 ms 77.2% triton_mm_2576 0.2935 ms 76.3% triton_mm_2580 0.2999 ms 74.6% SingleProcess AUTOTUNE benchmarking takes 2.5068 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x1088, 1088x1088) triton_mm_2731 0.0830 ms 100.0% mm 0.0832 ms 99.8% triton_mm_2730 0.0855 ms 97.0% triton_mm_2727 0.0921 ms 90.2% triton_mm_2732 0.0944 ms 87.9% triton_mm_2723 0.0952 ms 87.2% triton_mm_2725 0.0958 ms 86.7% triton_mm_2724 0.1030 ms 80.6% triton_mm_2728 0.1057 ms 78.6% triton_mm_2726 0.1233 ms 67.3% SingleProcess AUTOTUNE benchmarking takes 2.4181 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) mm 0.2250 ms 100.0% triton_mm_2769 0.2257 ms 99.7% triton_mm_2768 0.2309 ms 97.4% triton_mm_2770 0.2432 ms 92.5% triton_mm_2765 0.2588 ms 86.9% triton_mm_2763 0.2633 ms 85.4% triton_mm_2761 0.2657 ms 84.7% triton_mm_2764 0.2906 ms 77.4% triton_mm_2762 0.2921 ms 77.0% triton_mm_2766 0.2986 ms 75.4% SingleProcess AUTOTUNE benchmarking takes 2.5139 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) mm 0.2236 ms 100.0% triton_mm_2955 0.2251 ms 99.3% triton_mm_2954 0.2316 ms 96.5% triton_mm_2956 0.2439 ms 91.7% triton_mm_2951 0.2593 ms 86.2% triton_mm_2947 0.2642 ms 84.6% triton_mm_2949 0.2652 ms 84.3% triton_mm_2950 0.2891 ms 77.3% triton_mm_2948 0.2929 ms 76.3% triton_mm_2952 0.2996 ms 74.6% SingleProcess AUTOTUNE benchmarking takes 2.5044 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x3264, 3264x1088) mm 0.2249 ms 100.0% triton_mm_3141 0.2260 ms 99.5% triton_mm_3140 0.2309 ms 97.4% triton_mm_3142 0.2435 ms 92.4% triton_mm_3137 0.2585 ms 87.0% triton_mm_3133 0.2641 ms 85.2% triton_mm_3135 0.2647 ms 84.9% triton_mm_3136 0.2900 ms 77.5% triton_mm_3134 0.2928 ms 76.8% triton_mm_3138 0.2985 ms 75.3% SingleProcess AUTOTUNE benchmarking takes 2.5138 seconds and 0.0000 seconds precompiling AUTOTUNE mm(1088x6840, 6840x64) mm 0.0292 ms 100.0% triton_mm_4874 0.0447 ms 65.2% triton_mm_4878 0.0479 ms 60.9% triton_mm_4882 0.0490 ms 59.6% triton_mm_4872 0.0597 ms 48.9% triton_mm_4873 0.0633 ms 46.1% triton_mm_4881 0.0674 ms 43.3% triton_mm_4871 0.0708 ms 41.2% triton_mm_4877 0.0719 ms 40.6% triton_mm_4880 0.0789 ms 37.0% SingleProcess AUTOTUNE benchmarking takes 7.7932 seconds and 0.0000 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6840x1088, 6840x64, 64x1088) triton_mm_9 0.0324 ms 100.0% triton_mm_13 0.0326 ms 99.3% triton_mm_8 0.0340 ms 95.1% triton_mm_16 0.0344 ms 94.0% triton_mm_6 0.0356 ms 90.8% triton_mm_5 0.0362 ms 89.4% triton_mm_7 0.0371 ms 87.2% bias_addmm 0.0374 ms 86.6% triton_mm_10 0.0383 ms 84.5% triton_mm_12 0.0383 ms 84.4% SingleProcess AUTOTUNE benchmarking takes 4.5000 seconds and 0.0000 seconds precompiling AUTOTUNE addmm(6864x8192, 6864x1088, 1088x8192) bias_addmm 0.5542 ms 100.0% triton_mm_1634 0.5615 ms 98.7% triton_mm_1633 0.5627 ms 98.5% triton_mm_1635 0.6323 ms 87.7% triton_mm_1628 0.6582 ms 84.2% triton_mm_1630 0.6947 ms 79.8% addmm 0.6961 ms 79.6% triton_mm_1626 0.7105 ms 78.0% triton_mm_1627 0.7860 ms 70.5% triton_mm_1631 0.7862 ms 70.5% SingleProcess AUTOTUNE benchmarking takes 6.3490 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x8192, 8192x1088) mm 0.5390 ms 100.0% triton_mm_1652 0.5702 ms 94.5% triton_mm_1653 0.5815 ms 92.7% triton_mm_1654 0.5960 ms 90.4% triton_mm_1647 0.6632 ms 81.3% triton_mm_1649 0.6684 ms 80.6% triton_mm_1645 0.6711 ms 80.3% triton_mm_1648 0.7024 ms 76.7% triton_mm_1646 0.7268 ms 74.2% triton_mm_1650 0.7377 ms 73.1% SingleProcess AUTOTUNE benchmarking takes 2.6963 seconds and 0.0000 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6840x1088, 6840x64, 64x1088) triton_mm_13 0.0319 ms 100.0% triton_mm_9 0.0324 ms 98.3% triton_mm_16 0.0345 ms 92.6% triton_mm_8 0.0350 ms 91.1% triton_mm_6 0.0350 ms 91.1% triton_mm_5 0.0354 ms 90.1% triton_mm_11 0.0376 ms 84.8% triton_mm_7 0.0380 ms 84.0% triton_mm_10 0.0383 ms 83.2% triton_mm_12 0.0383 ms 83.2% SingleProcess AUTOTUNE benchmarking takes 2.5602 seconds and 0.0000 seconds precompiling AUTOTUNE mm(8192x6864, 6864x1088) mm 0.5173 ms 100.0% triton_mm_1671 0.5459 ms 94.8% triton_mm_1672 0.5640 ms 91.7% triton_mm_1673 0.5889 ms 87.8% triton_mm_1666 0.6542 ms 79.1% triton_mm_1664 0.6630 ms 78.0% triton_mm_1668 0.6667 ms 77.6% triton_mm_1667 0.7102 ms 72.8% triton_mm_1669 0.7125 ms 72.6% triton_mm_1665 0.7282 ms 71.0% SingleProcess AUTOTUNE benchmarking takes 2.7755 seconds and 0.0000 seconds precompiling AUTOTUNE mm(2176x24, 24x1088) triton_mm_1700 0.0116 ms 100.0% triton_mm_1702 0.0116 ms 100.0% triton_mm_1697 0.0117 ms 98.6% triton_mm_1699 0.0117 ms 98.6% triton_mm_1701 0.0117 ms 98.6% triton_mm_1703 0.0118 ms 98.1% triton_mm_1704 0.0118 ms 97.8% triton_mm_1696 0.0120 ms 96.0% triton_mm_1693 0.0124 ms 93.3% triton_mm_1707 0.0124 ms 93.3% SingleProcess AUTOTUNE benchmarking takes 2.1903 seconds and 0.0000 seconds precompiling AUTOTUNE mm(24x2176, 2176x1088) triton_mm_1678 0.0155 ms 100.0% mm 0.0171 ms 90.6% triton_mm_1682 0.0173 ms 89.5% triton_mm_1686 0.0201 ms 76.9% triton_mm_1677 0.0224 ms 69.1% triton_mm_1690 0.0234 ms 66.1% triton_mm_1676 0.0236 ms 65.5% triton_mm_1685 0.0258 ms 60.0% triton_mm_1681 0.0262 ms 59.2% triton_mm_1675 0.0263 ms 58.9% SingleProcess AUTOTUNE benchmarking takes 2.1890 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x8192, 8192x1088) mm 0.5265 ms 100.0% triton_mm_1652 0.5577 ms 94.4% triton_mm_1653 0.5706 ms 92.3% triton_mm_1654 0.5830 ms 90.3% triton_mm_1647 0.6556 ms 80.3% triton_mm_1649 0.6561 ms 80.3% triton_mm_1645 0.6583 ms 80.0% triton_mm_1648 0.6869 ms 76.7% triton_mm_1646 0.7146 ms 73.7% triton_mm_1650 0.7230 ms 72.8% SingleProcess AUTOTUNE benchmarking takes 2.7239 seconds and 0.0000 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): AUTOTUNE addmm(6864x8192, 6864x1088, 1088x8192) triton_mm_1634 0.5610 ms 100.0% triton_mm_1633 0.5624 ms 99.8% triton_mm_1635 0.6368 ms 88.1% bias_addmm 0.6483 ms 86.5% triton_mm_1628 0.6638 ms 84.5% triton_mm_1630 0.6950 ms 80.7% triton_mm_1626 0.7131 ms 78.7% addmm 0.7638 ms 73.4% triton_mm_1631 0.7856 ms 71.4% triton_mm_1627 0.7856 ms 71.4% SingleProcess AUTOTUNE benchmarking takes 2.8735 seconds and 0.0000 seconds precompiling AUTOTUNE mm(8192x6864, 6864x1088) mm 0.5283 ms 100.0% triton_mm_1671 0.5521 ms 95.7% triton_mm_1672 0.5733 ms 92.1% triton_mm_1673 0.5980 ms 88.3% triton_mm_1666 0.6676 ms 79.1% triton_mm_1668 0.6744 ms 78.3% triton_mm_1664 0.6836 ms 77.3% triton_mm_1667 0.7263 ms 72.7% triton_mm_1669 0.7267 ms 72.7% triton_mm_1665 0.7381 ms 71.6% SingleProcess AUTOTUNE benchmarking takes 2.7034 seconds and 0.0000 seconds precompiling AUTOTUNE mm(6864x8192, 8192x1088) mm 0.5382 ms 100.0% triton_mm_1652 0.5671 ms 94.9% triton_mm_1653 0.5776 ms 93.2% triton_mm_1654 0.5930 ms 90.8% triton_mm_1647 0.6655 ms 80.9% triton_mm_1649 0.6656 ms 80.9% triton_mm_1645 0.6761 ms 79.6% triton_mm_1648 0.6986 ms 77.0% triton_mm_1646 0.7184 ms 74.9% triton_mm_1650 0.7312 ms 73.6% SingleProcess AUTOTUNE benchmarking takes 2.7034 seconds and 0.0000 seconds precompiling /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): /home/user/VAR/tokenizer/tokenizer_image/dino_enc/dinov2.py:122: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(enabled=False): Validation Progress: 0%| | 0/44 [00:00