|
2025-09-05T19:18:35Z INFO 1939 [root]: /opt/conda/bin/neuronx-cc compile /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo --framework XLA --target trn1 --output /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff --model-type=transformer -O1 --lnc=1 '--internal-hlo2tensorizer-options=--experimental-unsafe-fp8e4m3fn-as-fp8e4m3 --verify-hlo=false' --logfile=/models/mistral-7b-v0.3-instruct-neuronx/layout_opt/log-neuron-cc.txt --verbose=35 |
|
2025-09-05T19:18:41Z INFO 1939 [root]: NeuronX Compiler version 2.20.9961.0+0acef03a Python version 3.10.17 HWM version 2.20.0.9961+0acef03a NumPy version 1.26.4 |
|
2025-09-05T19:18:41Z INFO 1971 [root]: XLA detected |
|
2025-09-05T19:18:41Z INFO 1971 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper |
|
2025-09-05T19:18:41Z INFO 1971 [root]: Intermediate files stored in /opt/vllm/neuronxcc-gss0wi_w, output in /opt/vllm |
|
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 |
|
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Processing input #0 |
|
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 |
|
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0 |
|
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1 |
|
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Processing input #0 |
|
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: IR signature: b7a76fc5f3f76d1d69d57e0e784721bafd07e3a61734f6594e8c815123a8a771 for graph.hlo |
|
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Executing: /opt/conda/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --layers-per-module=1 --partition --emit-tensor-level-dropout-ops --experimental-unsafe-fp8e4m3fn-as-fp8e4m3 --verify-hlo=false --native-to-custom-softmax --partitioner-opts='--transformer' |
|
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: DEBUG: needsModular? No. macCnt 0 num non-trivial Ops 290 |
|
INFO: Switching to single-module compile. PrePartitionPipe skipped. |
|
INFO: Found memory bound graph |
|
INFO: Number of Native SoftmaxDx's detected and replaced: 0 |
|
INFO: Number of Native Softmax's detected and replaced: 0 |
|
Replaced 0 dropout sequences with OffloadedDropout |
|
INFO: HloMacCount has found 0 |
|
INFO: Traffic has found 7248289792 |
|
INFO: AIF 0 |
|
HLO Ops used in computation: parameter reshape transpose tuple |
|
Warning: Could not open file debug_info_hlo_partitions.json |
|
2025-09-05 19:18:41.368875: W hilo/hlo2penguin/utils/DumpDebugInfo.cc:52] Truncating long HLO operator name %last = tuple(%p68, %transpose.290, %transpose.291, %transpose.292, %transpose.293, %transpose.294, %transpose.295, %transpose.296, %transpose.297, %transpose.298, %transpose.299, %transpose.300, %transpose.301, %transpose.302, %transpose.303, %transpose.304, %transpose.305, %transpose.306, %transpose.307, %transpose.308, %transpose.309, %transpose.310, %transpose.311, %transpose.312, %transpose.313, %transpose.314, %transpose.315, %transpose.316, %transpose.317, %transpose.318, %transpose.319, %transpos... to 512 characters in the compiler's debug metadata |
|
Invoking RemoveOptimizationBarriers pass |
|
|
|
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: IR signature: 2e0dd598350cceee9da216d825f016375db09d3b92ca63d286d74eb4016c4d07 for sg0000/HLOToTensorizer |
|
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Job #0 finished |
|
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0 |
|
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.Frontend.0 |
|
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Job Frontend len(in_states) 1 |
|
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Processing input #0 |
|
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Start model loading |
|
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Start tensorization |
|
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Num jobs: 32 |
|
2025-09-05T19:18:41Z USER 1971 [root/Tensorizer/Tensorizer]: Running Tensorizer |
|
2025-09-05T19:18:41Z INFO 1971 [Tensorizer]: Frontend did not find netlist info. Switching to flat flow. |
|
2025-09-05T19:18:41Z INFO 1971 [Tensorizer]: Building model from Penguin script "penguin.py"... |
|
2025-09-05T19:18:41Z INFO 1971 [Tensorizer]: Tensorizer options: --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=matmult-bf16 --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Running DoNothing |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.003 seconds |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.004 seconds |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.032 seconds |
|
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.011 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.004 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.033 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.047 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.014 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.004 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.068 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Running TCTransform |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.006 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Running TCTransform |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.060 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.005 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.005 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.016 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Rematerialization]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PadElimination]: Running PadElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PadElimination]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Running TCTransform |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Recompute]: Running Recompute |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Recompute]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.000 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [Tensorizer]: After optimization: 290 statements |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Running DoNothing |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AutoCastTCInputs]: Running AutoCastTCInputs |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AutoCastTCInputs]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AutoCastTCInputs]: AutoCastTCInputs finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.007 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.004 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.038 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.019 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.005 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.035 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.017 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.016 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.004 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.031 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PGTiling]: Running PGTiling |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.048 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.007 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.003 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.019 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 0.114 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.002 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.001 seconds |
|
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=True) |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 14.818 seconds |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 15.072 seconds |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: |
|
20 MACROS WITH LARGEST INSTRUCTION COUNTS: |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 4096: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False) |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.174 seconds |
|
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.101 seconds |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True) |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.654 seconds |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.101 seconds |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.030 seconds |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.023 seconds |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.073 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.084 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.148 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: |
|
20 MACROS WITH LARGEST INSTRUCTION COUNTS: |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 4096: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128 |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.029 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.107 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.053 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.106 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.022 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.080 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.018 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.006 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.066 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.087 seconds |
|
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True) |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 0.437 seconds |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.109 seconds |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=False) |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.068 seconds |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.031 seconds |
|
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SundaISel]: Running SundaISel |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/SundaISel]: Finished (changed=True) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 0.472 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=True) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.039 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.051 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.022 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.018 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.014 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.076 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.018 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.070 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=False) |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.092 seconds |
|
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 1.250 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.035 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.016 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.023 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.009 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.008 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/DeConcat]: Running DeConcat |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/DeConcat]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.001 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.017 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.007 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.008 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.069 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False) |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.004 seconds |
|
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=False) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.122 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.038 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.415 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.016 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.107 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.013 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SpillPSum]: Finished (changed=False) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.120 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=False) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.015 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.012 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True) |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.089 seconds |
|
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.063 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.137 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.013 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.114 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.029 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.015 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.014 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.025 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.050 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.010 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DataStreaming]: Finished (changed=False) |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.026 seconds |
|
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 2.728 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.050 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.015 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.015 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 583.920us (128.000MiB, est bw: 229.856GB/s, 1.157% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (128, 128, 4096) %'19629.24343'[T_i0,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (128, 128, 4096) %'input289'[T_i0,i0.128,i1.4096] # id=23881, src_id=None, , instances=128 # dl = tensor_op_name: t4051_pftranspose_19629 | hlo_id: 2097 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 349.240us (128.000MiB, est bw: 384.314GB/s, 0.692% of tot. time) for bfloat16<128 x 4096> bfloat16 (128, 128, 32, 128) %'output289'[16i0_18312_0+i0_18312_1,i0.128,i2.32,i1.128] = store bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (128, 128, 4096) %'t4051_pftranspose_19629'[16i0_18312_0+i0_18312_1,i0.128,i1.128+128i2.32] # id=21501, src_id=None, , instances=128 # dl = tensor_op_name: _transpose.578 | hlo_id: 2097 | [[i0.128];[i1.128, i2.32]] -> [[i0.128];[i1.128, i2.32]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18341.23903'[T_i0,T_i2_26087,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input6'[T_i0,i0.128,T_i2_26087,i1.3584] # id=22057, src_id=None, , instances=64 # dl = tensor_op_name: t2070_pftranspose_18341 | hlo_id: 1531 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18382.23917'[T_i0,T_i2_26095,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input15'[T_i0,i0.128,T_i2_26095,i1.3584] # id=22115, src_id=None, , instances=64 # dl = tensor_op_name: t2133_pftranspose_18382 | hlo_id: 1549 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18423.23931'[T_i0,T_i2_26103,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input24'[T_i0,i0.128,T_i2_26103,i1.3584] # id=22173, src_id=None, , instances=64 # dl = tensor_op_name: t2196_pftranspose_18423 | hlo_id: 1567 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18464.23945'[T_i0,T_i2_26111,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input33'[T_i0,i0.128,T_i2_26111,i1.3584] # id=22231, src_id=None, , instances=64 # dl = tensor_op_name: t2259_pftranspose_18464 | hlo_id: 1585 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18505.23959'[T_i0,T_i2_26119,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input42'[T_i0,i0.128,T_i2_26119,i1.3584] # id=22289, src_id=None, , instances=64 # dl = tensor_op_name: t2322_pftranspose_18505 | hlo_id: 1603 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18546.23973'[T_i0,T_i2_26127,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input51'[T_i0,i0.128,T_i2_26127,i1.3584] # id=22347, src_id=None, , instances=64 # dl = tensor_op_name: t2385_pftranspose_18546 | hlo_id: 1621 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18587.23987'[T_i0,T_i2_26135,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input60'[T_i0,i0.128,T_i2_26135,i1.3584] # id=22405, src_id=None, , instances=64 # dl = tensor_op_name: t2448_pftranspose_18587 | hlo_id: 1639 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18628.24001'[T_i0,T_i2_26143,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input69'[T_i0,i0.128,T_i2_26143,i1.3584] # id=22463, src_id=None, , instances=64 # dl = tensor_op_name: t2511_pftranspose_18628 | hlo_id: 1657 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]] |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.029 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.015 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.304 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler |
|
2025-09-05T19:19:07Z WARNING 1971 [sg0000/Tensorizer/StaticProfiler]: matmul-based transposes inserted by penguin takes up 100.00 percent of all matmul computation |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.035 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.130 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.032 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.038 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds |
|
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop |
|
2025-09-05T19:19:08Z INFO 1971 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) |
|
2025-09-05T19:19:08Z INFO 1971 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.618 seconds |
|
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: BirCodeGen estimate #instances=285991 in sg0000 |
|
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: IR signature: 868d010c2395728dde8cb4de874491d6109b3553bfa434107fadd7604cffd131 for nc00/sg0000/TensorizerBIR |
|
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: Weights total number of bytes: 131072 |
|
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: Successfully built model. |
|
2025-09-05T19:19:09Z USER 1971 [root/Tensorizer/Tensorizer]: Tensorizer finished after 27.551 seconds |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: End tensorization |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input0 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input1 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input2 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input3 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input4 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input5 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input6 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input7 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input8 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input9 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input10 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input11 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input12 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input13 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input14 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input15 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input16 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input17 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input18 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input19 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input20 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input21 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input22 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input23 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input24 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input25 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input26 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input27 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input28 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input29 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input30 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input31 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input32 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input33 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input34 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input35 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input36 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input37 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input38 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input39 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input40 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input41 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input42 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input43 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input44 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input45 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input46 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input47 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input48 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input49 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input50 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input51 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input52 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input53 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input54 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input55 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input56 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input57 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input58 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input59 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input60 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input61 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input62 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input63 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input64 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input65 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input66 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input67 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input68 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input69 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input70 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input71 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input72 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input73 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input74 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input75 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input76 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input77 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input78 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input79 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input80 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input81 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input82 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input83 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input84 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input85 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input86 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input87 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input88 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input89 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input90 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input91 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input92 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input93 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input94 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input95 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input96 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input97 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input98 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input99 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input100 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input101 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input102 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input103 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input104 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input105 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input106 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input107 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input108 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input109 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input110 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input111 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input112 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input113 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input114 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input115 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input116 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input117 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input118 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input119 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input120 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input121 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input122 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input123 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input124 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input125 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input126 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input127 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input128 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input129 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input130 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input131 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input132 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input133 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input134 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input135 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input136 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input137 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input138 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input139 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input140 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input141 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input142 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input143 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input144 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input145 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input146 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input147 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input148 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input149 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input150 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input151 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input152 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input153 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input154 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input155 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input156 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input157 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input158 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input159 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input160 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input161 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input162 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input163 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input164 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input165 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input166 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input167 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input168 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input169 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input170 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input171 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input172 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input173 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input174 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input175 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input176 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input177 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input178 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input179 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input180 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input181 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input182 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input183 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input184 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input185 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input186 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input187 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input188 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input189 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input190 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input191 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input192 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input193 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input194 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input195 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input196 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input197 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input198 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input199 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input200 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input201 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input202 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input203 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input204 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input205 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input206 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input207 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input208 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input209 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input210 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input211 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input212 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input213 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input214 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input215 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input216 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input217 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input218 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input219 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input220 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input221 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input222 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input223 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input224 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input225 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input226 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input227 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input228 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input229 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input230 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input231 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input232 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input233 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input234 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input235 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input236 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input237 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input238 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input239 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input240 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input241 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input242 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input243 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input244 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input245 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input246 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input247 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input248 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input249 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input250 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input251 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input252 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input253 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input254 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input255 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input256 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input257 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input258 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input259 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input260 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input261 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input262 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input263 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input264 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input265 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input266 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input267 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input268 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input269 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input270 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input271 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input272 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input273 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input274 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input275 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input276 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input277 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input278 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input279 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input280 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input281 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input282 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input283 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input284 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input285 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input286 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input287 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input288 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input289 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input290 |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: wrote bir.json |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: wrote tensor_map.json |
|
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Job #0 finished |
|
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.Frontend.0 |
|
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0 |
|
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0 |
|
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0 |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: BackendDriver has 1 states with 1 core LNC |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: BackendDriver: no partitions found. Switching to flat flow. |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Job WalrusDriver len(in_states) 1 |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Processing input #0 |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: BackendDriver in_state.num_states 1 with 1 core LNC |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Executing /opt/conda/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/log-neuron-cc.txt --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-color-rotation --max-load-lower-bound 0.14 --mm-reorder-opt --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/conda/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/conda/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json --unified-backend-and-legacy-codegen --tensor-map tensor_map.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --dge-levels vector_dynamic_offsets,scalar_dynamic_offset,io --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Working directory is /opt/vllm/neuronxcc-gss0wi_w/sg00 |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: propagate_exit=True |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: use_logger=False |
|
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: expose_stderr=True |
|
2025-09-05T19:19:09Z INFO 2007 [Logging]: Logging to ../../../../models/mistral-7b-v0.3-instruct-neuronx/layout_opt/log-neuron-cc.txt at level 'INFO' |
|
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: max_allowed_parallelism=32 |
|
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: Backend driver mtBackend: false numModules: 1 Cwd: "/opt/vllm/neuronxcc-gss0wi_w/sg00" |
|
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: DynamicDMA is enabled |
|
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: DynamicDMA levels being enabled: io, scalar_dynamic_offset, vector_dynamic_offsets, |
|
2025-09-05T19:19:09Z USER 2007 [BackendPassManager]: Running mod_parallel_pass |
|
2025-09-05T19:19:09Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:09Z USER 2007 [ModuleForkPass]: Running do_nothing |
|
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:09Z USER 2007 [ModuleForkPass]: do_nothing finished after 0.002 seconds |
|
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: curr_vmrss: 175mb, ru_maxrss: 419mb (delta=0mb) |
|
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:09Z USER 2007 [ModuleForkPass]: Running birverifier |
|
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: birverifier finished after 0.275 seconds |
|
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=542mb) |
|
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.282 seconds |
|
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=542mb) |
|
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [SubgraphForkPass]: Running lnc_verifier |
|
2025-09-05T19:19:10Z INFO 2007 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds |
|
2025-09-05T19:19:10Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=0mb) |
|
2025-09-05T19:19:10Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.001 seconds |
|
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=0mb) |
|
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: Running mod_parallel_pass |
|
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: Running expand_replication |
|
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z INFO 2007 [ExpandReplication]: Found 0 replicated matmults |
|
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: expand_replication finished after 0.001 seconds |
|
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=0mb) |
|
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: Running unroll |
|
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290 |
|
2025-09-05T19:19:10Z INFO 2007 [Unroll]: INFO (Unroll) Start unrolling at Fri Sep 5 19:19:10 2025 |
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: INFO (Unroll) DONE unrolling Fri Sep 5 19:19:10 2025 |
|
|
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: sg0000 Instruction count after Unroll: |
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Total count: 285701 |
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Matmult: 217153 |
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: GenericCopy: 54337 |
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Load: 7106 |
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Save: 7105 |
|
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Unrolled DGE count with Dynamic AP: 0 |
|
2025-09-05T19:19:12Z USER 2007 [ModuleForkPass]: unroll finished after 2.471 seconds |
|
2025-09-05T19:19:12Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2540mb, ru_maxrss: 2541mb (delta=1580mb) |
|
2025-09-05T19:19:12Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69807 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 2.543 seconds |
|
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: curr_vmrss: 1679mb, ru_maxrss: 2541mb (delta=1580mb) |
|
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69807 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69807 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:12Z USER 2007 [SubgraphForkPass]: Running dead_code_elim |
|
2025-09-05T19:19:12Z INFO 2007 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=69807 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:12Z USER 2007 [SubgraphForkPass]: dead_code_elim finished after 0.257 seconds |
|
2025-09-05T19:19:12Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 1699mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:12Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.266 seconds |
|
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: curr_vmrss: 1699mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: Running mod_parallel_pass |
|
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:12Z USER 2007 [ModuleForkPass]: Running birverifier |
|
2025-09-05T19:19:12Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: birverifier finished after 0.232 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.239 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [SubgraphForkPass]: Running lnc_verifier |
|
2025-09-05T19:19:13Z INFO 2007 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [SubgraphForkPass]: lnc_verifier finished after 0.003 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.010 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: Running mod_parallel_pass |
|
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running instruction_reorder |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: instruction_reorder finished after 0.027 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running psum_legalization |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: psum_legalization finished after 0.031 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running legalize_cce_dma |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: legalize_cce_dma finished after 0.014 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running error_injector |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z WARNING 2007 [ErrorInjector]: Unrecognized injected error value "0" |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: error_injector finished after 0.003 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running vn_splitter |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z INFO 2007 [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 0 |
|
2025-09-05T19:19:13Z INFO 2007 [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 |
|
2025-09-05T19:19:13Z INFO 2007 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 |
|
2025-09-05T19:19:13Z INFO 2007 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 |
|
2025-09-05T19:19:13Z INFO 2007 [VNSplitterPass]: INFO (VNSplitter) Time: 0.004 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [VNSplitterPass]: INFO (VerticalFusion) Time: 0.073 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [VNSplitterPass]: INFO (ShrinkDN) Time: 0.075 seconds |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: vn_splitter finished after 0.186 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1764mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running constant_propagate |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: constant_propagate finished after 0.482 seconds |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1768mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running lower_ac |
|
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z INFO 2007 [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: lower_ac finished after 0.026 seconds |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1768mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running input_dma_coalescing |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z INFO 2007 [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: input_dma_coalescing finished after 0.078 seconds |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1768mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running remat_optimization |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z INFO 2007 [RematOpt]: Removed 0 remat instructions |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: remat_optimization finished after 0.174 seconds |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running early_peephole_opts |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z INFO 2007 [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true |
|
2025-09-05T19:19:14Z INFO 2007 [EarlyPeepholeOpts]: Activation Accumulate: 0 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: early_peephole_opts finished after 0.048 seconds |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running coalesce_multichannel_cc_ops |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.015 seconds |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running infer_stream_ids |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: infer_stream_ids finished after 0.015 seconds |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running pre_sched |
|
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start PRE scheduling 2 cores: 1 at: Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: Start... |
|
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: Found 0 Splits CCs |
|
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: Grouped CCs to 0 clusters. |
|
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors |
|
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts |
|
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: Done. |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start split live ranges Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Num_Splits: 0 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: End split live ranges Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Strt remove redundncies Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_memsets |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_memsets: 0 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_loads |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_loads: 0 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: End remove redundncies Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start DCE Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: End DCE Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start build flow dependencies Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [build_flow_deps]: Start build fdeps. Invocation: 1Fri Sep 5 19:19:14 2025 |
|
2025-09-05T19:19:14Z INFO 2007 [build_flow_deps]: Allocs: 69132 instructions: 285701 |
|
2025-09-05T19:19:15Z INFO 2007 [build_flow_deps]: Build fdeps inserted 714949 edges |
|
2025-09-05T19:19:15Z INFO 2007 [build_flow_deps]: Done build fdeps 714949 Fri Sep 5 19:19:15 2025 |
|
2025-09-05T19:19:15Z INFO 2007 [PreSched]: End build flow dependencies Fri Sep 5 19:19:15 2025 |
|
2025-09-05T19:19:15Z INFO 2007 [PreSched]: Start remove useless insts Fri Sep 5 19:19:15 2025 |
|
2025-09-05T19:19:15Z INFO 2007 [PreSched]: remove_useless_insts |
|
2025-09-05T19:19:16Z INFO 2007 [PreSched]: remove Useless Instructions: 0 |
|
2025-09-05T19:19:16Z INFO 2007 [PreSched]: End remove useless insts Fri Sep 5 19:19:16 2025 |
|
2025-09-05T19:19:16Z INFO 2007 [PreSched]: Start scratchpad optimization Fri Sep 5 19:19:16 2025 |
|
2025-09-05T19:19:16Z INFO 2007 [PreSched]: End scratchpad optimization Fri Sep 5 19:19:16 2025 |
|
2025-09-05T19:19:16Z INFO 2007 [PreSched]: DONE PRE scheduling Fri Sep 5 19:19:16 2025 |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: pre_sched finished after 1.914 seconds |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running tensor_copy_elim |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: Tensor CP elimination: 0 |
|
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: tensor_copy_elim finished after 0.341 seconds |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running dynamic_dma_setup |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: dynamic_dma_setup finished after 0.004 seconds |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running runtime_memory_reservation |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: runtime_memory_reservation finished after 0.004 seconds |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running coloring_allocator_psum |
|
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:16Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions |
|
2025-09-05T19:19:16Z INFO 2007 [ColoringAllocator::Rep]: linearize and check |
|
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: allocating PSUM |
|
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: main loop |
|
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: renumber locations |
|
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: size = 54337 |
|
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: build_no_bitmap start |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: 100% PSUM demand before spilling |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: PSUM high-water mark = 8 tensors |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: found 182784 edges |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: mean: 6.72779 |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: median: 6.99997 |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: adjacency vectors require 1462272 bytes |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: build_no_bitmap done |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: find costs |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: simplify interference graph |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: initialize low and high |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: lo = 54337 |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: hi = 0 |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: inf = 0 |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: total = 54337 |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: simplify |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: new candidates = 0 |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: select ranges |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: no more spills |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: PSUM score = 0 (lower is better) |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: spilling from PSUM cost about 0 cycles |
|
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: 100% PSUM utilization after allocation |
|
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: coloring_allocator_psum finished after 0.597 seconds |
|
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1901mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: Running dma_optimization_psum |
|
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:17Z INFO 2007 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions |
|
2025-09-05T19:19:17Z INFO 2007 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations |
|
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: dma_optimization_psum finished after 0.203 seconds |
|
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1901mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: Running address_rotation_psum |
|
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:18Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-09-05T19:19:18Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-09-05T19:19:19Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-09-05T19:19:19Z USER 2007 [ModuleForkPass]: address_rotation_psum finished after 1.984 seconds |
|
2025-09-05T19:19:19Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1907mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:19Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:19Z USER 2007 [ModuleForkPass]: Running coloring_allocator_sb |
|
2025-09-05T19:19:19Z INFO 2007 [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 7114104832 |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 7875 bytes |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 7114072064 |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 7822 bytes |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 0 |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 0 bytes |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions |
|
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: linearize and check |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: allocating SB |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: main loop |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: renumber locations |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: size = 14212 |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: find partners |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: found 54337 accumulation groups |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: largest = 19629.23884_i1022 |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: tensors = 2 |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: requires 8448 bytes/partition |
|
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: expanding partners |
|
2025-09-05T19:19:19Z INFO 2007 []: find first defs for local |
|
2025-09-05T19:19:19Z INFO 2007 []: find first defs for global |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: find loads |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 1 pin count |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 6081 remat count |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: build interference graph |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: pass 1 int-tree |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Num intervals 14212 Num locations 14212 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: IntervalTree Build Done |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: info.neighbors init Done |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: info.neighbors partners Done |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: IntervalTree readback Done |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: edge: 31236 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: mean: 4.39572 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: median: 2.00087 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: find costs |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: best-of-n loop, heuristic = 0 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: simplify interference graph |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: initialize safe & unsafe |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: safe = 14210 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: unsafe = 1 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: inf = 0 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: total = 14211 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: simplify |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 14212 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: new candidates = 0 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: select ranges |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Total: 14211 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Spilled: 0.000 (0) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Allocated: 1.000 (14211) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Rover zone: 0.989 (14051) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Pre-rover zone: 0.009 (128) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Post-rover zone: 0.002 (32) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Slice zone: 0.000 (0) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Blocks nothing: 0.000 (0) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Blocks medium: 0.000 (0) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Blocks tall: 1.000 (14211) |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Visited until tall blocking (mean): 0.997 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Visited until tall blocking (median): 1.000 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Visited until tall blocking (p95): 1.000 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Success |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: SB spills = 0 tensors |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: size = 0 bytes/partition |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: remats = 0 tensors |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: unpinned = 0 tensors |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: size = 0 bytes/partition |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: SB score = 0 |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: spilling from SB cost about 0 cycles |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: pinning saved approximately 9010 cycles |
|
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 0% SB utilization after allocation |
|
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 7114104832 |
|
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 7875 bytes |
|
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 7114072064 |
|
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 7822 bytes |
|
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 0 |
|
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 0 bytes |
|
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: coloring_allocator_sb finished after 1.103 seconds |
|
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1911mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: Running address_rotation_sb |
|
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:20Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: address_rotation_sb finished after 0.290 seconds |
|
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1912mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: Running dma_optimization_sb |
|
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:20Z INFO 2007 [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 14228176896, 50.0001% input load, 49.9999% output write, 0% spill/reload [sg0000] |
|
2025-09-05T19:19:20Z INFO 2007 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: removed 0 identical load |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: adjusted 0 DMACopy remat |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: adjusted 0 DMACopy remat |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: sub-graph will get execute 1 times |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(7.1141e+09) |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations |
|
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: average loaded DMA size 7875 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: average saved DMA size 7822 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 7114104832 |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 7875 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 7114072064 |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 7822 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 0, 0% out of total dma traffic |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 14228176896, 50.0001% input load, 49.9999% output write, 0% spill/reload [sg0000] |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 7114104832 |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 7875 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 7114072064 |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 7822 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 0 |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 0 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 7848 bytes |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: DMA optimization re-enable optimization |
|
2025-09-05T19:19:22Z USER 2007 [ModuleForkPass]: dma_optimization_sb finished after 1.621 seconds |
|
2025-09-05T19:19:22Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1932mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:22Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:22Z USER 2007 [ModuleForkPass]: Running address_rotation_sb |
|
2025-09-05T19:19:22Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 5919 Sb address |
|
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 4699 Sb address |
|
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 1769 Sb address |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: address_rotation_sb finished after 1.665 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1932mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running coloring_allocator_dram |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions |
|
2025-09-05T19:19:24Z INFO 2007 [ColoringAllocator::Rep]: linearize and check |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: reserved space = 14496612352 bytes |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: spill space = 0 bytes |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: aligned spill space = 0 bytes |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: dram space = 107374182400 bytes |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: renumber locations |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: size = 0 |
|
2025-09-05T19:19:24Z INFO 2007 []: find first defs for local |
|
2025-09-05T19:19:24Z INFO 2007 []: find first defs for global |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: Num intervals 0 Num locations 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: IntervalTree Build Done |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: info.neighbors init Done |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: IntervalTree readback Done |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: simplify interference graph |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: initialize low and high |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: lo = 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: hi = 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: total = 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: simplify |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: new candidates = 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: select ranges |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: CC buffer size limit 524288000 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: allreduce_dram_hwm 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: Real CC buffer size 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: DRAM hwm after allocation: 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: DRAM allocation successful |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: coloring_allocator_dram finished after 0.469 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1937mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running address_rotation_dram |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: Runtime page size at 512MB |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: DRAM hwm before rotation 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: allreduce buffer size 524288000 |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: allreduce hwm 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: Real CC buffer size 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: DRAM hwm after rotation 0 |
|
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: address_rotation_dram finished after 0.171 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running tensorcopy_accel |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z INFO 2007 [TensorCopyAccel::Impl]: Running peephole optimization pass |
|
2025-09-05T19:19:24Z INFO 2007 [TensorCopyAccel::Impl]: Accelerated 0 out of 54337 tensorcopy in Function: sg0000 average acceleration factor: -nan |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: tensorcopy_accel finished after 0.025 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running peephole_opts |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z INFO 2007 [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: peephole_opts finished after 0.055 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running lower_kernel |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Started running LowerKernel |
|
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Start of kernel lowering pass, number of insts: 285701, number of allocs: 69132 |
|
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Scan BKs time (s): 0.013202 |
|
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Lower BKs time (s): 3e-06 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: lower_kernel finished after 0.019 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running lower_nki_kernel |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: lower_nki_kernel finished after 0.017 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running dynamic_dma_cleanup |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: dynamic_dma_cleanup finished after 0.029 seconds |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1941mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running birverifier |
|
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: birverifier finished after 0.195 seconds |
|
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1941mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: Running dynamic_dma_scan |
|
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: dynamic_dma_scan finished after 0.027 seconds |
|
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1941mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: Running build_fdeps |
|
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:25Z INFO 2007 [build_flow_deps]: Start build fdeps. Invocation: 2Fri Sep 5 19:19:25 2025 |
|
2025-09-05T19:19:25Z INFO 2007 [build_flow_deps]: Allocs: 69132 instructions: 285701 |
|
2025-09-05T19:19:26Z INFO 2007 [build_flow_deps]: Build fdeps inserted 714949 edges |
|
2025-09-05T19:19:26Z INFO 2007 [build_flow_deps]: Done build fdeps 714949 Fri Sep 5 19:19:26 2025 |
|
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: build_fdeps finished after 1.075 seconds |
|
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1963mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: Running remove_redundancies |
|
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove_clobbered_writes |
|
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove_clobbered_writes: 0 |
|
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove_useless_insts |
|
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove Useless Instructions: 0 |
|
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: remove_redundancies finished after 0.090 seconds |
|
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1963mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: Running anti_dependency_analyzer |
|
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:26Z INFO 2007 [AntiDependencyAnalyzer]: Batch size: 1000 |
|
2025-09-05T19:19:26Z INFO 2007 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} |
|
2025-09-05T19:19:26Z INFO 2007 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 |
|
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: anti_dependency_analyzer finished after 0.989 seconds |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2058mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: Running tensor_copy_elim |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:27Z INFO 2007 [TensorCopyElim]: Tensor CP elimination: 0 |
|
2025-09-05T19:19:27Z INFO 2007 [TensorCopyElim]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: tensor_copy_elim finished after 0.240 seconds |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2071mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: Running prefetch_scheduling_before_sched |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.003 seconds |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2071mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: Running post_sched |
|
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:27Z INFO 2007 [post_scheduler]: Start PosT ScheD 3 sunda Fri Sep 5 19:19:27 2025 |
|
2025-09-05T19:19:29Z INFO 2007 [post_scheduler]: Time-aware hwm post-sched |
|
2025-09-05T19:19:31Z INFO 2007 [post_scheduler]: Time-aware simulation time: 59414448 |
|
2025-09-05T19:19:32Z INFO 2007 [post_scheduler]: Done PosT ScheD Fri Sep 5 19:19:32 2025 |
|
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: post_sched finished after 4.715 seconds |
|
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2468mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: Running expand_scheduling_units |
|
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: expand_scheduling_units finished after 0.027 seconds |
|
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2218mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: Running address_rotation_sb |
|
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:34Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 9197 PSUM Banks |
|
2025-09-05T19:19:34Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 5583 PSUM Banks |
|
2025-09-05T19:19:35Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-09-05T19:19:35Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 2554 Sb address |
|
2025-09-05T19:19:36Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 2329 Sb address |
|
2025-09-05T19:19:36Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:36Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:37Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 59 Sb address |
|
2025-09-05T19:19:37Z INFO 2007 [DMAOptimizationBase]: moved 0 MM forward |
|
2025-09-05T19:19:37Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:38Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-09-05T19:19:38Z USER 2007 [ModuleForkPass]: address_rotation_sb finished after 5.695 seconds |
|
2025-09-05T19:19:38Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2254mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:38Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:38Z USER 2007 [ModuleForkPass]: Running anti_dependency_analyzer |
|
2025-09-05T19:19:38Z INFO 2007 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:38Z INFO 2007 [AntiDependencyAnalyzer]: Batch size: 1000 |
|
2025-09-05T19:19:38Z INFO 2007 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} |
|
2025-09-05T19:19:38Z INFO 2007 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 |
|
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: anti_dependency_analyzer finished after 1.349 seconds |
|
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2285mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: Running anti_dependency_analyzer |
|
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:39Z INFO 2007 [AntiDependencyAnalyzer]: Batch size: 1000 |
|
2025-09-05T19:19:39Z INFO 2007 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} |
|
2025-09-05T19:19:39Z INFO 2007 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 |
|
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: anti_dependency_analyzer finished after 0.188 seconds |
|
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2287mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: Running dep_opt |
|
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:39Z INFO 2007 [build_flow_deps]: Start build fdeps. Invocation: 3Fri Sep 5 19:19:39 2025 |
|
2025-09-05T19:19:39Z INFO 2007 [build_flow_deps]: Allocs: 69132 instructions: 285701 |
|
2025-09-05T19:19:40Z INFO 2007 [build_flow_deps]: Build fdeps inserted 701433 edges |
|
2025-09-05T19:19:40Z INFO 2007 [build_flow_deps]: Done build fdeps 701433 Fri Sep 5 19:19:40 2025 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: dep_opt finished after 1.572 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running report_stats |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: Data Movement Statistics: sg0000 |
|
βββββββββββββββ¬βββββββββββββββββββββββββββββ¬ββββββββ¬βββββββββββββ |
|
β Instruction β Kind β Count β Bytes β |
|
βββββββββββββββΌβββββββββββββββββββββββββββββΌββββββββΌβββββββββββββ€ |
|
β Load β Const -> Internal β 1 β 32768 β |
|
β Load β ExternalInput -> Internal β 7105 β 7114072064 β |
|
β Save β Internal -> ExternalOutput β 7105 β 7114072064 β |
|
βββββββββββββββ΄βββββββββββββββββββββββββββββ΄ββββββββ΄βββββββββββββ |
|
|
|
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: |
|
βββββββββββββββββββββββ¬ββββββββ |
|
β Bytes per partition β Count β |
|
βββββββββββββββββββββββΌββββββββ€ |
|
β 64 β 65 β |
|
β 256 β 66 β |
|
β 7168 β 4096 β |
|
β 8192 β 9984 β |
|
βββββββββββββββββββββββ΄ββββββββ |
|
|
|
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: MM Stats: #MatMults 217153 #MatMult-Transposes 217153 |
|
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: IO Tensor size combined: 14496579584 |
|
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: IO Tensor Statistics: |
|
ββββββββββββββββββββββ¬βββββββββββββββββ¬βββββββββββ¬βββββββββββββββ |
|
β Largest IO Tensors β Kind β Src Type β Size (Bytes) β |
|
ββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββΌβββββββββββββββ€ |
|
β output289 β ExternalOutput β bfloat16 β 134217728 β |
|
β input0 β ExternalInput β bfloat16 β 134217728 β |
|
β output0 β ExternalOutput β bfloat16 β 134217728 β |
|
β input289 β ExternalInput β bfloat16 β 134217728 β |
|
β input7 β ExternalInput β bfloat16 β 58720256 β |
|
β input9 β ExternalInput β bfloat16 β 58720256 β |
|
β input24 β ExternalInput β bfloat16 β 58720256 β |
|
β input18 β ExternalInput β bfloat16 β 58720256 β |
|
β input15 β ExternalInput β bfloat16 β 58720256 β |
|
β input27 β ExternalInput β bfloat16 β 58720256 β |
|
ββββββββββββββββββββββ΄βββββββββββββββββ΄βββββββββββ΄βββββββββββββββ |
|
|
|
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: Large (Internal) Tensor Statistics: |
|
ββββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββ |
|
β Largest Tensors β Kind β Src Type β Size (Bytes) β |
|
ββββββββββββββββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββββββ€ |
|
β DynamicDMAScratchLoc β Internal β uint8 β 2097152 β |
|
β t2035_pftranspose_18320_i5 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i2 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i1 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i3 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i4 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i6 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i9 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i8 β Internal β bfloat16 β 1048576 β |
|
β t2035_pftranspose_18320_i7 β Internal β bfloat16 β 1048576 β |
|
ββββββββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββ |
|
|
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: report_stats finished after 0.059 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 28.058 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running assign_trigger_engine |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to assign_trigger_engine: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z INFO 2007 [AssignTriggerEngine]: Assigned trigger engine for 0 DMA instructions. Moved 0 DMA instructions to CC's engines. |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: assign_trigger_engine finished after 0.097 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: Running lower_local_collectives |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: lower_local_collectives finished after 0.004 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: Running extend_shared_lifetimes |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: extend_shared_lifetimes finished after 0.003 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: Running dead_code_elim |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z INFO 2007 [DeadCodeElim]: eliminateDeadStore removed 0 instructions |
|
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: dead_code_elim finished after 0.245 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.266 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running assign_hwdge_engine |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: assign_hwdge_engine finished after 0.027 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running mod_parallel_pass |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running alloc_queues |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z INFO 2007 [AllocQueues]: DMACopy transpose will be triggered from multiple engines |
|
2025-09-05T19:19:41Z INFO 2007 [AllocQueues]: Alloc Queue info: |
|
βββββββββββββββββββ¬βββββββββββββββββ¬βββββββββ¬βββββββββββββ¬βββββββββββββββββββ |
|
β Name β DMAQueue::Type β Engine β Num Queues β Num instructions β |
|
βββββββββββββββββββΌβββββββββββββββββΌβββββββββΌβββββββββββββΌβββββββββββββββββββ€ |
|
β qSPSpillReload0 β data β SP β 16 β 1 β |
|
β qPoolDynamic β dynamic β Pool β 16 β 14210 β |
|
βββββββββββββββββββ΄βββββββββββββββββ΄βββββββββ΄βββββββββββββ΄βββββββββββββββββββ |
|
|
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: alloc_queues finished after 0.030 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running chain_dma_transposes |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: chain_dma_transposes finished after 0.003 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running prefetch_scheduling_after_sched |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.003 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running lower_control |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z INFO 2007 [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps |
|
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: lower_control finished after 0.197 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.252 seconds |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running nc_parallel_pass |
|
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to nc_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z USER 2007 [CoreForkPass]: Running dep_reduction |
|
2025-09-05T19:19:41Z INFO 2007 [CoreForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:41Z INFO 2007 [DepReduction]: Start Dependency Reduction |
|
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing async instrs... |
|
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing secondary edges per engine... |
|
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 484672 |
|
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing redundant descendants, Done. Num edges removed 497220 |
|
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing async instrs, Done. Num edges removed 497220 |
|
2025-09-05T19:19:43Z INFO 2007 [DepReduction]: Num Async removed: 0 |
|
2025-09-05T19:19:43Z INFO 2007 [DepReduction]: Finished dependency reduction: 1173793 removed, new total 115343 |
|
2025-09-05T19:19:43Z INFO 2007 [DepReduction]: Finished Dependency Reduction |
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: dep_reduction finished after 1.714 seconds |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running lower_dynamic_dma |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: lower_dynamic_dma finished after 0.048 seconds |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running legalize_dynamic_dma |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:43Z INFO 2007 [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 0 DGE instructions |
|
2025-09-05T19:19:43Z INFO 2007 [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 0 DGE instructions were scanned |
|
2025-09-05T19:19:43Z INFO 2007 [LegalizeDynamicDMA]: |
|
βββββββββββββ¬ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ |
|
β Sub-Pass β Illegal Instructions Detected β New Instructions Generated β |
|
βββββββββββββΌββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ€ |
|
β Peeling β 0 β 0 β |
|
β Unrolling β 0 β 0 β |
|
β Splitting β 0 β 0 β |
|
βββββββββββββ΄ββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββ |
|
|
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: legalize_dynamic_dma finished after 0.100 seconds |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running lower_dma |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to lower_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:43Z INFO 2007 [LowerDMA]: lower_dma metrics start |
|
IO |
|
Copy (DGE/DMA) |
|
128 partition : 14145/14145 (100% DGE) |
|
power-of-2 partition : 14210/14210 (100% DGE) |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 14210/14210 (100% DGE) |
|
Cast (DGE/DMA) |
|
128 partition : 0/0 |
|
power-of-2 partition : 0/0 |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 0/0 |
|
Spill/Reload |
|
Copy (DGE/DMA) |
|
128 partition : 0/1 (0% DGE) |
|
power-of-2 partition : 0/1 (0% DGE) |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 0/1 (0% DGE) |
|
Cast (DGE/DMA) |
|
128 partition : 0/0 |
|
power-of-2 partition : 0/0 |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 0/0 |
|
CopyMode |
|
CCE : 0 |
|
Transpose : 0 |
|
Replicate : 0 |
|
Dynamic (DGE/DMA) |
|
scalar : 0/0 |
|
vector : 0/0 |
|
Opcode |
|
ReadVarAddr : 0 |
|
IndirectLoad : 0 |
|
IndirectSave : 0 |
|
IndirectSaveAccumulate : 0 |
|
DstReduceDGE : 0 |
|
lower_dma metrics end |
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: lower_dma finished after 0.145 seconds |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running coalesce_dma_blocks |
|
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to coalesce_dma_blocks: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z INFO 2007 [CoalesceDmaBlocks]: Coaleseced 0 DMA triggers |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: coalesce_dma_blocks finished after 0.133 seconds |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running expand_all_engine |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: expand_all_engine finished after 0.050 seconds |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running alloc_semaphores |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: alloc_semaphores finished after 0.242 seconds |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running expand_inst_late |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: expand_inst_late finished after 0.262 seconds |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running seq_inst_opt |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z INFO 2007 [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: seq_inst_opt finished after 0.027 seconds |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running lower_sync |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to lower_sync: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: lower_sync finished after 0.102 seconds |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300967 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running lower_act |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to lower_act: modules=1 functions=1 allocs=69132 blocks=1 instructions=300967 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: lower_act finished after 0.036 seconds |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running lower_dve |
|
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to lower_dve: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:44Z INFO 2007 [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/conda/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json |
|
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: lower_dve finished after 0.303 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: curr_vmrss: 2296mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: Running lower_ap |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Inputs to lower_ap: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: lower_ap finished after 0.066 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: curr_vmrss: 2177mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: Running coloring_allocator_reg |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions |
|
2025-09-05T19:19:45Z INFO 2007 [ColoringAllocator::Rep]: linearize and check |
|
2025-09-05T19:19:45Z INFO 2007 [REG_Allocator]: allocating REG |
|
2025-09-05T19:19:45Z INFO 2007 [REG_Allocator]: main loop iteration 1 |
|
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: coloring_allocator_reg finished after 0.044 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: nc_parallel_pass finished after 3.389 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: Running mod_parallel_pass |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [ModuleForkPass]: Running birverifier |
|
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [ModuleForkPass]: birverifier finished after 0.254 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.261 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [SubgraphForkPass]: Running lnc_verifier |
|
2025-09-05T19:19:45Z INFO 2007 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [SubgraphForkPass]: lnc_verifier finished after 0.003 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.010 seconds |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: Running mod_parallel_pass |
|
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z USER 2007 [ModuleForkPass]: Running codegen |
|
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: Inputs to codegen: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:45Z INFO 2007 [Codegen]: Total compiler allocated DRAM tensors: 0 GB |
|
2025-09-05T19:19:45Z INFO 2007 [Codegen]: Total un-allocated DRAM tensors by kind: |
|
2025-09-05T19:19:45Z INFO 2007 [Codegen]: |
|
βββββββββββββββββ¬ββββββββββββββ |
|
β TensorKind β Size (GB) β |
|
βββββββββββββββββΌββββββββββββββ€ |
|
β ExternalInput β 6.7505 β |
|
β Const β 3.05176e-05 β |
|
βββββββββββββββββ΄ββββββββββββββ |
|
|
|
2025-09-05T19:19:45Z INFO 2007 [Codegen]: Total runtime managed DRAM tensors: 6.75053 GB |
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Instruction Stats: |
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: |
|
βββββββββββββββββββββββ¬βββββββββ |
|
β Opcode β Count β |
|
βββββββββββββββββββββββΌβββββββββ€ |
|
β LDWEIGHTS β 217153 β |
|
β MATMUL β 217153 β |
|
β ACTIVATE β 54337 β |
|
β EVENT_SEMAPHORE β 15257 β |
|
β UNKNOWN(0xd4) β 14210 β |
|
β NOP β 8 β |
|
β PSEUDO_BRANCH_LABEL β 5 β |
|
β ACT_TABLE_LOAD β 1 β |
|
β PSEUDO_DMA_TRIGGER β 1 β |
|
βββββββββββββββββββββββ΄βββββββββ |
|
|
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: |
|
ββββββββββββββ¬βββββββββ |
|
β Engine β Count β |
|
ββββββββββββββΌβββββββββ€ |
|
β Unassigned β 0 β |
|
β GPSIMD β 20924 β |
|
β Scalar β 56947 β |
|
β Tensor β 440254 β |
|
β SyncDMA β 0 β |
|
β Vector β 2 β |
|
β Sync β 3 β |
|
β All β 0 β |
|
ββββββββββββββ΄βββββββββ |
|
|
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Total instructions: 518130 (0.030883 GB) |
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Total DynamicDMA instruction count: 14210 |
|
2025-09-05T19:19:46Z USER 2007 [Codegen]: isa_gen finished after 1.105 seconds |
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Number of DMA descriptors on each queue instance: |
|
βββββββββββββββββββ¬βββββββββββββββββ |
|
β Queue Instance β RT Descriptors β |
|
βββββββββββββββββββΌβββββββββββββββββ€ |
|
β qSPSpillReload0 β 256 β |
|
βββββββββββββββββββ΄βββββββββββββββββ |
|
|
|
Total descriptors: 256 (3.8147e-06 GB) |
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Number of DMA engines used by each queue: |
|
βββββββββββββββββββ¬ββββββββββββββββββββββ |
|
β Queue β DMA Engines β |
|
βββββββββββββββββββΌββββββββββββββββββββββ€ |
|
β qSPSpillReload0 β 16 β |
|
β qPoolDynamic β 16 β |
|
βββββββββββββββββββΌββββββββββββββββββββββ€ |
|
β TOTAL β 32 (must be <= 176) β |
|
βββββββββββββββββββ΄ββββββββββββββββββββββ |
|
|
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Tensors with largest descriptor count: |
|
ββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββββββ |
|
β Tensor Name β Kind β Src Type β Descriptor Count β |
|
ββββββββββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββββββββββ€ |
|
β identity_local_22027 β Internal β bfloat16 β 1 β |
|
β identity_22025 β Const β bfloat16 β 1 β |
|
ββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββββββ |
|
|
|
2025-09-05T19:19:46Z USER 2007 [Codegen]: dma_desc_gen finished after 0.004 seconds |
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Estimated peak DRAM usage: 6.78141 GB |
|
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Generating debug info |
|
2025-09-05T19:19:47Z USER 2007 [Codegen]: debug_info_gen finished after 0.568 seconds |
|
2025-09-05T19:19:47Z USER 2007 [ModuleForkPass]: codegen finished after 1.742 seconds |
|
2025-09-05T19:19:47Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2385mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:47Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:47Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 1.775 seconds |
|
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: curr_vmrss: 2204mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:47Z USER 2007 [BackendPassManager]: Running neff_packager |
|
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: Inputs to neff_packager: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:47Z WARNING 2007 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.169490.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json |
|
2025-09-05T19:19:47Z INFO 2007 [NeffFileWriter]: Neff will be written to: /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff |
|
2025-09-05T19:19:47Z INFO 2007 [NeffFileWriter]: IR signature: b43fb3a95505587f066b6d2d419d7150 for neff artifacts |
|
2025-09-05T19:19:47Z USER 2007 [BackendPassManager]: neff_packager finished after 0.299 seconds |
|
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: curr_vmrss: 2205mb, ru_maxrss: 2541mb (delta=0mb) |
|
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153 |
|
2025-09-05T19:19:47Z INFO 2007 [BackendDriver]: HBM scratchpad usage summary (post-allocation): |
|
ββββββββ¬ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββ |
|
β Core β Subgraph β Description β Value β |
|
ββββββββΌββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€ |
|
β nc00 β module β Peak scratchpad usage: local β 0.000000 GB β |
|
β nc00 β module β Total size of allocated tensors: local β 0.000000 GB β |
|
β nc00 β Max β Peak scratchpad usage: local β 0.000000 GB β |
|
β nc00 β Post-link β Peak scratchpad usage after intermediate tensor allocation β 0.000000 GB β |
|
β nc00 β Post-link β Total size of allocated intermediate tensors β 0.000000 GB β |
|
ββββββββΌββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€ |
|
β Max β Max β Peak scratchpad usage β 0.000000 GB β |
|
β Max β Max β Peak scratchpad usage (page-aligned) β 0.000000 GB β |
|
ββββββββ΄ββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββ |
|
|
|
2025-09-05T19:19:47Z INFO 2007 [BackendDriver]: Backend completed successfully, tearing down. |
|
2025-09-05T19:19:48Z INFO 1971 [job.WalrusDriver.0]: Job #0 finished |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0 |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0 |
|
2025-09-05T19:19:48Z INFO 1971 [job.BIRLinker.0]: Replay this job by calling: /opt/conda/bin/neuronx-cc compile --framework XLA --state '{"model": ["/models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo"], "tensormap": "tensor_map.json", "bir": "bir.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "state_dir": "/opt/vllm/neuronxcc-gss0wi_w/sg00", "state_id": "sg00"}' --pipeline BIRLinker |
|
2025-09-05T19:19:48Z INFO 1971 [job.BIRLinker.0]: BIRLinker cwd: /opt/vllm/neuronxcc-gss0wi_w |
|
2025-09-05T19:19:48Z INFO 1971 [job.BIRLinker.0]: Linking not needed. Netlist doesnt exist |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0 |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.Kelper.0 |
|
2025-09-05T19:19:48Z INFO 1971 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.Kelper.0 |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0 |
|
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1 |
|
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Processing input #0 |
|
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Start NeffWrapper |
|
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Executing: /opt/conda/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo --neff /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff --io_transposes /opt/vllm/neuronxcc-gss0wi_w/io_transposes.json --output /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/wrapped_neff.hlo --netlist /opt/vllm/neuronxcc-gss0wi_w/hlo_netlist.json |
|
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Could not open file: /opt/vllm/neuronxcc-gss0wi_w/hlo_netlist.json |
|
There are no io transposes nor zero-sized parameters. Output will not be produced. |
|
Hlo neff wrapper finished successfully. Have a wonderful day :D |
|
|
|
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Job #0 finished |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0 |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished pipeline Pipeline |
|
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Job #0 finished |
|
2025-09-05T19:19:48Z INFO 1939 [root]: Subcommand returned with exitcode=0 |
|
|