dimabr's picture
Add files using upload-large-folder tool
345c745 verified
2025-09-05T19:18:35Z INFO 1939 [root]: /opt/conda/bin/neuronx-cc compile /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo --framework XLA --target trn1 --output /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff --model-type=transformer -O1 --lnc=1 '--internal-hlo2tensorizer-options=--experimental-unsafe-fp8e4m3fn-as-fp8e4m3 --verify-hlo=false' --logfile=/models/mistral-7b-v0.3-instruct-neuronx/layout_opt/log-neuron-cc.txt --verbose=35
2025-09-05T19:18:41Z INFO 1939 [root]: NeuronX Compiler version 2.20.9961.0+0acef03a Python version 3.10.17 HWM version 2.20.0.9961+0acef03a NumPy version 1.26.4
2025-09-05T19:18:41Z INFO 1971 [root]: XLA detected
2025-09-05T19:18:41Z INFO 1971 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper
2025-09-05T19:18:41Z INFO 1971 [root]: Intermediate files stored in /opt/vllm/neuronxcc-gss0wi_w, output in /opt/vllm
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Processing input #0
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Running pipeline Pipeline.0
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Processing input #0
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: IR signature: b7a76fc5f3f76d1d69d57e0e784721bafd07e3a61734f6594e8c815123a8a771 for graph.hlo
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Executing: /opt/conda/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --layers-per-module=1 --partition --emit-tensor-level-dropout-ops --experimental-unsafe-fp8e4m3fn-as-fp8e4m3 --verify-hlo=false --native-to-custom-softmax --partitioner-opts='--transformer'
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: DEBUG: needsModular? No. macCnt 0 num non-trivial Ops 290
INFO: Switching to single-module compile. PrePartitionPipe skipped.
INFO: Found memory bound graph
INFO: Number of Native SoftmaxDx's detected and replaced: 0
INFO: Number of Native Softmax's detected and replaced: 0
Replaced 0 dropout sequences with OffloadedDropout
INFO: HloMacCount has found 0
INFO: Traffic has found 7248289792
INFO: AIF 0
HLO Ops used in computation: parameter reshape transpose tuple
Warning: Could not open file debug_info_hlo_partitions.json
2025-09-05 19:18:41.368875: W hilo/hlo2penguin/utils/DumpDebugInfo.cc:52] Truncating long HLO operator name %last = tuple(%p68, %transpose.290, %transpose.291, %transpose.292, %transpose.293, %transpose.294, %transpose.295, %transpose.296, %transpose.297, %transpose.298, %transpose.299, %transpose.300, %transpose.301, %transpose.302, %transpose.303, %transpose.304, %transpose.305, %transpose.306, %transpose.307, %transpose.308, %transpose.309, %transpose.310, %transpose.311, %transpose.312, %transpose.313, %transpose.314, %transpose.315, %transpose.316, %transpose.317, %transpose.318, %transpose.319, %transpos... to 512 characters in the compiler's debug metadata
Invoking RemoveOptimizationBarriers pass
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: IR signature: 2e0dd598350cceee9da216d825f016375db09d3b92ca63d286d74eb4016c4d07 for sg0000/HLOToTensorizer
2025-09-05T19:18:41Z INFO 1971 [job.HLOToTensorizer.0]: Job #0 finished
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0
2025-09-05T19:18:41Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.Frontend.0
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Job Frontend len(in_states) 1
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Processing input #0
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Start model loading
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Start tensorization
2025-09-05T19:18:41Z INFO 1971 [job.Frontend.0]: Num jobs: 32
2025-09-05T19:18:41Z USER 1971 [root/Tensorizer/Tensorizer]: Running Tensorizer
2025-09-05T19:18:41Z INFO 1971 [Tensorizer]: Frontend did not find netlist info. Switching to flat flow.
2025-09-05T19:18:41Z INFO 1971 [Tensorizer]: Building model from Penguin script "penguin.py"...
2025-09-05T19:18:41Z INFO 1971 [Tensorizer]: Tensorizer options: --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=matmult-bf16 --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Running DoNothing
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Finished (changed=True)
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False)
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.003 seconds
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False)
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.004 seconds
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True)
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.032 seconds
2025-09-05T19:18:41Z INFO 1971 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.011 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.004 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.033 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.047 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.014 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.004 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.068 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Running TCTransform
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.006 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Running TCTransform
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.060 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.005 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.005 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.016 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Rematerialization]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PadElimination]: Running PadElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PadElimination]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Running TCTransform
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Recompute]: Running Recompute
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Recompute]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.000 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [Tensorizer]: After optimization: 290 statements
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Running DoNothing
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DoNothing]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AutoCastTCInputs]: Running AutoCastTCInputs
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AutoCastTCInputs]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AutoCastTCInputs]: AutoCastTCInputs finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Running Simplifier
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.007 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.004 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.038 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Running Delinearization
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.019 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.005 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.035 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.017 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.016 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.004 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.031 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PGTiling]: Running PGTiling
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.048 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.007 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.003 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.019 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 0.114 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.002 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False)
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.001 seconds
2025-09-05T19:18:42Z INFO 1971 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=True)
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 14.818 seconds
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 15.072 seconds
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]:
20 MACROS WITH LARGEST INSTRUCTION COUNTS:
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 4096: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False)
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.174 seconds
2025-09-05T19:18:57Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True)
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.101 seconds
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True)
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.654 seconds
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False)
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.101 seconds
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False)
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.030 seconds
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False)
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.023 seconds
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop
2025-09-05T19:18:58Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.073 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.084 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.148 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]:
20 MACROS WITH LARGEST INSTRUCTION COUNTS:
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 4096: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1792: transpose_128x128
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.029 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.107 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.053 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.106 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.022 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.080 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.018 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.006 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.066 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False)
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.087 seconds
2025-09-05T19:18:59Z INFO 1971 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True)
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 0.437 seconds
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False)
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.109 seconds
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=False)
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.068 seconds
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/LICM]: Running LICM
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/LICM]: Finished (changed=False)
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/LICM]: LICM finished after 0.031 seconds
2025-09-05T19:19:00Z INFO 1971 [sg0000/Tensorizer/SundaISel]: Running SundaISel
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/SundaISel]: Finished (changed=True)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 0.472 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=True)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.039 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.051 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=False)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.022 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.018 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.014 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.076 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.018 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.070 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=False)
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.092 seconds
2025-09-05T19:19:01Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 1.250 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.035 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.016 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.023 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.009 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.008 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/DeConcat]: Running DeConcat
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/DeConcat]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.001 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.017 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.007 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.008 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.069 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False)
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.004 seconds
2025-09-05T19:19:02Z INFO 1971 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=False)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.122 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.038 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.415 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.016 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.107 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.013 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SpillPSum]: Finished (changed=False)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.120 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=False)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.015 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.012 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True)
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.089 seconds
2025-09-05T19:19:03Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.063 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.137 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.013 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.114 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.029 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.015 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.014 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.025 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.050 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.010 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DataStreaming]: Finished (changed=False)
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.026 seconds
2025-09-05T19:19:04Z INFO 1971 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 2.728 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.050 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.015 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.015 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs:
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 583.920us (128.000MiB, est bw: 229.856GB/s, 1.157% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (128, 128, 4096) %'19629.24343'[T_i0,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (128, 128, 4096) %'input289'[T_i0,i0.128,i1.4096] # id=23881, src_id=None, , instances=128 # dl = tensor_op_name: t4051_pftranspose_19629 | hlo_id: 2097 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 349.240us (128.000MiB, est bw: 384.314GB/s, 0.692% of tot. time) for bfloat16<128 x 4096> bfloat16 (128, 128, 32, 128) %'output289'[16i0_18312_0+i0_18312_1,i0.128,i2.32,i1.128] = store bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (128, 128, 4096) %'t4051_pftranspose_19629'[16i0_18312_0+i0_18312_1,i0.128,i1.128+128i2.32] # id=21501, src_id=None, , instances=128 # dl = tensor_op_name: _transpose.578 | hlo_id: 2097 | [[i0.128];[i1.128, i2.32]] -> [[i0.128];[i1.128, i2.32]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18341.23903'[T_i0,T_i2_26087,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input6'[T_i0,i0.128,T_i2_26087,i1.3584] # id=22057, src_id=None, , instances=64 # dl = tensor_op_name: t2070_pftranspose_18341 | hlo_id: 1531 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18382.23917'[T_i0,T_i2_26095,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input15'[T_i0,i0.128,T_i2_26095,i1.3584] # id=22115, src_id=None, , instances=64 # dl = tensor_op_name: t2133_pftranspose_18382 | hlo_id: 1549 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18423.23931'[T_i0,T_i2_26103,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input24'[T_i0,i0.128,T_i2_26103,i1.3584] # id=22173, src_id=None, , instances=64 # dl = tensor_op_name: t2196_pftranspose_18423 | hlo_id: 1567 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18464.23945'[T_i0,T_i2_26111,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input33'[T_i0,i0.128,T_i2_26111,i1.3584] # id=22231, src_id=None, , instances=64 # dl = tensor_op_name: t2259_pftranspose_18464 | hlo_id: 1585 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18505.23959'[T_i0,T_i2_26119,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input42'[T_i0,i0.128,T_i2_26119,i1.3584] # id=22289, src_id=None, , instances=64 # dl = tensor_op_name: t2322_pftranspose_18505 | hlo_id: 1603 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18546.23973'[T_i0,T_i2_26127,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input51'[T_i0,i0.128,T_i2_26127,i1.3584] # id=22347, src_id=None, , instances=64 # dl = tensor_op_name: t2385_pftranspose_18546 | hlo_id: 1621 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18587.23987'[T_i0,T_i2_26135,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input60'[T_i0,i0.128,T_i2_26135,i1.3584] # id=22405, src_id=None, , instances=64 # dl = tensor_op_name: t2448_pftranspose_18587 | hlo_id: 1639 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 262.710us (56.000MiB, est bw: 223.518GB/s, 0.521% of tot. time) for bfloat16<128 x 3584> TongaSB partitions[2] bfloat16 (32, 2, 128, 3584) %'18628.24001'[T_i0,T_i2_26143,i0.128,i1.3584] = load bfloat16<128 x 3584> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3584) %'input69'[T_i0,i0.128,T_i2_26143,i1.3584] # id=22463, src_id=None, , instances=64 # dl = tensor_op_name: t2511_pftranspose_18628 | hlo_id: 1657 | [[i0.128];[i1.3584]] -> [[i0.128];[i1.3584]]
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.029 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.015 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.304 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler
2025-09-05T19:19:07Z WARNING 1971 [sg0000/Tensorizer/StaticProfiler]: matmul-based transposes inserted by penguin takes up 100.00 percent of all matmul computation
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.035 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.130 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.032 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.038 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False)
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds
2025-09-05T19:19:07Z INFO 1971 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop
2025-09-05T19:19:08Z INFO 1971 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False)
2025-09-05T19:19:08Z INFO 1971 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.618 seconds
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: BirCodeGen estimate #instances=285991 in sg0000
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: IR signature: 868d010c2395728dde8cb4de874491d6109b3553bfa434107fadd7604cffd131 for nc00/sg0000/TensorizerBIR
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: Weights total number of bytes: 131072
2025-09-05T19:19:09Z INFO 1971 [Tensorizer]: Successfully built model.
2025-09-05T19:19:09Z USER 1971 [root/Tensorizer/Tensorizer]: Tensorizer finished after 27.551 seconds
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: End tensorization
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input0
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input1
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input2
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input3
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input4
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input5
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input6
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input7
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input8
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input9
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input10
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input11
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input12
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input13
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input14
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input15
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input16
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input17
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input18
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input19
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input20
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input21
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input22
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input23
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input24
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input25
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input26
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input27
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input28
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input29
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input30
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input31
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input32
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input33
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input34
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input35
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input36
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input37
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input38
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input39
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input40
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input41
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input42
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input43
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input44
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input45
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input46
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input47
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input48
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input49
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input50
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input51
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input52
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input53
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input54
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input55
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input56
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input57
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input58
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input59
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input60
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input61
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input62
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input63
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input64
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input65
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input66
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input67
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input68
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input69
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input70
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input71
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input72
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input73
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input74
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input75
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input76
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input77
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input78
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input79
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input80
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input81
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input82
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input83
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input84
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input85
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input86
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input87
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input88
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input89
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input90
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input91
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input92
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input93
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input94
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input95
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input96
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input97
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input98
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input99
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input100
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input101
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input102
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input103
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input104
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input105
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input106
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input107
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input108
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input109
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input110
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input111
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input112
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input113
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input114
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input115
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input116
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input117
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input118
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input119
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input120
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input121
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input122
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input123
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input124
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input125
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input126
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input127
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input128
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input129
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input130
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input131
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input132
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input133
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input134
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input135
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input136
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input137
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input138
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input139
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input140
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input141
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input142
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input143
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input144
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input145
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input146
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input147
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input148
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input149
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input150
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input151
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input152
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input153
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input154
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input155
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input156
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input157
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input158
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input159
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input160
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input161
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input162
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input163
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input164
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input165
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input166
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input167
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input168
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input169
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input170
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input171
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input172
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input173
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input174
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input175
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input176
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input177
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input178
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input179
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input180
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input181
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input182
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input183
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input184
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input185
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input186
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input187
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input188
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input189
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input190
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input191
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input192
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input193
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input194
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input195
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input196
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input197
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input198
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input199
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input200
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input201
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input202
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input203
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input204
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input205
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input206
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input207
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input208
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input209
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input210
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input211
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input212
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input213
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input214
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input215
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input216
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input217
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input218
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input219
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input220
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input221
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input222
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input223
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input224
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input225
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input226
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input227
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input228
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input229
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input230
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input231
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input232
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input233
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input234
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input235
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input236
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input237
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input238
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input239
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input240
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input241
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input242
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input243
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input244
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input245
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input246
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input247
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input248
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input249
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input250
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input251
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input252
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input253
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input254
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input255
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input256
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input257
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input258
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input259
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input260
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input261
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input262
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input263
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input264
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input265
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input266
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input267
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input268
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input269
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input270
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input271
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input272
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input273
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input274
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input275
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input276
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input277
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input278
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input279
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input280
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input281
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input282
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input283
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input284
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input285
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input286
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input287
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input288
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input289
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Network input: input290
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: wrote bir.json
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: wrote tensor_map.json
2025-09-05T19:19:09Z INFO 1971 [job.Frontend.0]: Job #0 finished
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.Frontend.0
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0
2025-09-05T19:19:09Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: BackendDriver has 1 states with 1 core LNC
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: BackendDriver: no partitions found. Switching to flat flow.
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Job WalrusDriver len(in_states) 1
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Processing input #0
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: BackendDriver in_state.num_states 1 with 1 core LNC
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Executing /opt/conda/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/log-neuron-cc.txt --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-color-rotation --max-load-lower-bound 0.14 --mm-reorder-opt --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/conda/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/conda/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json --unified-backend-and-legacy-codegen --tensor-map tensor_map.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --dge-levels vector_dynamic_offsets,scalar_dynamic_offset,io --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: Working directory is /opt/vllm/neuronxcc-gss0wi_w/sg00
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: propagate_exit=True
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: use_logger=False
2025-09-05T19:19:09Z INFO 1971 [job.WalrusDriver.0]: expose_stderr=True
2025-09-05T19:19:09Z INFO 2007 [Logging]: Logging to ../../../../models/mistral-7b-v0.3-instruct-neuronx/layout_opt/log-neuron-cc.txt at level 'INFO'
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: max_allowed_parallelism=32
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: Backend driver mtBackend: false numModules: 1 Cwd: "/opt/vllm/neuronxcc-gss0wi_w/sg00"
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: DynamicDMA is enabled
2025-09-05T19:19:09Z INFO 2007 [BackendDriver]: DynamicDMA levels being enabled: io, scalar_dynamic_offset, vector_dynamic_offsets,
2025-09-05T19:19:09Z USER 2007 [BackendPassManager]: Running mod_parallel_pass
2025-09-05T19:19:09Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:09Z USER 2007 [ModuleForkPass]: Running do_nothing
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:09Z USER 2007 [ModuleForkPass]: do_nothing finished after 0.002 seconds
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: curr_vmrss: 175mb, ru_maxrss: 419mb (delta=0mb)
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290
2025-09-05T19:19:09Z USER 2007 [ModuleForkPass]: Running birverifier
2025-09-05T19:19:09Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: birverifier finished after 0.275 seconds
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=542mb)
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.282 seconds
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=542mb)
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [SubgraphForkPass]: Running lnc_verifier
2025-09-05T19:19:10Z INFO 2007 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds
2025-09-05T19:19:10Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=0mb)
2025-09-05T19:19:10Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.001 seconds
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=0mb)
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [BackendPassManager]: Running mod_parallel_pass
2025-09-05T19:19:10Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: Running expand_replication
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z INFO 2007 [ExpandReplication]: Found 0 replicated matmults
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: expand_replication finished after 0.001 seconds
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: curr_vmrss: 961mb, ru_maxrss: 961mb (delta=0mb)
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1455 memory location(s), 1 block(s), and 775 instruction(s). Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z USER 2007 [ModuleForkPass]: Running unroll
2025-09-05T19:19:10Z INFO 2007 [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=1455 blocks=1 instructions=775 Max writers: 1 Max Readers: 290
2025-09-05T19:19:10Z INFO 2007 [Unroll]: INFO (Unroll) Start unrolling at Fri Sep 5 19:19:10 2025
2025-09-05T19:19:12Z INFO 2007 [Unroll]: INFO (Unroll) DONE unrolling Fri Sep 5 19:19:10 2025
2025-09-05T19:19:12Z INFO 2007 [Unroll]: sg0000 Instruction count after Unroll:
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Total count: 285701
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Matmult: 217153
2025-09-05T19:19:12Z INFO 2007 [Unroll]: GenericCopy: 54337
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Load: 7106
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Save: 7105
2025-09-05T19:19:12Z INFO 2007 [Unroll]: Unrolled DGE count with Dynamic AP: 0
2025-09-05T19:19:12Z USER 2007 [ModuleForkPass]: unroll finished after 2.471 seconds
2025-09-05T19:19:12Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2540mb, ru_maxrss: 2541mb (delta=1580mb)
2025-09-05T19:19:12Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69807 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 2.543 seconds
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: curr_vmrss: 1679mb, ru_maxrss: 2541mb (delta=1580mb)
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69807 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69807 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:12Z USER 2007 [SubgraphForkPass]: Running dead_code_elim
2025-09-05T19:19:12Z INFO 2007 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=69807 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:12Z INFO 2007 [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys
2025-09-05T19:19:12Z USER 2007 [SubgraphForkPass]: dead_code_elim finished after 0.257 seconds
2025-09-05T19:19:12Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 1699mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:12Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.266 seconds
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: curr_vmrss: 1699mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:12Z USER 2007 [BackendPassManager]: Running mod_parallel_pass
2025-09-05T19:19:12Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:12Z USER 2007 [ModuleForkPass]: Running birverifier
2025-09-05T19:19:12Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: birverifier finished after 0.232 seconds
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.239 seconds
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [SubgraphForkPass]: Running lnc_verifier
2025-09-05T19:19:13Z INFO 2007 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [SubgraphForkPass]: lnc_verifier finished after 0.003 seconds
2025-09-05T19:19:13Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.010 seconds
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [BackendPassManager]: Running mod_parallel_pass
2025-09-05T19:19:13Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running instruction_reorder
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: instruction_reorder finished after 0.027 seconds
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running psum_legalization
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: psum_legalization finished after 0.031 seconds
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running legalize_cce_dma
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: legalize_cce_dma finished after 0.014 seconds
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running error_injector
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z WARNING 2007 [ErrorInjector]: Unrecognized injected error value "0"
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: error_injector finished after 0.003 seconds
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1753mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running vn_splitter
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z INFO 2007 [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 0
2025-09-05T19:19:13Z INFO 2007 [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0
2025-09-05T19:19:13Z INFO 2007 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0
2025-09-05T19:19:13Z INFO 2007 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0
2025-09-05T19:19:13Z INFO 2007 [VNSplitterPass]: INFO (VNSplitter) Time: 0.004 seconds
2025-09-05T19:19:13Z INFO 2007 [VNSplitterPass]: INFO (VerticalFusion) Time: 0.073 seconds
2025-09-05T19:19:13Z INFO 2007 [VNSplitterPass]: INFO (ShrinkDN) Time: 0.075 seconds
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: vn_splitter finished after 0.186 seconds
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1764mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running constant_propagate
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:13Z INFO 2007 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: constant_propagate finished after 0.482 seconds
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1768mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:13Z USER 2007 [ModuleForkPass]: Running lower_ac
2025-09-05T19:19:13Z INFO 2007 [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z INFO 2007 [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies.
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: lower_ac finished after 0.026 seconds
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1768mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running input_dma_coalescing
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z INFO 2007 [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: input_dma_coalescing finished after 0.078 seconds
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1768mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running remat_optimization
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z INFO 2007 [RematOpt]: Removed 0 remat instructions
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: remat_optimization finished after 0.174 seconds
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running early_peephole_opts
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z INFO 2007 [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true
2025-09-05T19:19:14Z INFO 2007 [EarlyPeepholeOpts]: Activation Accumulate: 0
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: early_peephole_opts finished after 0.048 seconds
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running coalesce_multichannel_cc_ops
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.015 seconds
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running infer_stream_ids
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: infer_stream_ids finished after 0.015 seconds
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1772mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z USER 2007 [ModuleForkPass]: Running pre_sched
2025-09-05T19:19:14Z INFO 2007 [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start PRE scheduling 2 cores: 1 at: Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: Start...
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: Found 0 Splits CCs
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: Grouped CCs to 0 clusters.
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts
2025-09-05T19:19:14Z INFO 2007 [LayerSpiller]: LayerSpill: Done.
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start split live ranges Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Num_Splits: 0
2025-09-05T19:19:14Z INFO 2007 [PreSched]: End split live ranges Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Strt remove redundncies Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_memsets
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_memsets: 0
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_loads
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_loads: 0
2025-09-05T19:19:14Z INFO 2007 [PreSched]: End remove redundncies Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start DCE Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [PreSched]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:14Z INFO 2007 [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys
2025-09-05T19:19:14Z INFO 2007 [PreSched]: End DCE Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [PreSched]: Start build flow dependencies Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [build_flow_deps]: Start build fdeps. Invocation: 1Fri Sep 5 19:19:14 2025
2025-09-05T19:19:14Z INFO 2007 [build_flow_deps]: Allocs: 69132 instructions: 285701
2025-09-05T19:19:15Z INFO 2007 [build_flow_deps]: Build fdeps inserted 714949 edges
2025-09-05T19:19:15Z INFO 2007 [build_flow_deps]: Done build fdeps 714949 Fri Sep 5 19:19:15 2025
2025-09-05T19:19:15Z INFO 2007 [PreSched]: End build flow dependencies Fri Sep 5 19:19:15 2025
2025-09-05T19:19:15Z INFO 2007 [PreSched]: Start remove useless insts Fri Sep 5 19:19:15 2025
2025-09-05T19:19:15Z INFO 2007 [PreSched]: remove_useless_insts
2025-09-05T19:19:16Z INFO 2007 [PreSched]: remove Useless Instructions: 0
2025-09-05T19:19:16Z INFO 2007 [PreSched]: End remove useless insts Fri Sep 5 19:19:16 2025
2025-09-05T19:19:16Z INFO 2007 [PreSched]: Start scratchpad optimization Fri Sep 5 19:19:16 2025
2025-09-05T19:19:16Z INFO 2007 [PreSched]: End scratchpad optimization Fri Sep 5 19:19:16 2025
2025-09-05T19:19:16Z INFO 2007 [PreSched]: DONE PRE scheduling Fri Sep 5 19:19:16 2025
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: pre_sched finished after 1.914 seconds
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running tensor_copy_elim
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: Tensor CP elimination: 0
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys
2025-09-05T19:19:16Z INFO 2007 [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: tensor_copy_elim finished after 0.341 seconds
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running dynamic_dma_setup
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: dynamic_dma_setup finished after 0.004 seconds
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running runtime_memory_reservation
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: runtime_memory_reservation finished after 0.004 seconds
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1884mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z USER 2007 [ModuleForkPass]: Running coloring_allocator_psum
2025-09-05T19:19:16Z INFO 2007 [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:16Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions
2025-09-05T19:19:16Z INFO 2007 [ColoringAllocator::Rep]: linearize and check
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: allocating PSUM
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: main loop
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: renumber locations
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: size = 54337
2025-09-05T19:19:16Z INFO 2007 [PSUM_Allocator]: build_no_bitmap start
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: 100% PSUM demand before spilling
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: PSUM high-water mark = 8 tensors
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: found 182784 edges
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: mean: 6.72779
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: median: 6.99997
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: adjacency vectors require 1462272 bytes
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: build_no_bitmap done
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: find costs
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: simplify interference graph
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: initialize low and high
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: lo = 54337
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: hi = 0
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: inf = 0
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: total = 54337
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: simplify
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: new candidates = 0
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: select ranges
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: no more spills
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: PSUM score = 0 (lower is better)
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: spilling from PSUM cost about 0 cycles
2025-09-05T19:19:17Z INFO 2007 [PSUM_Allocator]: 100% PSUM utilization after allocation
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: coloring_allocator_psum finished after 0.597 seconds
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1901mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: Running dma_optimization_psum
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:17Z INFO 2007 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions
2025-09-05T19:19:17Z INFO 2007 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: dma_optimization_psum finished after 0.203 seconds
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1901mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:17Z USER 2007 [ModuleForkPass]: Running address_rotation_psum
2025-09-05T19:19:17Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:18Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks
2025-09-05T19:19:18Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks
2025-09-05T19:19:19Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks
2025-09-05T19:19:19Z USER 2007 [ModuleForkPass]: address_rotation_psum finished after 1.984 seconds
2025-09-05T19:19:19Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1907mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:19Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:19Z USER 2007 [ModuleForkPass]: Running coloring_allocator_sb
2025-09-05T19:19:19Z INFO 2007 [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 7114104832
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 7875 bytes
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 7114072064
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 7822 bytes
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 0
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 0 bytes
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions
2025-09-05T19:19:19Z INFO 2007 [ColoringAllocator::Rep]: linearize and check
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: allocating SB
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: main loop
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: renumber locations
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: size = 14212
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: find partners
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: found 54337 accumulation groups
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: largest = 19629.23884_i1022
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: tensors = 2
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: requires 8448 bytes/partition
2025-09-05T19:19:19Z INFO 2007 [SB_Allocator]: expanding partners
2025-09-05T19:19:19Z INFO 2007 []: find first defs for local
2025-09-05T19:19:19Z INFO 2007 []: find first defs for global
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: find loads
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 1 pin count
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 6081 remat count
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: build interference graph
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: pass 1 int-tree
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Num intervals 14212 Num locations 14212
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: IntervalTree Build Done
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: info.neighbors init Done
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: info.neighbors partners Done
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: IntervalTree readback Done
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: edge: 31236
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: mean: 4.39572
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: median: 2.00087
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: find costs
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: best-of-n loop, heuristic = 0
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: simplify interference graph
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: initialize safe & unsafe
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: safe = 14210
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: unsafe = 1
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: inf = 0
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: total = 14211
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: simplify
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 14212
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: new candidates = 0
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: select ranges
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Total: 14211
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Spilled: 0.000 (0)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Allocated: 1.000 (14211)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Rover zone: 0.989 (14051)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Pre-rover zone: 0.009 (128)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Post-rover zone: 0.002 (32)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Slice zone: 0.000 (0)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Blocks nothing: 0.000 (0)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Blocks medium: 0.000 (0)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Blocks tall: 1.000 (14211)
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Visited until tall blocking (mean): 0.997
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Visited until tall blocking (median): 1.000
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Visited until tall blocking (p95): 1.000
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: Success
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: SB spills = 0 tensors
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: size = 0 bytes/partition
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: remats = 0 tensors
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: unpinned = 0 tensors
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: size = 0 bytes/partition
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: SB score = 0
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: spilling from SB cost about 0 cycles
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: pinning saved approximately 9010 cycles
2025-09-05T19:19:20Z INFO 2007 [SB_Allocator]: 0% SB utilization after allocation
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 7114104832
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 7875 bytes
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 7114072064
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 7822 bytes
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 0
2025-09-05T19:19:20Z INFO 2007 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 0 bytes
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: coloring_allocator_sb finished after 1.103 seconds
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1911mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: Running address_rotation_sb
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:20Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: address_rotation_sb finished after 0.290 seconds
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1912mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69133 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:20Z USER 2007 [ModuleForkPass]: Running dma_optimization_sb
2025-09-05T19:19:20Z INFO 2007 [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=69133 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:20Z INFO 2007 [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 14228176896, 50.0001% input load, 49.9999% output write, 0% spill/reload [sg0000]
2025-09-05T19:19:20Z INFO 2007 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: removed 0 identical load
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: adjusted 0 DMACopy remat
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: adjusted 0 DMACopy remat
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: sub-graph will get execute 1 times
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(7.1141e+09)
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations
2025-09-05T19:19:21Z INFO 2007 [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: average loaded DMA size 7875 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: average saved DMA size 7822 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 7114104832
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 7875 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 7114072064
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 7822 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 0, 0% out of total dma traffic
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 14228176896, 50.0001% input load, 49.9999% output write, 0% spill/reload [sg0000]
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 7114104832
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 7875 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 7114072064
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 7822 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 0
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 0 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 7848 bytes
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module);
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: DMA optimization re-enable optimization
2025-09-05T19:19:22Z USER 2007 [ModuleForkPass]: dma_optimization_sb finished after 1.621 seconds
2025-09-05T19:19:22Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1932mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:22Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:22Z USER 2007 [ModuleForkPass]: Running address_rotation_sb
2025-09-05T19:19:22Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:22Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 5919 Sb address
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 4699 Sb address
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:23Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 1769 Sb address
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: address_rotation_sb finished after 1.665 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1932mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running coloring_allocator_dram
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions
2025-09-05T19:19:24Z INFO 2007 [ColoringAllocator::Rep]: linearize and check
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: reserved space = 14496612352 bytes
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: spill space = 0 bytes
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: aligned spill space = 0 bytes
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: dram space = 107374182400 bytes
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: renumber locations
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: size = 0
2025-09-05T19:19:24Z INFO 2007 []: find first defs for local
2025-09-05T19:19:24Z INFO 2007 []: find first defs for global
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: Num intervals 0 Num locations 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: IntervalTree Build Done
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: info.neighbors init Done
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: IntervalTree readback Done
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: simplify interference graph
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: initialize low and high
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: lo = 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: hi = 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: total = 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: simplify
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: new candidates = 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: select ranges
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: CC buffer size limit 524288000
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: allreduce_dram_hwm 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: Real CC buffer size 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: DRAM hwm after allocation: 0
2025-09-05T19:19:24Z INFO 2007 [DRAM_Allocator]: DRAM allocation successful
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: coloring_allocator_dram finished after 0.469 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1937mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running address_rotation_dram
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: Runtime page size at 512MB
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: DRAM hwm before rotation 0
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: allreduce buffer size 524288000
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: allreduce hwm 0
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: Real CC buffer size 0
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: DRAM hwm after rotation 0
2025-09-05T19:19:24Z INFO 2007 [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: address_rotation_dram finished after 0.171 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running tensorcopy_accel
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z INFO 2007 [TensorCopyAccel::Impl]: Running peephole optimization pass
2025-09-05T19:19:24Z INFO 2007 [TensorCopyAccel::Impl]: Accelerated 0 out of 54337 tensorcopy in Function: sg0000 average acceleration factor: -nan
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: tensorcopy_accel finished after 0.025 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running peephole_opts
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z INFO 2007 [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: peephole_opts finished after 0.055 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running lower_kernel
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Started running LowerKernel
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Start of kernel lowering pass, number of insts: 285701, number of allocs: 69132
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Scan BKs time (s): 0.013202
2025-09-05T19:19:24Z INFO 2007 [LowerKernel]: Lower BKs time (s): 3e-06
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: lower_kernel finished after 0.019 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running lower_nki_kernel
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: lower_nki_kernel finished after 0.017 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1939mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running dynamic_dma_cleanup
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: dynamic_dma_cleanup finished after 0.029 seconds
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1941mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:24Z USER 2007 [ModuleForkPass]: Running birverifier
2025-09-05T19:19:24Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: birverifier finished after 0.195 seconds
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1941mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: Running dynamic_dma_scan
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: dynamic_dma_scan finished after 0.027 seconds
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1941mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:25Z USER 2007 [ModuleForkPass]: Running build_fdeps
2025-09-05T19:19:25Z INFO 2007 [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:25Z INFO 2007 [build_flow_deps]: Start build fdeps. Invocation: 2Fri Sep 5 19:19:25 2025
2025-09-05T19:19:25Z INFO 2007 [build_flow_deps]: Allocs: 69132 instructions: 285701
2025-09-05T19:19:26Z INFO 2007 [build_flow_deps]: Build fdeps inserted 714949 edges
2025-09-05T19:19:26Z INFO 2007 [build_flow_deps]: Done build fdeps 714949 Fri Sep 5 19:19:26 2025
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: build_fdeps finished after 1.075 seconds
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1963mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: Running remove_redundancies
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove_clobbered_writes
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove_clobbered_writes: 0
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove_useless_insts
2025-09-05T19:19:26Z INFO 2007 [RemoveRedundancies]: remove Useless Instructions: 0
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: remove_redundancies finished after 0.090 seconds
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: curr_vmrss: 1963mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:26Z USER 2007 [ModuleForkPass]: Running anti_dependency_analyzer
2025-09-05T19:19:26Z INFO 2007 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:26Z INFO 2007 [AntiDependencyAnalyzer]: Batch size: 1000
2025-09-05T19:19:26Z INFO 2007 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB}
2025-09-05T19:19:26Z INFO 2007 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: anti_dependency_analyzer finished after 0.989 seconds
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2058mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: Running tensor_copy_elim
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:27Z INFO 2007 [TensorCopyElim]: Tensor CP elimination: 0
2025-09-05T19:19:27Z INFO 2007 [TensorCopyElim]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: tensor_copy_elim finished after 0.240 seconds
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2071mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: Running prefetch_scheduling_before_sched
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.003 seconds
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2071mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:27Z USER 2007 [ModuleForkPass]: Running post_sched
2025-09-05T19:19:27Z INFO 2007 [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:27Z INFO 2007 [post_scheduler]: Start PosT ScheD 3 sunda Fri Sep 5 19:19:27 2025
2025-09-05T19:19:29Z INFO 2007 [post_scheduler]: Time-aware hwm post-sched
2025-09-05T19:19:31Z INFO 2007 [post_scheduler]: Time-aware simulation time: 59414448
2025-09-05T19:19:32Z INFO 2007 [post_scheduler]: Done PosT ScheD Fri Sep 5 19:19:32 2025
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: post_sched finished after 4.715 seconds
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2468mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: Running expand_scheduling_units
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: expand_scheduling_units finished after 0.027 seconds
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2218mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:32Z USER 2007 [ModuleForkPass]: Running address_rotation_sb
2025-09-05T19:19:32Z INFO 2007 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:34Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 9197 PSUM Banks
2025-09-05T19:19:34Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 5583 PSUM Banks
2025-09-05T19:19:35Z INFO 2007 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks
2025-09-05T19:19:35Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 2554 Sb address
2025-09-05T19:19:36Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 2329 Sb address
2025-09-05T19:19:36Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:36Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:37Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 59 Sb address
2025-09-05T19:19:37Z INFO 2007 [DMAOptimizationBase]: moved 0 MM forward
2025-09-05T19:19:37Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:38Z INFO 2007 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address
2025-09-05T19:19:38Z USER 2007 [ModuleForkPass]: address_rotation_sb finished after 5.695 seconds
2025-09-05T19:19:38Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2254mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:38Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:38Z USER 2007 [ModuleForkPass]: Running anti_dependency_analyzer
2025-09-05T19:19:38Z INFO 2007 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:38Z INFO 2007 [AntiDependencyAnalyzer]: Batch size: 1000
2025-09-05T19:19:38Z INFO 2007 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB}
2025-09-05T19:19:38Z INFO 2007 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: anti_dependency_analyzer finished after 1.349 seconds
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2285mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: Running anti_dependency_analyzer
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:39Z INFO 2007 [AntiDependencyAnalyzer]: Batch size: 1000
2025-09-05T19:19:39Z INFO 2007 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS}
2025-09-05T19:19:39Z INFO 2007 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: anti_dependency_analyzer finished after 0.188 seconds
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2287mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:39Z USER 2007 [ModuleForkPass]: Running dep_opt
2025-09-05T19:19:39Z INFO 2007 [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:39Z INFO 2007 [build_flow_deps]: Start build fdeps. Invocation: 3Fri Sep 5 19:19:39 2025
2025-09-05T19:19:39Z INFO 2007 [build_flow_deps]: Allocs: 69132 instructions: 285701
2025-09-05T19:19:40Z INFO 2007 [build_flow_deps]: Build fdeps inserted 701433 edges
2025-09-05T19:19:40Z INFO 2007 [build_flow_deps]: Done build fdeps 701433 Fri Sep 5 19:19:40 2025
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: dep_opt finished after 1.572 seconds
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running report_stats
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: Data Movement Statistics: sg0000
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Instruction β”‚ Kind β”‚ Count β”‚ Bytes β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Load β”‚ Const -> Internal β”‚ 1 β”‚ 32768 β”‚
β”‚ Load β”‚ ExternalInput -> Internal β”‚ 7105 β”‚ 7114072064 β”‚
β”‚ Save β”‚ Internal -> ExternalOutput β”‚ 7105 β”‚ 7114072064 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:41Z INFO 2007 [ReportStats]:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Bytes per partition β”‚ Count β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 64 β”‚ 65 β”‚
β”‚ 256 β”‚ 66 β”‚
β”‚ 7168 β”‚ 4096 β”‚
β”‚ 8192 β”‚ 9984 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: MM Stats: #MatMults 217153 #MatMult-Transposes 217153
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: IO Tensor size combined: 14496579584
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: IO Tensor Statistics:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Largest IO Tensors β”‚ Kind β”‚ Src Type β”‚ Size (Bytes) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ output289 β”‚ ExternalOutput β”‚ bfloat16 β”‚ 134217728 β”‚
β”‚ input0 β”‚ ExternalInput β”‚ bfloat16 β”‚ 134217728 β”‚
β”‚ output0 β”‚ ExternalOutput β”‚ bfloat16 β”‚ 134217728 β”‚
β”‚ input289 β”‚ ExternalInput β”‚ bfloat16 β”‚ 134217728 β”‚
β”‚ input7 β”‚ ExternalInput β”‚ bfloat16 β”‚ 58720256 β”‚
β”‚ input9 β”‚ ExternalInput β”‚ bfloat16 β”‚ 58720256 β”‚
β”‚ input24 β”‚ ExternalInput β”‚ bfloat16 β”‚ 58720256 β”‚
β”‚ input18 β”‚ ExternalInput β”‚ bfloat16 β”‚ 58720256 β”‚
β”‚ input15 β”‚ ExternalInput β”‚ bfloat16 β”‚ 58720256 β”‚
β”‚ input27 β”‚ ExternalInput β”‚ bfloat16 β”‚ 58720256 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:41Z INFO 2007 [ReportStats]: Large (Internal) Tensor Statistics:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Largest Tensors β”‚ Kind β”‚ Src Type β”‚ Size (Bytes) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DynamicDMAScratchLoc β”‚ Internal β”‚ uint8 β”‚ 2097152 β”‚
β”‚ t2035_pftranspose_18320_i5 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i2 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i1 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i3 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i4 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i6 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i9 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i8 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β”‚ t2035_pftranspose_18320_i7 β”‚ Internal β”‚ bfloat16 β”‚ 1048576 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: report_stats finished after 0.059 seconds
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 28.058 seconds
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running assign_trigger_engine
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to assign_trigger_engine: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z INFO 2007 [AssignTriggerEngine]: Assigned trigger engine for 0 DMA instructions. Moved 0 DMA instructions to CC's engines.
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: assign_trigger_engine finished after 0.097 seconds
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: Running lower_local_collectives
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: lower_local_collectives finished after 0.004 seconds
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: Running extend_shared_lifetimes
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: extend_shared_lifetimes finished after 0.003 seconds
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2248mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: Running dead_code_elim
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z INFO 2007 [DeadCodeElim]: eliminateDeadStore removed 0 instructions
2025-09-05T19:19:41Z USER 2007 [SubgraphForkPass]: dead_code_elim finished after 0.245 seconds
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.266 seconds
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running assign_hwdge_engine
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: assign_hwdge_engine finished after 0.027 seconds
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running mod_parallel_pass
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running alloc_queues
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z INFO 2007 [AllocQueues]: DMACopy transpose will be triggered from multiple engines
2025-09-05T19:19:41Z INFO 2007 [AllocQueues]: Alloc Queue info:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name β”‚ DMAQueue::Type β”‚ Engine β”‚ Num Queues β”‚ Num instructions β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ qSPSpillReload0 β”‚ data β”‚ SP β”‚ 16 β”‚ 1 β”‚
β”‚ qPoolDynamic β”‚ dynamic β”‚ Pool β”‚ 16 β”‚ 14210 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: alloc_queues finished after 0.030 seconds
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running chain_dma_transposes
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: chain_dma_transposes finished after 0.003 seconds
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running prefetch_scheduling_after_sched
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.003 seconds
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: Running lower_control
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z INFO 2007 [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps
2025-09-05T19:19:41Z USER 2007 [ModuleForkPass]: lower_control finished after 0.197 seconds
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.252 seconds
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: curr_vmrss: 2249mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [BackendPassManager]: Running nc_parallel_pass
2025-09-05T19:19:41Z INFO 2007 [BackendPassManager]: Inputs to nc_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z USER 2007 [CoreForkPass]: Running dep_reduction
2025-09-05T19:19:41Z INFO 2007 [CoreForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:41Z INFO 2007 [DepReduction]: Start Dependency Reduction
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing async instrs...
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing secondary edges per engine...
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 484672
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing redundant descendants, Done. Num edges removed 497220
2025-09-05T19:19:42Z INFO 2007 [DepReduction]: Processing async instrs, Done. Num edges removed 497220
2025-09-05T19:19:43Z INFO 2007 [DepReduction]: Num Async removed: 0
2025-09-05T19:19:43Z INFO 2007 [DepReduction]: Finished dependency reduction: 1173793 removed, new total 115343
2025-09-05T19:19:43Z INFO 2007 [DepReduction]: Finished Dependency Reduction
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: dep_reduction finished after 1.714 seconds
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running lower_dynamic_dma
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: lower_dynamic_dma finished after 0.048 seconds
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running legalize_dynamic_dma
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:43Z INFO 2007 [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 0 DGE instructions
2025-09-05T19:19:43Z INFO 2007 [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 0 DGE instructions were scanned
2025-09-05T19:19:43Z INFO 2007 [LegalizeDynamicDMA]:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Sub-Pass β”‚ Illegal Instructions Detected β”‚ New Instructions Generated β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Peeling β”‚ 0 β”‚ 0 β”‚
β”‚ Unrolling β”‚ 0 β”‚ 0 β”‚
β”‚ Splitting β”‚ 0 β”‚ 0 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: legalize_dynamic_dma finished after 0.100 seconds
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285701 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running lower_dma
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to lower_dma: modules=1 functions=1 allocs=69132 blocks=1 instructions=285701 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:43Z INFO 2007 [LowerDMA]: lower_dma metrics start
IO
Copy (DGE/DMA)
128 partition : 14145/14145 (100% DGE)
power-of-2 partition : 14210/14210 (100% DGE)
> 3 dimensional : 0/0
non-integer desc size : 0/0
total : 14210/14210 (100% DGE)
Cast (DGE/DMA)
128 partition : 0/0
power-of-2 partition : 0/0
> 3 dimensional : 0/0
non-integer desc size : 0/0
total : 0/0
Spill/Reload
Copy (DGE/DMA)
128 partition : 0/1 (0% DGE)
power-of-2 partition : 0/1 (0% DGE)
> 3 dimensional : 0/0
non-integer desc size : 0/0
total : 0/1 (0% DGE)
Cast (DGE/DMA)
128 partition : 0/0
power-of-2 partition : 0/0
> 3 dimensional : 0/0
non-integer desc size : 0/0
total : 0/0
CopyMode
CCE : 0
Transpose : 0
Replicate : 0
Dynamic (DGE/DMA)
scalar : 0/0
vector : 0/0
Opcode
ReadVarAddr : 0
IndirectLoad : 0
IndirectSave : 0
IndirectSaveAccumulate : 0
DstReduceDGE : 0
lower_dma metrics end
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: lower_dma finished after 0.145 seconds
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: curr_vmrss: 2263mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:43Z USER 2007 [CoreForkPass]: Running coalesce_dma_blocks
2025-09-05T19:19:43Z INFO 2007 [CoreForkPass]: Inputs to coalesce_dma_blocks: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z INFO 2007 [CoalesceDmaBlocks]: Coaleseced 0 DMA triggers
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: coalesce_dma_blocks finished after 0.133 seconds
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running expand_all_engine
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: expand_all_engine finished after 0.050 seconds
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running alloc_semaphores
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: alloc_semaphores finished after 0.242 seconds
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running expand_inst_late
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: expand_inst_late finished after 0.262 seconds
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running seq_inst_opt
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z INFO 2007 [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: seq_inst_opt finished after 0.027 seconds
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 285710 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running lower_sync
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to lower_sync: modules=1 functions=1 allocs=69132 blocks=1 instructions=285710 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: lower_sync finished after 0.102 seconds
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300967 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running lower_act
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to lower_act: modules=1 functions=1 allocs=69132 blocks=1 instructions=300967 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: lower_act finished after 0.036 seconds
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: curr_vmrss: 2264mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z USER 2007 [CoreForkPass]: Running lower_dve
2025-09-05T19:19:44Z INFO 2007 [CoreForkPass]: Inputs to lower_dve: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:44Z INFO 2007 [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/conda/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: lower_dve finished after 0.303 seconds
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: curr_vmrss: 2296mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: Running lower_ap
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Inputs to lower_ap: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: lower_ap finished after 0.066 seconds
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: curr_vmrss: 2177mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: Running coloring_allocator_reg
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z INFO 2007 [ColoringAllocator::Rep]: Allocating functions
2025-09-05T19:19:45Z INFO 2007 [ColoringAllocator::Rep]: linearize and check
2025-09-05T19:19:45Z INFO 2007 [REG_Allocator]: allocating REG
2025-09-05T19:19:45Z INFO 2007 [REG_Allocator]: main loop iteration 1
2025-09-05T19:19:45Z USER 2007 [CoreForkPass]: coloring_allocator_reg finished after 0.044 seconds
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [CoreForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: nc_parallel_pass finished after 3.389 seconds
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: Running mod_parallel_pass
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [ModuleForkPass]: Running birverifier
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [ModuleForkPass]: birverifier finished after 0.254 seconds
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 0.261 seconds
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: Running subgraph_parallel_pass
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [SubgraphForkPass]: Running lnc_verifier
2025-09-05T19:19:45Z INFO 2007 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [SubgraphForkPass]: lnc_verifier finished after 0.003 seconds
2025-09-05T19:19:45Z INFO 2007 [SubgraphForkPass]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: subgraph_parallel_pass finished after 0.010 seconds
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: curr_vmrss: 2190mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [BackendPassManager]: Running mod_parallel_pass
2025-09-05T19:19:45Z INFO 2007 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z USER 2007 [ModuleForkPass]: Running codegen
2025-09-05T19:19:45Z INFO 2007 [ModuleForkPass]: Inputs to codegen: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:45Z INFO 2007 [Codegen]: Total compiler allocated DRAM tensors: 0 GB
2025-09-05T19:19:45Z INFO 2007 [Codegen]: Total un-allocated DRAM tensors by kind:
2025-09-05T19:19:45Z INFO 2007 [Codegen]:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TensorKind β”‚ Size (GB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ ExternalInput β”‚ 6.7505 β”‚
β”‚ Const β”‚ 3.05176e-05 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:45Z INFO 2007 [Codegen]: Total runtime managed DRAM tensors: 6.75053 GB
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Instruction Stats:
2025-09-05T19:19:46Z INFO 2007 [Codegen]:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Opcode β”‚ Count β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ LDWEIGHTS β”‚ 217153 β”‚
β”‚ MATMUL β”‚ 217153 β”‚
β”‚ ACTIVATE β”‚ 54337 β”‚
β”‚ EVENT_SEMAPHORE β”‚ 15257 β”‚
β”‚ UNKNOWN(0xd4) β”‚ 14210 β”‚
β”‚ NOP β”‚ 8 β”‚
β”‚ PSEUDO_BRANCH_LABEL β”‚ 5 β”‚
β”‚ ACT_TABLE_LOAD β”‚ 1 β”‚
β”‚ PSEUDO_DMA_TRIGGER β”‚ 1 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:46Z INFO 2007 [Codegen]:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Engine β”‚ Count β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Unassigned β”‚ 0 β”‚
β”‚ GPSIMD β”‚ 20924 β”‚
β”‚ Scalar β”‚ 56947 β”‚
β”‚ Tensor β”‚ 440254 β”‚
β”‚ SyncDMA β”‚ 0 β”‚
β”‚ Vector β”‚ 2 β”‚
β”‚ Sync β”‚ 3 β”‚
β”‚ All β”‚ 0 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Total instructions: 518130 (0.030883 GB)
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Total DynamicDMA instruction count: 14210
2025-09-05T19:19:46Z USER 2007 [Codegen]: isa_gen finished after 1.105 seconds
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Number of DMA descriptors on each queue instance:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Queue Instance β”‚ RT Descriptors β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ qSPSpillReload0 β”‚ 256 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Total descriptors: 256 (3.8147e-06 GB)
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Number of DMA engines used by each queue:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Queue β”‚ DMA Engines β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ qSPSpillReload0 β”‚ 16 β”‚
β”‚ qPoolDynamic β”‚ 16 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TOTAL β”‚ 32 (must be <= 176) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Tensors with largest descriptor count:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tensor Name β”‚ Kind β”‚ Src Type β”‚ Descriptor Count β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ identity_local_22027 β”‚ Internal β”‚ bfloat16 β”‚ 1 β”‚
β”‚ identity_22025 β”‚ Const β”‚ bfloat16 β”‚ 1 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:46Z USER 2007 [Codegen]: dma_desc_gen finished after 0.004 seconds
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Estimated peak DRAM usage: 6.78141 GB
2025-09-05T19:19:46Z INFO 2007 [Codegen]: Generating debug info
2025-09-05T19:19:47Z USER 2007 [Codegen]: debug_info_gen finished after 0.568 seconds
2025-09-05T19:19:47Z USER 2007 [ModuleForkPass]: codegen finished after 1.742 seconds
2025-09-05T19:19:47Z INFO 2007 [ModuleForkPass]: curr_vmrss: 2385mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:47Z INFO 2007 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:47Z USER 2007 [BackendPassManager]: mod_parallel_pass finished after 1.775 seconds
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: curr_vmrss: 2204mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:47Z USER 2007 [BackendPassManager]: Running neff_packager
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: Inputs to neff_packager: modules=1 functions=1 allocs=69132 blocks=1 instructions=300968 Max writers: 128 Max Readers: 217153
2025-09-05T19:19:47Z WARNING 2007 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.169490.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json
2025-09-05T19:19:47Z INFO 2007 [NeffFileWriter]: Neff will be written to: /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff
2025-09-05T19:19:47Z INFO 2007 [NeffFileWriter]: IR signature: b43fb3a95505587f066b6d2d419d7150 for neff artifacts
2025-09-05T19:19:47Z USER 2007 [BackendPassManager]: neff_packager finished after 0.299 seconds
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: curr_vmrss: 2205mb, ru_maxrss: 2541mb (delta=0mb)
2025-09-05T19:19:47Z INFO 2007 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69132 memory location(s), 1 block(s), and 300968 instruction(s). Max writers: 128 Max Readers: 217153
2025-09-05T19:19:47Z INFO 2007 [BackendDriver]: HBM scratchpad usage summary (post-allocation):
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Core β”‚ Subgraph β”‚ Description β”‚ Value β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ nc00 β”‚ module β”‚ Peak scratchpad usage: local β”‚ 0.000000 GB β”‚
β”‚ nc00 β”‚ module β”‚ Total size of allocated tensors: local β”‚ 0.000000 GB β”‚
β”‚ nc00 β”‚ Max β”‚ Peak scratchpad usage: local β”‚ 0.000000 GB β”‚
β”‚ nc00 β”‚ Post-link β”‚ Peak scratchpad usage after intermediate tensor allocation β”‚ 0.000000 GB β”‚
β”‚ nc00 β”‚ Post-link β”‚ Total size of allocated intermediate tensors β”‚ 0.000000 GB β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Max β”‚ Max β”‚ Peak scratchpad usage β”‚ 0.000000 GB β”‚
β”‚ Max β”‚ Max β”‚ Peak scratchpad usage (page-aligned) β”‚ 0.000000 GB β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2025-09-05T19:19:47Z INFO 2007 [BackendDriver]: Backend completed successfully, tearing down.
2025-09-05T19:19:48Z INFO 1971 [job.WalrusDriver.0]: Job #0 finished
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0
2025-09-05T19:19:48Z INFO 1971 [job.BIRLinker.0]: Replay this job by calling: /opt/conda/bin/neuronx-cc compile --framework XLA --state '{"model": ["/models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo"], "tensormap": "tensor_map.json", "bir": "bir.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "state_dir": "/opt/vllm/neuronxcc-gss0wi_w/sg00", "state_id": "sg00"}' --pipeline BIRLinker
2025-09-05T19:19:48Z INFO 1971 [job.BIRLinker.0]: BIRLinker cwd: /opt/vllm/neuronxcc-gss0wi_w
2025-09-05T19:19:48Z INFO 1971 [job.BIRLinker.0]: Linking not needed. Netlist doesnt exist
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.Kelper.0
2025-09-05T19:19:48Z INFO 1971 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.Kelper.0
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Processing input #0
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Start NeffWrapper
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Executing: /opt/conda/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/model/graph.hlo --neff /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/graph.neff --io_transposes /opt/vllm/neuronxcc-gss0wi_w/io_transposes.json --output /models/mistral-7b-v0.3-instruct-neuronx/layout_opt/wrapped_neff.hlo --netlist /opt/vllm/neuronxcc-gss0wi_w/hlo_netlist.json
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Could not open file: /opt/vllm/neuronxcc-gss0wi_w/hlo_netlist.json
There are no io transposes nor zero-sized parameters. Output will not be produced.
Hlo neff wrapper finished successfully. Have a wonderful day :D
2025-09-05T19:19:48Z INFO 1971 [job.NeffWrapper.0]: Job #0 finished
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Finished pipeline Pipeline
2025-09-05T19:19:48Z INFO 1971 [pipeline.Pipeline.0]: Job #0 finished
2025-09-05T19:19:48Z INFO 1939 [root]: Subcommand returned with exitcode=0