ANE-Hash — Double SHA-256 on Apple Neural Engine (bit-sliced FP16)
proving a point of emulation — making the ANE do pure bit-twiddling with nothing but fp16 and conv perms.
What this is
- A faithful double SHA-256 (SHA-256 → SHA-256) implemented entirely in fp16 and compiled to a Core ML mlprogram that maps to the Apple Neural Engine.
- Bitwise ops are emulated with fp16 arithmetic; rotates/shifts are 1×1 conv permutations, so they land on ANE (no
gather
). - Verified byte-for-byte against NIST test vectors via
hashlib
.
This is not a miner. It’s an ANE stress test that shows you can emulate integer crypto on a matrix engine…and that you probably shouldn’t, performance-wise. That contrast is the point.
Files
model.py
— builds and converts the Core ML package (bitlot.mlpackage
).test.py
— correctness + throughput harness (H/s, MH/s reporting).
Repo name: ANE-Hash. Model artifact name in scripts is
bitlot.mlpackage
—keep or rename to taste.
Quickstart
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install coremltools numpy
Build:
python model.py # produces bitlot.mlpackage
Verify on NIST-style strings (runs entirely on device, compares to hashlib):
python test.py --verify --cu CPU_AND_NE
Benchmark (prints mean/p50/p90 and hash rate):
python test.py --bench --batches 1,2,4,8,16,32,64,128,256,512,1024 \
--iters 10 --warmup 3 --cu CPU_AND_NE --verbose
What you should see
- In Xcode › Metrics, Compute Unit Mapping should be 100% Neural Engine.
- Median prediction around ~2 ms at batch=1 on recent M-series/macOS 15.x (your silicon/OS will vary).
- Measured hash rate from the harness typically ~0.5–0.6 kH/s at the best batch (often 16–64).
You’re bottlenecked by emulated boolean math and massive adders, not ANE’s MACs.
Why it runs on ANE
- All fp16: constants, weights, ops.
- No gather: rotations and shifts are implemented as 1×1 conv with fixed fp16 0/1 weights.
- Core ML target:
mlprogram
, opset targeting iOS 18 / macOS 15. - Compute units:
CPU_AND_NE
(orCPU_ONLY
for sanity checks).
If you want to nudge flexible-shape models toward ANE on newer OS builds, load with:
import coremltools as ct
mlmodel = ct.models.MLModel(
"bitlot.mlpackage",
compute_units=ct.ComputeUnit.CPU_AND_NE,
optimization_hints={"reshapeFrequency": ct.ReshapeFrequency.Infrequent},
)
How it works (ANE-centric)
- Bit-slicing: each 32-bit word is a
(32,1,1)
column of {0,1} in fp16, batched as(N,32,1,L)
. - Boolean algebra in fp16:
AND = mul
,XOR = abs(a-b)
,OR = maximum
,CH
,MAJ
are expressed via those primitives.
- Rotates/shifts: fused 1×1 conv permutations (0/1 fp16 weights). This is why ops map to ANE.
- Adders: carry-save + prefix carry-propagate (log-depth) built from the same primitives.
- Double hash: first compression from provided midstate + last block; second compression over the 256-bit result with correct padding.
Everything stays fp16; there are no fp32 constants and nothing that triggers batch-norm folding.
Intended use
- Research / demo: show ANE can run faithful bit-level crypto via fp16 emulation.
- Education: bit-slicing, permutation-as-conv tricks, Core ML graph shaping for ANE.
Not intended for mining or production crypto. If you want real speed, use:
- CPU with SHA-2 instructions (CryptoKit/OpenSSL) — tens of MH/s on modern M-series.
- Metal with
uint
math on GPU.
Repro & environment
- Core ML Tools ≥ 7.x recommended.
- Opset target: iOS 18 / macOS 15.
- Device: Apple silicon with Neural Engine.
- Batch flexibility:
RangeDim(1, 1024)
; best throughput often at batch 16–64.
License
See LICENSE
in this repo.
Shout-out
To everyone who looks at a matrix engine and says: “yeah, let’s make it do bitwise crypto.”
ANE-Hash is that energy.
- Downloads last month
- 3