anemll
/

anemll-Nemo_full-8B-FP16-b64-w512

Apple Neural Engine

Model card Files Files and versions Community

anemll commited on May 7

Commit

b5481f6

·

verified ·

1 Parent(s): 41c446d

Update README.md

added single chunk profile note

Files changed (1) hide show

README.md +9 -2

README.md CHANGED Viewed

@@ -10,15 +10,22 @@ tags:
 ---
 # ANEMLL
-PREFILL test for M3 Ultra
-after unzipping (please see below)
 run
 python prefill.py --meta meta.yaml
 For M3U/M4P see original post:
 https://x.com/anemll/status/1919796143787278623
 **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
 The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.

 ---
 # ANEMLL
+PREFILL Test for M3 Ultra
+after unzipping ( please see below, "find . -type f -name "*.zip" -exec unzip {} \;"     )
 run
 python prefill.py --meta meta.yaml
+The repo has an extra file : nemotron_prefill_chunk_01of16_64x64.mlpackage which will be interesting to profile with Xcode  on:
+M1 Ultra, M2 Ultra and M4 Max
+It is single chunk for Batch=64/Window=64
+See https://docs.google.com/spreadsheets/d/1OCxn730D5h8rvS2IHsSi0UBYbsP_lV-W-0uVdVDCvIk
+FP16 tab for baseline numbers
 For M3U/M4P see original post:
 https://x.com/anemll/status/1919796143787278623
 **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
 The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.