File size: 4,795 Bytes
508ce38
 
 
 
 
 
 
 
 
 
 
 
97fffb9
 
 
 
 
 
 
 
 
 
 
 
0de5bcb
97fffb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0de5bcb
97fffb9
b5481f6
0de5bcb
 
 
b5481f6
508ce38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
license: mit
tags:
- coreml
- ANE
- DeepSeek
- Apple
- Apple Neural Engine
- DeepHermes
---
# ANEMLL

Here’s your text formatted in Markdown (MD):

# PREFILL Test for M3 Ultra

After unzipping :

```bash
find . -type f -name "*.zip" -exec unzip {} \;
````

Run:
```bash
python prefill.py --meta meta.yaml
````


⸻

The repository contains an extra file:

nemotron_prefill_chunk_01of16_64x64.mlpackage

This will be interesting to profile with Xcode on:
	•	M1 Ultra
	•	M2 Ultra
	•	M4 Max

It represents a single chunk for Batch=64 / Window=64.

If you have results, please email them to: [[email protected]](mailto:[email protected])

See the FP16 tab for baseline numbers.

For M3U / M4P reference, see the original post on X.

https://x.com/anemll/status/1919796143787278623



**ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).

The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.

This enables seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security.

This is critical for autonomous applications, where models run directly on the device without requiring an internet connection.

For more information, visit the [ANEMLL GitHub repository](https://github.com/anemll/anemll).


---

## License

ANEMLL is licensed under the [MIT License](https://opensource.org/license/mit).  
The model is based on Meta's LLaMA 3.2 and may require a separate license.

This test model is exclusively for the Meta's LLaMA architecture  converted for CoreML, released before the official launch of the ANEMLL repository and minimal documentation. It is intended for early adopters only who requested an early release.

---

## Requirements

- **macOS Sequoia** with Apple Neural Engine and 8GB RAM or more
- **CoreML Tools** and **HuggingFace Transformers** libraries 
- **Python 3.9**

`chat.py` provides a sample inference script.  
`chat_full.py` provides a sample inference script with history and conversation management.  

**Installation**

1. Download the model from Hugging Face:
```bash
# Install required tools
pip install huggingface_hub

# Install Git LFS (Large File Support)
# macOS with Homebrew:
brew install git-lfs
# Or Ubuntu/Debian:
# sudo apt-get install git-lfs

# Initialize Git LFS
git lfs install

# Clone the repository with model files
git clone https://huggingface.co/anemll/anemll-Llama-3.1-Nemotron-Nano-8B-v1-ctx512_0.3.0
```

2. Extract model files:
```bash
# Navigate to cloned directory
cd anemll-Llama-3.1-Nemotron-Nano-8B-v1-ctx512_0.3.0

# Pull LFS files (model weights)
git lfs pull

# Extract CoreML model files
find . -type f -name "*.zip" -exec unzip {} \;
```

3. Install dependencies:
```bash
pip install coremltools transformers
```

**Coremltools:**

See coremltools installation guide at https://coremltools.readme.io/v4.0/docs/installation 

**How to Run**

1. Basic chat interface:
```bash
python chat.py --meta ./meta.yaml
```

2. Full conversation mode with history:
```bash
python chat_full.py --meta ./meta.yaml
```

> Note: The first time the model loads, macOS will take some time to place it on the device.
> Subsequent loads will be instantaneous.
> Use Ctrl-D to exit, Ctrl-C to interrupt inference.

**More Info**
Please check following links for later updates:

* [GitHub](https://github.com/anemll)
* [Hugging Face Models](https://huggingface.co/anemll)
* [Twitter/X](https://x.com/anemll)
* [Website](https://anemll.com)


[email protected]

# anemll-Llama-3.1-Nemotron-Nano-8B-v1-ctx512_0.3.0

This is a CoreML model converted using ANEMLL for Apple Neural Engine inference.

## Available Distributions

### Standard Distribution
- Contains zipped MLMODELC files
- Suitable for macOS and development

### iOS Distribution
- Contains unzipped MLMODELC files
- Ready for iOS deployment
- Includes offline tokenizer support

## Model Information
- Context Length: %CONTEXT_LENGTH%
- Batch Size: %BATCH_SIZE%
- Number of Chunks: %NUM_CHUNKS%

## Quick Start

### Test in iOS/macOS App
Try our sample Chat-Bot app on TestFlight:
1. Install TestFlight from App Store
2. Join beta test: [TestFlight Link](https://testflight.apple.com/join/jrQq1D1C)
3. App includes a small demo model pre-installed
4. You can add custom models via HuggingFace URLs

> [!Note]
> - The TestFlight app works on both iOS and macOS
> - Demonstrates proper model integration and provides a reference implementation
> - iOS requires unzipped MLMODELC files and config.json for offline tokenizer
> - macOS supports both zipped and unzipped model formats

```