Commit
·
673a675
1
Parent(s):
9ee6e3a
Add Intel IPEX-LLM setup under deploy_local_llm (#1269)
Browse files### What problem does this PR solve?
It adds the setup guide for using Intel IPEX-LLM with Ollama to
docs/guide/deploy_local_llm.md
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [x] Other (please describe): adds the setup guide for using Intel
IPEX-LLM with Ollama to docs/guide/deploy_local_llm.md
- docs/guides/deploy_local_llm.md +129 -1
docs/guides/deploy_local_llm.md
CHANGED
|
@@ -156,4 +156,132 @@ Click on your logo **>** **Model Providers** **>** **System Model Settings** to
|
|
| 156 |
|
| 157 |
Update your chat model accordingly in **Chat Configuration**:
|
| 158 |
|
| 159 |
-
> If your local model is an embedding model, update it on the configruation page of your knowledge base.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
Update your chat model accordingly in **Chat Configuration**:
|
| 158 |
|
| 159 |
+
> If your local model is an embedding model, update it on the configruation page of your knowledge base.
|
| 160 |
+
|
| 161 |
+
## Deploy a local model using IPEX-LLM
|
| 162 |
+
|
| 163 |
+
IPEX-LLM([IPEX-LLM](https://github.com/intel-analytics/ipex-llm)) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency
|
| 164 |
+
|
| 165 |
+
To deploy a local model, eg., **Qwen2**, using IPEX-LLM, follow the steps below:
|
| 166 |
+
|
| 167 |
+
### 1. Check firewall settings
|
| 168 |
+
|
| 169 |
+
Ensure that your host machine's firewall allows inbound connections on port 11434. For example:
|
| 170 |
+
|
| 171 |
+
```bash
|
| 172 |
+
sudo ufw allow 11434/tcp
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
### 2. Install and Start Ollama serve using IPEX-LLM
|
| 176 |
+
|
| 177 |
+
#### 2.1 Install IPEX-LLM for Ollama
|
| 178 |
+
|
| 179 |
+
IPEX-LLM's support for `ollama` now is available for Linux system and Windows system.
|
| 180 |
+
|
| 181 |
+
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md), and follow the instructions in section [Prerequisites](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#0-prerequisites) to setup and section [Install IPEX-LLM cpp](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp) to install the IPEX-LLM with Ollama binaries.
|
| 182 |
+
|
| 183 |
+
**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `ollama` commands with IPEX-LLM.**
|
| 184 |
+
|
| 185 |
+
#### 2.2 Initialize Ollama
|
| 186 |
+
|
| 187 |
+
Activate the `llm-cpp` conda environment and initialize Ollama by executing the commands below. A symbolic link to `ollama` will appear in your current directory.
|
| 188 |
+
|
| 189 |
+
- For **Linux users**:
|
| 190 |
+
|
| 191 |
+
```bash
|
| 192 |
+
conda activate llm-cpp
|
| 193 |
+
init-ollama
|
| 194 |
+
```
|
| 195 |
+
|
| 196 |
+
- For **Windows users**:
|
| 197 |
+
|
| 198 |
+
Please run the following command with **administrator privilege in Miniforge Prompt**.
|
| 199 |
+
|
| 200 |
+
```cmd
|
| 201 |
+
conda activate llm-cpp
|
| 202 |
+
init-ollama.bat
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
> [!NOTE]
|
| 206 |
+
> If you have installed higher version `ipex-llm[cpp]` and want to upgrade your ollama binary file, don't forget to remove old binary files first and initialize again with `init-ollama` or `init-ollama.bat`.
|
| 207 |
+
|
| 208 |
+
**Now you can use this executable file by standard ollama's usage.**
|
| 209 |
+
|
| 210 |
+
#### 2.3 Run Ollama Serve
|
| 211 |
+
|
| 212 |
+
You may launch the Ollama service as below:
|
| 213 |
+
|
| 214 |
+
- For **Linux users**:
|
| 215 |
+
|
| 216 |
+
```bash
|
| 217 |
+
export OLLAMA_NUM_GPU=999
|
| 218 |
+
export no_proxy=localhost,127.0.0.1
|
| 219 |
+
export ZES_ENABLE_SYSMAN=1
|
| 220 |
+
source /opt/intel/oneapi/setvars.sh
|
| 221 |
+
export SYCL_CACHE_PERSISTENT=1
|
| 222 |
+
|
| 223 |
+
./ollama serve
|
| 224 |
+
```
|
| 225 |
+
|
| 226 |
+
- For **Windows users**:
|
| 227 |
+
|
| 228 |
+
Please run the following command in Miniforge Prompt.
|
| 229 |
+
|
| 230 |
+
```cmd
|
| 231 |
+
set OLLAMA_NUM_GPU=999
|
| 232 |
+
set no_proxy=localhost,127.0.0.1
|
| 233 |
+
set ZES_ENABLE_SYSMAN=1
|
| 234 |
+
set SYCL_CACHE_PERSISTENT=1
|
| 235 |
+
|
| 236 |
+
ollama serve
|
| 237 |
+
```
|
| 238 |
+
|
| 239 |
+
> [!NOTE]
|
| 240 |
+
> Please set environment variable `OLLAMA_NUM_GPU` to `999` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
|
| 241 |
+
|
| 242 |
+
> [!TIP]
|
| 243 |
+
> If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), it is recommended to additionaly set the following environment variable for optimal performance before executing `ollama serve`:
|
| 244 |
+
>
|
| 245 |
+
> ```bash
|
| 246 |
+
> export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
| 247 |
+
> ```
|
| 248 |
+
|
| 249 |
+
> [!NOTE]
|
| 250 |
+
> To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
|
| 251 |
+
|
| 252 |
+
The console will display messages similar to the following:
|
| 253 |
+
|
| 254 |
+
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png" target="_blank">
|
| 255 |
+
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png" width=100%; />
|
| 256 |
+
</a>
|
| 257 |
+
|
| 258 |
+
### 3. Pull and Run Ollama Model
|
| 259 |
+
|
| 260 |
+
Keep the Ollama service on and open another terminal and run `./ollama pull <model_name>` in Linux (`ollama.exe pull <model_name>` in Windows) to automatically pull a model. e.g. `qwen2:latest`:
|
| 261 |
+
|
| 262 |
+
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png" target="_blank">
|
| 263 |
+
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png" width=100%; />
|
| 264 |
+
</a>
|
| 265 |
+
|
| 266 |
+
#### Run Ollama Model
|
| 267 |
+
|
| 268 |
+
- For **Linux users**:
|
| 269 |
+
```bash
|
| 270 |
+
./ollama run qwen2:latest
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
- For **Windows users**:
|
| 274 |
+
```cmd
|
| 275 |
+
ollama run qwen2:latest
|
| 276 |
+
```
|
| 277 |
+
### 4. Configure RAGflow to use IPEX-LLM accelerated Ollama
|
| 278 |
+
|
| 279 |
+
The confiugraiton follows the steps in
|
| 280 |
+
|
| 281 |
+
Ollama Section 4 [Add Ollama](#4-add-ollama),
|
| 282 |
+
|
| 283 |
+
Section 5 [Complete basic Ollama settings](#5-complete-basic-ollama-settings),
|
| 284 |
+
|
| 285 |
+
Section 6 [Update System Model Settings](#6-update-system-model-settings),
|
| 286 |
+
|
| 287 |
+
Section 7 [Update Chat Configuration](#7-update-chat-configuration)
|