DeepSeek-LLM-7B-Chat — RKLLM build for RK3588 boards

Author: @jamescallander
Source model: deepseek-ai/deepseek-llm-7b-chat

Target: Rockchip RK3588 NPU via RKNN-LLM Runtime

This repository hosts a conversion of DeepSeek-LLM-7B-Chat for use on Rockchip RK3588 equipped single board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using rknn-llm toolkit.

Conversion details

RKLLM-Toolkit version: v1.2.1
NPU driver: v0.9.8
Python 3.11
Quantization: w8a8_g128
Output: single-file .rkllm artifact

Intended use

This build is intended for experimentation and deployment of DeepSeek-LLM-7B-Chat on Rockchip RK3588-based SBCs.

Limitations

Requires 9GB free memory.
Conversion is quantized (w8a8_g128), so slight quality differences from the FP16 baseline may occur.
Tested on Orange Pi 5 Plus, Orange Pi 5 Max, and Radxa Rock 5b+; other platforms may not be supported.

Quick start (RK3588)

1) Install runtime

The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from airockchip's github page.

Download and install the required packages as per the toolkit's instructions.

2) Simple Flask server deployment

The simplest way the deploy the .rkllm converted model is using an example script provided in the toolkit in this directory: rknn-llm/examples/rkllm_server_demo

python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
  --rkllm_model_path <MODEL_PATH>/rk3588-deepseek-llm-7b-chat.rkllm \
  --target_platform rk3588

3) Sending a request

A basic format for message request is:

{
    "model":"deepseek-7b",
    "messages":[{
        "role":"user",
        "content":"<YOUR_PROMPT_HERE>"}],
    "stream":false
}

Example request using curl:

curl -s -X POST <MODEL_SERVER_IP_ADDRESS>:8080/rkllm_chat \
    -H 'Content-Type: application/json' \
    -d '{"model":"deepseek-7b","messages":[{"role":"user","content":"In one sentence, who was Napoleon?"}],"stream":false}'

The response is formated in the following way:

{
    "choices":[{
        "finish_reason":"stop",
        "index":0,
        "logprobs":null,
        "message":{
            "content":"<MODEL_REPLY_HERE>",
            "role":"assistant"}}],
        "created":null,
        "id":"rkllm_chat",
        "object":"rkllm_chat",
        "usage":{
            "completion_tokens":null,
            "prompt_tokens":null,
            "total_tokens":null}
}

Example response:

{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte (1769-1821) was a French military leader and statesman who rose to power during the French Revolution, becoming Emperor of France from 1804 to 1815 and implementing various reforms and conquests that had a lasting impact on European history.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}

4) UI compatibility

This server exposes an OpenAI-compatible Chat Completions API.

You can connect it to any OpenAI-compatible client or UI (for example: Open WebUI)

Configure your client with the API base: http://<SERVER_IP_ADDRESS>:8080 and use the endpoint: /rkllm_chat
Make sure the model field matches the converted model’s name, for example:

{
 "model": "DeepSeek-LLM-7B-Chat",
 "messages": [{"role":"user","content":"Hello!"}],
 "stream": false
}

License

This conversion follows the license of the source model: DeepSeek LLM license.

Modifications: quantization (w8a8_g128), export to .rkllm format for RK3588 SBCs.
Use Restrictions: The model and its derivatives may not be used for military purposes, harming minors, harassment, generating PII without authorization, fully automated binding decisions, or other prohibited uses listed in Attachment A of the DeepSeek License Agreement.

For more information on the deployment and use of .rkllm models on RK3588 platforms, please refer to the RKNN-LLM toolkit documentation.

jamescallander
/

deepseek-llm-7b-chat_w8a8_g128_rk3588.rkllm