Gemma-2-2B-It — RKLLM build for RK3588 boards
Built with Gemma 2
Author: @jamescallander
Source model: google/gemma-2-2b-it · Hugging Face
Target: Rockchip RK3588 NPU via RKNN-LLM Runtime
This repository hosts a conversion of
gemma-2-2b-it
for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the RKNN-LLM toolkit
Conversion details
RKLLM-Toolkit version: v1.2.1
NPU driver: v0.9.8
Python: 3.10
Quantization:
w8a8_g128
Output: single-file
.rkllm
artifactModifications: quantization (w8a8_g128), export to
.rkllm
format for RK3588 SBCsTokenizer: not required at runtime (UI handles prompt I/O)
Intended use
- On-device chat and instruction following on RK3588 SBCs.
- gemma-2-2b-it is tuned for general conversational tasks, Q&A, and reasoning, making it suitable for edge inference deployments where low power and privacy matter.
Limitations
- Requires 3.5GB free memory
- Quantized build (
w8a8_g128
) may show small quality differences vs. full-precision upstream. - Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
Quick start (RK3588)
1) Install runtime
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from airockchip's github page.
Download and install the required packages as per the toolkit's instructions.
2) Simple Flask server deployment
The simplest way the deploy the .rkllm
converted model is using an example script provided in the toolkit in this directory: rknn-llm/examples/rkllm_server_demo
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
--rkllm_model_path <MODEL_PATH>/gemma-2-2b-it_w8a8_g128_rk3588.rkllm \
--target_platform rk3588
3) Sending a request
A basic format for message request is:
{
"model":"gemma-2-2b-it",
"messages":[{
"role":"user",
"content":"<YOUR_PROMPT_HERE>"}],
"stream":false
}
Example request using curl
:
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
-H 'Content-Type: application/json' \
-d '{"model":"gemma-2-2b-it","messages":[{"role":"user","content":"Explain who Napoleon Bonaparte is in two or three sentences."}],"stream":false}'
The response is formated in the following way:
{
"choices":[{
"finish_reason":"stop",
"index":0,
"logprobs":null,
"message":{
"content":"<MODEL_REPLY_HERE">,
"role":"assistant"}}],
"created":null,
"id":"rkllm_chat",
"object":"rkllm_chat",
"usage":{
"completion_tokens":null,
"prompt_tokens":null,
"total_tokens":null}
}
Example response:
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte was a French military and political leader who rose to prominence during the French Revolution. He became Emperor of France in 1804, leading his armies to conquer much of Europe before being defeated at the Battle of Waterloo in 1815. His legacy is complex, marked by both military brilliance and authoritarian rule, leaving a lasting impact on European history. ","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
4) UI compatibility
This server exposes an OpenAI-compatible Chat Completions API.
You can connect it to any OpenAI-compatible client or UI (for example: Open WebUI)
- Configure your client with the API base:
http://<SERVER_IP_ADDRESS>:8080
and use the endpoint:/rkllm_chat
- Make sure the
model
field matches the converted model’s name, for example:
{
"model": "gemma-2-2b-it",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}
⚠️ Safety disclaimer
- 🛑 Not a substitute for professional advice, diagnosis, or treatment.
- Intended for research and educational purposes only.
- Do not rely on outputs for decisions related to health, safety, or legal/financial matters.
- Always consult a qualified professional for real-world guidance.
- Follow Google’s Prohibited Use Policy
License
This conversion follows the license of the source model:
Gemma Terms of Use
- -Required notice: see
NOTICE
- Downloads last month
- 169