Gemma 3 4B Instruct for RK3588
This version of Gemma 3 has been converted to run on the RK3588 NPU using w8a8
quantisation and rkllm-toolkit v1.2.1.
Compatible with RKLLM runtime version: 1.2.x
Useful links:
Pretty much anything by these folks: marty1885 and happyme531
Conversion Python script
Based on instructions from airockchip/rknn-llm #240
gemma-3-conversion.py
from rkllm.api import RKLLM
from transformers import Gemma3Processor, Gemma3ForConditionalGeneration
import safetensors
import torch
# Unsloth version
modelpath = 'unsloth/gemma-3-4b-it'
model = Gemma3ForConditionalGeneration.from_pretrained(modelpath, device_map='cpu', torch_dtype=torch.bfloat16).eval()
processor = Gemma3Processor.from_pretrained(modelpath, use_fast=True)
model.language_model.save_pretrained('llm')
processor.save_pretrained('llm')
del model
model = None
del processor
processor = None
modelpath = 'llm'
savepath = 'llm/gemma-3-4b-it-g128.rkllm'
llm = RKLLM()
ret = llm.load_huggingface(model=modelpath, device='cpu')
if ret != 0:
print('Load model failed!')
exit(ret)
ret = llm.build(
do_quantization=True,
optimization_level=0,
quantized_dtype='w8a8',
# hybrid ratio of 25% gives a good balance
hybrid_rate=0.25,
max_context=4096 * 4,
quantized_algorithm='normal',
target_platform='rk3588',
num_npu_core=3,
extra_qparams=None,
dataset=None
)
if ret != 0:
print('Build model failed!')
exit(ret)
ret = llm.export_rkllm(savepath)
if ret != 0:
print('Export model failed!')
exit(ret)
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support