CPU support

#12

by travisking - opened 15 days ago

15 days ago

Given that gemma-3n is meant for edge devices, is there/will there be support for CPU inference (even quantized)? All examples seem to use CUDA, and when I tried some attempts at CPU inference (both naive and with torchao quant) it just hung at inference forever and never finished.

I understand that there are llama.cpp quants but they are text-only (at time of writing) so a no-go for me as I need the multimodal input features.

travisking

15 days ago

cc @sergiopaniego @ariG23498

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment