CPU support
#12
by
travisking
- opened
Given that gemma-3n is meant for edge devices, is there/will there be support for CPU inference (even quantized)? All examples seem to use CUDA, and when I tried some attempts at CPU inference (both naive and with torchao quant) it just hung at inference forever and never finished.
I understand that there are llama.cpp quants but they are text-only (at time of writing) so a no-go for me as I need the multimodal input features.