How to set input image resolution?

#18
by kkjh0723 - opened

According to the model card, it seems gemma 3n takes arbitrary image resolution and normalize it to 256x256, 512x512, or 768x768 resolution.
Is the normalized resolution set automatically based on the input image resolution?
I wonder if I can fix the normalized resolution to one of 256x256, 512x512, or 768x768 resolution to reduce the GPU memory?

I don't think you can change that, the size is fixed according to processor file

check out here
https://huggingface.co/google/gemma-3n-E4B-it/blob/main/preprocessor_config.json

Google org

Hi @kkjh0723 ,

Welcome to Google Gemma family of open source models, thanks for reaching out to us - The Gemma model takes image of any arbitrary size and if the given image is not fall under any of the model specified resolutions (256x256, 512x512, or 768x76), the model vision encoder converts the image into suitable resolution accordingly. If you like to pass the image with any of the above specified resolution you can, then the model will directly consider the given image as an input.

Thanks.

@BalakrishnaCh , Thanks for the clarification!

I also had a quick question: when passing multiple images as input (more than 10), I sometimes run into GPU memory issues. Is it possible to customize the image size to the smallest resolutions (256) to reduce memory usage?

Appreciate your help!

Google org

@kkjh0723 , Yes you can able to resize the image to different resolutions by using the PIL functionalities and then pass the re-sized images to the model as an input. Please find the attached gist file for you reference.

Thanks.

@BalakrishnaCh , Thanks for your help!

kkjh0723 changed discussion status to closed

Sign up or log in to comment