How to set input image resolution?

#18

by kkjh0723 - opened Jul 3

Jul 3

According to the model card, it seems gemma 3n takes arbitrary image resolution and normalize it to 256x256, 512x512, or 768x768 resolution.
Is the normalized resolution set automatically based on the input image resolution?
I wonder if I can fix the normalized resolution to one of 256x256, 512x512, or 768x768 resolution to reduce the GPU memory?

xbruce22

Jul 6

I don't think you can change that, the size is fixed according to processor file

check out here
https://huggingface.co/google/gemma-3n-E4B-it/blob/main/preprocessor_config.json

BalakrishnaCh

Google org Jul 7

Hi @kkjh0723 ,

Welcome to Google Gemma family of open source models, thanks for reaching out to us - The Gemma model takes image of any arbitrary size and if the given image is not fall under any of the model specified resolutions (256x256, 512x512, or 768x76), the model vision encoder converts the image into suitable resolution accordingly. If you like to pass the image with any of the above specified resolution you can, then the model will directly consider the given image as an input.

Thanks.

kkjh0723

Jul 7

@BalakrishnaCh , Thanks for the clarification!

I also had a quick question: when passing multiple images as input (more than 10), I sometimes run into GPU memory issues. Is it possible to customize the image size to the smallest resolutions (256) to reduce memory usage?

Appreciate your help!

BalakrishnaCh

Google org Jul 7

@kkjh0723 , Yes you can able to resize the image to different resolutions by using the PIL functionalities and then pass the re-sized images to the model as an input. Please find the attached gist file for you reference.

Thanks.

kkjh0723

Jul 8

@BalakrishnaCh , Thanks for your help!

kkjh0723 changed discussion status to closed Jul 8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment