Echo9Zulu
/

gemma-3-4b-it-qat-int4_asym-ov

Image-Text-to-Text

Model card Files Files and versions Community

Echo9Zulu commited on Apr 18

Commit

477f5bc

·

verified ·

1 Parent(s): 88cc288

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -29,8 +29,8 @@ optimum-cli export openvino -m ""input-model"" --task image-text-to-text --weigh
 ### What does the test code do?
 Well, it demonstrates how to inference in python *and* what parts of that code are important for benchmarking performance.
-Text generation offers different challenges than text-generation with images; for examples, vision encoders often use different strategies for handling properties an image can have.
-In practice this translates to higher memory usage, reduced throughput or bad results.
 To run the test code:

 ### What does the test code do?
 Well, it demonstrates how to inference in python *and* what parts of that code are important for benchmarking performance.
+Text generation offers different challenges than text-generation with images; for examples, vision encoders often use different strategies for handling properties an image can have; to get good performance **be mindful of image resolution**.
+In practice this can translate to higher memory usage, reduced throughput and greater variety in results. Gemma-3 uses SigLIP 2 which has many SOTA optimizations; even so, effort in the preprocessing stage of a pipeline makes a world of difference.
 To run the test code: