Reference VRAM usage
#8
by
av-codes
- opened
When running the model and the demo in the reference version - it requires ~10GB VRAM, which is suspiciously high for a 2B model (something wrong with the reference code?). Input tensors are not freed and CUDA cache is not cleared between inferences, so the VRAM stays allocated after the first inference in the demo.
Also, similarly to the base model, min/max pixels affect the VRAM usage drastically: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/discussions/10
With ShowUI, anything below 10242828 seems to degrade the predictions too much.
Thank you for the information! We will look into the optimization of model inference:)