Image-Text-to-Text
vllm

text generation from tokens obtained from image

#38
by jrtorrez31337 - opened

Hello, first post on this platform, plz pardon my ignorance.

We're using Pixtral-12B for image-to-text analysis and have noticed that the image encoding process consistently produces the same tokens, which we'd think is expected and glad to see so. However, the text generation phase seems to use a runtime-generated random seed (when none is explicitly provided), which results in variability in the outputs.

We've identified a "Great Prompt Strategy" that yields excellent results, but we're unable to reproduce these consistently because we can't capture or log the internally chosen seed. Some runs produce outstanding results, while others with the same prompt and image do not.

Is there any guidance on how to capture the seed for the text generation component, or otherwise achieve reproducible outputs?

Right now we have a seed randomizer running and are reviewing the output to find a seed we like until we die however when the script just runs the seed chosen internally seems to always be better than what our randomizer has chosen.

The struggle is real.

Thank you for any help and guidance provided.

Sign up or log in to comment