How to enable streaming for phi 3 vision model ?
I have developed an interface to chat with this model and was exploring how to stream the output.
https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-phi-3-vision-128k-instruct
But I couldn't get it right.
What have you tried?
Thanks @dranger003 for the script.
I used the existing TextIterabeStreamer and got it working.
#streaming
from threading import Thread
from transformers import TextIteratorStreamer
streamer = TextIteratorStreamer(processor.tokenizer,skip_prompt=True,skip_special_tokens=True,clean_up_tokenization_spaces=False)
# Run the generation in a separate thread, so that we can fetch the generated text in a non-blocking way.
generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=512, eos_token_id=processor.tokenizer.eos_token_id)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
for text in streamer:
print(text, end="", flush=True)
@sebbyjp , I was getting errors due to some parameter misconfiguration. Finally, it works now.
Awesome! Are you able to run batched inference with image inputs?
Awesome! Are you able to run batched inference with image inputs?
Thank you for the feedback! I haven't had the chance to check out batched inference with image inputs yet, but I'll definitely look into it. I appreciate you bringing it to my attention.
By the way, I have a studio deployed that you can try out. Feel free to explore it here: Deploy and Chat with PHI 3 Vision 128K Instruct.