Li Zhang's picture

5 2 1

Li Zhang

Andcircle

·

AI & ML interests

None yet

Recent Activity

upvoted an article about 1 month ago

Understanding InstaFlow/Rectified Flow

upvoted an article 8 months ago

Recoloring photos with diffusers

reacted to BramVanroy's post with 👍 about 1 year ago

Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others. Any prior experience that you can share or suggestions to improve throughout?

View all activity

Organizations

None yet

liked a model about 1 year ago

google/paligemma-3b-mix-224

Image-Text-to-Text • 3B • Updated Jul 19, 2024 • 278k • 83