Li Zhang's picture

5 2 1

Li Zhang

Andcircle

·

AI & ML interests

None yet

Recent Activity

upvoted an article about 1 month ago

Understanding InstaFlow/Rectified Flow

upvoted an article 8 months ago

Recoloring photos with diffusers

reacted to BramVanroy's post with 👍 about 1 year ago

Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others. Any prior experience that you can share or suggestions to improve throughout?

View all activity

Organizations

None yet

commented a paper over 1 year ago

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2, 2024 • 68 •

New activity in mistralai/Mistral-7B-v0.1 almost 2 years ago

Does Mistral support accelerate library?

#65 opened almost 2 years ago by

New activity in tiiuae/falcon-40b about 2 years ago

[Bug] Does not work

#3 opened about 2 years ago by