Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
5
2
1
Li Zhang
Andcircle
Follow
0 followers
·
2 following
AI & ML interests
None yet
Recent Activity
upvoted
an
article
about 1 month ago
Understanding InstaFlow/Rectified Flow
upvoted
an
article
8 months ago
Recoloring photos with diffusers
reacted
to
BramVanroy
's
post
with 👍
about 1 year ago
Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others. Any prior experience that you can share or suggestions to improve throughout?
View all activity
Organizations
None yet
Andcircle
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
about 1 year ago
google/paligemma-3b-mix-224
Image-Text-to-Text
•
3B
•
Updated
Jul 19, 2024
•
278k
•
83