Can you tell me how you transformed the model?

#1
by CocoRoF - opened

I was wondering how you removed the parameter from the model.

Was it simply deleting the 'Vision_tower' parameter?
or did you use some other additional method?

Hi there! Thank you for your question.
I extracted only the text model component from the multimodal Gemma-3-27b model using a straightforward approach. My transformation wasn't complex - I simply loaded the full multimodal model and extracted just the language_model component.
Here's what I did:

Loaded the complete Gemma-3-27b-it model using Gemma3ForConditionalGeneration
Extracted only the language_model component (text_model = full_model.language_model)
Saved this text-only component to a new directory
Uploaded this text-only version to Hugging Face

I didn't need to manually delete the vision_tower parameters or modify the architecture. The extraction approach automatically gives us just the language model portion without the vision components. This makes it easier to fine-tune or use in text-only applications where the full multimodal capabilities aren't needed.
The code I used is quite simple and just takes advantage of the model's modular architecture to isolate the text component.

Sign up or log in to comment