Can you tell me how you transformed the model?
I was wondering how you removed the parameter from the model.
Was it simply deleting the 'Vision_tower' parameter?
or did you use some other additional method?
Hi there! Thank you for your question.
I extracted only the text model component from the multimodal Gemma-3-27b model using a straightforward approach. My transformation wasn't complex - I simply loaded the full multimodal model and extracted just the language_model component.
Here's what I did:
Loaded the complete Gemma-3-27b-it model using Gemma3ForConditionalGeneration
Extracted only the language_model component (text_model = full_model.language_model)
Saved this text-only component to a new directory
Uploaded this text-only version to Hugging Face
I didn't need to manually delete the vision_tower parameters or modify the architecture. The extraction approach automatically gives us just the language model portion without the vision components. This makes it easier to fine-tune or use in text-only applications where the full multimodal capabilities aren't needed.
The code I used is quite simple and just takes advantage of the model's modular architecture to isolate the text component.