Gemma-2B-IT Fine-Tuned on Canadian Immigration Q&A
This model is a fine-tuned version of google/gemma-2b-it
, trained by Arash Ghezavati to specialize in answering questions about Canadian immigration, study permits, Express Entry, work visas, and PR pathways.
Model Details
- Base model:
google/gemma-2b-it
- Fine-tuned with: LoRA (Low-Rank Adaptation) on Q&A dataset
- Training type: Instruction-style tuning with
<|user|>
and<|assistant|>
prompts - Language: English π¬π§
- License: MIT
- Trained by: Arash Ghezavati
π Dataset
Fine-tuned on a custom dataset created from real Canadian immigration content sourced from:
- canada.ca
- alberta.ca
- cic.gc.ca
- Other provincial and legal sources
π§Ό Dataset Format
Each entry is formatted as:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant providing information from Canadian immigration and government programs."},
{"role": "user", "content": "What are the PR options for international students?"},
{"role": "assistant", "content": "International students can apply for PR through the Canadian Experience Class, Provincial Nominee Programs, and more..."}
]
}
Use Cases
β Direct Use
- Ideal for bots answering immigration-related questions.
- Used in production in Canada Immigration API Space.
π« Out-of-Scope Use
- Not suitable for legal decision-making or replacing certified immigration consultants.
- Not intended for multilingual queries (English only).
π Training Details
- Epochs: 3
- Batch size: 2
- Learning rate: 3e-4
- Optimizer: AdamW
- Adapter: LoRA (q_proj, v_proj modules)
- Frameworks: Transformers, PEFT, TRL
- Compute: Google Colab Pro (1 GPU)
π Evaluation
Manual testing across ~800 immigration Q&A examples showed:
- β Accurate extraction of information.
- β Context-specific answers.
- β Smooth conversational responses.
π§ͺ Example Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("arashGh/gemma-2b-it-canada-immigration")
tokenizer = AutoTokenizer.from_pretrained("arashGh/gemma-2b-it-canada-immigration")
input_text = "Can I work more than 24 hours per week as a student?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Environmental Impact
- Trained on: Google Colab (1x A100 GPU)
- Time used: ~3 hours
- Carbon Estimate: Low (light fine-tuning)
π€ Author
- Name: Arash Ghezavati
- Location: Vancouver, Canada
- Profile: huggingface.co/arashGh
π Acknowledgements
Thanks to Google for releasing the Gemma base model and Hugging Face for providing the hosting and training tools.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for arashGh/gemma-2b-it-canada-immigration
Base model
google/gemma-2b-it