Gemma-2B-IT Fine-Tuned on Canadian Immigration Q&A

This model is a fine-tuned version of google/gemma-2b-it, trained by Arash Ghezavati to specialize in answering questions about Canadian immigration, study permits, Express Entry, work visas, and PR pathways.

Model Details

Base model: google/gemma-2b-it
Fine-tuned with: LoRA (Low-Rank Adaptation) on Q&A dataset
Training type: Instruction-style tuning with <|user|> and <|assistant|> prompts
Language: English 🇬🇧
License: MIT
Trained by: Arash Ghezavati

📚 Dataset

Fine-tuned on a custom dataset created from real Canadian immigration content sourced from:

canada.ca
alberta.ca
cic.gc.ca
Other provincial and legal sources

🧼 Dataset Format

Each entry is formatted as:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant providing information from Canadian immigration and government programs."},
    {"role": "user", "content": "What are the PR options for international students?"},
    {"role": "assistant", "content": "International students can apply for PR through the Canadian Experience Class, Provincial Nominee Programs, and more..."}
  ]
}

Use Cases

✅ Direct Use

Ideal for bots answering immigration-related questions.
Used in production in Canada Immigration API Space.

🚫 Out-of-Scope Use

Not suitable for legal decision-making or replacing certified immigration consultants.
Not intended for multilingual queries (English only).

🛠 Training Details

Epochs: 3
Batch size: 2
Learning rate: 3e-4
Optimizer: AdamW
Adapter: LoRA (q_proj, v_proj modules)
Frameworks: Transformers, PEFT, TRL
Compute: Google Colab Pro (1 GPU)

📊 Evaluation

Manual testing across ~800 immigration Q&A examples showed:

✅ Accurate extraction of information.
✅ Context-specific answers.
✅ Smooth conversational responses.

🧪 Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("arashGh/gemma-2b-it-canada-immigration")
tokenizer = AutoTokenizer.from_pretrained("arashGh/gemma-2b-it-canada-immigration")

input_text = "Can I work more than 24 hours per week as a student?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🌍 Environmental Impact

Trained on: Google Colab (1x A100 GPU)
Time used: ~3 hours
Carbon Estimate: Low (light fine-tuning)

👤 Author

Name: Arash Ghezavati
Location: Vancouver, Canada
Profile: huggingface.co/arashGh

🙏 Acknowledgements

Thanks to Google for releasing the Gemma base model and Hugging Face for providing the hosting and training tools.

arashGh
/

gemma-2b-it-canada-immigration