Gemma-2B-IT Fine-Tuned on Canadian Immigration Q&A

This model is a fine-tuned version of google/gemma-2b-it, trained by Arash Ghezavati to specialize in answering questions about Canadian immigration, study permits, Express Entry, work visas, and PR pathways.


Model Details

  • Base model: google/gemma-2b-it
  • Fine-tuned with: LoRA (Low-Rank Adaptation) on Q&A dataset
  • Training type: Instruction-style tuning with <|user|> and <|assistant|> prompts
  • Language: English πŸ‡¬πŸ‡§
  • License: MIT
  • Trained by: Arash Ghezavati

πŸ“š Dataset

Fine-tuned on a custom dataset created from real Canadian immigration content sourced from:

🧼 Dataset Format

Each entry is formatted as:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant providing information from Canadian immigration and government programs."},
    {"role": "user", "content": "What are the PR options for international students?"},
    {"role": "assistant", "content": "International students can apply for PR through the Canadian Experience Class, Provincial Nominee Programs, and more..."}
  ]
}

Use Cases

βœ… Direct Use

  • Ideal for bots answering immigration-related questions.
  • Used in production in Canada Immigration API Space.

🚫 Out-of-Scope Use

  • Not suitable for legal decision-making or replacing certified immigration consultants.
  • Not intended for multilingual queries (English only).

πŸ›  Training Details

  • Epochs: 3
  • Batch size: 2
  • Learning rate: 3e-4
  • Optimizer: AdamW
  • Adapter: LoRA (q_proj, v_proj modules)
  • Frameworks: Transformers, PEFT, TRL
  • Compute: Google Colab Pro (1 GPU)

πŸ“Š Evaluation

Manual testing across ~800 immigration Q&A examples showed:

  • βœ… Accurate extraction of information.
  • βœ… Context-specific answers.
  • βœ… Smooth conversational responses.

πŸ§ͺ Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("arashGh/gemma-2b-it-canada-immigration")
tokenizer = AutoTokenizer.from_pretrained("arashGh/gemma-2b-it-canada-immigration")

input_text = "Can I work more than 24 hours per week as a student?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🌍 Environmental Impact

  • Trained on: Google Colab (1x A100 GPU)
  • Time used: ~3 hours
  • Carbon Estimate: Low (light fine-tuning)

πŸ‘€ Author


πŸ™ Acknowledgements

Thanks to Google for releasing the Gemma base model and Hugging Face for providing the hosting and training tools.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for arashGh/gemma-2b-it-canada-immigration

Base model

google/gemma-2b-it
Finetuned
(61)
this model

Space using arashGh/gemma-2b-it-canada-immigration 1