--- base_model: - google/gemma-3-1b-it library_name: peft model_name: Gemma-3-1B-it-Medical-LoRA tags: - generated_from_trainer - unsloth - sft - trl licence: license datasets: - tmnam20/ViMedAQA language: - vi --- # Model Card for Gemma-3-1B-it-Medical-LoRA This model is a fine-tuned version of [unsloth/gemma-3-1b-it-unsloth-bnb-4bit](https://huggingface.co/unsloth/gemma-3-1b-it-unsloth-bnb-4bit). It has been trained using [TRL](https://github.com/huggingface/trl). ## Quick start ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="None", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from peft import PeftModel # Define model and LoRA adapter paths base_model_name = "unsloth/gemma-3-1b-it" lora_adapter_name = "heboya8/Gemma-3-1B-it-Medical-LoRA" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model_name) # Load base model with optimized settings model = AutoModelForCausalLM.from_pretrained( base_model_name, torch_dtype=torch.float16, # Use FP16 for efficiency device_map="cpu", # Explicitly map to CUDA device trust_remote_code=True ) # Apply LoRA adapter model = PeftModel.from_pretrained(model, lora_adapter_name) # Set model to evaluation mode model.eval() # Create text generation pipeline generator = pipeline( "text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.float16, device_map="cuda", max_new_tokens=128, # Limit response length as per original script ) # Define the question question = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào " "tại bệnh viện để thăm khám?") # Format input for the pipeline input_prompt = [{"role": "user", "content": question}] # Generate response output = generator(input_prompt, return_full_text=False)[0] # Print the generated text print(output["generated_text"]) ``` ## Training procedure This model was trained with SFT. ### Framework versions - PEFT 0.14.0 - TRL: 0.19.0 - Transformers: 4.52.4 - Pytorch: 2.6.0+cu124 - Datasets: 3.6.0 - Tokenizers: 0.21.1 ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```