--- language: - en - ko - zh license: apache-2.0 library_name: peft pipeline_tag: visual-question-answering tags: - vision - visual-question-answering - multimodal - qwen - lora - tcm - traditional-chinese-medicine - tongue-diagnosis --- # ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks. ## Model Details ### Model Description - **Developed by:** Mark-CHAE - **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct - **Language(s) (NLP):** Chinese - **License:** Apache-2.0 - **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct - **Specialization:** Traditional Chinese Medicine Tongue Diagnosis ### Model Sources - **Repository:** [Mark-CHAE/ ViTCM-LLM ](https://huggingface.co/Mark-CHAE/ViTCM-LLM) - **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct) ## Uses ### Direct Use This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including: - Traditional Chinese Medicine tongue diagnosis - Tongue image analysis and interpretation - Visual question answering for medical images - Multimodal medical conversations - Symptom analysis from tongue images ### Downstream Use The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks. ## How to Get Started with the Model ### Using the Inference Widget You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it. ### Using the Model in Code ```python from peft import PeftModel from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor import torch from PIL import Image # Load base model and tokenizer base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-VL-32B-Instruct", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM") # Prepare inputs image = Image.open("tongue_image.jpg") question = "根据图片判断舌诊内容" prompt = f"<|im_start|>user\n\n{question}<|im_end|>\n<|im_start|>assistant\n" inputs = processor( text=prompt, images=image, return_tensors="pt" ) # Generate response with torch.no_grad(): outputs = model.generate( **inputs, max_length=512, temperature=0.7, top_p=0.9, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) answer = response.split("<|im_start|>assistant")[-1].strip() print(answer) ``` ### Training Procedure #### Training Hyperparameters - **Training regime:** LoRA fine-tuning - **LoRA rank:** 64 - **LoRA alpha:** 128 - **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj #### Speeds, Sizes, Times - **Adapter size:** 2.2GB - **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters) #### Software - PEFT 0.15.2 - Transformers library - PyTorch **APA:** Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen ## Model Card Contact For questions about this model, please contact the model author. ### Framework versions - PEFT 0.15.2