--- language: - en - ko license: apache-2.0 library_name: peft pipeline_tag: visual-question-answering tags: - vision - visual-question-answering - multimodal - qwen - lora - tcm - traditional-chinese-medicine --- # ViTCM_LLM - Traditional Chinese Medicine Diagnosis Model This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) diagnosis tasks. ## Model Details ### Model Description - **Developed by:** Mark-CHAE - **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct - **Language(s) (NLP):** English, Korean - **License:** Apache-2.0 - **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct - **Specialization:** Traditional Chinese Medicine Diagnosis ### Model Sources - **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen) - **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct) ## Uses ### Direct Use This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including: - Image understanding and description - Visual question answering - Image-text generation - Multimodal conversations - Traditional Chinese Medicine diagnosis - Symptom analysis from medical images ### Downstream Use The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks. ### Out-of-Scope Use This model should not be used for: - Generating harmful, offensive, or inappropriate content - Creating deepfakes or misleading visual content - Any illegal activities - Making actual medical diagnoses without proper medical supervision ### Recommendations Users should: - Verify outputs for accuracy and appropriateness - Be aware of potential biases in the model - Use appropriate safety measures when deploying - Not rely solely on this model for medical diagnosis - Consult qualified medical professionals for actual diagnosis ## How to Get Started with the Model ### Using the Inference Widget You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload an image and ask a question about it. ### Using the Model in Code ```python from peft import PeftModel from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor import torch from PIL import Image # Load base model and tokenizer base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-VL-32B-Instruct", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen") # Prepare inputs image = Image.open("your_image.jpg") question = "根据图片判断舌诊内容" prompt = f"<|im_start|>user\n\n{question}<|im_end|>\n<|im_start|>assistant\n" inputs = processor( text=prompt, images=image, return_tensors="pt" ) # Generate response with torch.no_grad(): outputs = model.generate( **inputs, max_length=512, temperature=0.7, top_p=0.9, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) answer = response.split("<|im_start|>assistant")[-1].strip() print(answer) ``` ## Training Details ### Training Data The model was fine-tuned on multimodal vision-language data including English and Korean content, with specific focus on Traditional Chinese Medicine diagnosis scenarios. ### Training Procedure #### Training Hyperparameters - **Training regime:** LoRA fine-tuning - **LoRA rank:** 64 - **LoRA alpha:** 128 - **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj #### Speeds, Sizes, Times - **Adapter size:** 2.2GB - **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters) ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding. #### Metrics Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores. ### Results [Evaluation results to be added] #### Summary This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine diagnosis tasks while maintaining the base model's capabilities. ## Technical Specifications ### Model Architecture and Objective - **Architecture:** LoRA adapter for Qwen2.5-VL-32B-Instruct - **Objective:** Multimodal vision-language understanding and generation, specialized for TCM Tongue diagnosis ### Compute Infrastructure #### Hardware [To be specified] #### Software - PEFT 0.15.2 - Transformers library - PyTorch **APA:** Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen ## Model Card Contact For questions about this model, please contact the model author. ### Framework versions - PEFT 0.15.2