language:
- en
- ko
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
- vision
- visual-question-answering
- multimodal
- qwen
- lora
- tcm
- traditional-chinese-medicine
ViTCM_LLM - Traditional Chinese Medicine Diagnosis Model
This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) diagnosis tasks.
Model Details
Model Description
- Developed by: Mark-CHAE
- Model type: LoRA Adapter for Qwen2.5-VL-32B-Instruct
- Language(s) (NLP): English, Korean
- License: Apache-2.0
- Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct
- Specialization: Traditional Chinese Medicine Diagnosis
Model Sources
- Repository: Mark-CHAE/shezhen
- Base Model: Qwen/Qwen2.5-VL-32B-Instruct
Uses
Direct Use
This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
- Image understanding and description
- Visual question answering
- Image-text generation
- Multimodal conversations
- Traditional Chinese Medicine diagnosis
- Symptom analysis from medical images
Downstream Use
The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
Out-of-Scope Use
This model should not be used for:
- Generating harmful, offensive, or inappropriate content
- Creating deepfakes or misleading visual content
- Any illegal activities
- Making actual medical diagnoses without proper medical supervision
Recommendations
Users should:
- Verify outputs for accuracy and appropriateness
- Be aware of potential biases in the model
- Use appropriate safety measures when deploying
- Not rely solely on this model for medical diagnosis
- Consult qualified medical professionals for actual diagnosis
How to Get Started with the Model
Using the Inference Widget
You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload an image and ask a question about it.
Using the Model in Code
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-VL-32B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")
# Prepare inputs
image = Image.open("your_image.jpg")
question = "根据图片判断舌诊内容"
prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
inputs = processor(
text=prompt,
images=image,
return_tensors="pt"
)
# Generate response
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)
Training Details
Training Data
The model was fine-tuned on multimodal vision-language data including English and Korean content, with specific focus on Traditional Chinese Medicine diagnosis scenarios.
Training Procedure
Training Hyperparameters
- Training regime: LoRA fine-tuning
- LoRA rank: 64
- LoRA alpha: 128
- Target modules: v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
Speeds, Sizes, Times
- Adapter size: 2.2GB
- Base model: Qwen2.5-VL-32B-Instruct (32B parameters)
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding.
Metrics
Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.
Results
[Evaluation results to be added]
Summary
This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine diagnosis tasks while maintaining the base model's capabilities.
Technical Specifications
Model Architecture and Objective
- Architecture: LoRA adapter for Qwen2.5-VL-32B-Instruct
- Objective: Multimodal vision-language understanding and generation, specialized for TCM Tongue diagnosis
Compute Infrastructure
Hardware
[To be specified]
Software
- PEFT 0.15.2
- Transformers library
- PyTorch
APA:
Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
Model Card Contact
For questions about this model, please contact the model author.
Framework versions
- PEFT 0.15.2