ViTCM-LLM / README.md
Mark-CHAE's picture
Update README.md
abe6183 verified
---
language:
- en
- ko
- zh
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
- vision
- visual-question-answering
- multimodal
- qwen
- lora
- tcm
- traditional-chinese-medicine
- tongue-diagnosis
---
# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model
This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.
## Model Details
### Model Description
- **Developed by:** Mark-CHAE
- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
- **Language(s) (NLP):** Chinese
- **License:** Apache-2.0
- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
- **Specialization:** Traditional Chinese Medicine Tongue Diagnosis
### Model Sources
- **Repository:** [Mark-CHAE/
ViTCM-LLM ](https://huggingface.co/Mark-CHAE/ViTCM-LLM)
- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
## Uses
### Direct Use
This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:
- Traditional Chinese Medicine tongue diagnosis
- Tongue image analysis and interpretation
- Visual question answering for medical images
- Multimodal medical conversations
- Symptom analysis from tongue images
### Downstream Use
The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.
## How to Get Started with the Model
### Using the Inference Widget
You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.
### Using the Model in Code
```python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-VL-32B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM")
# Prepare inputs
image = Image.open("tongue_image.jpg")
question = "根据图片判断舌诊内容"
prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"
inputs = processor(
text=prompt,
images=image,
return_tensors="pt"
)
# Generate response
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)
```
### Training Procedure
#### Training Hyperparameters
- **Training regime:** LoRA fine-tuning
- **LoRA rank:** 64
- **LoRA alpha:** 128
- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj
#### Speeds, Sizes, Times
- **Adapter size:** 2.2GB
- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)
#### Software
- PEFT 0.15.2
- Transformers library
- PyTorch
**APA:**
Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen
## Model Card Contact
For questions about this model, please contact the model author.
### Framework versions
- PEFT 0.15.2