|
--- |
|
language: |
|
- en |
|
- ko |
|
- zh |
|
license: apache-2.0 |
|
library_name: peft |
|
pipeline_tag: visual-question-answering |
|
tags: |
|
- vision |
|
- visual-question-answering |
|
- multimodal |
|
- qwen |
|
- lora |
|
- tcm |
|
- traditional-chinese-medicine |
|
- tongue-diagnosis |
|
--- |
|
|
|
# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model |
|
|
|
This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Mark-CHAE |
|
- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct |
|
- **Language(s) (NLP):** Chinese |
|
- **License:** Apache-2.0 |
|
- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct |
|
- **Specialization:** Traditional Chinese Medicine Tongue Diagnosis |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [Mark-CHAE/ |
|
ViTCM-LLM ](https://huggingface.co/Mark-CHAE/ViTCM-LLM) |
|
- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including: |
|
|
|
- Traditional Chinese Medicine tongue diagnosis |
|
- Tongue image analysis and interpretation |
|
- Visual question answering for medical images |
|
- Multimodal medical conversations |
|
- Symptom analysis from tongue images |
|
|
|
### Downstream Use |
|
|
|
The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks. |
|
|
|
## How to Get Started with the Model |
|
|
|
### Using the Inference Widget |
|
|
|
You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it. |
|
|
|
### Using the Model in Code |
|
|
|
```python |
|
from peft import PeftModel |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor |
|
import torch |
|
from PIL import Image |
|
|
|
# Load base model and tokenizer |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"Qwen/Qwen2.5-VL-32B-Instruct", |
|
torch_dtype=torch.float16, |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") |
|
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct") |
|
|
|
# Load LoRA adapter |
|
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM") |
|
|
|
# Prepare inputs |
|
image = Image.open("tongue_image.jpg") |
|
question = "根据图片判断舌诊内容" |
|
|
|
prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n" |
|
|
|
inputs = processor( |
|
text=prompt, |
|
images=image, |
|
return_tensors="pt" |
|
) |
|
|
|
# Generate response |
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_length=512, |
|
temperature=0.7, |
|
top_p=0.9, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
answer = response.split("<|im_start|>assistant")[-1].strip() |
|
print(answer) |
|
``` |
|
|
|
|
|
### Training Procedure |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** LoRA fine-tuning |
|
- **LoRA rank:** 64 |
|
- **LoRA alpha:** 128 |
|
- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj |
|
|
|
|
|
#### Speeds, Sizes, Times |
|
|
|
- **Adapter size:** 2.2GB |
|
- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters) |
|
|
|
|
|
#### Software |
|
|
|
- PEFT 0.15.2 |
|
- Transformers library |
|
- PyTorch |
|
|
|
|
|
|
|
**APA:** |
|
|
|
Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen |
|
|
|
## Model Card Contact |
|
|
|
For questions about this model, please contact the model author. |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.15.2 |