File size: 3,656 Bytes

---
language:
- en
- ko
- zh
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
- vision
- visual-question-answering
- multimodal
- qwen
- lora
- tcm
- traditional-chinese-medicine
- tongue-diagnosis
---

# ViTCM_LLM - Traditional Chinese Medicine Tongue Diagnosis Model

This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) tongue diagnosis tasks.

## Model Details

### Model Description

- **Developed by:** Mark-CHAE
- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
- **Language(s) (NLP):** Chinese
- **License:** Apache-2.0
- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
- **Specialization:** Traditional Chinese Medicine Tongue Diagnosis

### Model Sources

- **Repository:** [Mark-CHAE/
ViTCM-LLM ](https://huggingface.co/Mark-CHAE/ViTCM-LLM)
- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)

## Uses

### Direct Use

This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

- Traditional Chinese Medicine tongue diagnosis
- Tongue image analysis and interpretation
- Visual question answering for medical images
- Multimodal medical conversations
- Symptom analysis from tongue images

### Downstream Use

The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

## How to Get Started with the Model

### Using the Inference Widget

You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload a tongue image and ask a question about it.

### Using the Model in Code

```python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/ViTCM-LLM")

# Prepare inputs
image = Image.open("tongue_image.jpg")
question = "根据图片判断舌诊内容"

prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)
```


### Training Procedure

#### Training Hyperparameters

- **Training regime:** LoRA fine-tuning
- **LoRA rank:** 64
- **LoRA alpha:** 128
- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj


#### Speeds, Sizes, Times

- **Adapter size:** 2.2GB
- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)


#### Software

- PEFT 0.15.2
- Transformers library
- PyTorch



**APA:**

Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Tongue Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

## Model Card Contact

For questions about this model, please contact the model author.

### Framework versions

- PEFT 0.15.2