---
language:
- en
- ko
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
- vision
- visual-question-answering
- multimodal
- qwen
- lora
- tcm
- traditional-chinese-medicine
---

# ViTCM_LLM - Traditional Chinese Medicine Diagnosis Model

This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) diagnosis tasks.

## Model Details

### Model Description

- **Developed by:** Mark-CHAE
- **Model type:** LoRA Adapter for Qwen2.5-VL-32B-Instruct
- **Language(s) (NLP):** English, Korean
- **License:** Apache-2.0
- **Finetuned from model:** Qwen/Qwen2.5-VL-32B-Instruct
- **Specialization:** Traditional Chinese Medicine Diagnosis

### Model Sources

- **Repository:** [Mark-CHAE/shezhen](https://huggingface.co/Mark-CHAE/shezhen)
- **Base Model:** [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)

## Uses

### Direct Use

This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

- Image understanding and description
- Visual question answering
- Image-text generation
- Multimodal conversations
- Traditional Chinese Medicine diagnosis
- Symptom analysis from medical images

### Downstream Use

The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

### Out-of-Scope Use

This model should not be used for:

- Generating harmful, offensive, or inappropriate content
- Creating deepfakes or misleading visual content
- Any illegal activities
- Making actual medical diagnoses without proper medical supervision

### Recommendations

Users should:

- Verify outputs for accuracy and appropriateness
- Be aware of potential biases in the model
- Use appropriate safety measures when deploying
- Not rely solely on this model for medical diagnosis
- Consult qualified medical professionals for actual diagnosis

## How to Get Started with the Model

### Using the Inference Widget

You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload an image and ask a question about it.

### Using the Model in Code

```python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")

# Prepare inputs
image = Image.open("your_image.jpg")
question = "根据图片判断舌诊内容"

prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)
```

## Training Details

### Training Data

The model was fine-tuned on multimodal vision-language data including English and Korean content, with specific focus on Traditional Chinese Medicine diagnosis scenarios.

### Training Procedure

#### Training Hyperparameters

- **Training regime:** LoRA fine-tuning
- **LoRA rank:** 64
- **LoRA alpha:** 128
- **Target modules:** v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj

#### Speeds, Sizes, Times

- **Adapter size:** 2.2GB
- **Base model:** Qwen2.5-VL-32B-Instruct (32B parameters)

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding.

#### Metrics

Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.

### Results

[Evaluation results to be added]

#### Summary

This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine diagnosis tasks while maintaining the base model's capabilities.


## Technical Specifications

### Model Architecture and Objective

- **Architecture:** LoRA adapter for Qwen2.5-VL-32B-Instruct
- **Objective:** Multimodal vision-language understanding and generation, specialized for TCM Tongue diagnosis

### Compute Infrastructure

#### Hardware

[To be specified]

#### Software

- PEFT 0.15.2
- Transformers library
- PyTorch


**APA:**

Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

## Model Card Contact

For questions about this model, please contact the model author.

### Framework versions

- PEFT 0.15.2