metadata

language:
  - en
  - ko
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
  - vision
  - visual-question-answering
  - multimodal
  - qwen
  - lora
  - tcm
  - traditional-chinese-medicine

ViTCM_LLM - Traditional Chinese Medicine Diagnosis Model

This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) diagnosis tasks.

Model Details

Model Description

Developed by: Mark-CHAE
Model type: LoRA Adapter for Qwen2.5-VL-32B-Instruct
Language(s) (NLP): English, Korean
License: Apache-2.0
Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct
Specialization: Traditional Chinese Medicine Diagnosis

Model Sources

Repository: Mark-CHAE/shezhen
Base Model: Qwen/Qwen2.5-VL-32B-Instruct

Uses

Direct Use

This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

Image understanding and description
Visual question answering
Image-text generation
Multimodal conversations
Traditional Chinese Medicine diagnosis
Symptom analysis from medical images

Downstream Use

The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

Out-of-Scope Use

This model should not be used for:

Generating harmful, offensive, or inappropriate content
Creating deepfakes or misleading visual content
Any illegal activities
Making actual medical diagnoses without proper medical supervision

Recommendations

Users should:

Verify outputs for accuracy and appropriateness
Be aware of potential biases in the model
Use appropriate safety measures when deploying
Not rely solely on this model for medical diagnosis
Consult qualified medical professionals for actual diagnosis

How to Get Started with the Model

Using the Inference Widget

You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload an image and ask a question about it.

Using the Model in Code

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")

# Prepare inputs
image = Image.open("your_image.jpg")
question = "根据图片判断舌诊内容"

prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)

Training Details

Training Data

The model was fine-tuned on multimodal vision-language data including English and Korean content, with specific focus on Traditional Chinese Medicine diagnosis scenarios.

Training Procedure

Training Hyperparameters

Training regime: LoRA fine-tuning
LoRA rank: 64
LoRA alpha: 128
Target modules: v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj

Speeds, Sizes, Times

Adapter size: 2.2GB
Base model: Qwen2.5-VL-32B-Instruct (32B parameters)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding.

Metrics

Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.

Results

[Evaluation results to be added]

Summary

This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine diagnosis tasks while maintaining the base model's capabilities.

Technical Specifications

Model Architecture and Objective

Architecture: LoRA adapter for Qwen2.5-VL-32B-Instruct
Objective: Multimodal vision-language understanding and generation, specialized for TCM Tongue diagnosis

Compute Infrastructure

Hardware

[To be specified]

Software

PEFT 0.15.2
Transformers library
PyTorch

APA:

Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

Model Card Contact

For questions about this model, please contact the model author.

Framework versions

PEFT 0.15.2