ViTCM-LLM / model_card.md
Mark-CHAE's picture
Upload folder using huggingface_hub
8374b0f verified
metadata
language:
  - en
  - ko
license: apache-2.0
library_name: peft
pipeline_tag: visual-question-answering
tags:
  - vision
  - visual-question-answering
  - multimodal
  - qwen
  - lora
  - tcm
  - traditional-chinese-medicine

ViTCM_LLM - Traditional Chinese Medicine Diagnosis Model

This is a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-VL-32B-Instruct model, fine-tuned specifically for Traditional Chinese Medicine (TCM) diagnosis tasks.

Model Details

Model Description

  • Developed by: Mark-CHAE
  • Model type: LoRA Adapter for Qwen2.5-VL-32B-Instruct
  • Language(s) (NLP): English, Korean
  • License: Apache-2.0
  • Finetuned from model: Qwen/Qwen2.5-VL-32B-Instruct
  • Specialization: Traditional Chinese Medicine Diagnosis

Model Sources

Uses

Direct Use

This LoRA adapter can be used with the base Qwen2.5-VL-32B-Instruct model for multimodal vision-language tasks including:

  • Image understanding and description
  • Visual question answering
  • Image-text generation
  • Multimodal conversations
  • Traditional Chinese Medicine diagnosis
  • Symptom analysis from medical images

Downstream Use

The adapter can be loaded with the base model for inference or further fine-tuning on specific TCM diagnosis tasks.

Out-of-Scope Use

This model should not be used for:

  • Generating harmful, offensive, or inappropriate content
  • Creating deepfakes or misleading visual content
  • Any illegal activities
  • Making actual medical diagnoses without proper medical supervision

Recommendations

Users should:

  • Verify outputs for accuracy and appropriateness
  • Be aware of potential biases in the model
  • Use appropriate safety measures when deploying
  • Not rely solely on this model for medical diagnosis
  • Consult qualified medical professionals for actual diagnosis

How to Get Started with the Model

Using the Inference Widget

You can try the model directly in the browser using the Visual Question Answering widget above. Simply upload an image and ask a question about it.

Using the Model in Code

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
import torch
from PIL import Image

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Mark-CHAE/shezhen")

# Prepare inputs
image = Image.open("your_image.jpg")
question = "根据图片判断舌诊内容"

prompt = f"<|im_start|>user\n<image>\n{question}<|im_end|>\n<|im_start|>assistant\n"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = response.split("<|im_start|>assistant")[-1].strip()
print(answer)

Training Details

Training Data

The model was fine-tuned on multimodal vision-language data including English and Korean content, with specific focus on Traditional Chinese Medicine diagnosis scenarios.

Training Procedure

Training Hyperparameters

  • Training regime: LoRA fine-tuning
  • LoRA rank: 64
  • LoRA alpha: 128
  • Target modules: v_proj, qkv, attn.proj, q_proj, gate_proj, down_proj, up_proj, o_proj, k_proj

Speeds, Sizes, Times

  • Adapter size: 2.2GB
  • Base model: Qwen2.5-VL-32B-Instruct (32B parameters)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed on multimodal vision-language benchmarks with focus on medical image understanding.

Metrics

Standard vision-language evaluation metrics including accuracy, BLEU, and human evaluation scores.

Results

[Evaluation results to be added]

Summary

This LoRA adapter provides an efficient way to adapt the Qwen2.5-VL-32B-Instruct model for Traditional Chinese Medicine diagnosis tasks while maintaining the base model's capabilities.

Technical Specifications

Model Architecture and Objective

  • Architecture: LoRA adapter for Qwen2.5-VL-32B-Instruct
  • Objective: Multimodal vision-language understanding and generation, specialized for TCM Tongue diagnosis

Compute Infrastructure

Hardware

[To be specified]

Software

  • PEFT 0.15.2
  • Transformers library
  • PyTorch

APA:

Mark-CHAE. (2024). ViTCM_LLM: Traditional Chinese Medicine Diagnosis Model. Hugging Face. https://huggingface.co/Mark-CHAE/shezhen

Model Card Contact

For questions about this model, please contact the model author.

Framework versions

  • PEFT 0.15.2