Hola world.
Browse files- LICENSE.md +36 -0
- README.md +248 -2
- adapter_config.json +26 -0
- adapter_model.bin +3 -0
- biomedclipcxr_518.json +17 -0
- biomedclipcxr_518_checkpoint.pt +3 -0
- config.json +40 -0
- non_lora_trainables.bin +3 -0
- results.jpg +0 -0
- results_top.jpg +0 -0
LICENSE.md
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
MICROSOFT RESEARCH LICENSE TERMS
|
2 |
+
IF YOU LIVE IN THE UNITED STATES, PLEASE READ THE “BINDING ARBITRATION AND CLASS ACTION WAIVER” SECTION BELOW. IT AFFECTS HOW DISPUTES ARE RESOLVED.
|
3 |
+
These license terms are an agreement between you and Microsoft Corporation (or one of its affiliates). They apply to the source code, object code, machine learning models, or data (collectively “Materials”) that accompany this license. IF YOU COMPLY WITH THESE LICENSE TERMS, YOU HAVE THE RIGHTS BELOW. BY USING THE MATERIALS, YOU ACCEPT THESE TERMS.
|
4 |
+
1) INSTALLATION AND USE RIGHTS TO THE MATERIALS.
|
5 |
+
Subject to the terms of this agreement, you have the below rights, if applicable, to use the Materials solely for non-commercial, non-revenue generating, research purposes:
|
6 |
+
a) Source Code. If source code is included, you may use and modify the source code, but you may not distribute the source code.
|
7 |
+
b) Object Code. If object code is included, you may use the object code, but you may not distribute the object code.
|
8 |
+
c) Models. If machine learning model(s) are included, you may use the model(s), but you may not distribute the models.
|
9 |
+
d) Data. If data is included, you may use and modify the data, but your use and modification must be consistent with the consent under which the data was provided and/or gathered and you may not distribute the data or your modifications to the data.
|
10 |
+
2) SCOPE OF LICENSE. The Materials are licensed, not sold. Microsoft reserves all other rights. Unless applicable law gives you more rights despite this limitation, you will not (and have no right to):
|
11 |
+
a) work around any technical limitations in the Materials that only allow you to use it in certain ways;
|
12 |
+
b) reverse engineer, decompile or disassemble the Materials;
|
13 |
+
c) remove, minimize, block, or modify any notices of Microsoft or its suppliers in the Materials;
|
14 |
+
d) use the Materials in any way that is against the law or to create or propagate malware; or
|
15 |
+
e) share, publish, distribute or lend the Materials, provide the Materials as a stand-alone hosted solution for others to use, or transfer the Materials or this agreement to any third party.
|
16 |
+
3) PERSONAL DATA. If the data (set forth in Section 1(c) above) includes or is found to include any data that enables any ability to identify an individual (“Personal Data”), you will not use such Personal Data for any purpose other than was authorized and consented to by the data subject/research participant. You will not use Personal Data to contact any person. You will keep Personal Data in strict confidence. You will not share any Personal Data that is collected or in your possession with any third party for any reason and as required under the original consent agreement. Further, you will destroy the Personal Data and any backup or copies, immediately upon the completion of your research.
|
17 |
+
4) LICENSE TO MICROSOFT. Notwithstanding the limitations in Section 1, you may distribute your modifications back to Microsoft, and if you do provide Microsoft with modifications of the Materials, you hereby grant Microsoft, without any restrictions or limitations, a non-exclusive, perpetual, irrevocable, royalty-free, assignable and sub-licensable license, to reproduce, publicly perform or display, install, use, modify, post, distribute, make and have made, sell and transfer such modifications and derivatives for any purpose.
|
18 |
+
5) PUBLICATION. You may publish (or present papers or articles) on your results from using the Materials provided that no material or substantial portion of the Materials is included in any such publication or presentation.
|
19 |
+
6) FEEDBACK. Any feedback about the Materials provided by you to us is voluntarily given, and Microsoft shall be free to use the feedback as it sees fit without obligation or restriction of any kind, even if the feedback is designated by you as confidential. Such feedback shall be considered a contribution and licensed to Microsoft under the terms of Section 4 above.
|
20 |
+
7) COMPLIANCE WITH TRADE LAWS. You acknowledge that the Materials may be subject to applicable trade laws in one or more countries. You will comply with all relevant laws and regulations applicable to the import or export of the Materials, including but not limited to, trade laws such as the U.S. Export Administration Regulations or other end-user, end use, and destination restrictions by the U.S. and other governments, as well as sanctions regulations administered by the U.S. Office of Foreign Assets Control. Microsoft may suspend or terminate the agreement immediately to the extent that Microsoft reasonably concludes that continued performance would violate trade laws or put it at risk of becoming subject to sanctions or penalties under trade laws. For additional information, see www.microsoft.com/exporting.
|
21 |
+
8) SUPPORT SERVICES. Microsoft is not obligated under this agreement to provide any support services for the Materials. Any support provided is “as is”, “with all faults”, and without warranty of any kind.
|
22 |
+
9) BINDING ARBITRATION AND CLASS ACTION WAIVER. This Section applies if you live in (or, if a business, your principal place of business is in) the United States. If you and Microsoft have a dispute, you and Microsoft agree to try for 60 days to resolve it informally. If you and Microsoft can’t, you and Microsoft agree to binding individual arbitration before the American Arbitration Association under the Federal Arbitration Act (“FAA”), and not to sue in court in front of a judge or jury. Instead, a neutral arbitrator will decide. Class action lawsuits, class-wide arbitrations, private attorney-general actions, and any other proceeding where someone acts in a representative capacity are not allowed; nor is combining individual proceedings without the consent of all parties. The complete Arbitration Agreement contains more terms and is at aka.ms/arb-agreement-1. You and Microsoft agree to these terms.
|
23 |
+
10) ENTIRE AGREEMENT. This agreement, and any other terms Microsoft may provide for supplements, updates, or third-party applications, is the entire agreement for the Materials.
|
24 |
+
11) APPLICABLE LAW AND PLACE TO RESOLVE DISPUTES. If you acquired the Materials in the United States or Canada, the laws of the state or province where you live (or, if a business, where your principal place of business is located) govern the interpretation of this agreement, claims for its breach, and all other claims (including consumer protection, unfair competition, and tort claims), regardless of conflict of laws principles, except that the FAA governs everything related to arbitration. If you acquired the Materials in any other country, its laws apply, except that the FAA governs everything related to arbitration. If U.S. federal jurisdiction exists, you and Microsoft consent to exclusive jurisdiction and venue in the federal court in King County, Washington for all disputes heard in court (excluding arbitration). If not, you and Microsoft consent to exclusive jurisdiction and venue in the Superior Court of King County, Washington for all disputes heard in court (excluding arbitration).
|
25 |
+
12) CONSUMER RIGHTS; REGIONAL VARIATIONS. This agreement describes certain legal rights. You may have other rights, including consumer rights, under the laws of your state, province, or country. Separate and apart from your relationship with Microsoft, you may also have rights with respect to the party from which you acquired the Materials. This agreement does not change those other rights if the laws of your state, province, or country do not permit it to do so. For example, if you acquired the Materials in one of the below regions, or mandatory country law applies, then the following provisions apply to you:
|
26 |
+
a) Australia. You have statutory guarantees under the Australian Consumer Law and nothing in this agreement is intended to affect those rights.
|
27 |
+
b) Canada. If you acquired this software in Canada, you may stop receiving updates by turning off the automatic update feature, disconnecting your device from the Internet (if and when you re-connect to the Internet, however, the Materials will resume checking for and installing updates), or uninstalling the Materials. The product documentation, if any, may also specify how to turn off updates for your specific device or software.
|
28 |
+
c) Germany and Austria.
|
29 |
+
i. Warranty. The properly licensed software will perform substantially as described in any Microsoft materials that accompany the Materials. However, Microsoft gives no contractual guarantee in relation to the licensed software.
|
30 |
+
ii. Limitation of Liability. In case of intentional conduct, gross negligence, claims based on the Product Liability Act, as well as, in case of death or personal or physical injury, Microsoft is liable according to the statutory law.
|
31 |
+
Subject to the foregoing clause (ii), Microsoft will only be liable for slight negligence if Microsoft is in breach of such material contractual obligations, the fulfillment of which facilitate the due performance of this agreement, the breach of which would endanger the purpose of this agreement and the compliance with which a party may constantly trust in (so-called "cardinal obligations"). In other cases of slight negligence, Microsoft will not be liable for slight negligence.
|
32 |
+
13) DISCLAIMER OF WARRANTY. THE MATERIALS ARE LICENSED “AS IS.” YOU BEAR THE RISK OF USING THEM. MICROSOFT GIVES NO EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS. TO THE EXTENT PERMITTED UNDER APPLICABLE LAWS, MICROSOFT EXCLUDES ALL IMPLIED WARRANTIES, INCLUDING MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.
|
33 |
+
|
34 |
+
14) LIMITATION ON AND EXCLUSION OF DAMAGES. IF YOU HAVE ANY BASIS FOR RECOVERING DAMAGES DESPITE THE PRECEDING DISCLAIMER OF WARRANTY, YOU CAN RECOVER FROM MICROSOFT AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO U.S. $5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.
|
35 |
+
This limitation applies to (a) anything related to the Materials, services, content (including code) on third party Internet sites, or third party applications; and (b) claims for breach of contract, warranty, guarantee, or condition; strict liability, negligence, or other tort; or any other claim; in each case to the extent permitted by applicable law.
|
36 |
+
It also applies even if Microsoft knew or should have known about the possibility of the damages. The above limitation or exclusion may not apply to you because your state, province, or country may not allow the exclusion or limitation of incidental, consequential, or other damages.
|
README.md
CHANGED
@@ -1,5 +1,251 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
-
license_name: microsoft-research
|
4 |
-
license_link: LICENSE
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: other
|
|
|
|
|
3 |
---
|
4 |
+
|
5 |
+
# LLaVA-Rad
|
6 |
+
|
7 |
+
This is the official model checkpoint for **LLaVA-Rad**, described in “A Clinically Accessible Small Multimodal Radiology Model and Evaluation Metric for Chest X-Ray
|
8 |
+
Findings”.
|
9 |
+
|
10 |
+
| | |
|
11 |
+
|:---------|:--------|
|
12 |
+
| **Developed by** | Microsoft Research |
|
13 |
+
| **Model type** | [small multimodal transformer model](#model-architecture) |
|
14 |
+
| **Languages** | English |
|
15 |
+
| **License** | [](LICENSE) |
|
16 |
+
| **Data** | [](https://physionet.org/content/llava-rad-mimic-cxr-annotation/1.0.0/) |
|
17 |
+
| **Code** | [](https://github.com/microsoft/llava-rad) |
|
18 |
+
| **Evaluation** | [](https://github.com/microsoft/chexprompt) |
|
19 |
+
| **Preprint** | [](https://arxiv.org/abs/2403.08002) |
|
20 |
+
| **Peer Reviewed Paper** | []() |
|
21 |
+
|
22 |
+
|
23 |
+
LlaVA-Rad is a 7 billion parameter small multimodal model trained to produce findings given an input chest X-ray. Its architecture follows that of [LLaVA](https://arxiv.org/abs/2310.03744) and [LLaVA-Med](https://arxiv.org/abs/2306.00890), differing in the use of a specialized chest X-ray image encoder, BiomedCLIP-CXR, built with the [BiomedCLIP](https://arxiv.org/abs/2303.00915) framework. LLaVA-Rad offers outstanding performance at relatively small model size.
|
24 |
+
|
25 |
+
|
26 |
+
<p align="center">
|
27 |
+
<img src="results_top.jpg" alt="Evaluation Overview" width="400" height="400">
|
28 |
+
</p>
|
29 |
+
|
30 |
+
## Contents
|
31 |
+
- [Usage](#usage)
|
32 |
+
- [Installation](#installation)
|
33 |
+
- [Example inference](#example-inference)
|
34 |
+
- [Usage and License Notices](#usage-and-license-notices)
|
35 |
+
- [Ethical Considerations and Limitations](#ethical-considerations-and-limitations)
|
36 |
+
- [LLaVA-Rad Basics](#llava-rad-basics)
|
37 |
+
- [Abstract](#abstract)
|
38 |
+
- [Model Architecture](#model-architecture)
|
39 |
+
- [Evaluation Results](#evaluation-results)
|
40 |
+
- [Citation](#citation)
|
41 |
+
- [Further Information](#further-information)
|
42 |
+
- [Data Specification](#data-specification)
|
43 |
+
- [Related Models](#related-models)
|
44 |
+
|
45 |
+
|
46 |
+
|
47 |
+
## Usage
|
48 |
+
|
49 |
+
### Installation
|
50 |
+
Follow these steps to set up LLaVA-Rad:
|
51 |
+
|
52 |
+
1. Clone the repository and navigate to the project folder:
|
53 |
+
```Shell
|
54 |
+
git clone https://github.com/microsoft/llava-rad.git
|
55 |
+
cd llava-rad
|
56 |
+
```
|
57 |
+
2. Create and activate a virtual environment (Python 3.10):
|
58 |
+
```Shell
|
59 |
+
conda create -n llavarad python=3.10 -y
|
60 |
+
conda activate llavarad
|
61 |
+
```
|
62 |
+
3. Upgrade pip and install the package:
|
63 |
+
```Shell
|
64 |
+
pip install --upgrade pip
|
65 |
+
pip install -e .
|
66 |
+
```
|
67 |
+
|
68 |
+
|
69 |
+
### Example inference
|
70 |
+
|
71 |
+
```python
|
72 |
+
import requests
|
73 |
+
import torch
|
74 |
+
from PIL import Image
|
75 |
+
from io import BytesIO
|
76 |
+
|
77 |
+
from llava.constants import IMAGE_TOKEN_INDEX
|
78 |
+
from llava.conversation import conv_templates
|
79 |
+
from llava.model.builder import load_pretrained_model
|
80 |
+
from llava.utils import disable_torch_init
|
81 |
+
from llava.mm_utils import tokenizer_image_token, KeywordsStoppingCriteria
|
82 |
+
|
83 |
+
|
84 |
+
def load_image(image_file):
|
85 |
+
if image_file.startswith('http') or image_file.startswith('https'):
|
86 |
+
response = requests.get(image_file)
|
87 |
+
image = Image.open(BytesIO(response.content)).convert('RGB')
|
88 |
+
else:
|
89 |
+
image = Image.open(image_file).convert('RGB')
|
90 |
+
return image
|
91 |
+
|
92 |
+
# Model
|
93 |
+
disable_torch_init()
|
94 |
+
|
95 |
+
model_path = "microsoft/llava-rad"
|
96 |
+
model_base = "lmsys/vicuna-7b-v1.5"
|
97 |
+
model_name = "llavarad"
|
98 |
+
conv_mode = "v1"
|
99 |
+
|
100 |
+
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, model_base, model_name)
|
101 |
+
|
102 |
+
# Prepare query
|
103 |
+
image_file = "https://openi.nlm.nih.gov/imgs/512/253/253/CXR253_IM-1045-1001.png" # CXR w pneumothorax from Open-I
|
104 |
+
query = "<image>\nDescribe the findings of the chest x-ray.\n"
|
105 |
+
|
106 |
+
conv = conv_templates[conv_mode].copy()
|
107 |
+
conv.append_message(conv.roles[0], query)
|
108 |
+
conv.append_message(conv.roles[1], None)
|
109 |
+
prompt = conv.get_prompt()
|
110 |
+
|
111 |
+
image = load_image(image_file)
|
112 |
+
image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0].half().unsqueeze(0).cuda()
|
113 |
+
|
114 |
+
input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).cuda()
|
115 |
+
|
116 |
+
stopping_criteria = KeywordsStoppingCriteria(["</s>"], tokenizer, input_ids)
|
117 |
+
|
118 |
+
with torch.inference_mode():
|
119 |
+
output_ids = model.generate(
|
120 |
+
input_ids,
|
121 |
+
images=image_tensor,
|
122 |
+
do_sample=False,
|
123 |
+
temperature=0.0,
|
124 |
+
max_new_tokens=1024,
|
125 |
+
use_cache=True,
|
126 |
+
stopping_criteria=[stopping_criteria])
|
127 |
+
|
128 |
+
outputs = tokenizer.batch_decode(output_ids[:, input_ids.shape[1]:], skip_special_tokens=True)[0]
|
129 |
+
outputs = outputs.strip()
|
130 |
+
print(outputs)
|
131 |
+
# Large left pneumothorax is present with apical pneumothorax component
|
132 |
+
# measuring approximately 3.4 cm in craniocaudal dimension, and a basilar
|
133 |
+
# component overlying the left hemidiaphragm, with visceral pleural line just
|
134 |
+
# below the left seventh posterior rib. Cardiomediastinal contours are normal.
|
135 |
+
# The lungs are clear. No pleural effusion.
|
136 |
+
```
|
137 |
+
|
138 |
+
### Usage and License Notices
|
139 |
+
The model’s intended use is to generate draft findings of chest X-ray images in English. It is provided for reproducibility and to enable further research.
|
140 |
+
|
141 |
+
**The data, code, and model checkpoints are licensed and intended for research use only.** The code and model checkpoints are subject to additional restrictions as determined by the Terms of Use of LLaMA, Vicuna, and GPT-4 respectively. Code and model checkpoints may be used for **research purposes only** and should **not** be used in direct clinical care or for any clinical decision making purpose. You bear sole responsibility for use of the code and model checkpoints, including incorporation into any product used for clinical purposes.
|
142 |
+
|
143 |
+
### Ethical Considerations and Limitations
|
144 |
+
|
145 |
+
|
146 |
+
Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices to help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse.
|
147 |
+
While testing the model with images and/or text, ensure that the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.
|
148 |
+
|
149 |
+
#### Out-of-scope use
|
150 |
+
The model is NOT designed for the following use cases:
|
151 |
+
* Use by clinicians to inform clinical decision-making, as a diagnostic tool or as a medical device. Although LLaVA-Rad attains state-of-the-art performance, it is not designed or intended to be deployed in clinical settings as-is nor is it for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions (including to support clinical decision-making), or as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional.
|
152 |
+
* Scenarios without consent for data - Any scenario that uses health data for a purpose for which consent was not obtained.
|
153 |
+
* Use outside of health scenarios - Any scenario that uses non-medical related image and/or serving purposes outside of the healthcare domain.
|
154 |
+
Please see Microsoft's Responsible AI Principles and approach available at https://www.microsoft.com/en-us/ai/principles-and-approach/
|
155 |
+
|
156 |
+
#### Data provenance and potential biases
|
157 |
+
BiomedCLIP-CXR was trained with datasets containing chest X-ray images acquired from adult patients in the United States (Stanford Hospital in CA, Beth Israel Deaconess Medical Center in MA), New Zealand, Brazil, Vietnam, Spain, and China. Synthetic free-text reports were generated for datasets without readily available corresponding reports, or translated from other languages such as Spanish into English.
|
158 |
+
|
159 |
+
The projector and decoder layers of LLaVA-Rad were trained using MIMIC-CXR data, containing English only reports for patients from Boston, MA. Please see [our manuscript](https://arxiv.org/abs/2403.08002v5) for references to individual publications that describe demographic characteristics of patients from whom chest X-ray images were obtained in detail.
|
160 |
+
|
161 |
+
Radiology reporting styles can vary within and across health systems and regions. This may impact the generalizability of the model to unseen populations and as such the model should always be tested on the intended deployment population.
|
162 |
+
|
163 |
+
This model may produce errors in the generated findings (see CheXprompt evaluation and other results in [manuscript](https://arxiv.org/pdf/2403.08002) for details).
|
164 |
+
|
165 |
+
## LLaVA-Rad Basics
|
166 |
+
|
167 |
+
|
168 |
+
### Abstract
|
169 |
+
The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics.
|
170 |
+
This work aims to address these challenges by training small multimodal models (SMMs) to bridge competency gaps for unmet clinical need in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-art pre-trained models for image and text modalities, and focusing on training a lightweight adapter to ground each modality to the text embedding space. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
|
171 |
+
LLaVA-Rad (7B) model attains state-of-the-art results on standard radiology tasks such as report generation and cross-modal retrieval, even outperforming much larger models such as GPT-4V and Med-PaLM M (84B). Moreover, LLaVA-Rad requires only one day to be trained on over 697 thousand image-text pairs using a standard 8-A100 GPU cluster, allowing further fine-tuning by clinicians using their own data. The inference of LLaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
|
172 |
+
|
173 |
+
|
174 |
+
### Model Architecture
|
175 |
+
LLaVA-Rad follows the LLaVA v1.5 architecture, which employs an image encoder, a transformer-based language decoder, and a multilayer perceptron connector. For the image encoder, it uses a custom model named BiomedCLIP-CXR, based on [BiomedCLIP](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224). For the language decoder it uses [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5).
|
176 |
+
|
177 |
+
### Evaluation Results
|
178 |
+
|
179 |
+

|
180 |
+
|
181 |
+
To reproduce the results above, please follow the steps outlined in the official code implementation: [setup](https://github.com/microsoft/llava-rad#requirements), [inference and eval](https://github.com/microsoft/llava-rad#inference) .
|
182 |
+
|
183 |
+
Please see the paper for detailed information about methods and results: https://arxiv.org/pdf/2412.10337.
|
184 |
+
|
185 |
+
You may also find the CheXprompt library useful for evaluation of generated reports. It is accessible at https://github.com/microsoft/chexprompt.
|
186 |
+
|
187 |
+
To use this model to reproduce results in the manuscript, follow these steps using the [supported code](https://github.com/microsoft/llava-rad):
|
188 |
+
|
189 |
+
1. Preparation
|
190 |
+
Before running the commands below, you need to have the data, image folder, and the above checkpoints ready.
|
191 |
+
1. Text Data
|
192 |
+
|
193 |
+
To download the data, sign the data use agreement and follow the instructions for download at LLaVA-Rad MIMIC-CXR Annotations on PhysioNet. This will include reports with extracted sections in LLaVA format, split into train/dev/test.
|
194 |
+
|
195 |
+
2. Images
|
196 |
+
|
197 |
+
You need to download the MIMIC-CXR-JPG images from PhysioNet by signing the data use agreement and following the instructions.
|
198 |
+
|
199 |
+
3. Model weights
|
200 |
+
You can find the pretrained model weights for BiomedCLIP-CXR and LLaVA-Rad at https://huggingface.co/microsoft/llava-rad (this repository).
|
201 |
+
|
202 |
+
|
203 |
+
Before proceeding, change the paths in the scripts below according to where you downloaded the data.
|
204 |
+
Batch size is set for 4-GPU machines. If your machine has a difference number of GPUs, please change batch size.
|
205 |
+
|
206 |
+
2. Inference
|
207 |
+
|
208 |
+
To perform inference, update the paths in the evaluation script, found in the [official codebase](https://github.com/microsoft/llava-rad).
|
209 |
+
|
210 |
+
```python
|
211 |
+
bash scripts/eval.sh
|
212 |
+
```
|
213 |
+
|
214 |
+
3. Evaluation
|
215 |
+
|
216 |
+
If you have run inference using multiple GPUs and have a resulting set of chunks with results, make sure you concatenate prediction chunks into a single file before running the following command:
|
217 |
+
|
218 |
+
```python
|
219 |
+
cd llava/eval/rr_eval
|
220 |
+
python run.py ${YOUR_PREDICTION_FILE}
|
221 |
+
```
|
222 |
+
|
223 |
+
|
224 |
+
|
225 |
+
### Citation
|
226 |
+
Please cite our paper if you use the code, model, or data.
|
227 |
+
|
228 |
+
```
|
229 |
+
Zambrano Chaves JM, Huang S-C, Xu Y, Xu H, Usuyama N, Zhang S, et al. Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation. arXiv preprint arXiv:2403.08002. 2024.
|
230 |
+
```
|
231 |
+
|
232 |
+
Bibtex:
|
233 |
+
```bibtex
|
234 |
+
@article{zambranochaves2024llavarad,
|
235 |
+
title = {Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation},
|
236 |
+
author = {Zambrano Chaves, JM and Huang, S-C and Xu, Y and Xu, H and Usuyama, N and Zhang, S, et al.},
|
237 |
+
journal = {arXiv preprint arXiv:2403.08002},
|
238 |
+
year = {2024},
|
239 |
+
url = {https://arxiv.org/pdf/2403.08002}
|
240 |
+
}
|
241 |
+
```
|
242 |
+
|
243 |
+
|
244 |
+
## Further Information
|
245 |
+
|
246 |
+
### Data Specification
|
247 |
+
Additional details regarding the data used can be found in the corresponding data release, LLaVA-Rad MIMIC-CXR Annotations: https://physionet.org/content/llava-rad-mimic-cxr-annotation/1.0.0/
|
248 |
+
|
249 |
+
### Related Models
|
250 |
+
CxrReportGen (https://aka.ms/CXRReportGenModelCard) offers similar functionality based on the [MAIRA-2](https://arxiv.org/abs/2406.04449) framework.
|
251 |
+
|
adapter_config.json
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"auto_mapping": null,
|
3 |
+
"base_model_name_or_path": "lmsys/vicuna-7b-v1.5",
|
4 |
+
"bias": "none",
|
5 |
+
"fan_in_fan_out": false,
|
6 |
+
"inference_mode": true,
|
7 |
+
"init_lora_weights": true,
|
8 |
+
"layers_pattern": null,
|
9 |
+
"layers_to_transform": null,
|
10 |
+
"lora_alpha": 128,
|
11 |
+
"lora_dropout": 0.05,
|
12 |
+
"modules_to_save": null,
|
13 |
+
"peft_type": "LORA",
|
14 |
+
"r": 64,
|
15 |
+
"revision": null,
|
16 |
+
"target_modules": [
|
17 |
+
"k_proj",
|
18 |
+
"down_proj",
|
19 |
+
"gate_proj",
|
20 |
+
"o_proj",
|
21 |
+
"v_proj",
|
22 |
+
"q_proj",
|
23 |
+
"up_proj"
|
24 |
+
],
|
25 |
+
"task_type": "CAUSAL_LM"
|
26 |
+
}
|
adapter_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b8d0649a878cff912d0347775d047fec1215e22c1aed740454d2f32165331103
|
3 |
+
size 319971402
|
biomedclipcxr_518.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"embed_dim": 512,
|
3 |
+
"vision_cfg": {
|
4 |
+
"timm_model_name": "vit_base_patch14_dinov2.lvd142m",
|
5 |
+
"timm_model_pretrained": false,
|
6 |
+
"timm_pool": "",
|
7 |
+
"timm_proj": "linear",
|
8 |
+
"image_size": 518
|
9 |
+
},
|
10 |
+
"text_cfg": {
|
11 |
+
"hf_model_name": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract",
|
12 |
+
"hf_tokenizer_name": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract",
|
13 |
+
"hf_proj_type": "mlp",
|
14 |
+
"hf_pooler_type": "cls_last_hidden_state_pooler",
|
15 |
+
"context_length": 256
|
16 |
+
}
|
17 |
+
}
|
biomedclipcxr_518_checkpoint.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:698b1afcaf07365ec60bcaba3f1e96f556f5c40e5f405db424bb4a9221144319
|
3 |
+
size 2363054683
|
config.json
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "lmsys/vicuna-7b-v1.5",
|
3 |
+
"architectures": [
|
4 |
+
"LlamaForCausalLM"
|
5 |
+
],
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"eos_token_id": 2,
|
8 |
+
"freeze_mm_mlp_adapter": false,
|
9 |
+
"hidden_act": "silu",
|
10 |
+
"hidden_size": 4096,
|
11 |
+
"image_aspect_ratio": "square",
|
12 |
+
"image_grid_pinpoints": null,
|
13 |
+
"initializer_range": 0.02,
|
14 |
+
"intermediate_size": 11008,
|
15 |
+
"max_position_embeddings": 4096,
|
16 |
+
"mm_hidden_size": 768,
|
17 |
+
"mm_projector_type": "mlp2x_gelu",
|
18 |
+
"mm_use_im_patch_token": false,
|
19 |
+
"mm_use_im_start_end": false,
|
20 |
+
"mm_vision_select_feature": "patch",
|
21 |
+
"mm_vision_select_layer": -2,
|
22 |
+
"mm_vision_tower": "biomedclip_cxr_518",
|
23 |
+
"mm_vision_tower_checkpoint": "biomedclipcxr_518_checkpoint.pt",
|
24 |
+
"mm_vision_tower_config": "llava/model/multimodal_encoder/open_clip_encoder/model_configs/biomedclipcxr_518.json",
|
25 |
+
"model_type": "llava",
|
26 |
+
"num_attention_heads": 32,
|
27 |
+
"num_hidden_layers": 32,
|
28 |
+
"num_key_value_heads": 32,
|
29 |
+
"pad_token_id": 0,
|
30 |
+
"pretraining_tp": 1,
|
31 |
+
"rms_norm_eps": 1e-05,
|
32 |
+
"rope_scaling": null,
|
33 |
+
"tie_word_embeddings": false,
|
34 |
+
"torch_dtype": "float16",
|
35 |
+
"transformers_version": "4.31.0",
|
36 |
+
"tune_mm_mlp_adapter": false,
|
37 |
+
"use_cache": true,
|
38 |
+
"use_mm_proj": true,
|
39 |
+
"vocab_size": 32000
|
40 |
+
}
|
non_lora_trainables.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1b87a97a84e803896db2720857d71f1a782f4f108c9f2d5eba06cb1ba660f9f8
|
3 |
+
size 39864496
|
results.jpg
ADDED
![]() |
results_top.jpg
ADDED
![]() |