--- license: mit base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned tags: - text-embeddings - sentence-transformers - llm2vec - medical - chest-xray - radiology - clinical-nlp language: - en pipeline_tag: feature-extraction library_name: transformers --- # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis This model is a fine-tuned version of [microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned](https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned) specifically optimized for chest X-ray report analysis and medical text understanding. ## Model Description LLM2Vec4CXR is a bidirectional language model that converts the base decoder-only LLM into a text encoder optimized for medical text embeddings. The model has been fully fine-tuned with modified pooling strategy (`latent_attention`) to better capture semantic relationships in chest X-ray reports. ### Key Features - **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct - **Pooling Mode**: Latent Attention (modified from original) - **Bidirectional Processing**: Enabled for better context understanding - **Medical Domain**: Specialized for chest X-ray report analysis - **Max Length**: 512 tokens - **Precision**: bfloat16 ## Training Details ### Training Data - Fully fine-tuned on chest X-ray reports and medical text data - Training focused on understanding pleural effusion status and other chest X-ray findings ### Training Configuration - **Pooling Mode**: `latent_attention` (modified from base model) - **Enable Bidirectional**: True - **Max Length**: 512 - **Torch Dtype**: bfloat16 - **Full Fine-tuning**: All model weights were updated during training ## Usage ### Installation ```bash pip install torch transformers # Also requires the LLM2Vec wrapper - see the original repository for installation ``` ### Basic Usage ```python import torch import torch.nn.functional as F from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec # Load the model model = LLM2Vec.from_pretrained( base_model_name_or_path='lukeingawesome/llm2vec4cxr', enable_bidirectional=True, pooling_mode="latent_attention", max_length=512, torch_dtype=torch.bfloat16, ) # Configure tokenizer tokenizer = model.tokenizer tokenizer.padding_side = 'left' # Example usage for chest X-ray report analysis def encode_text(text): inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512) with torch.no_grad(): embeddings = model(inputs) return embeddings # Example with medical text report = "There is a small increase in the left-sided effusion. There continues to be volume loss at both bases." embedding = encode_text(report) ``` ### Advanced Usage with Separator-based Processing The model supports special separator-based processing for instruction-following tasks: ```python def tokenize_with_separator(texts, tokenizer, max_length): """Tokenize texts with special handling for separator-based splitting.""" texts_2 = [] original_texts = [] separator = '!@#$%^&*()' for text in texts: parts = text.split(separator) texts_2.append(parts[1] if len(parts) > 1 else "") original_texts.append("".join(parts)) tokenized = tokenizer( original_texts, return_tensors="pt", padding=True, truncation=True, max_length=max_length, ) # Create embedding masks for the separated parts embed_mask = None for t_i, t in enumerate(texts_2): ids = tokenizer([t], return_tensors="pt", padding=True, truncation=True, max_length=max_length, add_special_tokens=False) e_m = torch.zeros_like(tokenized["attention_mask"][t_i]) if len(ids["input_ids"][0]) > 0: e_m[-len(ids["input_ids"][0]):] = torch.ones(len(ids["input_ids"][0])) if embed_mask is None: embed_mask = e_m.unsqueeze(0) else: embed_mask = torch.cat((embed_mask, e_m.unsqueeze(0)), dim=0) tokenized["embed_mask"] = embed_mask return tokenized # Example with instruction and report separator = '!@#$%^&*()' instruction = 'Determine the change or the status of the pleural effusion.' report = 'There is a small increase in the left-sided effusion.' text = instruction + separator + report tokenized = tokenize_with_separator([text], tokenizer, 512) embedding = model(tokenized) ``` ## Evaluation The model has been evaluated on chest X-ray report analysis tasks, particularly for: - Pleural effusion status determination - Medical text similarity comparison - Clinical finding extraction ### Sample Performance The model shows improved performance compared to the base model on medical text understanding tasks, particularly in distinguishing between different pleural effusion states and medical abbreviations. ## Intended Use ### Primary Use Cases - **Medical Text Embeddings**: Generate embeddings for chest X-ray reports - **Clinical Text Similarity**: Compare medical texts for semantic similarity - **Medical Information Retrieval**: Find relevant medical reports or findings - **Clinical NLP Research**: Foundation model for medical text analysis ### Limitations - Specialized for chest X-ray reports - may not generalize to other medical domains - Requires careful preprocessing for optimal performance - Should be used as part of a larger clinical decision support system, not for standalone diagnosis ## Technical Specifications - **Model Type**: Bidirectional Language Model (LLM2Vec) - **Architecture**: LlamaBiModel (modified Llama 3.2) - **Parameters**: ~1B parameters - **Input Length**: Up to 512 tokens - **Output**: Dense embeddings - **Precision**: bfloat16 ## Citation If you use this model in your research, please cite: ```bibtex @misc{llm2vec4cxr, title={LLM2Vec4CXR: Fine-tuned Language Model for Chest X-ray Report Analysis}, author={[Your Name]}, year={2024}, howpublished={\\url{https://huggingface.co/lukeingawesome/llm2vec4cxr}}, } ``` ## Acknowledgments This model is built upon: - [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders - [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models - [microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned](https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned) - Base model ## License This model is licensed under the MIT License.