--- license: gpl-3.0 language: - en base_model: - lmms-lab/llava-onevision-qwen2-7b-ov pipeline_tag: image-text-to-text tags: - radioastronomy --- # radiollava-7b-qa https://arxiv.org/abs/2503.23859 radiollava is a domain-specialized vision-language AI assistant tailored for research in radioastronomy, in particular for running radio source analysis tasks on radio-continuum images. It was trained on ~1.5M user-assistant conversations relative to ~55k radio images taken from various radio surveys, including ASKAP-EMU, MeerKAT SMGPS and VLA FIRST. ## Model Details - **Base Architecture**: llava-onevision - **Base Model**: llava-onevision-qwen2-7b-ov - **Parameters**: 7 billion - **Domain**: Radio Astronomy - **License**: GPL 3.0 License - **Development Process**: Supervised Fine-tuning (SFT) on QA pairs ## Using the model To use this model, you need to install LLaVA-NeXT as described in this repository: `https://github.com/LLaVA-VL/LLaVA-NeXT` LLaVA-NeXT requires an outdated version of the `transformers` library (v4.40.0). To load the model: ```python from llava.model.builder import load_pretrained_model tokenizer, model, image_processor, max_length = load_pretrained_model( model_name_or_path="inaf-oact-ai/radiollava-7b-qa", model_base=None, model_name="llava_qwen", device_map="auto" ) ``` To run model inference on an input image: ```python import torch from PIL import Image from llava.model.builder import load_pretrained_model from llava.mm_utils import process_images, tokenizer_image_token from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN from llava.conversation import conv_templates # - Load model tokenizer, model, image_processor, max_length = load_pretrained_model( model_name_or_path="inaf-oact-ai/radiollava-7b-qa", model_base=None, model_name="llava_qwen", device_map="auto" ) # - Load image image_path= ... image= Image.fromarray(data).convert("RGB") # - Process image image_tensor = process_images([image], image_processor, model.config) image_tensor = [_image.to(dtype=torch.float16, device=model.device) for _image in image_tensor] # - Create prompt query= "Describe the input image" # Replace it with your query question = DEFAULT_IMAGE_TOKEN + "\n" + query conv = copy.deepcopy(conv_templates[conv_template]) conv.system= '<|im_start|>system\nYou are an AI assistant specialized in radio astronomical topics.' conv.append_message(conv.roles[0], question) conv.append_message(conv.roles[1], None) prompt_question = conv.get_prompt() # - Create model inputs input_ids = tokenizer_image_token( prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt" ).unsqueeze(0).to(model.device) image_sizes = [image.size] # - Generate model response # Change generation parameters as you wish do_sample=True temperature= 0.3 max_new_tokens=4096 output = model.generate( input_ids, images=image_tensor, image_sizes=image_sizes, do_sample=do_sample, temperature=temperature if do_sample else None, max_new_tokens=max_new_tokens, ) output_parsed= tokenizer.decode( output[0], skip_special_tokens=True, clean_up_tokenization_spaces=False ) # - Process response as you wish ... #response= output_parsed.strip("\n").strip() ``` See the tutorials available in the LLaVA-NeXT repository: `https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb` Further usage examples are provided in this repository: `https://github.com/SKA-INAF/radio-llava.git`