--- license: apache-2.0 tags: - multimodal - vision-language - video understanding - spatial reasoning - visuospatial cognition - llava - qwen - llava-video datasets: - nkkbr/ViCA-322K - nkkbr/ViCA-thinking-2.68k language: - en library_name: transformers pipeline_tag: video-text-to-text model_name: ViCA-ScanNetPP-7B base_model: lmms-lab/LLaVA-Video-7B-Qwen2 --- ## Usage and Full Documentation For detailed model description, training setup, datasets, evaluation results, and inference code, **please refer to the main ViCA-7B README**: [**nkkbr/ViCA**](https://huggingface.co/nkkbr/ViCA)