--- title: "LogSAD: Training-free Anomaly Detection with Vision and Language Foundation Models" tags: - computer-vision - anomaly-detection - foundation-models - zero-shot - training-free license: mit library_name: pytorch pipeline_tag: zero-shot-object-detection arxiv: 2503.18325 --- # Towards Training-free Anomaly Detection with Vision and Language Foundation Models (CVPR 2025)
## System Requirements **Hardware Requirements:** - **GPU Memory:** 32GB VRAM (for running complete experiments) - **Storage:** 70GB free disk space (for models, datasets, and results) - **CUDA:** Compatible GPU with CUDA 12.1 support **Software Requirements:** - Python 3.10 - Conda (recommended for environment management) - CUDA 12.1 runtime > **Note:** The memory and storage requirements are for running the full experimental pipeline on all categories with visualization enabled. Smaller experiments on individual categories may require less resources. ## Installation ### Automated Setup (Recommended) Run the setup script to automatically configure the complete environment: ```bash bash scripts/setup_environment.sh ``` This script will: - Create a conda environment named `logsad` with Python 3.10 - Install PyTorch with CUDA 12.1 support - Install all required dependencies from `requirements.txt` - Configure numpy compatibility ### Manual Setup If you prefer manual setup, download the checkpoint for [ViT-H SAM model](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) and put it in the checkpoint folder. After installation, activate the environment: ```bash conda activate logsad ``` ## Instructions for MVTEC LOCO dataset ### Quick Start (Recommended) Run evaluation for all categories using the provided shell scripts: **Few-shot Protocol:** ```bash bash scripts/run_few_shot.sh ``` **Full-data Protocol:** ```bash bash scripts/run_full_data.sh ``` ### Manual Execution #### Few-shot Protocol Run the script for few-shot protocal: ``` python evaluation.py --module_path model_ensemble_few_shot --category CATEGORY --dataset_path DATASET_PATH ``` #### Full-data Protocol Run the script to compute coreset for full-data scenarios: ``` python compute_coreset.py --module_path model_ensemble --category CATEGORY --dataset_path DATASET_PATH ``` Run the script for full-data protocol: ``` python evaluation.py --module_path model_ensemble --category CATEGORY --dataset_path DATASET_PATH ``` **Available categories:** breakfast_box, juice_bottle, pushpins, screw_bag, splicing_connectors ## Acknowledgement We are grateful for the following awesome projects when implementing LogSAD: * [SAM](https://github.com/facebookresearch/segment-anything), [OpenCLIP](https://github.com/mlfoundations/open_clip), [DINOv2](https://github.com/facebookresearch/dinov2) and [NACLIP](https://github.com/sinahmr/NACLIP). ## Citation If you find our paper is helpful in your research or applications, generously cite with ``` @inproceedings{zhang2025logsad, title={Towards Training-free Anomaly Detection with Vision and Language Foundation Models}, author={Jinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang}, year={2025}, booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, } ```