---
title: "LogSAD: Training-free Anomaly Detection with Vision and Language Foundation Models"
tags:
  - computer-vision
  - anomaly-detection
  - foundation-models
  - zero-shot
  - training-free
license: mit
library_name: pytorch
pipeline_tag: zero-shot-object-detection
arxiv: 2503.18325
---

# Towards Training-free Anomaly Detection with Vision and Language Foundation Models (CVPR 2025)

<div>
  <a href="https://arxiv.org/abs/2503.18325"><img src="https://img.shields.io/static/v1?label=Arxiv&message=LogSAD&color=red&logo=arxiv"></a> &ensp;
</div>

## System Requirements

**Hardware Requirements:**
- **GPU Memory:** 32GB VRAM (for running complete experiments)
- **Storage:** 70GB free disk space (for models, datasets, and results)
- **CUDA:** Compatible GPU with CUDA 12.1 support

**Software Requirements:**
- Python 3.10
- Conda (recommended for environment management)
- CUDA 12.1 runtime

> **Note:** The memory and storage requirements are for running the full experimental pipeline on all categories with visualization enabled. Smaller experiments on individual categories may require less resources.

## Installation

### Automated Setup (Recommended)

Run the setup script to automatically configure the complete environment:

```bash
bash scripts/setup_environment.sh
```

This script will:
- Create a conda environment named `logsad` with Python 3.10
- Install PyTorch with CUDA 12.1 support
- Install all required dependencies from `requirements.txt`
- Configure numpy compatibility

### Manual Setup

If you prefer manual setup, download the checkpoint for [ViT-H SAM model](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) and put it in the checkpoint folder.

After installation, activate the environment:
```bash
conda activate logsad
```


## Instructions for MVTEC LOCO dataset

### Quick Start (Recommended)

Run evaluation for all categories using the provided shell scripts:

**Few-shot Protocol:**
```bash
bash scripts/run_few_shot.sh
```

**Full-data Protocol:**
```bash
bash scripts/run_full_data.sh
```

### Manual Execution

#### Few-shot Protocol
Run the script for few-shot protocal:

```
python evaluation.py --module_path model_ensemble_few_shot --category CATEGORY  --dataset_path DATASET_PATH
```

#### Full-data Protocol
Run the script to compute coreset for full-data scenarios:

```
python compute_coreset.py --module_path model_ensemble --category CATEGORY  --dataset_path DATASET_PATH
```

Run the script for full-data protocol:

```
python evaluation.py --module_path model_ensemble --category CATEGORY  --dataset_path DATASET_PATH
```

**Available categories:** breakfast_box, juice_bottle, pushpins, screw_bag, splicing_connectors


## Acknowledgement
We are grateful for the following awesome projects when implementing LogSAD:
* [SAM](https://github.com/facebookresearch/segment-anything), [OpenCLIP](https://github.com/mlfoundations/open_clip), [DINOv2](https://github.com/facebookresearch/dinov2) and [NACLIP](https://github.com/sinahmr/NACLIP).


## Citation
If you find our paper is helpful in your research or applications, generously cite with
```
@inproceedings{zhang2025logsad,
      title={Towards Training-free Anomaly Detection with Vision and Language Foundation Models},
      author={Jinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang},
      year={2025},
      booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    }
```