File size: 4,325 Bytes

---
license: cc-by-4.0
library_name: pytorch
tags:
- computer-vision
- object-tracking
- spiking-neural-networks
- visual-streaming-perception
- energy-efficient
- cvpr-2025
pipeline_tag: object-detection
---

# ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception

**ViStream** is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties.

## Model Details

### Model Description

- **Developed by:** Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He
- **Model type:** Spiking Neural Network for Visual Streaming Perception
- **Language(s):** PyTorch implementation
- **License:** CC-BY-4.0
- **Paper:** [CVPR 2025](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf)
- **Repository:** [GitHub](https://github.com/Intelligent-Computing-Research-Group/ViStream)

### Model Architecture

ViStream introduces two key innovations:
1. **Law of Charge Conservation (LoCC)** property in ST-BIF neurons
2. **Differential Encoding (DiffEncode)** scheme for temporal optimization

The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts.

## Uses

### Direct Use

ViStream can be directly used for:
- **Multiple Object Tracking (MOT)**
- **Single Object Tracking (SOT)**
- **Video Object Segmentation (VOS)**
- **Multiple Object Tracking and Segmentation (MOTS)**
- **Pose Tracking**

### Downstream Use

The model can be fine-tuned for various visual streaming perception tasks in:
- Autonomous driving
- UAV navigation
- AR/VR applications
- Real-time surveillance

## Bias, Risks, and Limitations

### Limitations
- Requires specific hardware optimization for maximum energy benefits
- Performance may vary with different frame rates
- Limited to visual perception tasks

### Recommendations
- Test thoroughly on target hardware before deployment
- Consider computational constraints of edge devices
- Validate performance on domain-specific datasets

## How to Get Started with the Model

```python
from huggingface_hub import hf_hub_download
import torch

# Download the checkpoint
checkpoint_path = hf_hub_download(
    repo_id="AndyBlocker/ViStream", 
    filename="checkpoint-90.pth"
)

# Load the model (requires ViStream implementation)
checkpoint = torch.load(checkpoint_path, map_location='cpu')
```

For complete usage examples, see the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream).

## Training Details

### Training Data

The model was trained on multiple datasets for various visual streaming perception tasks including object tracking, video object segmentation, and pose tracking.

### Training Procedure

**Training Details:**
- Framework: PyTorch
- Optimization: Energy-efficient SNN training with Law of Charge Conservation
- Architecture: ResNet-based backbone with spike quantization layers

## Evaluation

The model demonstrates competitive performance across multiple visual streaming perception tasks while achieving significant energy efficiency improvements compared to traditional ANN-based approaches. Detailed evaluation results are available in the [CVPR 2025 paper](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf).

## Model Card Authors

Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He

## Model Card Contact

For questions about this model, please open an issue in the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream).

## Citation

```bibtex
@inproceedings{you2025vistream,
  title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network},
  author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={8796--8805},
  year={2025}
}
```