ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception

ViStream is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties.

Model Details

Model Description

  • Developed by: Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He
  • Model type: Spiking Neural Network for Visual Streaming Perception
  • Language(s): PyTorch implementation
  • License: CC-BY-4.0
  • Paper: CVPR 2025
  • Repository: GitHub

Model Architecture

ViStream introduces two key innovations:

  1. Law of Charge Conservation (LoCC) property in ST-BIF neurons
  2. Differential Encoding (DiffEncode) scheme for temporal optimization

The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts.

Uses

Direct Use

ViStream can be directly used for:

  • Multiple Object Tracking (MOT)
  • Single Object Tracking (SOT)
  • Video Object Segmentation (VOS)
  • Multiple Object Tracking and Segmentation (MOTS)
  • Pose Tracking

Downstream Use

The model can be fine-tuned for various visual streaming perception tasks in:

  • Autonomous driving
  • UAV navigation
  • AR/VR applications
  • Real-time surveillance

Bias, Risks, and Limitations

Limitations

  • Requires specific hardware optimization for maximum energy benefits
  • Performance may vary with different frame rates
  • Limited to visual perception tasks

Recommendations

  • Test thoroughly on target hardware before deployment
  • Consider computational constraints of edge devices
  • Validate performance on domain-specific datasets

How to Get Started with the Model

from huggingface_hub import hf_hub_download
import torch

# Download the checkpoint
checkpoint_path = hf_hub_download(
    repo_id="AndyBlocker/ViStream", 
    filename="checkpoint-90.pth"
)

# Load the model (requires ViStream implementation)
checkpoint = torch.load(checkpoint_path, map_location='cpu')

For complete usage examples, see the GitHub repository.

Training Details

Training Data

The model was trained on multiple datasets for various visual streaming perception tasks including object tracking, video object segmentation, and pose tracking.

Training Procedure

Training Details:

  • Framework: PyTorch
  • Optimization: Energy-efficient SNN training with Law of Charge Conservation
  • Architecture: ResNet-based backbone with spike quantization layers

Evaluation

The model demonstrates competitive performance across multiple visual streaming perception tasks while achieving significant energy efficiency improvements compared to traditional ANN-based approaches. Detailed evaluation results are available in the CVPR 2025 paper.

Model Card Authors

Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He

Model Card Contact

For questions about this model, please open an issue in the GitHub repository.

Citation

@inproceedings{you2025vistream,
  title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network},
  author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={8796--8805},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support