File size: 4,325 Bytes
851cd21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0dbfb82
 
 
851cd21
0dbfb82
 
 
 
 
 
851cd21
 
 
0dbfb82
 
851cd21
 
 
 
 
 
603cbe7
851cd21
 
 
603cbe7
851cd21
603cbe7
 
851cd21
 
0dbfb82
603cbe7
851cd21
 
 
 
 
 
 
 
0dbfb82
 
 
 
 
 
 
 
 
 
 
851cd21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
license: cc-by-4.0
library_name: pytorch
tags:
- computer-vision
- object-tracking
- spiking-neural-networks
- visual-streaming-perception
- energy-efficient
- cvpr-2025
pipeline_tag: object-detection
---

# ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception

**ViStream** is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties.

## Model Details

### Model Description

- **Developed by:** Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He
- **Model type:** Spiking Neural Network for Visual Streaming Perception
- **Language(s):** PyTorch implementation
- **License:** CC-BY-4.0
- **Paper:** [CVPR 2025](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf)
- **Repository:** [GitHub](https://github.com/Intelligent-Computing-Research-Group/ViStream)

### Model Architecture

ViStream introduces two key innovations:
1. **Law of Charge Conservation (LoCC)** property in ST-BIF neurons
2. **Differential Encoding (DiffEncode)** scheme for temporal optimization

The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts.

## Uses

### Direct Use

ViStream can be directly used for:
- **Multiple Object Tracking (MOT)**
- **Single Object Tracking (SOT)**
- **Video Object Segmentation (VOS)**
- **Multiple Object Tracking and Segmentation (MOTS)**
- **Pose Tracking**

### Downstream Use

The model can be fine-tuned for various visual streaming perception tasks in:
- Autonomous driving
- UAV navigation
- AR/VR applications
- Real-time surveillance

## Bias, Risks, and Limitations

### Limitations
- Requires specific hardware optimization for maximum energy benefits
- Performance may vary with different frame rates
- Limited to visual perception tasks

### Recommendations
- Test thoroughly on target hardware before deployment
- Consider computational constraints of edge devices
- Validate performance on domain-specific datasets

## How to Get Started with the Model

```python
from huggingface_hub import hf_hub_download
import torch

# Download the checkpoint
checkpoint_path = hf_hub_download(
    repo_id="AndyBlocker/ViStream", 
    filename="checkpoint-90.pth"
)

# Load the model (requires ViStream implementation)
checkpoint = torch.load(checkpoint_path, map_location='cpu')
```

For complete usage examples, see the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream).

## Training Details

### Training Data

The model was trained on multiple datasets for various visual streaming perception tasks including object tracking, video object segmentation, and pose tracking.

### Training Procedure

**Training Details:**
- Framework: PyTorch
- Optimization: Energy-efficient SNN training with Law of Charge Conservation
- Architecture: ResNet-based backbone with spike quantization layers

## Evaluation

The model demonstrates competitive performance across multiple visual streaming perception tasks while achieving significant energy efficiency improvements compared to traditional ANN-based approaches. Detailed evaluation results are available in the [CVPR 2025 paper](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf).

## Model Card Authors

Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He

## Model Card Contact

For questions about this model, please open an issue in the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream).

## Citation

```bibtex
@inproceedings{you2025vistream,
  title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network},
  author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={8796--8805},
  year={2025}
}
```