File size: 3,298 Bytes
8d4f706
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
library_name: pytorch
tags:
- computer-vision
- pose-estimation
- panoramic-images
- covispose
- pytorch-model
license: mit
base_model: resnet50
---

# CovisPose Model

This model estimates relative poses between panoramic images using the CovisPose framework.

## Model Details

- **Architecture**: CovisPose with resnet50 backbone
- **Transformer Layers**: 6
- **FFN Dimension**: 2048
- **Input Size**: [512, 1024]
- **Parameters**: 121,890,467 (estimated)

## Training Information

### Configuration
- **Epochs**: N/A
- **Batch Size**: N/A
- **Learning Rate**: N/A
- **Backbone**: N/A

### Performance Metrics
- **Final Training Loss**: N/A
- **Training Rotation Error**: N/A
- **Final Validation Loss**: N/A
- **Validation Rotation Error**: N/A

## Usage

```python
import torch
import json
from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(
    repo_id="SGEthan/covis_toy",
    filename="pytorch_model.bin"
)

config_path = hf_hub_download(
    repo_id="SGEthan/covis_toy",
    filename="config.json"
)

# Load configuration
with open(config_path, 'r') as f:
    config = json.load(f)

# Initialize model (you'll need the COVIS class)
from models.covispose_model import COVIS

model = COVIS(
    backbone=config['backbone'],
    num_transformer_layers=config['num_transformer_layers'],
    transformer_ffn_dim=config['transformer_ffn_dim']
)

# Load weights
checkpoint = torch.load(model_path, map_location='cpu')
if "model_state_dict" in checkpoint:
    model.load_state_dict(checkpoint["model_state_dict"])
else:
    model.load_state_dict(checkpoint)

model.eval()

# Use for inference
with torch.no_grad():
    # Your inference code here
    # outputs1, outputs2 = model(pano1_tensor, pano2_tensor)
    pass
```

## Model Architecture

The CovisPose model consists of:

1. **Backbone Network**: resnet50 for feature extraction
2. **Transformer Encoder**: 6 layers for processing image features
3. **Prediction Heads**:
   - Covisibility mask prediction
   - Relative pose estimation
   - Boundary detection

## Task Description

**CovisPose** estimates the relative pose between two panoramic images by:

1. **Covisibility Estimation**: Predicting which parts of the images overlap
2. **Pose Regression**: Estimating relative rotation and translation
3. **Boundary Detection**: Finding floor-wall boundaries for scale estimation

## Training Data

This model was trained on panoramic image pairs with:
- Relative pose annotations
- Covisibility masks
- Floor-wall boundary labels

## Limitations

- Designed specifically for indoor panoramic images
- Requires significant visual overlap between image pairs for reliable pose estimation
- Performance may degrade on outdoor scenes or images with minimal overlap

## Citation

If you use this model, please cite the CovisPose work:

```bibtex
@article{covispose2024,
  title={CovisPose: Co-visibility Pose Estimation for Panoramic Images},
  author={Your Authors},
  journal={Conference/Journal},
  year={2024}
}
```

## License

This model is released under the MIT License.

## Repository

- **Training Code**: Available in the original repository
- **Model Upload**: Generated automatically from local checkpoint

---

*Model uploaded on 2025-06-27T09:05:36.254713 using upload_model.py*