Upload CovisPose model from local checkpoint
Browse files- README.md +142 -0
- config.json +15 -0
- pytorch_model.bin +3 -0
- requirements.txt +5 -0
- upload_metadata.json +48 -0
README.md
ADDED
@@ -0,0 +1,142 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: pytorch
|
3 |
+
tags:
|
4 |
+
- computer-vision
|
5 |
+
- pose-estimation
|
6 |
+
- panoramic-images
|
7 |
+
- covispose
|
8 |
+
- pytorch-model
|
9 |
+
license: mit
|
10 |
+
base_model: resnet50
|
11 |
+
---
|
12 |
+
|
13 |
+
# CovisPose Model
|
14 |
+
|
15 |
+
This model estimates relative poses between panoramic images using the CovisPose framework.
|
16 |
+
|
17 |
+
## Model Details
|
18 |
+
|
19 |
+
- **Architecture**: CovisPose with resnet50 backbone
|
20 |
+
- **Transformer Layers**: 6
|
21 |
+
- **FFN Dimension**: 2048
|
22 |
+
- **Input Size**: [512, 1024]
|
23 |
+
- **Parameters**: 121,890,467 (estimated)
|
24 |
+
|
25 |
+
## Training Information
|
26 |
+
|
27 |
+
### Configuration
|
28 |
+
- **Epochs**: N/A
|
29 |
+
- **Batch Size**: N/A
|
30 |
+
- **Learning Rate**: N/A
|
31 |
+
- **Backbone**: N/A
|
32 |
+
|
33 |
+
### Performance Metrics
|
34 |
+
- **Final Training Loss**: N/A
|
35 |
+
- **Training Rotation Error**: N/A
|
36 |
+
- **Final Validation Loss**: N/A
|
37 |
+
- **Validation Rotation Error**: N/A
|
38 |
+
|
39 |
+
## Usage
|
40 |
+
|
41 |
+
```python
|
42 |
+
import torch
|
43 |
+
import json
|
44 |
+
from huggingface_hub import hf_hub_download
|
45 |
+
|
46 |
+
# Download model files
|
47 |
+
model_path = hf_hub_download(
|
48 |
+
repo_id="SGEthan/covis_toy",
|
49 |
+
filename="pytorch_model.bin"
|
50 |
+
)
|
51 |
+
|
52 |
+
config_path = hf_hub_download(
|
53 |
+
repo_id="SGEthan/covis_toy",
|
54 |
+
filename="config.json"
|
55 |
+
)
|
56 |
+
|
57 |
+
# Load configuration
|
58 |
+
with open(config_path, 'r') as f:
|
59 |
+
config = json.load(f)
|
60 |
+
|
61 |
+
# Initialize model (you'll need the COVIS class)
|
62 |
+
from models.covispose_model import COVIS
|
63 |
+
|
64 |
+
model = COVIS(
|
65 |
+
backbone=config['backbone'],
|
66 |
+
num_transformer_layers=config['num_transformer_layers'],
|
67 |
+
transformer_ffn_dim=config['transformer_ffn_dim']
|
68 |
+
)
|
69 |
+
|
70 |
+
# Load weights
|
71 |
+
checkpoint = torch.load(model_path, map_location='cpu')
|
72 |
+
if "model_state_dict" in checkpoint:
|
73 |
+
model.load_state_dict(checkpoint["model_state_dict"])
|
74 |
+
else:
|
75 |
+
model.load_state_dict(checkpoint)
|
76 |
+
|
77 |
+
model.eval()
|
78 |
+
|
79 |
+
# Use for inference
|
80 |
+
with torch.no_grad():
|
81 |
+
# Your inference code here
|
82 |
+
# outputs1, outputs2 = model(pano1_tensor, pano2_tensor)
|
83 |
+
pass
|
84 |
+
```
|
85 |
+
|
86 |
+
## Model Architecture
|
87 |
+
|
88 |
+
The CovisPose model consists of:
|
89 |
+
|
90 |
+
1. **Backbone Network**: resnet50 for feature extraction
|
91 |
+
2. **Transformer Encoder**: 6 layers for processing image features
|
92 |
+
3. **Prediction Heads**:
|
93 |
+
- Covisibility mask prediction
|
94 |
+
- Relative pose estimation
|
95 |
+
- Boundary detection
|
96 |
+
|
97 |
+
## Task Description
|
98 |
+
|
99 |
+
**CovisPose** estimates the relative pose between two panoramic images by:
|
100 |
+
|
101 |
+
1. **Covisibility Estimation**: Predicting which parts of the images overlap
|
102 |
+
2. **Pose Regression**: Estimating relative rotation and translation
|
103 |
+
3. **Boundary Detection**: Finding floor-wall boundaries for scale estimation
|
104 |
+
|
105 |
+
## Training Data
|
106 |
+
|
107 |
+
This model was trained on panoramic image pairs with:
|
108 |
+
- Relative pose annotations
|
109 |
+
- Covisibility masks
|
110 |
+
- Floor-wall boundary labels
|
111 |
+
|
112 |
+
## Limitations
|
113 |
+
|
114 |
+
- Designed specifically for indoor panoramic images
|
115 |
+
- Requires significant visual overlap between image pairs for reliable pose estimation
|
116 |
+
- Performance may degrade on outdoor scenes or images with minimal overlap
|
117 |
+
|
118 |
+
## Citation
|
119 |
+
|
120 |
+
If you use this model, please cite the CovisPose work:
|
121 |
+
|
122 |
+
```bibtex
|
123 |
+
@article{covispose2024,
|
124 |
+
title={CovisPose: Co-visibility Pose Estimation for Panoramic Images},
|
125 |
+
author={Your Authors},
|
126 |
+
journal={Conference/Journal},
|
127 |
+
year={2024}
|
128 |
+
}
|
129 |
+
```
|
130 |
+
|
131 |
+
## License
|
132 |
+
|
133 |
+
This model is released under the MIT License.
|
134 |
+
|
135 |
+
## Repository
|
136 |
+
|
137 |
+
- **Training Code**: Available in the original repository
|
138 |
+
- **Model Upload**: Generated automatically from local checkpoint
|
139 |
+
|
140 |
+
---
|
141 |
+
|
142 |
+
*Model uploaded on 2025-06-27T09:05:36.254713 using upload_model.py*
|
config.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architecture": "CovisPose",
|
3 |
+
"framework": "pytorch",
|
4 |
+
"model_type": "pose_estimation",
|
5 |
+
"task": "panoramic_pose_estimation",
|
6 |
+
"input_size": [
|
7 |
+
512,
|
8 |
+
1024
|
9 |
+
],
|
10 |
+
"backbone": "resnet50",
|
11 |
+
"num_transformer_layers": 6,
|
12 |
+
"transformer_ffn_dim": 2048,
|
13 |
+
"num_classes": null,
|
14 |
+
"library_name": "pytorch"
|
15 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3b5ed1a2abbd8701fc9fab0d5d6dfbb9e2835f60ea17c25778b2137566e4b62b
|
3 |
+
size 1460570852
|
requirements.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
torch>=1.9.0
|
2 |
+
torchvision>=0.10.0
|
3 |
+
numpy>=1.19.0
|
4 |
+
pillow>=8.0.0
|
5 |
+
huggingface_hub>=0.12.0
|
upload_metadata.json
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"original_checkpoint": "outputs/hf_training/checkpoints/final_model.pth",
|
3 |
+
"upload_date": "2025-06-27T09:05:36.254713",
|
4 |
+
"analysis": {
|
5 |
+
"checkpoint_keys": [
|
6 |
+
"epoch",
|
7 |
+
"model_state_dict",
|
8 |
+
"optimizer_state_dict",
|
9 |
+
"loss"
|
10 |
+
],
|
11 |
+
"has_model_state_dict": true,
|
12 |
+
"has_optimizer_state_dict": true,
|
13 |
+
"has_training_metrics": false,
|
14 |
+
"has_val_metrics": false,
|
15 |
+
"has_args": false,
|
16 |
+
"training_info": {},
|
17 |
+
"model_info": {
|
18 |
+
"num_parameters": 121890467,
|
19 |
+
"parameter_keys": [
|
20 |
+
"covis_feature_extractor.feature_extractor.encoder.conv1.1.weight",
|
21 |
+
"covis_feature_extractor.feature_extractor.encoder.bn1.weight",
|
22 |
+
"covis_feature_extractor.feature_extractor.encoder.bn1.bias",
|
23 |
+
"covis_feature_extractor.feature_extractor.encoder.bn1.running_mean",
|
24 |
+
"covis_feature_extractor.feature_extractor.encoder.bn1.running_var",
|
25 |
+
"covis_feature_extractor.feature_extractor.encoder.bn1.num_batches_tracked",
|
26 |
+
"covis_feature_extractor.feature_extractor.encoder.layer1.0.conv1.weight",
|
27 |
+
"covis_feature_extractor.feature_extractor.encoder.layer1.0.bn1.weight",
|
28 |
+
"covis_feature_extractor.feature_extractor.encoder.layer1.0.bn1.bias",
|
29 |
+
"covis_feature_extractor.feature_extractor.encoder.layer1.0.bn1.running_mean"
|
30 |
+
]
|
31 |
+
}
|
32 |
+
},
|
33 |
+
"config": {
|
34 |
+
"architecture": "CovisPose",
|
35 |
+
"framework": "pytorch",
|
36 |
+
"model_type": "pose_estimation",
|
37 |
+
"task": "panoramic_pose_estimation",
|
38 |
+
"input_size": [
|
39 |
+
512,
|
40 |
+
1024
|
41 |
+
],
|
42 |
+
"backbone": "resnet50",
|
43 |
+
"num_transformer_layers": 6,
|
44 |
+
"transformer_ffn_dim": 2048,
|
45 |
+
"num_classes": null,
|
46 |
+
"library_name": "pytorch"
|
47 |
+
}
|
48 |
+
}
|