SGEthan
/

covis_toy

pose_estimation

computer-vision

pose-estimation

panoramic-images

Model card Files Files and versions Community

covis_toy / README.md

SGEthan's picture

Upload CovisPose model from local checkpoint

8d4f706 verified 2 months ago

|

history blame contribute delete

3.3 kB

	---
	library_name: pytorch
	tags:
	- computer-vision
	- pose-estimation
	- panoramic-images
	- covispose
	- pytorch-model
	license: mit
	base_model: resnet50
	---

	# CovisPose Model

	This model estimates relative poses between panoramic images using the CovisPose framework.

	## Model Details

	- Architecture: CovisPose with resnet50 backbone
	- Transformer Layers: 6
	- FFN Dimension: 2048
	- Input Size: [512, 1024]
	- Parameters: 121,890,467 (estimated)

	## Training Information

	### Configuration
	- Epochs: N/A
	- Batch Size: N/A
	- Learning Rate: N/A
	- Backbone: N/A

	### Performance Metrics
	- Final Training Loss: N/A
	- Training Rotation Error: N/A
	- Final Validation Loss: N/A
	- Validation Rotation Error: N/A

	## Usage

	```python
	import torch
	import json
	from huggingface_hub import hf_hub_download

	# Download model files
	model_path = hf_hub_download(
	repo_id="SGEthan/covis_toy",
	filename="pytorch_model.bin"
	)

	config_path = hf_hub_download(
	repo_id="SGEthan/covis_toy",
	filename="config.json"
	)

	# Load configuration
	with open(config_path, 'r') as f:
	config = json.load(f)

	# Initialize model (you'll need the COVIS class)
	from models.covispose_model import COVIS

	model = COVIS(
	backbone=config['backbone'],
	num_transformer_layers=config['num_transformer_layers'],
	transformer_ffn_dim=config['transformer_ffn_dim']
	)

	# Load weights
	checkpoint = torch.load(model_path, map_location='cpu')
	if "model_state_dict" in checkpoint:
	model.load_state_dict(checkpoint["model_state_dict"])
	else:
	model.load_state_dict(checkpoint)

	model.eval()

	# Use for inference
	with torch.no_grad():
	# Your inference code here
	# outputs1, outputs2 = model(pano1_tensor, pano2_tensor)
	pass
	```

	## Model Architecture

	The CovisPose model consists of:

	1. Backbone Network: resnet50 for feature extraction
	2. Transformer Encoder: 6 layers for processing image features
	3. Prediction Heads:
	- Covisibility mask prediction
	- Relative pose estimation
	- Boundary detection

	## Task Description

	CovisPose estimates the relative pose between two panoramic images by:

	1. Covisibility Estimation: Predicting which parts of the images overlap
	2. Pose Regression: Estimating relative rotation and translation
	3. Boundary Detection: Finding floor-wall boundaries for scale estimation

	## Training Data

	This model was trained on panoramic image pairs with:
	- Relative pose annotations
	- Covisibility masks
	- Floor-wall boundary labels

	## Limitations

	- Designed specifically for indoor panoramic images
	- Requires significant visual overlap between image pairs for reliable pose estimation
	- Performance may degrade on outdoor scenes or images with minimal overlap

	## Citation

	If you use this model, please cite the CovisPose work:

	```bibtex
	@article{covispose2024,
	title={CovisPose: Co-visibility Pose Estimation for Panoramic Images},
	author={Your Authors},
	journal={Conference/Journal},
	year={2024}
	}
	```

	## License

	This model is released under the MIT License.

	## Repository

	- Training Code: Available in the original repository
	- Model Upload: Generated automatically from local checkpoint

	---

	Model uploaded on 2025-06-27T09:05:36.254713 using upload_model.py