3dlg-hcvc
/

DuoduoCLIP

Model card Files Files and versions

DuoduoCLIP / README.md

rexleeppp's picture

Update README.md

e060be0 verified about 1 year ago

|

1.38 kB

	# Model Card for DuoduoCLIP

	In this model repo we provide the official pretrained models used in the paper Duoduo CLIP: Efficient 3D Understanding with Multi-View Images.
	The model usage and code can be found in the [github repo](https://github.com/3dlg-hcvc/DuoduoCLIP).

	*Note: We provide the main model in the initial release, we will soon upload the other models used in the paper.*

	## Model Details

	### Model Description

	- Finetuned from model: OpenCLIP model ("ViT-B-32" architecture and checkpoint "laion2b_s34b_b79k")

	### Model Sources

	- Repository: https://github.com/3dlg-hcvc/DuoduoCLIP
	- Paper: https://arxiv.org/abs/2406.11579

	### Model Checkpoints

	- Four_1to6F_bs1600_LT6.ckpt: The model trained with the Four dataset and 1 to 6 frames sampled during training, with the last 6 attention layers trainable.

	## Training Data

	The dataset card can be found [here](https://huggingface.co/datasets/3dlg-hcvc/DuoduoCLIP-data).

	BibTeX:
	```bibtex
	@misc{lee2024duoduo,
	title={Duoduo CLIP: Efficient 3D Understanding with Multi-View Images},
	author={Han-Hung Lee and Yiming Zhang and Angel X. Chang},
	year={2024},
	eprint={2406.11579},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	## Acknowledgement

	This work was funded by a CIFAR AI Chair, a NSERC Discovery grant, and a CFI/BCKDF JELF grant.