--- library_name: transformers tags: - video - feature - face license: cc base_model: - ControlNet/MARLIN pipeline_tag: feature-extraction --- # MARLIN: Masked Autoencoder for facial video Representation LearnINg This repo is the official PyTorch implementation for the paper [MARLIN: Masked Autoencoder for facial video Representation LearnINg](https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper) (CVPR 2023) ([arXiv](https://arxiv.org/abs/2211.06627)). ## Use `transformers` (HuggingFace) for Feature Extraction Requirements: - Python - PyTorch - transformers - einops Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc). ```python import torch from transformers import AutoModel model = AutoModel.from_pretrained( "ControlNet/marlin_vit_base_ytf", # or other variants trust_remote_code=True ) tensor = torch.rand([1, 3, 16, 224, 224]) # (B, C, T, H, W) output = model(tensor) # torch.Size([1, 1568, 384]) ``` ## License This project is under the CC BY-NC 4.0 license. See [LICENSE](LICENSE) for details. ## References If you find this work useful for your research, please consider citing it. ```bibtex @inproceedings{cai2022marlin, title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg}, author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2023}, month = {June}, pages = {1493-1504}, doi = {10.1109/CVPR52729.2023.00150}, publisher = {IEEE}, } ```