---
library_name: transformers
tags:
- video
- feature
- face
license: cc
base_model:
- ControlNet/MARLIN
pipeline_tag: feature-extraction
---


# MARLIN: Masked Autoencoder for facial video Representation LearnINg

This repo is the official PyTorch implementation for the paper 
[MARLIN: Masked Autoencoder for facial video Representation LearnINg](https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper) (CVPR 2023) ([arXiv](https://arxiv.org/abs/2211.06627)).


## Use `transformers` (HuggingFace) for Feature Extraction

Requirements:
- Python
- PyTorch
- transformers
- einops

Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc).


```python
import torch
from transformers import AutoModel

model = AutoModel.from_pretrained(
    "ControlNet/marlin_vit_base_ytf",  # or other variants
    trust_remote_code=True
)
tensor = torch.rand([1, 3, 16, 224, 224])  # (B, C, T, H, W)
output = model(tensor)  # torch.Size([1, 1568, 384])
```

## License

This project is under the CC BY-NC 4.0 license. See [LICENSE](LICENSE) for details.

## References
If you find this work useful for your research, please consider citing it.
```bibtex
@inproceedings{cai2022marlin,
  title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg},
  author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023},
  month = {June},
  pages = {1493-1504},
  doi = {10.1109/CVPR52729.2023.00150},
  publisher = {IEEE},
}
```