Image Feature Extraction
PerceptionEncoder

Add pipeline tag

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,11 +1,13 @@
1
  ---
2
- license: apache-2.0
3
  library_name: perception-encoder
 
 
4
  ---
 
5
  # Model Details
6
 
7
- [\[πŸ“ƒ Tech Report\]](https://arxiv.org/abs/2504.13181)
8
- [\[πŸ“‚ Github\]](https://github.com/facebookresearch/perception_models/)
9
 
10
  Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
11
  are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
@@ -61,5 +63,4 @@ If you find our code useful for your research, please consider citing:
61
  author={Jang Hyun Cho and Andrea Madotto and Effrosyni Mavroudi and Triantafyllos Afouras and Tushar Nagarajan and Muhammad Maaz and Yale Song and Tengyu Ma and Shuming Hu and Hanoona Rasheed and Peize Sun and Po-Yao Huang and Daniel Bolya and Suyog Jain and Miguel Martin and Huiyu Wang and Nikhila Ravi and Shashank Jain and Temmy Stark and Shane Moon and Babak Damavandi and Vivian Lee and Andrew Westbury and Salman Khan and Philipp Kr\"{a}henb\"{u}hl and Piotr Doll{\'a}r and Lorenzo Torresani and Kristen Grauman and Christoph Feichtenhofer},
62
  journal={arXiv},
63
  year={2025}
64
- }
65
-
 
1
  ---
 
2
  library_name: perception-encoder
3
+ license: apache-2.0
4
+ pipeline_tag: image-text-to-text
5
  ---
6
+
7
  # Model Details
8
 
9
+ [\\[πŸ“ƒ Tech Report\\]](https://arxiv.org/abs/2504.13181)
10
+ [\\[πŸ“‚ Github\\]](https://github.com/facebookresearch/perception_models/)
11
 
12
  Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
13
  are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
 
63
  author={Jang Hyun Cho and Andrea Madotto and Effrosyni Mavroudi and Triantafyllos Afouras and Tushar Nagarajan and Muhammad Maaz and Yale Song and Tengyu Ma and Shuming Hu and Hanoona Rasheed and Peize Sun and Po-Yao Huang and Daniel Bolya and Suyog Jain and Miguel Martin and Huiyu Wang and Nikhila Ravi and Shashank Jain and Temmy Stark and Shane Moon and Babak Damavandi and Vivian Lee and Andrew Westbury and Salman Khan and Philipp Kr\"{a}henb\"{u}hl and Piotr Doll{\'a}r and Lorenzo Torresani and Kristen Grauman and Christoph Feichtenhofer},
64
  journal={arXiv},
65
  year={2025}
66
+ }