facebook
/

PE-Lang-L14-448

Image Feature Extraction

PerceptionEncoder

Model card Files Files and versions Community

janghyuncho7 commited on 25 days ago

Commit

3f0d058

·

verified ·

1 Parent(s): dab510d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ We release two PE Lang checkpoints, L14-448 and G14-448. Here are their results
-Here is a sample of the performance obtainable by using PE Core G aligned further with [PLM-8B](https://huggingface.co/facebook/Perception-LM-8B) (stage 2) using 16+1 image tiles / 16 video frames with Llama 3.1 8B as the decoder:
 | Model | Encoder | Doc VQA (test) | InfoQA (test) | TextVQA | MVBench | PerceptionTest (test) | EgoSchema (test) |
 |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|

+Here is a sample of the performance obtainable by using PE Core G aligned further with [PLM-8B](https://huggingface.co/facebook/Perception-LM-8B) (*stage 3*) using 36+1 image tiles / 32 video frames with Llama 3.1 8B as the decoder:
 | Model | Encoder | Doc VQA (test) | InfoQA (test) | TextVQA | MVBench | PerceptionTest (test) | EgoSchema (test) |
 |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|