|
--- |
|
datasets: |
|
- shuaishuaicdp/GUI-World |
|
language: |
|
- en |
|
license: cc-by-4.0 |
|
metrics: |
|
- bertscore |
|
- LLM-as-a-Judge |
|
tags: |
|
- gui |
|
- agent |
|
pipeline_tag: video-text-to-text |
|
--- |
|
|
|
This is the first VideoLLM with powerful GUI-oriented capabilities, retrained on [GUI-World](https://gui-world.github.io). |
|
|
|
It was presented in [GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents](https://huggingface.co/papers/2406.10819). |
|
|
|
See [Github](https://github.com/Dongping-Chen/GUI-World) for how to use GUI-Vid for GUI understanding tasks. |