-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 27
Xijia Tao
Cie1
AI & ML interests
Multimodal tool-calling agents, Diffusion large language models
Recent Activity
updated
a dataset
3 days ago
Cie1/MMSearch-Plus
new activity
3 days ago
Cie1/MMSearch-Plus:The Question and Answer are not readable
updated
a model
6 days ago
Cie1/ImgTrojan-anti-0.01-lora
Organizations
None yet