Johann-Peter Hartmann PRO
johannhartmann
AI & ML interests
LLMs, Local LLMs, Transformers, Image Processing, Audio Processing, E-Commerce
Recent Activity
updated
a collection
11 days ago
Computer Use Models
updated
a collection
11 days ago
Computer Use Models
updated
a collection
11 days ago
Computer Use Models
Organizations
Document & UI Intelligence
-
xlangai/Aguvis-7B-720P
8B • Updated • 46 • 9 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 72 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
cckevinn/SeeClick
Text Generation • 10B • Updated • 149 • 18
Medical MultiModal
Multimodal models that have been trained on medical datasets.
Computer Use Models
-
ByteDance-Seed/UI-TARS-72B-DPO
Image-Text-to-Text • 73B • Updated • 800 • 147 -
ByteDance-Seed/UI-TARS-7B-DPO
Image-Text-to-Text • 8B • Updated • 1.26k • 221 -
microsoft/OmniParser
Image-Text-to-Text • Updated • 469 • 1.7k -
jadechoghari/Ferret-UI-Llama8b
Image-Text-to-Text • 8B • Updated • 281 • 68
Multimodal Models
A collection of multimodal models for the gpu poor
-
google/paligemma-3b-pt-896
Image-Text-to-Text • 3B • Updated • 276 • 123 -
OpenGVLab/InternVL-Chat-V1-5
Image-Text-to-Text • 26B • Updated • 7.75k • 416 -
alexshengzhili/llava-v1.5-13b-dpo
Text Generation • Updated • 11 • 5 -
llava-hf/llava-v1.6-mistral-7b-hf
Image-Text-to-Text • 8B • Updated • 367k • 297
Music
Computer Use Models
-
ByteDance-Seed/UI-TARS-72B-DPO
Image-Text-to-Text • 73B • Updated • 800 • 147 -
ByteDance-Seed/UI-TARS-7B-DPO
Image-Text-to-Text • 8B • Updated • 1.26k • 221 -
microsoft/OmniParser
Image-Text-to-Text • Updated • 469 • 1.7k -
jadechoghari/Ferret-UI-Llama8b
Image-Text-to-Text • 8B • Updated • 281 • 68
Document & UI Intelligence
-
xlangai/Aguvis-7B-720P
8B • Updated • 46 • 9 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 72 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
cckevinn/SeeClick
Text Generation • 10B • Updated • 149 • 18
Multimodal Models
A collection of multimodal models for the gpu poor
-
google/paligemma-3b-pt-896
Image-Text-to-Text • 3B • Updated • 276 • 123 -
OpenGVLab/InternVL-Chat-V1-5
Image-Text-to-Text • 26B • Updated • 7.75k • 416 -
alexshengzhili/llava-v1.5-13b-dpo
Text Generation • Updated • 11 • 5 -
llava-hf/llava-v1.6-mistral-7b-hf
Image-Text-to-Text • 8B • Updated • 367k • 297
Medical MultiModal
Multimodal models that have been trained on medical datasets.