Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Giuliano 's Collections
Multimodal
Voice
Video Gen
text2sql
Medicine
LLM Personalization
Agents
Agents SWE
Agents GUI
LLM Reasoning

Multimodal

updated Jan 14
Upvote
-

  • VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

    Paper • 2501.01957 • Published Jan 3 • 47

  • Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

    Paper • 2501.07542 • Published Jan 13 • 3
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs