Universal-Multimodal-Agent (UMA)

New: Multimodal Datasets Catalog (Phase 1 Data Collection)

We’re kicking off data collection for UMA. Below is a curated, growing catalog of top public multimodal datasets by category with brief notes and links for immediate use.

A. Text–Image

B. Text–Image Reasoning / VQA / Document QA

C. Text–Table (Structured Data)

D. Text–Audio / Speech

E. Full Multimodal / Multi-domain (Text–Image–Table–Audio—and more)

F. Safety, Bias, and Accessibility-focused Sets

G. Licensing and Usage Notes

  • Always check each dataset’s license and terms of use; some require access requests or restrict commercial use.
  • Maintain separate manifests with source, license, checksum, and intended use. Prefer mirrored, deduplicated shards with exact provenance.

Call for Collaboration: Build UMA with Us

We’re assembling an open team. If you’re passionate about agentic multimodal AI, join us.

Roles we’re seeking (volunteer or sponsored collaborations):

  • Research Scientists: Multimodal learning, alignment, grounding, evaluation.
  • Research Engineers: Training pipelines, distributed systems, retrieval, tool-use interfaces.
  • Data Scientists / Data Engineers: Dataset curation, cleaning, deduplication, data governance.
  • Domain Experts: Finance, healthcare, education, accessibility, scientific communication.
  • Accessibility Specialists: Inclusive design, alt-text/sonification, screen-reader workflows, disability advocacy.
  • MLOps/Infra: Dataset storage, versioning, scalable training eval infra (HF Datasets, WebDataset, parquet, Arrow).
  • Community & Documentation: Tutorials, examples, benchmark harnesses, governance.

How to get involved now:

Initial roadmap for data:

  • Phase 1: Curate public datasets and licenses; build manifests and downloaders
  • Phase 2: Unified preprocessing (image, OCR, tables, audio), deduping, quality filters
  • Phase 3: Balanced training mixtures + eval suites (MMMU/MMBench/DocVQA/ASR)

Ethics & Safety:

  • Respect dataset licenses, privacy, and consent. Implement filter lists and red-teaming sets.
  • Document known biases and limitations; enable opt-out mechanisms where applicable.

Contributors will be acknowledged in the README and future preprint.

Original Project Overview

[Existing content retained below]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support