-
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper • 2309.06380 • Published • 32 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 53 -
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Paper • 2309.15818 • Published • 18
Mark Redito
markredito
AI & ML interests
Generative AI, Multimodal AI, Deep Learning
Organizations
Audio
-
Retrieval-Augmented Text-to-Audio Generation
Paper • 2309.08051 • Published • 7 -
A Large-scale Dataset for Audio-Language Representation Learning
Paper • 2309.11500 • Published • 10 -
End-to-End Speech Recognition Contextualization with Large Language Models
Paper • 2309.10917 • Published • 10 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 9
Multimodal
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 59 -
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Paper • 2309.15091 • Published • 33 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
experiments
3D
LLMs
-
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 42 -
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
Paper • 2309.07430 • Published • 27 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 53 -
Investigating Answerability of LLMs for Long-Form Question Answering
Paper • 2309.08210 • Published • 14
Interpretability
Music Generation
robotics
Image Generation
-
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper • 2309.06380 • Published • 32 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 53 -
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Paper • 2309.15818 • Published • 18
LLMs
-
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 42 -
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
Paper • 2309.07430 • Published • 27 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 53 -
Investigating Answerability of LLMs for Long-Form Question Answering
Paper • 2309.08210 • Published • 14
Audio
-
Retrieval-Augmented Text-to-Audio Generation
Paper • 2309.08051 • Published • 7 -
A Large-scale Dataset for Audio-Language Representation Learning
Paper • 2309.11500 • Published • 10 -
End-to-End Speech Recognition Contextualization with Large Language Models
Paper • 2309.10917 • Published • 10 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 9
Interpretability
Multimodal
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 59 -
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Paper • 2309.15091 • Published • 33 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
Music Generation
experiments
robotics
3D