-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 144 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 109 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 41 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 28
Collections
Discover the best community collections!
Collections including paper arxiv:2406.09414
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 60 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 30 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56
-
EdgeFusion: On-Device Text-to-Image Generation
Paper • 2404.11925 • Published • 21 -
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 43 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 47 -
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Paper • 2404.07724 • Published • 12
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 27 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 27 -
Probing the 3D Awareness of Visual Foundation Models
Paper • 2404.08636 • Published • 12 -
AM-RADIO: Agglomerative Model -- Reduce All Domains Into One
Paper • 2312.06709 • Published • 1
-
Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology
Paper • 2203.00585 • Published • 2 -
Emerging Properties in Self-Supervised Vision Transformers
Paper • 2104.14294 • Published • 3 -
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Paper • 2404.06903 • Published • 18 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 30
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 188 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44 -
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Paper • 2403.04692 • Published • 40
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 21 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 144 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 9 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 16 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 59 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 73
-
GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
Paper • 2312.11458 • Published • 4 -
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Paper • 2312.11392 • Published • 19 -
LangSplat: 3D Language Gaussian Splatting
Paper • 2312.16084 • Published • 14 -
Human101: Training 100+FPS Human Gaussians in 100s from 1 View
Paper • 2312.15258 • Published • 7
-
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper • 2312.00777 • Published • 21 -
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Paper • 2312.03641 • Published • 20 -
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 12 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 9