Mathieu Jouffroy PRO
CCMat
AI & ML interests
Computer Vision, NLP, Generative Models
Recent Activity
upvoted
a
collection
30 days ago
DINOv3
updated
a collection
3 months ago
3D Models
updated
a collection
3 months ago
3D Models
Organizations
None yet
RL
Visual Consistency
Inference Improvements
Img-Diffusion
-
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74 -
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 30 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67 -
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper • 2401.06105 • Published • 50
Personalization
-
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Paper • 2402.05195 • Published • 19 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 59 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36
Depth & Segmentation
3D Models
-
3D-aware Image Generation using 2D Diffusion Models
Paper • 2303.17905 • Published • 2 -
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Paper • 2310.08529 • Published • 18 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 32 -
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image
Paper • 2312.04543 • Published • 22
Video
-
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Paper • 2401.15977 • Published • 39 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Paper • 2307.04725 • Published • 64 -
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Paper • 2402.01566 • Published • 27
Transformers & Attention
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 114 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 158
MergingModels
Virtual TryOn
Agents
-
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 37 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 46 -
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 98
Fast Diffusion
-
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Paper • 2402.13929 • Published • 27 -
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Paper • 2403.12015 • Published • 70 -
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Paper • 2404.19759 • Published • 27
Relighting
-
Colorful Diffuse Intrinsic Image Decomposition in the Wild
Paper • 2409.13690 • Published • 14 -
Latent Intrinsics Emerge from Training to Relight
Paper • 2405.21074 • Published • 1 -
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
Paper • 2409.14677 • Published • 16 -
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Paper • 2501.09756 • Published • 19
VLM
-
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 72 -
Vision language models are blind
Paper • 2407.06581 • Published • 83 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 46
3D World / Scene
-
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 53 -
Disentangled 3D Scene Generation with Layout Learning
Paper • 2402.16936 • Published • 12 -
WonderWorld: Interactive 3D Scene Generation from a Single Image
Paper • 2406.09394 • Published • 3 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 40
LoRA
ID Preservation
-
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper • 2404.19427 • Published • 74 -
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
Paper • 2404.16771 • Published • 19 -
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Paper • 2405.12970 • Published • 25 -
FlashFace: Human Image Personalization with High-fidelity Identity Preservation
Paper • 2403.17008 • Published • 21
Style Transfer
-
Style-Friendly SNR Sampler for Style-Driven Generation
Paper • 2411.14793 • Published • 39 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
Stylecodes: Encoding Stylistic Information For Image Generation
Paper • 2411.12811 • Published • 12 -
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper • 2411.10958 • Published • 56
Adapters & Controls
-
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Paper • 2312.02238 • Published • 28 -
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Paper • 2308.06721 • Published • 33 -
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
Paper • 2302.08453 • Published • 11 -
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper • 2311.13600 • Published • 46
Upscaling & SR
-
Exploiting Diffusion Prior for Real-World Image Super-Resolution
Paper • 2305.07015 • Published • 4 -
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
Paper • 2310.12004 • Published • 2 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74
Computer Vision
-
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Paper • 2402.13616 • Published • 49 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 83 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
SAMPart3D: Segment Any Part in 3D Objects
Paper • 2411.07184 • Published • 28
Encoders
Mixture of Experts
-
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56
StateSpaceModels
-
ZigMa: Zigzag Mamba Diffusion Model
Paper • 2403.13802 • Published • 18 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 39 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62
LLMs
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 94 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
Audio
Data
-
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 64 -
Aria Everyday Activities Dataset
Paper • 2402.13349 • Published • 31 -
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper • 2405.01470 • Published • 63 -
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Paper • 2407.02371 • Published • 54
UI
toread
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
3D Understanding
3D World / Scene
-
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 53 -
Disentangled 3D Scene Generation with Layout Learning
Paper • 2402.16936 • Published • 12 -
WonderWorld: Interactive 3D Scene Generation from a Single Image
Paper • 2406.09394 • Published • 3 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 40
RL
LoRA
Visual Consistency
ID Preservation
-
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper • 2404.19427 • Published • 74 -
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
Paper • 2404.16771 • Published • 19 -
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Paper • 2405.12970 • Published • 25 -
FlashFace: Human Image Personalization with High-fidelity Identity Preservation
Paper • 2403.17008 • Published • 21
Inference Improvements
Style Transfer
-
Style-Friendly SNR Sampler for Style-Driven Generation
Paper • 2411.14793 • Published • 39 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
Stylecodes: Encoding Stylistic Information For Image Generation
Paper • 2411.12811 • Published • 12 -
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper • 2411.10958 • Published • 56
Img-Diffusion
-
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74 -
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 30 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67 -
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper • 2401.06105 • Published • 50
Adapters & Controls
-
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Paper • 2312.02238 • Published • 28 -
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Paper • 2308.06721 • Published • 33 -
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
Paper • 2302.08453 • Published • 11 -
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper • 2311.13600 • Published • 46
Personalization
-
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Paper • 2402.05195 • Published • 19 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 59 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36
Upscaling & SR
-
Exploiting Diffusion Prior for Real-World Image Super-Resolution
Paper • 2305.07015 • Published • 4 -
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
Paper • 2310.12004 • Published • 2 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74
Depth & Segmentation
Computer Vision
-
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Paper • 2402.13616 • Published • 49 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 83 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
SAMPart3D: Segment Any Part in 3D Objects
Paper • 2411.07184 • Published • 28
3D Models
-
3D-aware Image Generation using 2D Diffusion Models
Paper • 2303.17905 • Published • 2 -
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Paper • 2310.08529 • Published • 18 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 32 -
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image
Paper • 2312.04543 • Published • 22
Encoders
Video
-
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Paper • 2401.15977 • Published • 39 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Paper • 2307.04725 • Published • 64 -
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Paper • 2402.01566 • Published • 27
Mixture of Experts
-
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56
Transformers & Attention
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 114 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 158
StateSpaceModels
-
ZigMa: Zigzag Mamba Diffusion Model
Paper • 2403.13802 • Published • 18 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 39 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62
MergingModels
LLMs
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 94 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
Virtual TryOn
Audio
Agents
-
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 37 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 46 -
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 98
Data
-
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 64 -
Aria Everyday Activities Dataset
Paper • 2402.13349 • Published • 31 -
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper • 2405.01470 • Published • 63 -
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Paper • 2407.02371 • Published • 54
Fast Diffusion
-
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Paper • 2402.13929 • Published • 27 -
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Paper • 2403.12015 • Published • 70 -
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Paper • 2404.19759 • Published • 27
UI
Relighting
-
Colorful Diffuse Intrinsic Image Decomposition in the Wild
Paper • 2409.13690 • Published • 14 -
Latent Intrinsics Emerge from Training to Relight
Paper • 2405.21074 • Published • 1 -
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
Paper • 2409.14677 • Published • 16 -
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Paper • 2501.09756 • Published • 19
toread
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
VLM
-
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 72 -
Vision language models are blind
Paper • 2407.06581 • Published • 83 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 46