hbkang
's Collections
cool-papers
updated
Paper
•
2406.09414
•
Published
•
104
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
•
2406.09415
•
Published
•
52
Physics3D: Learning Physical Properties of 3D Gaussians via Video
Diffusion
Paper
•
2406.04338
•
Published
•
40
SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
116
GraCo: Granularity-Controllable Interactive Segmentation
Paper
•
2405.00587
•
Published
Paper
•
2410.05258
•
Published
•
180
APOLLO: SGD-like Memory, AdamW-level Performance
Paper
•
2412.05270
•
Published
•
39
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
106
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
•
2501.00958
•
Published
•
107
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token
Paper
•
2501.03895
•
Published
•
53
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper
•
2501.05441
•
Published
•
94
Infecting Generative AI With Viruses
Paper
•
2501.05542
•
Published
•
13
MatAnyone: Stable Video Matting with Consistent Memory Propagation
Paper
•
2501.14677
•
Published
•
36
ConceptAttention: Diffusion Transformers Learn Highly Interpretable
Features
Paper
•
2502.04320
•
Published
•
38
Diffusion Models without Classifier-free Guidance
Paper
•
2502.12154
•
Published
•
7
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling
Paper
•
2502.09509
•
Published
•
7
Distill Any Depth: Distillation Creates a Stronger Monocular Depth
Estimator
Paper
•
2502.19204
•
Published
•
11
UniTok: A Unified Tokenizer for Visual Generation and Understanding
Paper
•
2502.20321
•
Published
•
30
How far can we go with ImageNet for Text-to-Image generation?
Paper
•
2502.21318
•
Published
•
26
AI-Invented Tonal Languages: Preventing a Machine Lingua Franca Beyond
Human Understanding
Paper
•
2503.01063
•
Published
•
5
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
121
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI
Perspective
Paper
•
2503.01933
•
Published
•
12
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper
•
2503.04724
•
Published
•
69
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper
•
2503.02130
•
Published
•
32
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion
Models
Paper
•
2503.08417
•
Published
•
8
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
•
2503.09573
•
Published
•
72
The Curse of Conditions: Analyzing and Improving Optimal Transport for
Conditional Flow-Based Generation
Paper
•
2503.10636
•
Published
•
3
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Paper
•
2503.11647
•
Published
•
142
Paper
•
2503.16425
•
Published
•
16
When Less is Enough: Adaptive Token Reduction for Efficient Image
Representation
Paper
•
2503.16660
•
Published
•
73
Unconditional Priors Matter! Improving Conditional Generation of
Fine-Tuned Diffusion Models
Paper
•
2503.20240
•
Published
•
22
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Paper
•
2503.21732
•
Published
•
9
X^{2}-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time
Tomographic Reconstruction
Paper
•
2503.21779
•
Published
•
4
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through
Lightweight Vocabulary Adaptation
Paper
•
2503.19693
•
Published
•
77
Scaling Language-Free Visual Representation Learning
Paper
•
2504.01017
•
Published
•
32
Gaussian Mixture Flow Matching Models
Paper
•
2504.05304
•
Published
•
12
DDT: Decoupled Diffusion Transformer
Paper
•
2504.05741
•
Published
•
76
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
•
2504.13837
•
Published
•
132
Group Downsampling with Equivariant Anti-aliasing
Paper
•
2504.17258
•
Published
•
9
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
•
2504.20966
•
Published
•
32
Training-Free Efficient Video Generation via Dynamic Token Carving
Paper
•
2505.16864
•
Published
•
22
Revisiting Residual Connections: Orthogonal Updates for Stable and
Efficient Deep Networks
Paper
•
2505.11881
•
Published
•
4
Paper
•
2506.10892
•
Published
•
38
JAFAR: Jack up Any Feature at Any Resolution
Paper
•
2506.11136
•
Published
•
10
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
Paper
•
2506.15682
•
Published
•
5
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based
Diffusion Sampling
Paper
•
2506.20452
•
Published
•
17
Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image
Watermarking Technique for AI-Generated Images
Paper
•
2506.22960
•
Published
•
4
Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection
Paper
•
2507.07994
•
Published
•
1