Collections
Discover the best community collections!
Collections including paper arxiv:2404.05669
-
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 43 -
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Paper • 2404.14351 • Published • 5 -
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper • 2404.17672 • Published • 18 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 64
-
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1 -
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
Paper • 2404.13686 • Published • 27 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 21 -
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
Paper • 2404.05669 • Published • 1
-
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 7 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper • 2004.08483 • Published • 1
-
Insightful analysis of historical sources at scales beyond human capabilities using unsupervised Machine Learning and XAI
Paper • 2310.09091 • Published • 2 -
Evolution and Transformation of Scientific Knowledge over the Sphaera Corpus: A Network Study
Paper • 2004.00520 • Published • 2 -
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
Paper • 2404.05669 • Published • 1
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 13 -
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper • 2403.09622 • Published • 16 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86
-
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Paper • 2311.17128 • Published • 2 -
Data Generation for Post-OCR correction of Cyrillic handwriting
Paper • 2311.15896 • Published • 3 -
An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics
Paper • 2208.11484 • Published • 3 -
Transformer based Urdu Handwritten Text Optical Character Reader
Paper • 2206.04575 • Published • 2