Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14 • 34
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Paper • 2501.06751 • Published Jan 12 • 32
view post Post 14786 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: akhaliq/anychat See translation 3 replies · 🚀 10 10 🔥 5 5 👍 3 3 👀 2 2 + Reply
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper • 2412.02980 • Published Dec 4, 2024 • 14
view post Post 15026 QwQ-32B-Preview is now available in anychatA reasoning model that is competitive with OpenAI o1-mini and o1-previewtry it out: akhaliq/anychat See translation 1 reply · ❤️ 3 3 👀 2 2 + Reply
view post Post 4302 New model drop in anychatallenai/Llama-3.1-Tulu-3-8B is now availabletry it here: akhaliq/anychat See translation 🔥 3 3 👍 1 1 + Reply
view post Post 3208 anychatsupports chatgpt, gemini, perplexity, claude, meta llama, grok all in one apptry it out there: akhaliq/anychat ❤️ 7 7 🚀 4 4 🔥 2 2 + Reply
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence Paper • 2305.14334 • Published May 23, 2023 • 1
Readout Guidance: Learning Control from Diffusion Features Paper • 2312.02150 • Published Dec 4, 2023 • 3
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14, 2024 • 26
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2, 2024 • 17
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27, 2024 • 26
Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features Paper • 2308.11485 • Published Aug 22, 2023 • 1
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers Paper • 2409.04196 • Published Sep 6, 2024 • 15