-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2412.17739
-
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Paper • 2501.03916 • Published • 16 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 92 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 100
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 49 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 39
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 19 -
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper • 2412.04814 • Published • 49 -
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Paper • 2412.05237 • Published • 48 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16
-
Differential Transformer
Paper • 2410.05258 • Published • 180 -
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Paper • 2410.20424 • Published • 41 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 125
-
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Paper • 2312.03788 • Published • 1 -
FlatQuant: Flatness Matters for LLM Quantization
Paper • 2410.09426 • Published • 15 -
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Paper • 2501.01005 • Published • 1
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 106 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
-
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 106 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 369 -
YuLan-Mini: An Open Data-efficient Language Model
Paper • 2412.17743 • Published • 67
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 62
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Paper • 2312.03788 • Published • 1 -
FlatQuant: Flatness Matters for LLM Quantization
Paper • 2410.09426 • Published • 15 -
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Paper • 2501.01005 • Published • 1
-
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Paper • 2501.03916 • Published • 16 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 92 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 100
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 49 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 39
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 106 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 19 -
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper • 2412.04814 • Published • 49 -
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Paper • 2412.05237 • Published • 48 -
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
Paper • 2412.05939 • Published • 16
-
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 106 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 369 -
YuLan-Mini: An Open Data-efficient Language Model
Paper • 2412.17743 • Published • 67
-
Differential Transformer
Paper • 2410.05258 • Published • 180 -
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Paper • 2410.20424 • Published • 41 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 125
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 62