innovation64
's Collections
papaer selecting
updated
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
•
2402.14083
•
Published
•
47
Linear Transformers are Versatile In-Context Learners
Paper
•
2402.14180
•
Published
•
6
Training-Free Long-Context Scaling of Large Language Models
Paper
•
2402.17463
•
Published
•
20
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
607
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper
•
2402.17753
•
Published
•
19
Resonance RoPE: Improving Context Length Generalization of Large
Language Models
Paper
•
2403.00071
•
Published
•
23
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
•
2403.03853
•
Published
•
62
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
184
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
•
2403.03163
•
Published
•
94
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
•
2309.08968
•
Published
•
22
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
126
Evaluating Frontier Models for Dangerous Capabilities
Paper
•
2403.13793
•
Published
•
7
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
•
2403.17887
•
Published
•
79
Clover: Regressive Lightweight Speculative Decoding with Sequential
Knowledge
Paper
•
2405.00263
•
Published
•
16
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model
Editing with Llama-3
Paper
•
2405.00664
•
Published
•
20
Prometheus 2: An Open Source Language Model Specialized in Evaluating
Other Language Models
Paper
•
2405.01535
•
Published
•
121
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper
•
2405.09215
•
Published
•
20
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in
Language Models
Paper
•
2405.09220
•
Published
•
27
LoRA Learns Less and Forgets Less
Paper
•
2405.09673
•
Published
•
88
Layer-Condensed KV Cache for Efficient Inference of Large Language
Models
Paper
•
2405.10637
•
Published
•
21
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
47
2BP: 2-Stage Backpropagation
Paper
•
2405.18047
•
Published
•
23
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
•
2405.20541
•
Published
•
22
4-bit Shampoo for Memory-Efficient Network Training
Paper
•
2405.18144
•
Published
•
9
Transformers meet Neural Algorithmic Reasoners
Paper
•
2406.09308
•
Published
•
44
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper
•
2406.09170
•
Published
•
26