sugatoray
's Collections
Papers-Fundamentals
updated
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
•
2104.09864
•
Published
•
13
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
62
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
62
Zero-Shot Tokenizer Transfer
Paper
•
2405.07883
•
Published
•
5
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
51
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
•
2406.06608
•
Published
•
64
Extreme Compression of Large Language Models via Additive Quantization
Paper
•
2401.06118
•
Published
•
13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
120
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full
Context Interaction
Paper
•
2401.17948
•
Published
•
4
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Paper
•
2405.20233
•
Published
•
6
Stream of Search (SoS): Learning to Search in Language
Paper
•
2404.03683
•
Published
•
32
Xmodel-2 Technical Report
Paper
•
2412.19638
•
Published
•
27
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
55
Foundations of Large Language Models
Paper
•
2501.09223
•
Published
•
3
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
391
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
•
2502.01534
•
Published
•
40
Levels of AGI: Operationalizing Progress on the Path to AGI
Paper
•
2311.02462
•
Published
•
37
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
116
A Survey on Post-training of Large Language Models
Paper
•
2503.06072
•
Published
•
4
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
•
2503.09573
•
Published
•
72
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
163
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
•
2503.21460
•
Published
•
77
rasbt/llama-3.2-from-scratch
A Survey on Inference Engines for Large Language Models: Perspectives on
Optimization and Efficiency
Paper
•
2505.01658
•
Published
•
31
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
•
2505.09343
•
Published
•
55