Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.14219

Foundation Models

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1 • 80
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 60
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 30
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 56

Papers - Microsoft

Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22 • 32
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

Paper • 2403.19655 • Published Mar 28 • 18
WavLLM: Towards Robust and Adaptive Speech Large Language Model

Paper • 2404.00656 • Published Mar 31 • 10
Enabling Memory Safety of C Programs using LLMs

Paper • 2404.01096 • Published Apr 1 • 1

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 48
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 39
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12 • 39
microsoft/phi-1_5

Text Generation • Updated Apr 29 • 131k • 1.31k
Language models scale reliably with over-training and on downstream tasks

Paper • 2403.08540 • Published Mar 13 • 14
Akashpb13/Swahili_xlsr

Automatic Speech Recognition • Updated Aug 27, 2023 • 15 • 8

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12 • 75
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 253
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118
Synthesizing Text-to-SQL Data from Weak and Strong LLMs

Paper • 2408.03256 • Published Aug 6 • 10

(Continued) Pretraining

Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 77
Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 47
Simple and Scalable Strategies to Continually Pre-train Large Language Models

Paper • 2403.08763 • Published Mar 13 • 48
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5 • 41

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 67
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 62
InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26 • 29

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 253
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 73

ro-h/regulatory_comments

Updated Mar 21 • 284 • 48
EarthnDusk/Loras

Updated Aug 31 • 2
Running on CPU Upgrade

8.59k

👩‍🎨

AI Comic Factory

Create your own AI comic with a single prompt
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 253

LLM_architectures

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26 • 42
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 14
Reformer: The Efficient Transformer

Paper • 2001.04451 • Published Jan 13, 2020

Previous
1
...
3
4
5
6
7
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs