Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published 27 days ago • 23
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 56
xLAM models Collection xLAM: A Family of Large Action Models to Empower AI Agent Systems • 9 items • Updated 14 days ago • 40
LLM Compiler Collection Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated Jun 27 • 147
Mistral Bangla Collection A collection of Bangla Mistral 7B models fine-tuned for context-based question answering and Bengali retrieval-augmented generation. • 5 items • Updated May 25 • 1
Bangla Llama Collection A collection of Bangla Llama 3 8B models fine-tuned for context-based question answering and Bengali retrieval-augmented generation. • 6 items • Updated May 26 • 3
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien • May 23 • 14
🚀GGUF Collection Llama.cpp compatible models, can be used on CPUs and GPUs! • 698 items • Updated 4 days ago • 30
view article Article Train custom AI models with the trainer API and adapt them to 🤗 By not-lain • Jun 29 • 33
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA May 24, 2023 • 80
A Primer on the Inner Workings of Transformer-based Language Models Paper • 2405.00208 • Published Apr 30 • 10
Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate Paper • 2401.16788 • Published Jan 30 • 1
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 160
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 25
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 239
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24 • 56
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 250
Learning to Route Among Specialized Experts for Zero-Shot Generalization Paper • 2402.05859 • Published Feb 8 • 5
Zephyr ORPO Collection Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook • 3 items • Updated Apr 12 • 16
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Paper • 2404.05405 • Published Apr 8 • 7
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 75
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 14
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 103
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 63
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 24
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15 • 56
DRAGON Models Collection Production-grade RAG-optimized 6-7B parameter models - "Delivering RAG on ..." the leading foundation base models • 20 items • Updated 27 days ago • 44
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 23
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages Paper • 2403.01926 • Published Mar 4 • 1
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 10