Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.04132

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7 • 38
Evaluating Very Long-Term Conversational Memory of LLM Agents

Paper • 2402.17753 • Published Feb 27 • 18
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20 • 16
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20 • 10

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7 • 38

daily_paper_coll

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 49
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 136
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

Paper • 2402.10986 • Published Feb 16 • 76
Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20 • 29
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7 • 38
SaulLM-7B: A pioneering Large Language Model for Law

Paper • 2403.03883 • Published Mar 6 • 75

Unicron: Economizing Self-Healing LLM Training at Scale

Paper • 2401.00134 • Published Dec 30, 2023 • 9
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

Paper • 2401.00788 • Published Jan 1 • 21
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Paper • 2401.04398 • Published Jan 9 • 21
The Impact of Reasoning Step Length on Large Language Models

Paper • 2401.04925 • Published Jan 10 • 16

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5 • 11
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Paper • 2305.01210 • Published May 2, 2023 • 4
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models

Paper • 2309.06495 • Published Sep 5, 2023 • 1
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 30
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 26
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Paper • 2401.01974 • Published Jan 3 • 5
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 27

Levels of AGI: Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 33
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
A Survey on Evaluation of Large Language Models

Paper • 2307.03109 • Published Jul 6, 2023 • 42
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Paper • 2306.13651 • Published Jun 23, 2023 • 15

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs