llms for me - a maxinwalk Collection

maxinwalk 's Collections

robot

llms for me

updated Nov 27, 2024

Evaluating Very Long-Term Conversational Memory of LLM Agents

Paper • 2402.17753 • Published Feb 27, 2024 • 20
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Paper • 2402.16671 • Published Feb 26, 2024 • 29
Do Large Language Models Latently Perform Multi-Hop Reasoning?

Paper • 2402.16837 • Published Feb 26, 2024 • 27
Divide-or-Conquer? Which Part Should You Distill Your LLM?

Paper • 2402.15000 • Published Feb 22, 2024 • 23
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19, 2024 • 19
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23, 2024 • 16
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Paper • 2402.14658 • Published Feb 22, 2024 • 82
AgentScope: A Flexible yet Robust Multi-Agent Platform

Paper • 2402.14034 • Published Feb 21, 2024 • 13
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 11
User-LLM: Efficient LLM Contextualization with User Embeddings

Paper • 2402.13598 • Published Feb 21, 2024 • 20
Coercing LLMs to do and reveal (almost) anything

Paper • 2402.14020 • Published Feb 21, 2024 • 13
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 13
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

Paper • 2402.10963 • Published Feb 13, 2024 • 12
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Paper • 2402.10466 • Published Feb 16, 2024 • 19
RLVF: Learning from Verbal Feedback without Overgeneralization

Paper • 2402.10893 • Published Feb 16, 2024 • 12
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
ReGAL: Refactoring Programs to Discover Generalizable Abstractions

Paper • 2401.16467 • Published Jan 29, 2024 • 10

Note 增加代码模板可以提高代码生成的准确性和稳定性
Capture the Flag: Uncovering Data Insights with Large Language Models

Paper • 2312.13876 • Published Dec 21, 2023 • 1
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 66
Recourse for reclamation: Chatting with generative language models

Paper • 2403.14467 • Published Mar 21, 2024 • 8
AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models

Paper • 2403.15157 • Published Mar 22, 2024 • 10
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 33
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 45
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 37
Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2, 2024 • 58
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 25
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Scaling Instructable Agents Across Many Simulated Worlds

Paper • 2404.10179 • Published Mar 13, 2024 • 28
How Far Can We Go with Practical Function-Level Program Repair?

Paper • 2404.12833 • Published Apr 19, 2024 • 7
INDUS: Effective and Efficient Language Models for Scientific Applications

Paper • 2405.10725 • Published May 17, 2024 • 35
TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 31
LLMs Do Not Think Step-by-step In Implicit Reasoning

Paper • 2411.15862 • Published Nov 24, 2024 • 10