EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria Paper • 2309.13633 • Published Sep 24, 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper • 2310.08491 • Published Oct 12, 2023 • 55
Aligning Large Language Models through Synthetic Feedback Paper • 2305.13735 • Published May 23, 2023 • 1
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning Paper • 2305.14045 • Published May 23, 2023 • 5
Who Wrote this Code? Watermarking for Code Generation Paper • 2305.15060 • Published May 24, 2023 • 1
Dialogue Summaries as Dialogue States (DS2), Template-Guided Summarization for Few-shot Dialogue State Tracking Paper • 2203.01552 • Published Mar 3, 2022
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2, 2024 • 123
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9, 2024 • 3