WildEval

non-profit

wild_eval

WildEval

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yuntian-deng authored a paper 4 days ago

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

ChengsongHuang authored a paper about 1 month ago

POSS: Position Specialist Generates Better Draft for Speculative Decoding

DongfuJiang authored a paper about 2 months ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

View all activity

yuntian-deng

authored a paper 4 days ago

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

Paper • 2507.08800 • Published 8 days ago • 64

ChengsongHuang

authored a paper about 1 month ago

POSS: Position Specialist Generates Better Draft for Speculative Decoding

Paper • 2506.03566 • Published Jun 4 • 6

DongfuJiang

authored 2 papers about 2 months ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Paper • 2505.20139 • Published May 26 • 18

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22 • 42

yuntian-deng

authored a paper about 2 months ago

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Paper • 2505.15612 • Published May 21 • 34

DongfuJiang

authored a paper about 2 months ago

General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20 • 23

yuntian-deng

authored 3 papers 3 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 70

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Paper • 2407.17468 • Published Jul 24, 2024

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 71

yuchenlin

updated a Space 3 months ago

Zebra Logic Bench

🦓

Explore and evaluate Zebra Logic models

yuchenlin

authored a paper 3 months ago

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Paper • 2504.00043 • Published Mar 30 • 9

ChengsongHuang

authored a paper 3 months ago

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Paper • 2504.00043 • Published Mar 30 • 9

lasha-nlp

authored a paper 4 months ago

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models

Paper • 2503.12072 • Published Mar 15

ChengsongHuang

authored 4 papers 5 months ago

On Grounded Planning for Embodied Tasks with Language Models

Paper • 2209.00465 • Published Aug 29, 2022 • 1

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Paper • 2405.04086 • Published May 7, 2024 • 2

Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning

Paper • 2410.10074 • Published Oct 14, 2024 • 1

Efficient Test-Time Scaling via Self-Calibration

Paper • 2503.00031 • Published Feb 25 • 15

faezeb

authored a paper 5 months ago

Large-Scale Data Selection for Instruction Tuning

Paper • 2503.01807 • Published Mar 3 • 13

yuchenlin

authored a paper 5 months ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17 • 38

DongfuJiang

authored a paper 5 months ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3 • 29

AI & ML interests

Recent Activity

Team members 9

WildEval's activity

Zebra Logic Bench