Chiung-Yi's picture

10 2 5

Chiung-Yi

Chiung-Yi

·

AI & ML interests

AI for math

Organizations

authored 4 papers 2 months ago

When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

Paper • 2509.20293 • Published Sep 24, 2025 • 7

Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models

Paper • 2508.12461 • Published Aug 17, 2025 • 2

StreetMath: Study of LLMs' Approximation Behaviors

Paper • 2510.25776 • Published Oct 27, 2025 • 4

Active Learning Methods for Efficient Data Utilization and Model Performance Enhancement

Paper • 2504.16136 • Published Apr 21, 2025