MultiUI Meta Data

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

yuanshengni authored a paper 6 days ago

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

yuanshengni authored a paper 14 days ago

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

Solaris99 authored a paper 29 days ago

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

View all activity

MultiUI-Meta's activity

yuanshengni

authored a paper 6 days ago

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Paper • 2506.03930 • Published 7 days ago • 22

yuanshengni

authored a paper 14 days ago

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

Paper • 2505.15929 • Published 20 days ago • 48

Solaris99

authored 4 papers 29 days ago

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

Paper • 2404.05955 • Published Apr 9, 2024

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15, 2024 • 25

AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories

Paper • 2410.07706 • Published Oct 10, 2024

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published 29 days ago • 79

yuanshengni

authored a paper 4 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 103

yuanshengni

authored 2 papers 8 months ago

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Paper • 2406.05862 • Published Jun 9, 2024 • 4

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14, 2024 • 39

yuanshengni

authored a paper 9 months ago

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 32

yuanshengni

authored a paper 12 months ago

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21, 2024 • 18

yuanshengni

authored 2 papers about 1 year ago

GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6, 2024 • 23

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 47

Solaris99

authored a paper over 1 year ago

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

Paper • 2403.02502 • Published Mar 4, 2024 • 3

yuanshengni

authored 2 papers over 1 year ago

A Comprehensive Study of Knowledge Editing for Large Language Models

Paper • 2401.01286 • Published Jan 2, 2024 • 21

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35

AI & ML interests

Recent Activity

Team members 2

MultiUI-Meta's activity