Regarding the benchmark metrics, HLE(humanity last exam) should belong to higher-order reasoning, not the code domain.
· Sign up or log in to comment