More detailed usage examples are needed.

#3
by lycfight - opened

I couldn’t find any documentation explaining how to use it. I used OpenHands to generate an output.jsonl file for SWE-bench_Verified, and now I’d like to evaluate it using all-hands/openhands-critic-32b-exp-20250417.

It seems that a specific version of vLLM is required for this model, but I’m not sure how to use it—should the evaluation be performed on the final patch only, or on the entire trajectory?

Could you provide more detailed documentation and examples?

Sign up or log in to comment