bigcode/bigcodebench-evaluator · Collaborating on BigCodeBench Evaluator

Hi @abidlabs , do you (or the gradio team) want to collaborate on the BigCodeBench Evaluator? I'm trying to build a fully reproducible code execution environment, which helps users evaluate the generated code in real-time. We only provided the docker image before, and found users still had discrepant results run by different machines. The main reason is the various environment setups and configs.

When building a scalable space for real-time code execution, I face a few challenges: (1) supporting multiple users to execute their code in parallel in different clean sandboxes and (2) properly cleaning the finished sandbox and returning it to the pool. In addition, do you know how to only display the execution logs from the backend to the specific user? It seems that the current logs are for the public.