Collaborating on BigCodeBench Evaluator

#1
by terryyz - opened
BigCode org
edited Jul 23

Hi @abidlabs , do you (or the gradio team) want to collaborate on the BigCodeBench Evaluator? I'm trying to build a fully reproducible code execution environment, which helps users evaluate the generated code in real-time. We only provided the docker image before, and found users still had discrepant results run by different machines. The main reason is the various environment setups and configs.

When building a scalable space for real-time code execution, I face a few challenges: (1) supporting multiple users to execute their code in parallel in different clean sandboxes and (2) properly cleaning the finished sandbox and returning it to the pool. In addition, do you know how to only display the execution logs from the backend to the specific user? It seems that the current logs are for the public.

terryyz changed discussion status to closed

Sign up or log in to comment