git+https://github.com/huggingface/evaluate@main evaluate gradio>=5.34.2 datasets tqdm