Spaces:

OALL
/

Open-Arabic-LLM-Leaderboard

Running on CPU Upgrade

App Files Files

xet

Community

Model eval request FAILED ... how do we know the root cause?

by karimouda - opened Jul 28, 2024

Discussion

karimouda

Jul 28, 2024

I requested an eval 3 times for our new model but all failed although the eval runs successfully using lighteval

Lighteval Run Commands

!git clone https://github.com/huggingface/lighteval.git
%cd lighteval
!pip install -e . && pip install accelerate
!wget https://raw.githubusercontent.com/huggingface/lighteval/main/examples/tasks/all_arabic_tasks.txt -O examples/tasks/all_arabic_tasks.txt
%env HF_DATASETS_TRUST_REMOTE_CODE=1
!accelerate launch -m
lighteval accelerate
--model_args="pretrained=silma-ai/SILMA-9B-Instruct-v0.1.1,trust_remote_code=True"
--custom_tasks community_tasks/arabic_evals.py
--tasks examples/tasks/all_arabic_tasks.txt
--override_batch_size 1 --save_details --output_dir="./output_gpt2"

Model request file below
https://huggingface.co/datasets/OALL/requests/commit/c6a182a11b637ed7787bbedab46f63d5c690f1a9

My question: How can we determine the cause of the failure on your side so we could resolve the issue?

alielfilali01

Open Arabic LLM Leaderboard org Jul 30, 2024

Hey @karimouda ,
Apologies for the late reply, well i see that the model is based on Gemma2 9B which also fails to run (we are still investigating the issue)
The main issue is that you are launching your evals with the trust_remote_code=True tag which we don't support !

karimouda

Jul 30, 2024

Thanks Ali for your response. Is there anything we could do on our side to make it work or we should wait until the Gemma2 issue is resolved ?

Also as far as I understood, the trust_remote_code=True is mandatory for the Arabic datasets used in Lighteval, is there a way we could the run the eval without it?

alielfilali01 changed discussion status to closed Dec 10, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment