Spaces:
				
			
			
	
			
			
		Runtime error
		
	
	
	
			
			
	
	
	
	
		
		
		Runtime error
		
	Update content.py
Browse files- content.py +2 -2
    	
        content.py
    CHANGED
    
    | @@ -14,8 +14,8 @@ Both multilingual and language-specific LLMs are welcome in this leaderboard. | |
| 14 | 
             
            We currently evaluate models over four benchmarks:
         | 
| 15 |  | 
| 16 | 
             
            - <a href="https://arxiv.org/abs/1803.05457" target="_blank">  AI2 Reasoning Challenge </a> (25-shot) 
         | 
| 17 | 
            -
            - <a href="https://arxiv.org/abs/1905.07830" target="_blank">  HellaSwag </a> ( | 
| 18 | 
            -
            - <a href="https://arxiv.org/abs/2009.03300" target="_blank">  MMLU </a>  ( | 
| 19 | 
             
            - <a href="https://arxiv.org/abs/2109.07958" target="_blank">  TruthfulQA </a> (0-shot)
         | 
| 20 |  | 
| 21 | 
             
            The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).
         | 
|  | |
| 14 | 
             
            We currently evaluate models over four benchmarks:
         | 
| 15 |  | 
| 16 | 
             
            - <a href="https://arxiv.org/abs/1803.05457" target="_blank">  AI2 Reasoning Challenge </a> (25-shot) 
         | 
| 17 | 
            +
            - <a href="https://arxiv.org/abs/1905.07830" target="_blank">  HellaSwag </a> (0-shot) 
         | 
| 18 | 
            +
            - <a href="https://arxiv.org/abs/2009.03300" target="_blank">  MMLU </a>  (25-shot) 
         | 
| 19 | 
             
            - <a href="https://arxiv.org/abs/2109.07958" target="_blank">  TruthfulQA </a> (0-shot)
         | 
| 20 |  | 
| 21 | 
             
            The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).
         | 
