McGill-NLP
/

gemma-2-9b-it-Injongo-slot

@@ -25,12 +25,49 @@ base_model:
 library_name: transformers
 metrics:
 - f1
 ---
 # INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
-<!--
 ## Evaluation Comparison
- -->
 ## Language Codes
 - **eng**: English
@@ -51,13 +88,6 @@ metrics:
 - **yor**: Yoruba
 - **zul**: Zulu
-## Notes
-- **Bold** values indicate the best performing scores in each category
-- The highlighted models (AfroXLMR 76L) show the top overall performance
-- Multi-lingual training generally outperforms in-language training
-- Standard deviations are reported alongside average scores
-- AVG doest not include english results.
 ### Citation
 ```

 library_name: transformers
 metrics:
 - f1
+tags:
+- llama-factory
+- full
+- generated_from_trainer
 ---
 # INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
 ## Evaluation Comparison
+Zero-Shot Performance of LLMs on Intent Detection and Slot Filling
+### Intent Detection
+*Evaluation based on accuracy. Average computed on five templates, and on only African languages.*
+| Model | eng | amh | ewe | hau | ibo | kin | lin | lug | orm | sna | sot | swa | twi | wol | xho | yor | zul | *AVG* |
+|-------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
+| Llama 3.1 8B | 27.6 | 1.9 | 2.1 | 4.8 | 5.5 | 3.3 | 5.3 | 2.4 | 1.6 | 2.8 | 2.9 | 14.1 | 2.6 | 4.0 | 3.2 | 3.5 | 2.8 | 3.9±2.4 |
+| Gemma 2 9B | 77.6 | 49.2 | 6.1 | 40.8 | 31.5 | 23.8 | 22.2 | 23.2 | 7.7 | 29.7 | 19.9 | 70.0 | 21.0 | 13.8 | 40.1 | 32.2 | 36.3 | 29.2±8.7 |
+| Aya-101 13B | 65.3 | 62.9 | 13.4 | 57.8 | 56.9 | 40.4 | 27.8 | 33.9 | 20.8 | 51.2 | 43.9 | 65.9 | 27.2 | 19.7 | 58.1 | 45.9 | 53.2 | 42.4±9.1 |
+| Gemma 2 27B | 79.5 | 47.2 | 6.3 | 46.5 | 36.9 | 26.7 | 27.5 | 26.1 | 5.8 | 36.7 | 25.6 | 75.5 | 21.2 | 16.4 | 50.2 | 34.8 | 44.3 | 33.0±9.6 |
+| Llama 3.3 70B | 81.1 | 56.2 | 9.5 | 52.3 | 52.4 | 35.0 | 37.5 | 37.7 | 12.4 | 32.3 | 30.5 | 80.6 | 29.3 | 20.9 | 43.5 | 41.4 | 43.9 | 38.5±9.5 |
+| Gemini 1.5 Pro | **81.8** | 77.9 | 24.3 | 74.8 | 65.4 | 61.5 | 54.6 | 59.3 | 39.3 | 68.6 | 51.6 | 83.2 | 47.2 | 25.6 | 76.2 | 66.8 | 68.7 | 59.1±9.6 |
+| GPT-4o (Aug) | 80.9 | 76.0 | 15.1 | 80.7 | 71.8 | 64.7 | 56.4 | 68.2 | 59.3 | 75.5 | 59.7 | 84.5 | 58.6 | 43.7 | 79.6 | 77.0 | 71.2 | 65.1±9.3 |
+| [Gemma 2 9B IT (SFT)](https://huggingface.co/McGill-NLP/gemma-2-9b-it-Injongo-intent) | 81.2 | **83.3** | **77.1** | **89.8** | **86.7** | **78.6** | **85.8** | **83.6** | **84.6** | **87.7** | **76.8** | **88.8** | **82.6** | **85.1** | **89.1** | **87.9** | **78.9** | **84.1** |
+### Slot Filling
+*Evaluation based on F1-score. Average computed on five templates, and on only African languages.*
+| Model | eng | amh | ewe | hau | ibo | kin | lin | lug | orm | sna | sot | swa | twi | wol | xho | yor | zul | *AVG* |
+|-------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
+| Llama 3.1 8B | 25.0 | 3.7 | 5.6 | 11.1 | 12.6 | 8.5 | 9.1 | 10.1 | 2.8 | 9.9 | 11.5 | 17.3 | 11.2 | 9.2 | 2.6 | 11.0 | 9.0 | 9.1±2.2 |
+| Gemma 2 IT 9B | 34.1 | 4.5 | 0.3 | 7.4 | 10.6 | 5.0 | 6.0 | 5.6 | 0.1 | 7.3 | 10.8 | 21.2 | 2.4 | 2.6 | 2.2 | 5.2 | 8.2 | 6.2±2.9 |
+| Aya-101 13B | 21.4 | 8.2 | 7.9 | 11.8 | 14.6 | 12.2 | 9.4 | 15.5 | 3.6 | 15.0 | 17.0 | 16.2 | 13.8 | 14.0 | 2.8 | 9.6 | 10.6 | 11.4±2.4 |
+| Gemma 2 IT 27B | 49.8 | 15.7 | 9.5 | 24.1 | 25.2 | 21.7 | 15.2 | 28.4 | 2.6 | 29.8 | 28.0 | 40.2 | 24.3 | 23.3 | 4.5 | 28.1 | 31.0 | 22.0±5.8 |
+| Llama 3.3 70B Instruct | 52.6 | 26.3 | 22.0 | 29.5 | 35.0 | 31.4 | 25.0 | 30.4 | 9.3 | 29.5 | 36.4 | 40.7 | 35.6 | 36.4 | 6.9 | 34.2 | 31.9 | 28.8±5.2 |
+| Gemini 1.5 Pro | 52.8 | 15.2 | 18.7 | 31.9 | 35.8 | 34.4 | 34.9 | 34.4 | 12.2 | 36.8 | 43.0 | 37.5 | 34.5 | 34.2 | 6.9 | 33.2 | 38.6 | 30.1±6.1 |
+| GPT-4o (Aug) | 55.4 | 22.8 | 19.4 | 37.8 | 38.9 | 36.4 | 33.5 | 35.3 | 13.0 | 40.2 | 40.9 | 46.5 | 40.1 | 37.9 | 10.0 | 42.4 | 37.6 | 33.3±6.0 |
+| [Gemma 2 9B IT (SFT)](https://huggingface.co/McGill-NLP/gemma-2-9b-it-Injongo-slot) | **80.6** | **80.7** | **82.0** | **92.2** | **81.3** | **75.5** | **88.5** | **85.8** | **81.1** | **82.5** | **77.2** | **87.7** | **86.3** | **82.9** | **89.6** | **88.4** | **68.8** | **83.1** |
+**Bold** values indicate the best performance for each language/metric.
 ## Language Codes
 - **eng**: English
 - **yor**: Yoruba
 - **zul**: Zulu
 ### Citation
 ```