Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets
Walia-LLM
is a fine-tuned LLaMA-2 model for the Amharic language, created by instruction tuning with task-specific and generative datasets. It is part of our effort to adapt and improve LLMs for low-resource languages.
This model was introduced in the EMNLP 2024 Findings paper:
Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets
Model Details
- Base model: LLaMA-2
- Fine-tuning method: Supervised fine-tuning (SFT) using LoRA
- Language: Amharic
- Tasks:
- Sentiment analysis
- Question answering
- Named entity recognition
- News classification
- Summarization
- Machine translation
- Poem/story/lyrics generation
- Spelling correction
Training Data
The model was trained on a custom instruction dataset derived from:
- Existing NLP benchmarks (e.g., AfriSenti, AmharicQA, MasakhaNER, MasakhaNews, XL-Sum)
- Manually collected generative datasets (e.g., religious lyrics, stories, poems)
- Translated instruction datasets (e.g., Alpaca, Dolly)
See EthioNLP/walia-amharic-instructions for the dataset used.
Intended Use
This model is intended for:
- Research on instruction tuning in low-resource languages
- Generative NLP tasks in Amharic
- Evaluating multilingual LLM capabilities
Limitations
- Some generative outputs may be verbose or imprecise.
- Limited understanding of highly specific Amharic poetic or lyrical structures.
- Spell correction and NER performance is still under exploration.
Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("EthioNLP/Amharic-LLAMA-all-data")
tokenizer = AutoTokenizer.from_pretrained("EthioNLP/Amharic-LLAMA-all-data")
prompt = "α΅α α ααα ααα αααα« α α
αα₯α’"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@inproceedings{azime-etal-2024-walia,
title = "Walia-{LLM}: Enhancing {A}mharic-{LL}a{MA} by Integrating Task-Specific and Generative Datasets",
author = "Azime, Israel Abebe and Tonja, Atnafu Lambebo and Belay, Tadesse Destaw and Fuge, Mitiku Yohannes and Wassie, Aman Kassahun and Jada, Eyasu Shiferaw and Chanie, Yonas and Sewunetie, Walelign Tewabe and Yimam, Seid Muhie",
editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-emnlp.25/",
doi = "10.18653/v1/2024.findings-emnlp.25",
pages = "432--444"
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support