YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Oolel: A High-Performing Open LLM for Wolof

Despite numerous open-source innovations in large language models, African languages have remained underrepresented.

Soynade Research is transforming this landscape with Oolel, the first open-source language model for Wolof.

Built on the Qwen 2.5 architecture, Oolel combines state-of-the-art AI technology with deep Wolof linguistic expertise. With careful high-quality curated data, we trained and optimized Oolel for the following tasks:

RAG supporting Wolof queries with English, French, or Wolof context.
Bidirectional translation between English and Wolof
Natural text generation in Wolof
Math in Wolof
And many other standard NLP tasks:
- Summarization
- Text edition
- etc

3. Usage

!!! It's important to add your system prompt !!!

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.


from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda" 

model = AutoModelForCausalLM.from_pretrained(
    "soynade-research/Oolel-v0.1",
    torch_dtype = torch.bfloat16,
    device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("soynade-research/Oolel-v0.1")

def generate_response(messages, max_new_tokens=1024, temperature=0.1):
    text = tokenizer.apply_chat_template(
          messages,
          tokenize=False,
          add_generation_prompt=True
)
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=max_new_tokens, temperature=temperature)
    
    generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

Some tasks examples:

Translation Tasks

system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Translate to Wolof: Bassirou Diomaye Faye is the new Senegalese president. He is 44 years old"}
]
print(generate_response(messages))

Code generation

system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries"
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Bindal ab klaas Python buy wone ni ñuy jëfandikoo dataframe yi ci Pandas"}
]
print(generate_response(messages))

Problem Solving

system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Ndax nga mën ma won ni ñuy resolver problème bii: Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër. Ceeb gi wenn kilo 500 CFA la, diw gi 1200 CFA kilo bi, sukër gi 750 CFA kilo bi. Ñaata la wara fay?"}
]
from pprint import pprint
pprint(generate_response(messages))

Text Generation (e.g. story generation)

system_prompt = "You are a skilled Wolof storyteller (Gewël) with deep knowledge of African folktales and traditions. Write engaging stories in Wolof that reflect African cultural values and wisdom."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Bindal ab léeb ci gaynde gi lekk muus mi"}
]
print(generate_response(messages, temperature=0.9))

Multi-turn conversations Oolel is not optimized for multi-turn conversations, but you can try it!

messages = [
   {"role": "user", "content": "Wax ma clan mooy CEDEAO ? Ci lan la liggeey?"},
   {"role": "assistant", "content": "CEDEAO mooy 'organisation' gu boole reew yi nekk ci pennc Afrika bi. Mu ngi sukkandiku ci wàll économie, politig, ak déggoo diggante reew yi"},
   {"role": "user", "content": "ñaata reew ñoo ci bokk?"}
]
print(generate_response(messages))

Authors

Yaya SY: NLP Researcher (Efficient Continued Pretraining)
Dioula DOUCOURE: Data & NLP Engineer

Downloads last month: 45

Safetensors

Model size

8B params

Tensor type

F32

Model tree for soynade-research/Oolel-v0.1

Quantizations

4 models

soynade-research
/

Oolel-v0.1

Oolel: A High-Performing Open LLM for Wolof

3. Usage

Authors

Model tree for soynade-research/Oolel-v0.1

Space using soynade-research/Oolel-v0.1 1