Makandal Pre-trained

Model Details

This model has been created from scratch by Palmis Labs AI for educational purposes. This is the first base model we created to validate the process of model creation and demonstrate transformer architecture fundamentals to Haitian students.

Creation Proccess

All You Need To start

Model Description

Developed by: Palmis Labs AI
Funded by: Jean Sauvenel Beaudry
Model type: GPT (Generative Pre-trained Transformer)
Language(s) (NLP): Haitian Creole
License: MIT
Model size: 111.9M parameters
Architecture: GPT-2 style decoder-only transformer

Model Sources

Repository: https://huggingface.co/jsbeaudry/makandal-pre-trained
Paper: N/A (Educational project)

Uses

Direct Use

This model is designed for educational purposes to demonstrate the process of training a language model from scratch. It can generate text in Haitian Creole, though with significant limitations due to minimal training data and time.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

def generate(model, tokenizer, prompt, device):
    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(device)
    output = model.generate(
        **inputs,
        max_new_tokens=100,
        do_sample=True,
        repetition_penalty=1.2,
        no_repeat_ngram_size=3,
        temperature=0.9,
        top_k=40,
        top_p=0.85,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("jsbeaudry/makandal-pre-trained")
model = AutoModelForCausalLM.from_pretrained("jsbeaudry/makandal-pre-trained")

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Generate text
prompt = "Literati"
response = generate(model, tokenizer, prompt, device)
print(response)

# Answer:
# Literati Ayisyèn se yon aspè enpòtan nan kilti ak istwa peyi d Ayiti. Li konsène tout ekriti,
# yon seri de tèks, pwezi, woman, ak lòt fòm literè ki soti nan peyi a oswa ki ekri nan lang kreyòl
# ayisyen oswa franse. Literati sa a se yon repons kiltirèl ak sosyal ki reflete eksperyans pèp ayisyen an,
# tankou Ayiti. Lè nou pale de tèks literè pa sèlman konsantre sou fason pou konprann divèsite kiltirèl,
# nou ap diskite, nou ka gen yon priyorite, men li deplase atravè lemond pawòl yo te kòmanse

Out-of-Scope Use

This model should NOT be used for:

Production applications
Critical decision-making systems
Any application requiring reliable or factual outputs
Commercial deployment without significant additional training

Bias, Risks, and Limitations

⚠️ Educational Use Only: This model is intended solely for learning purposes and has significant limitations:

Insufficient training data: Only 4.7 MB of training data used
Limited training time: Only 1.5 hours of training
High hallucination rate: Model frequently generates inaccurate or nonsensical content
Language coverage: Limited Haitian Creole language understanding due to minimal dataset
Bias: May reflect biases present in the small training dataset

Recommendations

Use exclusively as an educational tool to teach students about transformer architecture and training processes
Do not rely on outputs for factual information
Supervise usage in educational settings
Consider this a proof-of-concept for the technical pipeline rather than a functional language model

Training Details

Training Data

Dataset: 4.7 MB of STEM-focused plain text in Haitian Creole

Data Processing: Custom tokenizer with vocabulary size of 33,977 tokens

Training Procedure

Training Hyperparameters:

Total Parameters: 111,853,824
Training Steps: 5,600
Training Time: 1.5 hours
Epochs: 100
Architecture Configuration:
- n_layer: 12
- n_head: 12
- n_embd: 768
- n_positions: 1024
- n_ctx: 1024
- activation_function: "gelu_new"
- attn_pdrop: 0.1
- embd_pdrop: 0.1
- resid_pdrop: 0.1
- initializer_range: 0.02
- layer_norm_epsilon: 1e-05

Training Infrastructure:

GPU: Tesla T4 (15GB)
Framework: Transformers/PyTorch

Training Loss Progression:

Step    Training Loss
50      6.651700
100     5.100200
150     4.631900
200     4.264000
250     3.855000
300     3.556900
...
5500    0.020400
5550    0.020200
5600    0.019300

Environmental Impact

Training was conducted using a single Tesla T4 GPU for 1.5 hours, representing minimal computational resources and environmental impact compared to large-scale model training.

Technical Specifications

Model Architecture: GPT-2 style decoder-only transformer Precision: FP32 Framework Compatibility: Transformers, PyTorch

Citation

@misc{makandal2025,
  title={Makandal-pretrain: An Educational Haitian Creole Language Model},
  author={Jean Sauvenel Beaudry},
  year={2025},
  howpublished={\url{https://huggingface.co/jsbeaudry/makandal-pre-trained}},
  note={Educational demonstration model}
}

Glossary

Makandal: Named after François Makandal, an 18th-century Haitian revolutionary leader, symbolizing the model's connection to Haitian culture and education.

Contact

For questions about this educational project, please visit the repository or contact Palmis Labs AI through the Hugging Face model page.

jsbeaudry
/

makandal-pre-trained