Makandal Pre-trained
Model Details
This model has been created from scratch by Palmis Labs AI for educational purposes. This is the first base model we created to validate the process of model creation and demonstrate transformer architecture fundamentals to Haitian students.
Creation Proccess
Model Description
- Developed by: Palmis Labs AI
- Funded by: Jean Sauvenel Beaudry
- Model type: GPT (Generative Pre-trained Transformer)
- Language(s) (NLP): Haitian Creole
- License: MIT
- Model size: 111.9M parameters
- Architecture: GPT-2 style decoder-only transformer
Model Sources
- Repository: https://huggingface.co/jsbeaudry/makandal-pre-trained
- Paper: N/A (Educational project)
Uses
Direct Use
This model is designed for educational purposes to demonstrate the process of training a language model from scratch. It can generate text in Haitian Creole, though with significant limitations due to minimal training data and time.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
def generate(model, tokenizer, prompt, device):
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(device)
output = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
repetition_penalty=1.2,
no_repeat_ngram_size=3,
temperature=0.9,
top_k=40,
top_p=0.85,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("jsbeaudry/makandal-pre-trained")
model = AutoModelForCausalLM.from_pretrained("jsbeaudry/makandal-pre-trained")
# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Generate text
prompt = "Literati"
response = generate(model, tokenizer, prompt, device)
print(response)
# Answer:
# Literati Ayisyèn se yon aspè enpòtan nan kilti ak istwa peyi d Ayiti. Li konsène tout ekriti,
# yon seri de tèks, pwezi, woman, ak lòt fòm literè ki soti nan peyi a oswa ki ekri nan lang kreyòl
# ayisyen oswa franse. Literati sa a se yon repons kiltirèl ak sosyal ki reflete eksperyans pèp ayisyen an,
# tankou Ayiti. Lè nou pale de tèks literè pa sèlman konsantre sou fason pou konprann divèsite kiltirèl,
# nou ap diskite, nou ka gen yon priyorite, men li deplase atravè lemond pawòl yo te kòmanse
Out-of-Scope Use
This model should NOT be used for:
- Production applications
- Critical decision-making systems
- Any application requiring reliable or factual outputs
- Commercial deployment without significant additional training
Bias, Risks, and Limitations
⚠️ Educational Use Only: This model is intended solely for learning purposes and has significant limitations:
- Insufficient training data: Only 4.7 MB of training data used
- Limited training time: Only 1.5 hours of training
- High hallucination rate: Model frequently generates inaccurate or nonsensical content
- Language coverage: Limited Haitian Creole language understanding due to minimal dataset
- Bias: May reflect biases present in the small training dataset
Recommendations
- Use exclusively as an educational tool to teach students about transformer architecture and training processes
- Do not rely on outputs for factual information
- Supervise usage in educational settings
- Consider this a proof-of-concept for the technical pipeline rather than a functional language model
Training Details
Training Data
Dataset: 4.7 MB of STEM-focused plain text in Haitian Creole
Data Processing: Custom tokenizer with vocabulary size of 33,977 tokens
Training Procedure
Training Hyperparameters:
- Total Parameters: 111,853,824
- Training Steps: 5,600
- Training Time: 1.5 hours
- Epochs: 100
- Architecture Configuration:
n_layer
: 12n_head
: 12n_embd
: 768n_positions
: 1024n_ctx
: 1024activation_function
: "gelu_new"attn_pdrop
: 0.1embd_pdrop
: 0.1resid_pdrop
: 0.1initializer_range
: 0.02layer_norm_epsilon
: 1e-05
Training Infrastructure:
- GPU: Tesla T4 (15GB)
- Framework: Transformers/PyTorch
Training Loss Progression:
Step Training Loss
50 6.651700
100 5.100200
150 4.631900
200 4.264000
250 3.855000
300 3.556900
...
5500 0.020400
5550 0.020200
5600 0.019300
Environmental Impact
Training was conducted using a single Tesla T4 GPU for 1.5 hours, representing minimal computational resources and environmental impact compared to large-scale model training.
Technical Specifications
Model Architecture: GPT-2 style decoder-only transformer Precision: FP32 Framework Compatibility: Transformers, PyTorch
Citation
@misc{makandal2025,
title={Makandal-pretrain: An Educational Haitian Creole Language Model},
author={Jean Sauvenel Beaudry},
year={2025},
howpublished={\url{https://huggingface.co/jsbeaudry/makandal-pre-trained}},
note={Educational demonstration model}
}
Glossary
Makandal: Named after François Makandal, an 18th-century Haitian revolutionary leader, symbolizing the model's connection to Haitian culture and education.
Contact
For questions about this educational project, please visit the repository or contact Palmis Labs AI through the Hugging Face model page.
- Downloads last month
- 32
Model tree for jsbeaudry/makandal-pre-trained
Unable to build the model tree, the base model loops to the model itself. Learn more.