Makandal Pre-trained

Model Details

This model has been created from scratch by Palmis Labs AI for educational purposes. This is the first base model we created to validate the process of model creation and demonstrate transformer architecture fundamentals to Haitian students.

Creation Proccess

All You Need To start

Model Description

  • Developed by: Palmis Labs AI
  • Funded by: Jean Sauvenel Beaudry
  • Model type: GPT (Generative Pre-trained Transformer)
  • Language(s) (NLP): Haitian Creole
  • License: MIT
  • Model size: 111.9M parameters
  • Architecture: GPT-2 style decoder-only transformer

Model Sources

Uses

Direct Use

This model is designed for educational purposes to demonstrate the process of training a language model from scratch. It can generate text in Haitian Creole, though with significant limitations due to minimal training data and time.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

def generate(model, tokenizer, prompt, device):
    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(device)
    output = model.generate(
        **inputs,
        max_new_tokens=100,
        do_sample=True,
        repetition_penalty=1.2,
        no_repeat_ngram_size=3,
        temperature=0.9,
        top_k=40,
        top_p=0.85,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("jsbeaudry/makandal-pre-trained")
model = AutoModelForCausalLM.from_pretrained("jsbeaudry/makandal-pre-trained")

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Generate text
prompt = "Literati"
response = generate(model, tokenizer, prompt, device)
print(response)

# Answer:
# Literati Ayisyèn se yon aspè enpòtan nan kilti ak istwa peyi d Ayiti. Li konsène tout ekriti,
# yon seri de tèks, pwezi, woman, ak lòt fòm literè ki soti nan peyi a oswa ki ekri nan lang kreyòl
# ayisyen oswa franse. Literati sa a se yon repons kiltirèl ak sosyal ki reflete eksperyans pèp ayisyen an,
# tankou Ayiti. Lè nou pale de tèks literè pa sèlman konsantre sou fason pou konprann divèsite kiltirèl,
# nou ap diskite, nou ka gen yon priyorite, men li deplase atravè lemond pawòl yo te kòmanse

Out-of-Scope Use

This model should NOT be used for:

  • Production applications
  • Critical decision-making systems
  • Any application requiring reliable or factual outputs
  • Commercial deployment without significant additional training

Bias, Risks, and Limitations

⚠️ Educational Use Only: This model is intended solely for learning purposes and has significant limitations:

  • Insufficient training data: Only 4.7 MB of training data used
  • Limited training time: Only 1.5 hours of training
  • High hallucination rate: Model frequently generates inaccurate or nonsensical content
  • Language coverage: Limited Haitian Creole language understanding due to minimal dataset
  • Bias: May reflect biases present in the small training dataset

Recommendations

  • Use exclusively as an educational tool to teach students about transformer architecture and training processes
  • Do not rely on outputs for factual information
  • Supervise usage in educational settings
  • Consider this a proof-of-concept for the technical pipeline rather than a functional language model

Training Details

Training Data

Dataset: 4.7 MB of STEM-focused plain text in Haitian Creole

Data Processing: Custom tokenizer with vocabulary size of 33,977 tokens

Training Procedure

Training Hyperparameters:

  • Total Parameters: 111,853,824
  • Training Steps: 5,600
  • Training Time: 1.5 hours
  • Epochs: 100
  • Architecture Configuration:
    • n_layer: 12
    • n_head: 12
    • n_embd: 768
    • n_positions: 1024
    • n_ctx: 1024
    • activation_function: "gelu_new"
    • attn_pdrop: 0.1
    • embd_pdrop: 0.1
    • resid_pdrop: 0.1
    • initializer_range: 0.02
    • layer_norm_epsilon: 1e-05

Training Infrastructure:

  • GPU: Tesla T4 (15GB)
  • Framework: Transformers/PyTorch

Training Loss Progression:

Step    Training Loss
50      6.651700
100     5.100200
150     4.631900
200     4.264000
250     3.855000
300     3.556900
...
5500    0.020400
5550    0.020200
5600    0.019300

Environmental Impact

Training was conducted using a single Tesla T4 GPU for 1.5 hours, representing minimal computational resources and environmental impact compared to large-scale model training.

Technical Specifications

Model Architecture: GPT-2 style decoder-only transformer Precision: FP32 Framework Compatibility: Transformers, PyTorch

Citation

@misc{makandal2025,
  title={Makandal-pretrain: An Educational Haitian Creole Language Model},
  author={Jean Sauvenel Beaudry},
  year={2025},
  howpublished={\url{https://huggingface.co/jsbeaudry/makandal-pre-trained}},
  note={Educational demonstration model}
}

Glossary

Makandal: Named after François Makandal, an 18th-century Haitian revolutionary leader, symbolizing the model's connection to Haitian culture and education.

Contact

For questions about this educational project, please visit the repository or contact Palmis Labs AI through the Hugging Face model page.

Downloads last month
32
Safetensors
Model size
112M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsbeaudry/makandal-pre-trained

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using jsbeaudry/makandal-pre-trained 1