Sankofa XGBoost Regression Model

📖 Model Overview

The Sankofa Regression Model is part of the Soulprint Archetype System, designed to measure how strongly a given text reflects the values of the Sankofa archetype.
It uses SentenceTransformer embeddings (all-mpnet-base-v2) as input features and an XGBoost regressor trained on a 1,000-row curated dataset.

Architecture: SentenceTransformer embeddings + XGBoost regression
Output Range: 0.0 → 1.0 (Sankofa alignment score)
Training Size: 1,000 rows (balanced distribution)

🌍 What is Sankofa?

The Sankofa archetype emphasizes learning from the past, honoring ancestral wisdom, and applying history to guide future actions.

High scores (0.7–1.0): Strong grounding in memory, reflection, and ancestral values
Mid scores (0.4–0.6): Some awareness of the past but shallow or inconsistent application
Low scores (0.0–0.3): Dismissal of history, impatience, or neglect of lessons from the past

📊 Training & Evaluation

Training Methodology:

Inputs: Free-text statements
Labels: Float scores (0.0 → 1.0) for Sankofa alignment
Embeddings: all-mpnet-base-v2 from SentenceTransformers
Model: XGBoost regressor

Results:

MSE: 0.0143
RMSE: 0.1198
R²: 0.824

This means predictions are typically within ±0.12 of the true score, explaining 82% of dataset variance.

🚀 Intended Use

Measuring alignment of text to Sankofa archetype values
Research in Soulprint archetypes & culturally rooted AI models
Applications in AI agents, storytelling systems, and reflective analysis tools

⚠️ Limitations

The dataset is limited to 1,000 rows; performance could improve with more data.
The model is specific to Sankofa and should not be generalized to other archetypes.
Interpretability is dependent on the embedding model (all-mpnet-base-v2).

💡 Example Usage

import joblib
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download

# -----------------------------
# 1. Download model from Hugging Face Hub
# -----------------------------
REPO_ID = "mjpsm/Sankofa-xgb-model"
FILENAME = "Sankofa_xgb_model.pkl"

model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# -----------------------------
# 2. Load model + embedder
# -----------------------------
model = joblib.load(model_path)
embedder = SentenceTransformer("all-mpnet-base-v2")

# -----------------------------
# 3. Example prediction
# -----------------------------
text = "The group studied old archives before planning, ensuring past mistakes were not repeated."
embedding = embedder.encode([text])
score = model.predict(embedding)[0]

print("Predicted Sankofa Score:", round(float(score), 3))

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using mjpsm/Sankofa-xgb-model 1

Evaluation results

MSE
self-reported

0.014
RMSE
self-reported

0.120
R²
self-reported

0.824

Metadata error: specify a dataset to view leaderboard