Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- regression
|
| 6 |
+
- xgboost
|
| 7 |
+
- embeddings
|
| 8 |
+
- soulprint
|
| 9 |
+
- sankofa
|
| 10 |
+
datasets:
|
| 11 |
+
- custom
|
| 12 |
+
metrics:
|
| 13 |
+
- mse
|
| 14 |
+
- rmse
|
| 15 |
+
- r2
|
| 16 |
+
model-index:
|
| 17 |
+
- name: Sankofa_xgb_model
|
| 18 |
+
results:
|
| 19 |
+
- task:
|
| 20 |
+
type: regression
|
| 21 |
+
name: Archetype Regression
|
| 22 |
+
metrics:
|
| 23 |
+
- name: MSE
|
| 24 |
+
type: mean_squared_error
|
| 25 |
+
value: 0.0143
|
| 26 |
+
- name: RMSE
|
| 27 |
+
type: root_mean_squared_error
|
| 28 |
+
value: 0.1198
|
| 29 |
+
- name: R²
|
| 30 |
+
type: r2_score
|
| 31 |
+
value: 0.824
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
# Sankofa XGBoost Regression Model
|
| 35 |
+
|
| 36 |
+
## 📖 Model Overview
|
| 37 |
+
The **Sankofa Regression Model** is part of the **Soulprint Archetype System**, designed to measure how strongly a given text reflects the values of the **Sankofa archetype**.
|
| 38 |
+
It uses **SentenceTransformer embeddings** (`all-mpnet-base-v2`) as input features and an **XGBoost regressor** trained on a **1,000-row curated dataset**.
|
| 39 |
+
|
| 40 |
+
- **Architecture**: SentenceTransformer embeddings + XGBoost regression
|
| 41 |
+
- **Output Range**: 0.0 → 1.0 (Sankofa alignment score)
|
| 42 |
+
- **Training Size**: 1,000 rows (balanced distribution)
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## 🌍 What is Sankofa?
|
| 47 |
+
The **Sankofa archetype** emphasizes **learning from the past, honoring ancestral wisdom, and applying history to guide future actions**.
|
| 48 |
+
- **High scores (0.7–1.0)**: Strong grounding in memory, reflection, and ancestral values
|
| 49 |
+
- **Mid scores (0.4–0.6)**: Some awareness of the past but shallow or inconsistent application
|
| 50 |
+
- **Low scores (0.0–0.3)**: Dismissal of history, impatience, or neglect of lessons from the past
|
| 51 |
+
|
| 52 |
+
---
|
| 53 |
+
|
| 54 |
+
## 📊 Training & Evaluation
|
| 55 |
+
|
| 56 |
+
**Training Methodology**:
|
| 57 |
+
- Inputs: Free-text statements
|
| 58 |
+
- Labels: Float scores (0.0 → 1.0) for Sankofa alignment
|
| 59 |
+
- Embeddings: `all-mpnet-base-v2` from SentenceTransformers
|
| 60 |
+
- Model: XGBoost regressor
|
| 61 |
+
|
| 62 |
+
**Results**:
|
| 63 |
+
- **MSE**: 0.0143
|
| 64 |
+
- **RMSE**: 0.1198
|
| 65 |
+
- **R²**: 0.824
|
| 66 |
+
|
| 67 |
+
This means predictions are typically within ±0.12 of the true score, explaining **82% of dataset variance**.
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
## 🚀 Intended Use
|
| 72 |
+
- Measuring **alignment of text to Sankofa archetype values**
|
| 73 |
+
- Research in **Soulprint archetypes & culturally rooted AI models**
|
| 74 |
+
- Applications in **AI agents, storytelling systems, and reflective analysis tools**
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## ⚠️ Limitations
|
| 79 |
+
- The dataset is **limited to 1,000 rows**; performance could improve with more data.
|
| 80 |
+
- The model is **specific to Sankofa** and should not be generalized to other archetypes.
|
| 81 |
+
- Interpretability is dependent on the embedding model (`all-mpnet-base-v2`).
|
| 82 |
+
|
| 83 |
+
---
|
| 84 |
+
|
| 85 |
+
## 💡 Example Usage
|
| 86 |
+
|
| 87 |
+
```python
|
| 88 |
+
import joblib
|
| 89 |
+
from sentence_transformers import SentenceTransformer
|
| 90 |
+
|
| 91 |
+
# Load model and embedder
|
| 92 |
+
model = joblib.load("Sankofa_xgb_model.pkl")
|
| 93 |
+
embedder = SentenceTransformer("all-mpnet-base-v2")
|
| 94 |
+
|
| 95 |
+
# Example text
|
| 96 |
+
text = "The community documented their struggles so future generations could learn."
|
| 97 |
+
embedding = embedder.encode([text])
|
| 98 |
+
score = model.predict(embedding)[0]
|
| 99 |
+
|
| 100 |
+
print("Predicted Sankofa Score:", round(float(score), 3))
|
| 101 |
+
```
|