mjpsm commited on
Commit
ba3dea3
·
verified ·
1 Parent(s): b9c9af3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - regression
6
+ - xgboost
7
+ - embeddings
8
+ - soulprint
9
+ - sankofa
10
+ datasets:
11
+ - custom
12
+ metrics:
13
+ - mse
14
+ - rmse
15
+ - r2
16
+ model-index:
17
+ - name: Sankofa_xgb_model
18
+ results:
19
+ - task:
20
+ type: regression
21
+ name: Archetype Regression
22
+ metrics:
23
+ - name: MSE
24
+ type: mean_squared_error
25
+ value: 0.0143
26
+ - name: RMSE
27
+ type: root_mean_squared_error
28
+ value: 0.1198
29
+ - name: R²
30
+ type: r2_score
31
+ value: 0.824
32
+ ---
33
+
34
+ # Sankofa XGBoost Regression Model
35
+
36
+ ## 📖 Model Overview
37
+ The **Sankofa Regression Model** is part of the **Soulprint Archetype System**, designed to measure how strongly a given text reflects the values of the **Sankofa archetype**.
38
+ It uses **SentenceTransformer embeddings** (`all-mpnet-base-v2`) as input features and an **XGBoost regressor** trained on a **1,000-row curated dataset**.
39
+
40
+ - **Architecture**: SentenceTransformer embeddings + XGBoost regression
41
+ - **Output Range**: 0.0 → 1.0 (Sankofa alignment score)
42
+ - **Training Size**: 1,000 rows (balanced distribution)
43
+
44
+ ---
45
+
46
+ ## 🌍 What is Sankofa?
47
+ The **Sankofa archetype** emphasizes **learning from the past, honoring ancestral wisdom, and applying history to guide future actions**.
48
+ - **High scores (0.7–1.0)**: Strong grounding in memory, reflection, and ancestral values
49
+ - **Mid scores (0.4–0.6)**: Some awareness of the past but shallow or inconsistent application
50
+ - **Low scores (0.0–0.3)**: Dismissal of history, impatience, or neglect of lessons from the past
51
+
52
+ ---
53
+
54
+ ## 📊 Training & Evaluation
55
+
56
+ **Training Methodology**:
57
+ - Inputs: Free-text statements
58
+ - Labels: Float scores (0.0 → 1.0) for Sankofa alignment
59
+ - Embeddings: `all-mpnet-base-v2` from SentenceTransformers
60
+ - Model: XGBoost regressor
61
+
62
+ **Results**:
63
+ - **MSE**: 0.0143
64
+ - **RMSE**: 0.1198
65
+ - **R²**: 0.824
66
+
67
+ This means predictions are typically within ±0.12 of the true score, explaining **82% of dataset variance**.
68
+
69
+ ---
70
+
71
+ ## 🚀 Intended Use
72
+ - Measuring **alignment of text to Sankofa archetype values**
73
+ - Research in **Soulprint archetypes & culturally rooted AI models**
74
+ - Applications in **AI agents, storytelling systems, and reflective analysis tools**
75
+
76
+ ---
77
+
78
+ ## ⚠️ Limitations
79
+ - The dataset is **limited to 1,000 rows**; performance could improve with more data.
80
+ - The model is **specific to Sankofa** and should not be generalized to other archetypes.
81
+ - Interpretability is dependent on the embedding model (`all-mpnet-base-v2`).
82
+
83
+ ---
84
+
85
+ ## 💡 Example Usage
86
+
87
+ ```python
88
+ import joblib
89
+ from sentence_transformers import SentenceTransformer
90
+
91
+ # Load model and embedder
92
+ model = joblib.load("Sankofa_xgb_model.pkl")
93
+ embedder = SentenceTransformer("all-mpnet-base-v2")
94
+
95
+ # Example text
96
+ text = "The community documented their struggles so future generations could learn."
97
+ embedding = embedder.encode([text])
98
+ score = model.predict(embedding)[0]
99
+
100
+ print("Predicted Sankofa Score:", round(float(score), 3))
101
+ ```