Skill Level Classifier (XGBoost, v2)
What it does: Predicts an entrepreneur’s skill level — Low
, Medium
, or High
— from eight 1–10 numeric features.
Why it’s here: Compact, fast, and easy to deploy for tabular inference. Trained with early stopping; no target leakage.
Files in this repo
xgb_model_Skill_Level_v2.json
— trained XGBoost modelfeature_order_Skill_Level_v2.json
— list of feature names (order-sensitive)label_map_Skill_Level_v2.json
— mapping from class name → index (don’t assume order)
Input features (1–10 scale)
All inputs are integers or floats in the range 1–10.
years_experience_score
education_training_score
execution_ability_score
problem_solving_score
confidence_score
idea_difficulty_score
leadership_score
networking_score
Feature definitions (what 1, 5, 10 roughly mean)
years_experience_score — Practical experience in entrepreneurial, business, or technical work.
1: none · 5: ~3–4 years/moderate exposure · 10: 10+ years/expert-level track recordeducation_training_score — Formal or informal training related to business/entrepreneurship/tech.
1: no training · 5: some courses/undergrad/bootcamps · 10: advanced degrees/certifications/ongoing educationexecution_ability_score — Ability to independently complete tasks and ship projects.
1: needs step-by-step guidance · 5: handles medium complexity with minimal help · 10: ships complex projects reliably, end-to-endproblem_solving_score — Adaptability and effectiveness at diagnosing/solving issues.
1: gets stuck frequently · 5: resolves common issues with some effort · 10: quickly solves complex/novel problemsconfidence_score — Self-efficacy and decisiveness applying skills in practice.
1: not confident · 5: moderately confident · 10: highly confident and decisive under uncertaintyidea_difficulty_score — Complexity/ambition of the current venture or idea.
1: very simple/small scope · 5: moderate (e.g., niche app) · 10: highly complex (e.g., AI/biotech/multi-sided marketplace)leadership_score — Experience leading people, projects, or cross-functional efforts.
1: no leadership experience · 5: led small teams/projects · 10: extensive leadership of large/complex teamsnetworking_score — Ability to leverage mentors, partnerships, and resources/funding.
1: minimal network/isolated · 5: active connections and events · 10: strong network; partnerships/fundraising proficiency
Note: The dataset also contains a
skill_level_readiness
(1–10) field, but it is not used as an input in this v2 model to avoid target leakage.
Quickstart — Load from Hub & predict
# pip install xgboost pandas huggingface_hub
from huggingface_hub import hf_hub_download
from xgboost import XGBClassifier
import json, pandas as pd, numpy as np
REPO_ID = "mjpsm/Skill-Level-XGB-v2" # change to your repo id if you fork
# Download artifacts
model_file = hf_hub_download(REPO_ID, "xgb_model_Skill_Level_v2.json")
features_file = hf_hub_download(REPO_ID, "feature_order_Skill_Level_v2.json")
labelmap_file = hf_hub_download(REPO_ID, "label_map_Skill_Level_v2.json")
with open(features_file) as f: FEATURE_COLS = json.load(f)
with open(labelmap_file) as f: LABEL_MAP = json.load(f) # e.g., {"High":0,"Low":1,"Medium":2}
INV = {v:k for k,v in LABEL_MAP.items()}
# Load model
clf = XGBClassifier()
clf.load_model(model_file)
# Single example
example = {
"years_experience_score": 6,
"education_training_score": 7,
"execution_ability_score": 7,
"problem_solving_score": 7,
"confidence_score": 7,
"idea_difficulty_score": 6,
"leadership_score": 6,
"networking_score": 6
}
X = pd.DataFrame([example], columns=FEATURE_COLS).astype("float32").values
probs = clf.predict_proba(X)[0]
pred = int(np.argmax(probs))
print("Predicted class:", INV[pred])
print("Class probabilities:", {INV[i]: float(probs[i]) for i in range(len(probs))})
Evaluation results
- accuracy on skill_level_dataset_1998 (synthetic, balanced)self-reported0.993
- macro_f1 on skill_level_dataset_1998 (synthetic, balanced)self-reported0.993
- log_loss on skill_level_dataset_1998 (synthetic, balanced)self-reported0.034