mjpsm/Idea-Difficulty-XGB
π§Ύ Overview
This model predicts the difficulty of a business idea as Low
, Medium
, or High
.
It is part of the Entrepreneurial Readiness series of tabular classifiers (alongside Skill Level, Risk Tolerance, and Confidence).
The model was trained with XGBoost on a 2,000-row synthetic dataset of structured features that capture common difficulty drivers.
π₯ Input Features
Feature | Type | Range | Definition |
---|---|---|---|
capital_required |
int | 1β10 | How much upfront capital is needed (1 = minimal, 10 = very high) |
technical_complexity |
int | 1β10 | How technically difficult the product/service is to build or maintain |
market_competition |
int | 1β10 | How crowded the target market is with competitors |
customer_acquisition_difficulty |
int | 1β10 | How difficult it is to acquire and retain customers |
regulatory_hurdles |
int | 1β10 | The degree of legal/regulatory challenges |
time_to_mvp_months |
int | 1β60 | Estimated time to Minimum Viable Product launch (in months) |
team_expertise_required |
int | 1β10 | Level of specialized expertise/team members required |
scalability_requirement |
int | 1β10 | Degree to which scaling is required for success |
Target label:
Low
= Idea is relatively easy to executeMedium
= Moderately challengingHigh
= Difficult, requiring significant resources and expertise
π Performance
- Accuracy: 0.9733
- Macro F1: 0.9733
- Log Loss: 0.0584
Confusion Matrix (rows = true, cols = predicted):
High | Low | Medium | |
---|---|---|---|
High | 100 | 0 | 0 |
Low | 0 | 96 | 4 |
Medium | 2 | 2 | 96 |
π Quickstart (load from the Hub)
# Load directly from: mjpsm/Idea-Difficulty-XGB
from huggingface_hub import hf_hub_download
from xgboost import XGBClassifier
import pandas as pd, json
REPO_ID = "mjpsm/Idea-Difficulty-XGB"
model_path = hf_hub_download(REPO_ID, "xgb_model.json")
clf = XGBClassifier()
clf.load_model(model_path)
# IMPORTANT: Use the same feature names/order as training
FEATURES = [
"capital_required","technical_complexity","market_competition",
"customer_acquisition_difficulty","regulatory_hurdles",
"time_to_mvp_months","team_expertise_required","scalability_requirement"
]
row = pd.DataFrame([{
"capital_required": 7,
"technical_complexity": 9,
"market_competition": 6,
"customer_acquisition_difficulty": 8,
"regulatory_hurdles": 7,
"time_to_mvp_months": 18,
"team_expertise_required": 5,
"scalability_requirement": 9
}], columns=FEATURES)
pred_id = int(clf.predict(row)[0])
# If label_map.json is NOT uploaded, default to alphabetical LabelEncoder order:
CLASSES = ["High","Low","Medium"] # update if you publish label_map.json
print("Predicted Idea Difficulty:", CLASSES[pred_id])
# OPTIONAL: If you later upload 'label_map.json', prefer this:
# lm_path = hf_hub_download(REPO_ID, "label_map.json")
# label_map = json.load(open(lm_path))
# inv_map = {v:k for k,v in label_map.items()}
# print("Predicted Idea Difficulty:", inv_map[pred_id])
Evaluation results
- accuracy on idea_difficulty_dataset_2000 (synthetic, balanced)self-reported0.973
- macro_f1 on idea_difficulty_dataset_2000 (synthetic, balanced)self-reported0.973
- log_loss on idea_difficulty_dataset_2000 (synthetic, balanced)self-reported0.058