mjpsm/Idea-Difficulty-XGB

🧾 Overview

This model predicts the difficulty of a business idea as Low, Medium, or High.
It is part of the Entrepreneurial Readiness series of tabular classifiers (alongside Skill Level, Risk Tolerance, and Confidence).

The model was trained with XGBoost on a 2,000-row synthetic dataset of structured features that capture common difficulty drivers.

📥 Input Features

Feature	Type	Range	Definition
`capital_required`	int	1–10	How much upfront capital is needed (1 = minimal, 10 = very high)
`technical_complexity`	int	1–10	How technically difficult the product/service is to build or maintain
`market_competition`	int	1–10	How crowded the target market is with competitors
`customer_acquisition_difficulty`	int	1–10	How difficult it is to acquire and retain customers
`regulatory_hurdles`	int	1–10	The degree of legal/regulatory challenges
`time_to_mvp_months`	int	1–60	Estimated time to Minimum Viable Product launch (in months)
`team_expertise_required`	int	1–10	Level of specialized expertise/team members required
`scalability_requirement`	int	1–10	Degree to which scaling is required for success

Target label:

Low = Idea is relatively easy to execute
Medium = Moderately challenging
High = Difficult, requiring significant resources and expertise

📊 Performance

Accuracy: 0.9733
Macro F1: 0.9733
Log Loss: 0.0584

Confusion Matrix (rows = true, cols = predicted):

	High	Low	Medium
High	100	0	0
Low	0	96	4
Medium	2	2	96

🚀 Quickstart (load from the Hub)

# Load directly from: mjpsm/Idea-Difficulty-XGB
from huggingface_hub import hf_hub_download
from xgboost import XGBClassifier
import pandas as pd, json

REPO_ID = "mjpsm/Idea-Difficulty-XGB"
model_path = hf_hub_download(REPO_ID, "xgb_model.json")

clf = XGBClassifier()
clf.load_model(model_path)

# IMPORTANT: Use the same feature names/order as training
FEATURES = [
    "capital_required","technical_complexity","market_competition",
    "customer_acquisition_difficulty","regulatory_hurdles",
    "time_to_mvp_months","team_expertise_required","scalability_requirement"
]

row = pd.DataFrame([{
    "capital_required": 7,
    "technical_complexity": 9,
    "market_competition": 6,
    "customer_acquisition_difficulty": 8,
    "regulatory_hurdles": 7,
    "time_to_mvp_months": 18,
    "team_expertise_required": 5,
    "scalability_requirement": 9
}], columns=FEATURES)

pred_id = int(clf.predict(row)[0])

# If label_map.json is NOT uploaded, default to alphabetical LabelEncoder order:
CLASSES = ["High","Low","Medium"]  # update if you publish label_map.json
print("Predicted Idea Difficulty:", CLASSES[pred_id])

# OPTIONAL: If you later upload 'label_map.json', prefer this:
# lm_path = hf_hub_download(REPO_ID, "label_map.json")
# label_map = json.load(open(lm_path))
# inv_map = {v:k for k,v in label_map.items()}
# print("Predicted Idea Difficulty:", inv_map[pred_id])

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

accuracy on idea_difficulty_dataset_2000 (synthetic, balanced)
self-reported

0.973
macro_f1 on idea_difficulty_dataset_2000 (synthetic, balanced)
self-reported

0.973
log_loss on idea_difficulty_dataset_2000 (synthetic, balanced)
self-reported

0.058

View on Papers With Code