mjpsm/Idea-Difficulty-XGB

🧾 Overview

This model predicts the difficulty of a business idea as Low, Medium, or High.
It is part of the Entrepreneurial Readiness series of tabular classifiers (alongside Skill Level, Risk Tolerance, and Confidence).

The model was trained with XGBoost on a 2,000-row synthetic dataset of structured features that capture common difficulty drivers.


πŸ“₯ Input Features

Feature Type Range Definition
capital_required int 1–10 How much upfront capital is needed (1 = minimal, 10 = very high)
technical_complexity int 1–10 How technically difficult the product/service is to build or maintain
market_competition int 1–10 How crowded the target market is with competitors
customer_acquisition_difficulty int 1–10 How difficult it is to acquire and retain customers
regulatory_hurdles int 1–10 The degree of legal/regulatory challenges
time_to_mvp_months int 1–60 Estimated time to Minimum Viable Product launch (in months)
team_expertise_required int 1–10 Level of specialized expertise/team members required
scalability_requirement int 1–10 Degree to which scaling is required for success

Target label:

  • Low = Idea is relatively easy to execute
  • Medium = Moderately challenging
  • High = Difficult, requiring significant resources and expertise

πŸ“Š Performance

  • Accuracy: 0.9733
  • Macro F1: 0.9733
  • Log Loss: 0.0584

Confusion Matrix (rows = true, cols = predicted):

High Low Medium
High 100 0 0
Low 0 96 4
Medium 2 2 96

πŸš€ Quickstart (load from the Hub)

# Load directly from: mjpsm/Idea-Difficulty-XGB
from huggingface_hub import hf_hub_download
from xgboost import XGBClassifier
import pandas as pd, json

REPO_ID = "mjpsm/Idea-Difficulty-XGB"
model_path = hf_hub_download(REPO_ID, "xgb_model.json")

clf = XGBClassifier()
clf.load_model(model_path)

# IMPORTANT: Use the same feature names/order as training
FEATURES = [
    "capital_required","technical_complexity","market_competition",
    "customer_acquisition_difficulty","regulatory_hurdles",
    "time_to_mvp_months","team_expertise_required","scalability_requirement"
]

row = pd.DataFrame([{
    "capital_required": 7,
    "technical_complexity": 9,
    "market_competition": 6,
    "customer_acquisition_difficulty": 8,
    "regulatory_hurdles": 7,
    "time_to_mvp_months": 18,
    "team_expertise_required": 5,
    "scalability_requirement": 9
}], columns=FEATURES)

pred_id = int(clf.predict(row)[0])

# If label_map.json is NOT uploaded, default to alphabetical LabelEncoder order:
CLASSES = ["High","Low","Medium"]  # update if you publish label_map.json
print("Predicted Idea Difficulty:", CLASSES[pred_id])

# OPTIONAL: If you later upload 'label_map.json', prefer this:
# lm_path = hf_hub_download(REPO_ID, "label_map.json")
# label_map = json.load(open(lm_path))
# inv_map = {v:k for k,v in label_map.items()}
# print("Predicted Idea Difficulty:", inv_map[pred_id])
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results

  • accuracy on idea_difficulty_dataset_2000 (synthetic, balanced)
    self-reported
    0.973
  • macro_f1 on idea_difficulty_dataset_2000 (synthetic, balanced)
    self-reported
    0.973
  • log_loss on idea_difficulty_dataset_2000 (synthetic, balanced)
    self-reported
    0.058