Surgical Duration Prediction Model
Model Description
This XGBoost regression model predicts the actual duration of surgical procedures in minutes, significantly outperforming traditional human estimates (booked time). The model achieves a Mean Absolute Error of 4.97 minutes and explains 94.19% of the variance in surgical durations, representing a 56.52% improvement over baseline predictions.
Model Type: XGBoost Regressor
Task: Regression (Time Prediction)
Language: English
License: Apache 2.0
Intended Use
Primary Use Cases
- Operating Room Scheduling: Optimize surgical scheduling to reduce delays and improve utilization
 - Resource Planning: Better allocate staff, equipment, and facilities based on accurate time estimates
 - Hospital Operations: Minimize patient wait times and reduce overtime costs
 
Out-of-Scope Use
- Emergency surgery planning (model trained on scheduled procedures)
 - Cross-institutional deployment without retraining (model is hospital-specific)
 - Real-time intraoperative duration updates
 
Model Architecture
- Algorithm: XGBoost (Extreme Gradient Boosting)
 - Parameters:
- n_estimators: 200
 - learning_rate: 0.1
 - max_depth: 7
 - random_state: 42
 
 
Training Data
Dataset: Kaggle - Optimizing Operating Room Utilization
Features Used
- Booked Time (min) - Originally scheduled procedure duration (most important feature, 65% importance)
 - Service - Medical department/service (e.g., Orthopedics, General Surgery, Podiatry)
 - CPT Description - Procedure code description (22% importance)
 
Target Variable
- actual_duration_min - Calculated as (End Time - Start Time) in minutes
 
Preprocessing Steps
- Missing value imputation (median for numeric, mode for categorical)
 - Label encoding for categorical features (Service and CPT Description)
 - 80-20 train-test split with random_state=42
 
Performance
Evaluation Metrics
| Metric | Your Model | Baseline (Booked Time) | Improvement | 
|---|---|---|---|
| Mean Absolute Error (MAE) | 4.97 min | 11.43 min | 56.52% better | 
| Root Mean Squared Error (RMSE) | ~15-25 min* | ~30-45 min* | ~35-45% better* | 
| R² Score | 0.9419 | 0.7770 | +0.1649 | 
*Estimated based on typical performance for this model type
Interpretation
- On average, predictions are within ±5 minutes of actual surgical duration
 - Model explains 94% of variance in actual durations
 - More than twice as accurate as simply using booked time
 
Feature Importance
- Booked Time (min): 65%
 - CPT Description: 22%
 - Service Departments: 13% (combined)
 
How to Use
Installation
pip install xgboost scikit-learn pandas numpy joblib
Loading the Model
import joblib
import pandas as pd
# Load model and encoders
model = joblib.load('surgical_predictor.pkl')
encoder_service = joblib.load('encoder_service.pkl')
encoder_cpt = joblib.load('encoder_cpt.pkl')
Making Predictions
# Prepare input data
new_surgery = pd.DataFrame({
    'Booked Time (min)': [120],
    'Service': ['Orthopedics'],
    'CPT Description': ['Total Knee Arthroplasty']
})
# Encode categorical features
new_surgery['Service'] = encoder_service.transform(new_surgery['Service'])
new_surgery['CPT Description'] = encoder_cpt.transform(new_surgery['CPT Description'])
# Predict duration
predicted_duration = model.predict(new_surgery)
print(f'Predicted Surgical Duration: {predicted_duration[0]:.0f} minutes')
Example Output
Predicted Surgical Duration: 138 minutes
Limitations
- Data Source Dependency: Model trained on single hospital dataset - performance may vary across institutions
 - Feature Requirements: Requires accurate CPT codes and service classifications
 - Procedure Coverage: Limited to procedure types present in training data
 - Temporal Factors: Does not account for time-of-day or day-of-week effects
 - Surgeon Variability: Does not include surgeon experience or individual performance metrics
 - Patient Factors: Does not include patient-specific factors (age, BMI, comorbidities)
 
Bias and Ethical Considerations
Potential Biases
- Model may perform differently across procedure types based on training data distribution
 - Underrepresented procedures may have higher prediction errors
 - May not capture rare complications that significantly extend surgery time
 
Ethical Use Guidelines
- Privacy: Ensure patient data confidentiality and HIPAA compliance
 - Clinical Judgment: Use as decision support tool, not replacement for clinical expertise
 - Continuous Monitoring: Regularly validate performance on new data
 - Transparency: Inform scheduling staff about model limitations
 - Fairness: Monitor for performance disparities across procedure types and departments
 
Risk Mitigation
- Always maintain buffer time in scheduling
 - Allow manual overrides by clinical staff
 - Regular model retraining with updated data
 - Implement alerts for predictions with high uncertainty
 
Training Procedure
Data Preprocessing
# 1. Load dataset
df = pd.read_csv('operating_room_utilization.csv')
# 2. Create target variable
df['actual_duration_min'] = (df['End Time'] - df['Start Time']).dt.total_seconds() / 60
# 3. Handle missing values
# Numeric: median imputation
# Categorical: mode imputation
# 4. Encode categorical features
from sklearn.preprocessing import LabelEncoder
le_service = LabelEncoder()
le_cpt = LabelEncoder()
# 5. Split data (80-20)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Model Training
from xgboost import XGBRegressor
model = XGBRegressor(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=7,
    random_state=42,
    n_jobs=-1
)
model.fit(X_train, y_train)
Hyperparameters
| Parameter | Value | Rationale | 
|---|---|---|
| n_estimators | 200 | Balance between performance and training time | 
| learning_rate | 0.1 | Standard rate for stable convergence | 
| max_depth | 7 | Prevent overfitting while capturing complexity | 
| random_state | 42 | Reproducibility | 
Validation
Cross-Validation
5-fold cross-validation can be performed to ensure robustness:
from sklearn.model_selection import cross_val_score
cv_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_absolute_error')
print(f'CV MAE: {-cv_scores.mean():.2f} ± {cv_scores.std():.2f}')
Model Card Authors
This model was developed as part of a portfolio project for operating room optimization using machine learning techniques.
Citation
If you use this model in your research or operations, please cite:
@misc{surgical_duration_predictor_2025,
  title={Surgical Duration Prediction using XGBoost},
  author={Your Name},
  year={2025},
  howpublished={Hugging Face Model Hub},
  note={Dataset: Kaggle Operating Room Utilization}
}
References
- Kaggle Dataset: Optimizing Operating Room Utilization
 - XGBoost Documentation: https://xgboost.readthedocs.io/
 - Recent research shows ML models can achieve MAE of 10-15 minutes for surgical duration prediction
 
Additional Resources
Model Files:
surgical_predictor.pkl- Trained XGBoost modelencoder_service.pkl- Service label encoderencoder_cpt.pkl- CPT Description label encodermodel_info.pkl- Model metadata
Visualizations:
- Predicted vs Actual scatter plot
 - Model performance comparison chart
 - Feature importance chart
 
Contact
For questions, issues, or collaboration opportunities, please open an issue in the repository.
Changelog
Version 1.0 (October 2025)
- Initial release
 - MAE: 4.97 minutes
 - R² Score: 0.9419
 - 56.52% improvement over baseline
 
Model Status: Production Ready ✓
Last Updated: October 2025
Framework: XGBoost 2.0+
Python Version: 3.8+