Spaces:

VLAI-AIVN
/

AIO2025M03_Demo_LightGBM

Sleeping

App Files Files Community

wjnwjn59 commited on Sep 17

Commit

4dec27c

1 Parent(s): e08f4c0

modify illustration logic

Browse files

Files changed (7) hide show

.gitignore +4 -0
README.md +15 -8
__pycache__/app.cpython-312.pyc +0 -0
app.py +84 -22
requirements.txt +5 -2
src/__pycache__/lightgbm_core.cpython-312.pyc +0 -0
src/lightgbm_core.py +462 -195

.gitignore ADDED Viewed

	@@ -0,0 +1,4 @@

+__pycache__/
+__MACOSX/
+.DS_Store

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ license: "mit"
 # AIO2025 Module 03 - LightGBM Demo
-This interactive demo showcases LightGBM (Light Gradient Boosting Machine) algorithms for both classification and regression tasks. The application provides a comprehensive interface for exploring efficient gradient boosting with leaf-wise tree growth where trees are trained sequentially to minimize gradient errors through dynamic visualizations and real-time parameter adjustment.
 ## ⚡ Features
@@ -26,13 +26,14 @@ This interactive demo showcases LightGBM (Light Gradient Boosting Machine) algor
 ### LightGBM Parameters
 - **Number of Trees**: Control gradient boosting iterations (limited to 1000 for performance)
 - **Learning Rate**: Step size shrinkage for gradient descent (0.001-1.0)
-- **Max Depth**: Individual tree depth (default: 6, leaf-wise growth)
 - **Early Stopping**: Automatic stopping when validation loss stops improving
 ### Visualizations
-- **Training Progress Chart**: Shows how loss evolves with early stopping during gradient boosting
-- **Individual Tree Visualization**: Detailed view of selected tree structure with leaf-wise growth
-- **Feature Importance**: Displays which features matter most using gradient-based importance
 - **LightGBM Process**: Gradient boosting aggregation display showing how predictions build up efficiently
 ## ⚡ Quick Start
@@ -61,7 +62,10 @@ This interactive demo showcases LightGBM (Light Gradient Boosting Machine) algor
 - `scikit-learn`: Data preprocessing utilities
 - `pandas`: Data manipulation
 - `numpy`: Numerical operations
-- `plotly`: Interactive visualizations
 - `gradio`: Web interface
 ### Architecture
@@ -75,6 +79,7 @@ This interactive demo showcases LightGBM (Light Gradient Boosting Machine) algor
 ### LightGBM Benefits
 - **Gradient Boosting**: Trees trained sequentially to minimize loss gradients
 - **High Performance**: Fast training and prediction with leaf-wise tree growth
 - **Feature Importance**: Robust importance scores through gradient-based methods
 - **Memory Efficiency**: Uses gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB)
 - **Early Stopping**: Automatic stopping when validation loss stops improving
@@ -101,7 +106,8 @@ This interactive demo showcases LightGBM (Light Gradient Boosting Machine) algor
 - **Number of Trees**: Limited to 1000 for optimal performance in this demo
 - **Learning Rate**: Default 0.1 works well; lower rates (0.01-0.05) create more conservative models, higher rates (0.2-0.3) for faster convergence
-- **Max Depth**: Default depth 6 balances performance and overfitting; deeper trees (8-12) for complex patterns
 - **Early Stopping**: Built-in early stopping prevents overfitting automatically
 ## 🎯 Use Cases
@@ -125,7 +131,8 @@ This interactive demo showcases LightGBM (Light Gradient Boosting Machine) algor
 - **Memory Efficient**: Optimized for gradient boosting with GOSS and EFB
 - **Real-time Updates**: Instant parameter adjustment and visualization
 - **Tree Selection**: Interactive dropdown to explore individual gradient boosting trees (up to 100)
-- **Gradient Nature**: Each tree fits gradients of loss function from previous iterations
 ## 🔗 Related Resources

 # AIO2025 Module 03 - LightGBM Demo
+This interactive demo showcases LightGBM (Light Gradient Boosting Machine) algorithms for both classification and regression tasks. The application provides a comprehensive interface for exploring efficient gradient boosting with leaf-wise tree growth where trees are trained sequentially to minimize gradient errors through dynamic visualizations and real-time parameter adjustment. LightGBM uses leaf-wise tree growth instead of depth-wise growth for faster convergence and better performance.
 ## ⚡ Features
 ### LightGBM Parameters
 - **Number of Trees**: Control gradient boosting iterations (limited to 1000 for performance)
 - **Learning Rate**: Step size shrinkage for gradient descent (0.001-1.0)
+- **Number of Leaves**: Maximum number of leaves in one tree (default: 31, controls complexity)
+- **Min Data in Leaf**: Minimum number of data points in one leaf (default: 20, prevents overfitting)
 - **Early Stopping**: Automatic stopping when validation loss stops improving
 ### Visualizations
+- **Interactive Training Progress Chart**: Interactive Plotly chart showing how loss evolves with early stopping during gradient boosting
+- **Interactive Feature Importance**: Interactive Plotly bar chart displaying which features matter most using gradient-based importance
+- **Individual Tree Visualization**: Detailed view of selected tree structure with leaf-wise growth using matplotlib
 - **LightGBM Process**: Gradient boosting aggregation display showing how predictions build up efficiently
 ## ⚡ Quick Start
 - `scikit-learn`: Data preprocessing utilities
 - `pandas`: Data manipulation
 - `numpy`: Numerical operations
+- `plotly`: Interactive visualizations for charts
+- `matplotlib`: Static visualizations for tree plots
+- `graphviz`: Tree structure visualization
+- `Pillow`: Image processing
 - `gradio`: Web interface
 ### Architecture
 ### LightGBM Benefits
 - **Gradient Boosting**: Trees trained sequentially to minimize loss gradients
 - **High Performance**: Fast training and prediction with leaf-wise tree growth
+- **Leaf-wise Growth**: Grows trees leaf-by-leaf instead of level-by-level for faster convergence
 - **Feature Importance**: Robust importance scores through gradient-based methods
 - **Memory Efficiency**: Uses gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB)
 - **Early Stopping**: Automatic stopping when validation loss stops improving
 - **Number of Trees**: Limited to 1000 for optimal performance in this demo
 - **Learning Rate**: Default 0.1 works well; lower rates (0.01-0.05) create more conservative models, higher rates (0.2-0.3) for faster convergence
+- **Number of Leaves**: Default 31 works well; for depth-7 equivalent, use ~70-80 leaves instead of 127 to prevent overfitting
+- **Min Data in Leaf**: Default 20 prevents overfitting; increase to hundreds or thousands for large datasets
 - **Early Stopping**: Built-in early stopping prevents overfitting automatically
 ## 🎯 Use Cases
 - **Memory Efficient**: Optimized for gradient boosting with GOSS and EFB
 - **Real-time Updates**: Instant parameter adjustment and visualization
 - **Tree Selection**: Interactive dropdown to explore individual gradient boosting trees (up to 100)
+- **Leaf-wise Growth**: LightGBM uses leaf-wise tree growth for faster convergence compared to depth-wise growth
+- **Parameter Tuning**: num_leaves is the main parameter to control tree complexity; min_data_in_leaf prevents overfitting
 ## 🔗 Related Resources

__pycache__/app.cpython-312.pyc CHANGED Viewed

Binary files a/__pycache__/app.cpython-312.pyc and b/__pycache__/app.cpython-312.pyc differ

app.py CHANGED Viewed

@@ -28,6 +28,8 @@ vlai_template.configure(
 )
 current_dataframe = None
 def load_sample_data_fallback(dataset_choice="Iris"):
     """Fallback data loading function when LightGBM is not available"""
@@ -301,8 +303,8 @@ def update_configuration(df_preview, target_col):
 # AdaBoost-specific functions
-def execute_prediction(df_preview, target_col, n_estimators, max_depth, learning_rate, train_test_split_ratio, show_split_info, *input_values):
-    global current_dataframe
     df = current_dataframe
     EMPTY_PLOT = None
@@ -321,6 +323,10 @@ def execute_prediction(df_preview, target_col, n_estimators, max_depth, learning
     is_valid, validation_msg, problem_type = validate_config(df, target_col)
     if not is_valid:
         return (EMPTY_PLOT, EMPTY_PLOT, EMPTY_PLOT, error_style.format("Configuration issue."), default_dropdown)
     try:
         if LIGHTGBM_AVAILABLE:
@@ -338,12 +344,12 @@ def execute_prediction(df_preview, target_col, n_estimators, max_depth, learning
             new_point_dict[comp["name"]] = v
         boosting_progress_fig, loss_chart_fig, importance_fig, prediction, pred_details, summary, aggregation_display = lightgbm_core.run_lightgbm_and_visualize(
-            df, target_col, new_point_dict, n_estimators, max_depth, learning_rate, train_test_split_ratio, problem_type
         )
         feature_cols = [c for c in df.columns if c != target_col]
         first_tree_fig = lightgbm_core.get_individual_tree_visualization(
-            lightgbm_core._get_current_model(), 0, feature_cols, problem_type
         )
         updated_tree_selector = update_tree_selector_choices(n_estimators)
@@ -356,30 +362,67 @@ def execute_prediction(df_preview, target_col, n_estimators, max_depth, learning
 def update_tree_selector_choices(n_estimators):
-    # Limit tree visualization dropdown to 50 trees for UI performance
-    n_estimators_limited = min(int(n_estimators), 50)
-    choices = [f"Tree {i+1}" for i in range(n_estimators_limited)]
     return gr.Dropdown(choices=choices, value="Tree 1")
-def update_tree_visualization(tree_selector):
-    global current_dataframe
     if current_dataframe is None or current_dataframe.empty:
         return None
     try:
         model = lightgbm_core._get_current_model()
         if model is None:
             return None
         tree_index = int(tree_selector.split()[-1]) - 1
-        _, _, problem_type = validate_config(current_dataframe, current_dataframe.columns[-1])
-        feature_cols = [c for c in current_dataframe.columns if c != current_dataframe.columns[-1]]
-        tree_fig = lightgbm_core.get_individual_tree_visualization(model, tree_index, feature_cols, problem_type)
         return tree_fig
     except Exception as e:
         return None
@@ -416,7 +459,7 @@ with gr.Blocks(theme="gstaff/sketch", css=vlai_template.custom_css, fill_width=T
                     n_estimators = gr.Number(
                         label="Number of Trees",
                         value=100, minimum=1, maximum=1000, precision=0,
-                        info="Number of gradient boosting trees (up to 1000)"
                     )
                     learning_rate = gr.Slider(
                         label="Learning Rate",
@@ -424,10 +467,15 @@ with gr.Blocks(theme="gstaff/sketch", css=vlai_template.custom_css, fill_width=T
                         info="Step size shrinkage for each tree"
                     )
                 with gr.Row():
-                    max_depth = gr.Number(
-                        label="Max Depth",
-                        value=6, minimum=1, maximum=15, precision=0,
-                        info="Maximum depth of individual trees (-1 for unlimited, but 6 is typical)"
                     )
                 gr.Markdown("**📊 Data Split Configuration**")
@@ -442,6 +490,18 @@ with gr.Blocks(theme="gstaff/sketch", css=vlai_template.custom_css, fill_width=T
                         value=True,
                         info="Display train/validation set information"
                     )
                 inputs_group = gr.Group(visible=False)
                 with inputs_group:
@@ -476,15 +536,17 @@ with gr.Blocks(theme="gstaff/sketch", css=vlai_template.custom_css, fill_width=T
             feature_importance_plot = gr.Plot(label="Feature Importance", visible=True)
             aggregation_display = gr.HTML("**⚡ LightGBM Process**<br><br>LightGBM details will appear here showing how the prediction builds up.", label="⚡ LightGBM Process")
-    gr.Markdown("""⚡ **LightGBM Tips**:
 - **📉 Loss Evolution Chart**: Monitor training and validation loss to understand model convergence with early stopping.
 - **🌳 Individual Tree Visualization**: Select any tree to see its leaf-wise structure and contribution.
 - **📊 Feature Importance**: Displays which features are most influential using gradient-based importance.
 - **🎯 Parameter Tuning**: Try different **number of trees** (up to 1000) and **learning rate** (0.001-1.0).
 - **⚡ Learning Rate**: Default 0.1 works well; lower values (0.01-0.05) for more conservative models, higher values (0.2-0.3) for faster convergence.
-- **🌲 Tree Depth**: Default depth 6 balances complexity and performance; deeper trees (8-12) for complex patterns.
-- **🎯 Gradient Boosting**: LightGBM uses gradient-based one-side sampling and exclusive feature bundling for efficiency.
 - **🔍 Tree Analysis**: Use the tree selector to understand how each tree contributes to gradient boosting ensemble.
 """)
     vlai_template.create_footer()
@@ -515,13 +577,13 @@ with gr.Blocks(theme="gstaff/sketch", css=vlai_template.custom_css, fill_width=T
     run_prediction_btn.click(
         fn=execute_prediction,
-        inputs=[data_preview, target_column, n_estimators, max_depth, learning_rate, train_test_split_ratio, show_split_info] + input_components,
         outputs=[loss_chart, individual_tree_plot, feature_importance_plot, aggregation_display, tree_selector],
     )
     tree_selector.change(
         fn=update_tree_visualization,
-        inputs=[tree_selector],
         outputs=[individual_tree_plot],
     )

 )
 current_dataframe = None
+current_target_column = None
+current_problem_type = None
 def load_sample_data_fallback(dataset_choice="Iris"):
     """Fallback data loading function when LightGBM is not available"""
 # AdaBoost-specific functions
+def execute_prediction(df_preview, target_col, n_estimators, num_leaves, min_data_in_leaf, learning_rate, train_test_split_ratio, show_split_info, use_early_stopping, early_stopping_rounds, *input_values):
+    global current_dataframe, current_target_column, current_problem_type
     df = current_dataframe
     EMPTY_PLOT = None
     is_valid, validation_msg, problem_type = validate_config(df, target_col)
     if not is_valid:
         return (EMPTY_PLOT, EMPTY_PLOT, EMPTY_PLOT, error_style.format("Configuration issue."), default_dropdown)
+    # Store the current target column and problem type globally
+    current_target_column = target_col
+    current_problem_type = problem_type
     try:
         if LIGHTGBM_AVAILABLE:
             new_point_dict[comp["name"]] = v
         boosting_progress_fig, loss_chart_fig, importance_fig, prediction, pred_details, summary, aggregation_display = lightgbm_core.run_lightgbm_and_visualize(
+            df, target_col, new_point_dict, n_estimators, num_leaves, min_data_in_leaf, learning_rate, train_test_split_ratio, problem_type, use_early_stopping, early_stopping_rounds
         )
         feature_cols = [c for c in df.columns if c != target_col]
         first_tree_fig = lightgbm_core.get_individual_tree_visualization(
+            lightgbm_core._get_current_model(), 0, feature_cols, problem_type, num_leaves
         )
         updated_tree_selector = update_tree_selector_choices(n_estimators)
 def update_tree_selector_choices(n_estimators):
+    # Only show trees that were actually trained (respect early stopping)
+    try:
+        model = lightgbm_core._get_current_model()
+        actual_trees = 0
+        if model is not None:
+            # Prefer evals_result_ count if available
+            if hasattr(model, 'evals_result_') and model.evals_result_:
+                eval_results = model.evals_result_
+                if 'train' in eval_results and eval_results['train']:
+                    metric_name = list(eval_results['train'].keys())[0]
+                    actual_trees = len(eval_results['train'][metric_name])
+                    print(f"Tree selector: eval history reports {actual_trees} trees trained")
+            # Fallback to best_iteration if present
+            if actual_trees == 0 and hasattr(model, 'best_iteration') and model.best_iteration is not None:
+                actual_trees = int(model.best_iteration) + 1
+                print(f"Tree selector: using best_iteration -> {actual_trees} trees")
+            # Final fallback to model.num_trees()
+            if actual_trees == 0 and hasattr(model, 'num_trees'):
+                actual_trees = int(model.num_trees())
+                print(f"Tree selector: using num_trees() -> {actual_trees} trees")
+        # Ensure at least one option to avoid empty dropdown
+        actual_trees = max(1, actual_trees)
+        # For UI performance, cap at 100
+        trees_to_show = min(actual_trees, 100)
+        # Debug
+        print(f"Tree selector: requested={n_estimators}, available={actual_trees}, showing={trees_to_show}")
+    except Exception as e:
+        trees_to_show = min(max(1, int(n_estimators)), 100)
+        print(f"Tree selector error: {e}, falling back to requested count {trees_to_show}")
+    choices = [f"Tree {i+1}" for i in range(trees_to_show)]
     return gr.Dropdown(choices=choices, value="Tree 1")
+def update_tree_visualization(tree_selector, num_leaves=31):
+    global current_dataframe, current_target_column, current_problem_type
     if current_dataframe is None or current_dataframe.empty:
         return None
+    if current_target_column is None or current_problem_type is None:
+        return None
     try:
         model = lightgbm_core._get_current_model()
         if model is None:
             return None
         tree_index = int(tree_selector.split()[-1]) - 1
+        # Use the stored target column and problem type
+        feature_cols = [c for c in current_dataframe.columns if c != current_target_column]
+        # Use the num_leaves parameter from the UI
+        tree_fig = lightgbm_core.get_individual_tree_visualization(model, tree_index, feature_cols, current_problem_type, num_leaves)
         return tree_fig
     except Exception as e:
+        print(f"Tree visualization error: {str(e)}")  # For debugging
         return None
                     n_estimators = gr.Number(
                         label="Number of Trees",
                         value=100, minimum=1, maximum=1000, precision=0,
+                        info="Requested number of trees (up to 1000). Actual trained trees may be fewer due to early stopping."
                     )
                     learning_rate = gr.Slider(
                         label="Learning Rate",
                         info="Step size shrinkage for each tree"
                     )
                 with gr.Row():
+                    num_leaves = gr.Number(
+                        label="Number of Leaves",
+                        value=31, minimum=2, maximum=127, precision=0,
+                        info="Maximum number of leaves in one tree (controls complexity, typically 31-70)"
+                    )
+                    min_data_in_leaf = gr.Number(
+                        label="Min Data in Leaf",
+                        value=20, minimum=1, maximum=1000, precision=0,
+                        info="Minimum number of data points in one leaf (prevents overfitting)"
                     )
                 gr.Markdown("**📊 Data Split Configuration**")
                         value=True,
                         info="Display train/validation set information"
                     )
+                with gr.Row():
+                    use_early_stopping = gr.Checkbox(
+                        label="Use Early Stopping",
+                        value=True,
+                        info="Stop training early if validation performance doesn't improve (prevents overfitting)"
+                    )
+                    early_stopping_rounds = gr.Number(
+                        label="Early Stopping Rounds",
+                        value=20, minimum=5, maximum=100, precision=0,
+                        info="Number of rounds to wait before stopping (20% of trees by default)"
+                    )
                 inputs_group = gr.Group(visible=False)
                 with inputs_group:
             feature_importance_plot = gr.Plot(label="Feature Importance", visible=True)
             aggregation_display = gr.HTML("**⚡ LightGBM Process**<br><br>LightGBM details will appear here showing how the prediction builds up.", label="⚡ LightGBM Process")
+    gr.Markdown("""⚡ **LightGBM Leaf-wise Tree Tips**:
 - **📉 Loss Evolution Chart**: Monitor training and validation loss to understand model convergence with early stopping.
 - **🌳 Individual Tree Visualization**: Select any tree to see its leaf-wise structure and contribution.
 - **📊 Feature Importance**: Displays which features are most influential using gradient-based importance.
 - **🎯 Parameter Tuning**: Try different **number of trees** (up to 1000) and **learning rate** (0.001-1.0).
 - **⚡ Learning Rate**: Default 0.1 works well; lower values (0.01-0.05) for more conservative models, higher values (0.2-0.3) for faster convergence.
+- **🍃 Number of Leaves**: Controls tree complexity (default 31). For depth-7 equivalent, use ~70-80 leaves instead of 127 to prevent overfitting.
+- **📊 Min Data in Leaf**: Prevents overfitting by requiring minimum samples per leaf (default 20). Increase for larger datasets.
+- **🎯 Leaf-wise Growth**: LightGBM grows trees leaf-by-leaf for faster convergence compared to depth-wise growth.
 - **🔍 Tree Analysis**: Use the tree selector to understand how each tree contributes to gradient boosting ensemble.
+- **⏹️ Early Stopping**: Tree selector shows requested trees, but only actually trained trees can be visualized. Check console for actual vs requested tree counts.
 """)
     vlai_template.create_footer()
     run_prediction_btn.click(
         fn=execute_prediction,
+        inputs=[data_preview, target_column, n_estimators, num_leaves, min_data_in_leaf, learning_rate, train_test_split_ratio, show_split_info, use_early_stopping, early_stopping_rounds] + input_components,
         outputs=[loss_chart, individual_tree_plot, feature_importance_plot, aggregation_display, tree_selector],
     )
     tree_selector.change(
         fn=update_tree_visualization,
+        inputs=[tree_selector, num_leaves],
         outputs=[individual_tree_plot],
     )

requirements.txt CHANGED Viewed

@@ -2,5 +2,8 @@ gradio>=5.38.0
 pandas>=1.5.0
 scikit-learn>=1.3.0
 numpy>=1.24.0
-plotly>=5.15.0
-lightgbm>=4.0.0

 pandas>=1.5.0
 scikit-learn>=1.3.0
 numpy>=1.24.0
+lightgbm>=4.0.0
+matplotlib>=3.5.0
+graphviz>=0.20.0
+Pillow>=8.0.0
+plotly>=5.15.0

src/__pycache__/lightgbm_core.cpython-312.pyc CHANGED Viewed

Binary files a/src/__pycache__/lightgbm_core.cpython-312.pyc and b/src/__pycache__/lightgbm_core.cpython-312.pyc differ

src/lightgbm_core.py CHANGED Viewed

@@ -1,5 +1,7 @@
 import pandas as pd
 import numpy as np
 import lightgbm as lgb
 from sklearn.preprocessing import LabelEncoder
@@ -8,8 +10,20 @@ from sklearn.datasets import (
 )
 from sklearn.model_selection import train_test_split
 from sklearn.metrics import accuracy_score, mean_squared_error
 import plotly.graph_objects as go
 import plotly.express as px
 _current_model = None
@@ -151,7 +165,7 @@ def preprocess_data(df, target_col, new_point_dict):
 def run_lightgbm_and_visualize(df, target_col, new_point_dict,
-                               n_estimators, max_depth, learning_rate, train_test_split_ratio=0.8, problem_type=None):
     X, y, new_point, feature_cols, _ = preprocess_data(df, target_col, new_point_dict)
     if problem_type is None:
@@ -159,8 +173,10 @@ def run_lightgbm_and_visualize(df, target_col, new_point_dict,
     if n_estimators < 1:
         return None, None, None, None, "Number of estimators must be ≥ 1.", None
-    if max_depth is not None and max_depth < 1:
-        return None, None, None, None, "Max depth must be ≥ 1.", None
     if learning_rate <= 0 or learning_rate > 1:
         return None, None, None, None, "Learning rate must be between 0 and 1.", None
@@ -175,8 +191,8 @@ def run_lightgbm_and_visualize(df, target_col, new_point_dict,
         'objective': 'multiclass' if problem_type == "classification" and len(np.unique(y)) > 2 else 'binary' if problem_type == "classification" else 'regression',
         'num_class': len(np.unique(y)) if problem_type == "classification" and len(np.unique(y)) > 2 else None,
         'boosting_type': 'gbdt',
-        'num_leaves': 2**max_depth - 1 if max_depth else 31,
-        'max_depth': int(max_depth) if max_depth else -1,
         'learning_rate': float(learning_rate),
         'feature_fraction': 0.9,
         'bagging_fraction': 0.8,
@@ -193,17 +209,76 @@ def run_lightgbm_and_visualize(df, target_col, new_point_dict,
     train_data = lgb.Dataset(X_train, label=y_train)
     val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)
-    # Train model with early stopping
     model = lgb.train(
         params,
         train_data,
         valid_sets=[train_data, val_data],
         valid_names=['train', 'eval'],
         num_boost_round=n_estimators,
-        callbacks=[lgb.early_stopping(stopping_rounds=50, verbose=False), lgb.log_evaluation(0)]
     )
-    prediction = model.predict(new_point, num_iteration=model.best_iteration)[0]
     if problem_type == "classification":
         if len(np.unique(y)) == 2:  # Binary classification
             prediction = int(prediction > 0.5)
@@ -246,15 +321,48 @@ def run_lightgbm_and_visualize(df, target_col, new_point_dict,
     loss_chart_fig = create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type)
     importance_fig = create_feature_importance_plot(model, feature_cols)
     prediction_details = create_prediction_details(model, new_point[0], feature_cols, target_col, prediction, problem_type)
-    summary = create_algorithm_summary(model, problem_type, n_estimators, max_depth, learning_rate, feature_cols)
     aggregation_display = create_lightgbm_aggregation_display(model, new_point[0], problem_type, target_col, df, split_info)
     return None, loss_chart_fig, importance_fig, prediction, prediction_details, summary, aggregation_display
 def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
-    """Create a loss chart showing training and validation loss evolution during LightGBM training"""
     try:
         # Get evaluation results from LightGBM training history
         eval_results = model.evals_result_
@@ -274,8 +382,9 @@ def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
                 y=train_losses,
                 mode='lines+markers',
                 name='Training Loss',
-                line=dict(color='#8E44AD', width=2),
-                marker=dict(size=4)
             ))
             # Plot validation loss
@@ -284,8 +393,9 @@ def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
                 y=val_losses,
                 mode='lines+markers',
                 name='Validation Loss',
-                line=dict(color='#3498DB', width=2),
-                marker=dict(size=4)
             ))
             # Add early stopping line if available
@@ -294,7 +404,9 @@ def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
                     x=model.best_iteration + 1,
                     line_dash="dash",
                     line_color="red",
-                    annotation_text="Early Stop"
                 )
             fig.update_layout(
@@ -302,7 +414,8 @@ def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
                 xaxis_title="Boosting Round",
                 yaxis_title=metric_name.replace('_', ' ').title(),
                 plot_bgcolor="white",
-                height=400,
                 legend=dict(
                     yanchor="top",
                     y=0.99,
@@ -331,7 +444,7 @@ def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
         )
         fig.update_layout(
             title="LightGBM Training Progress - Loss Evolution",
-            height=400,
             plot_bgcolor="white"
         )
         return fig
@@ -339,204 +452,333 @@ def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
-def create_individual_tree_visualization(model, tree_index, feature_cols, problem_type):
-    """Create visualization of individual LightGBM tree"""
     try:
-        # LightGBM doesn't expose individual trees easily, so create a representative visualization
-        if tree_index < model.num_trees():
-            return create_lightgbm_tree_plot(tree_index, feature_cols, problem_type, model)
         else:
-            raise IndexError(f"Tree index {tree_index} out of range")
     except Exception as e:
-        # Fallback visualization
-        fig = go.Figure()
-        fig.add_annotation(
-            text=f"LightGBM Tree {tree_index + 1} Visualization<br>Unable to extract tree structure<br>Error: {str(e)}",
-            xref="paper", yref="paper",
-            x=0.5, y=0.5, xanchor='center', yanchor='middle',
-            showarrow=False,
-            font=dict(size=14)
-        )
-        fig.update_layout(
-            title=f"LightGBM Tree {tree_index + 1} Structure",
-            height=500,
-            plot_bgcolor="white"
-        )
         return fig
-def create_lightgbm_tree_plot(tree_index, feature_cols, problem_type, model):
     """Create tree visualization for LightGBM trees"""
     try:
         # Create a representative visualization for LightGBM tree
-        return create_manual_tree_plot(tree_index, feature_cols, problem_type, "LightGBM", 1.0, model)
     except Exception as e:
         # Fallback to manual tree creation
-        return create_manual_tree_plot(tree_index, feature_cols, problem_type, "LightGBM", 1.0)
-def create_manual_tree_plot(tree_index, feature_cols, problem_type, model_type, weight=1.0, model=None):
     """Create a manual tree visualization when tree structure is not easily accessible"""
-    fig = go.Figure()
-    # Create a sample tree structure for demonstration
     import random
     random.seed(tree_index)  # Consistent trees for same index
-    # For LightGBM, we can try to get some parameters
-    if model_type == "LightGBM" and model:
         try:
-            # Try to get max_depth from model params
-            actual_depth = model.params.get('max_depth', 6) if hasattr(model, 'params') else 6
-            if actual_depth == -1:  # LightGBM default unlimited depth
-                actual_depth = 6  # Set reasonable default for visualization
         except:
-            actual_depth = 6  # LightGBM typical depth
     else:
-        actual_depth = 1  # fallback for other models
-    # Root node
     root_feature = random.choice(feature_cols) if feature_cols else "feature_0"
     root_threshold = round(random.uniform(0.1, 5.0), 2)
-    # Create tree structure based on actual depth
-    if actual_depth <= 2 or model_type != "LightGBM":
-        # Simple tree (depth 1-2)
-        positions = {
-            'root': (0, 1),
-            'left': (-1, 0),
-            'right': (1, 0)
-        }
-        if model_type == "LightGBM":
-            labels = {
-                'root': f"{root_feature}<br>≤ {root_threshold}<br>Tree: {tree_index + 1}<br>Gradient Boosting",
-                'left': f"Leaf (≤)<br>Output: {round(random.uniform(-1, 1), 3)}<br>Samples: {random.randint(20, 80)}",
-                'right': f"Leaf (>)<br>Output: {round(random.uniform(-1, 1), 3)}<br>Samples: {random.randint(20, 80)}"
-            }
-        else:
-            labels = {
-                'root': f"{root_feature}<br>≤ {root_threshold}<br>Weight: {weight:.3f}<br>Decision Stump",
-                'left': f"Leaf (≤)<br>Value: {round(random.uniform(-1, 1), 3)}<br>Samples: {random.randint(20, 80)}",
-                'right': f"Leaf (>)<br>Value: {round(random.uniform(-1, 1), 3)}<br>Samples: {random.randint(20, 80)}"
-            }
-        colors = {
-            'root': '#8E44AD' if model_type == "LightGBM" else '#81C784',  # Purple for LightGBM, Green for others
-            'left': '#3498DB' if model_type == "LightGBM" else '#FFB74D',   # Blue for LightGBM, Orange for others
-            'right': '#3498DB' if model_type == "LightGBM" else '#FFB74D'   # Blue for LightGBM, Orange for others
-        }
-        edges = [('root', 'left'), ('root', 'right')]
-        title_suffix = "Gradient Boosting Tree" if model_type == "LightGBM" else "Decision Stump"
-    else:
-        # Deeper tree (depth 2+)
-        positions = {
-            'root': (0, 2),
-            'left': (-1.5, 1),
-            'right': (1.5, 1),
-            'left_left': (-2.5, 0),
-            'left_right': (-0.5, 0),
-            'right_left': (0.5, 0),
-            'right_right': (2.5, 0)
-        }
-        if model_type == "LightGBM":
-            labels = {
-                'root': f"{root_feature}<br>≤ {root_threshold}<br>Tree: {tree_index + 1}<br>Depth: {actual_depth}",
-                'left': f"{random.choice(feature_cols) if feature_cols else 'feature_1'}<br>≤ {round(random.uniform(0.1, 3.0), 2)}<br>Samples: 75",
-                'right': f"{random.choice(feature_cols) if feature_cols else 'feature_2'}<br>≤ {round(random.uniform(0.1, 3.0), 2)}<br>Samples: 75",
-                'left_left': f"Leaf<br>Output: {round(random.uniform(-1, 1), 3)}<br>Samples: 25",
-                'left_right': f"Leaf<br>Output: {round(random.uniform(-1, 1), 3)}<br>Samples: 50",
-                'right_left': f"Leaf<br>Output: {round(random.uniform(-1, 1), 3)}<br>Samples: 30",
-                'right_right': f"Leaf<br>Output: {round(random.uniform(-1, 1), 3)}<br>Samples: 45"
-            }
-            colors = {
-                'root': '#8E44AD', 'left': '#8E44AD', 'right': '#8E44AD',  # Purple for split nodes
-                'left_left': '#3498DB', 'left_right': '#3498DB', 'right_left': '#3498DB', 'right_right': '#3498DB'  # Blue for leaves
-            }
-        else:
-            labels = {
-                'root': f"{root_feature}<br>≤ {root_threshold}<br>Weight: {weight:.3f}<br>Depth: {actual_depth}",
-                'left': f"{random.choice(feature_cols) if feature_cols else 'feature_1'}<br>≤ {round(random.uniform(0.1, 3.0), 2)}<br>Samples: 75",
-                'right': f"{random.choice(feature_cols) if feature_cols else 'feature_2'}<br>≤ {round(random.uniform(0.1, 3.0), 2)}<br>Samples: 75",
-                'left_left': f"Leaf<br>Value: {round(random.uniform(-1, 1), 3)}<br>Samples: 25",
-                'left_right': f"Leaf<br>Value: {round(random.uniform(-1, 1), 3)}<br>Samples: 50",
-                'right_left': f"Leaf<br>Value: {round(random.uniform(-1, 1), 3)}<br>Samples: 30",
-                'right_right': f"Leaf<br>Value: {round(random.uniform(-1, 1), 3)}<br>Samples: 45"
-            }
-            colors = {
-                'root': '#81C784', 'left': '#81C784', 'right': '#81C784',  # Green for split nodes
-                'left_left': '#FFB74D', 'left_right': '#FFB74D', 'right_left': '#FFB74D', 'right_right': '#FFB74D'  # Orange for leaves
-            }
-        edges = [
-            ('root', 'left'), ('root', 'right'),
-            ('left', 'left_left'), ('left', 'left_right'),
-            ('right', 'right_left'), ('right', 'right_right')
-        ]
-        title_suffix = f"Depth {actual_depth} Gradient Boosting Tree" if model_type == "LightGBM" else f"Depth {actual_depth} Tree"
-    edge_x, edge_y = [], []
-    for parent, child in edges:
-        parent_pos = positions[parent]
-        child_pos = positions[child]
-        edge_x.extend([parent_pos[0], child_pos[0], None])
-        edge_y.extend([parent_pos[1], child_pos[1], None])
-    fig.add_trace(go.Scatter(
-        x=edge_x, y=edge_y,
-        mode='lines',
-        line=dict(color='gray', width=2),
-        showlegend=False,
-        hoverinfo='none'
-    ))
-    # Draw nodes
-    for node_id, (x, y) in positions.items():
-        fig.add_trace(go.Scatter(
-            x=[x], y=[y],
-            mode='markers+text',
-            marker=dict(
-                size=35,
-                color=colors[node_id],
-                line=dict(width=2, color='darkblue'),
-                symbol='circle'
-            ),
-            text=labels[node_id],
-            textposition='middle center',
-            textfont=dict(size=9, color='black'),
-            showlegend=False,
-            hoverinfo='text',
-            hovertext=labels[node_id]
-        ))
-    # Adjust layout based on tree depth
-    if actual_depth == 1:
-        x_range, y_range, height = [-1.5, 1.5], [-0.5, 1.5], 400
-    else:
-        x_range, y_range, height = [-3, 3], [-0.5, 2.5], 600
-    fig.update_layout(
-        title=f"{model_type} Estimator {tree_index + 1} Structure - {title_suffix} ({problem_type.title()})",
-        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=x_range),
-        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=y_range),
-        plot_bgcolor="white",
-        height=height,
-        margin=dict(l=40, r=40, t=60, b=40),
-        showlegend=False
-    )
     return fig
-def get_individual_tree_visualization(model, tree_index, feature_cols, problem_type):
-    return create_individual_tree_visualization(model, tree_index, feature_cols, problem_type)
 def create_feature_importance_plot(model, feature_cols):
@@ -545,27 +787,51 @@ def create_feature_importance_plot(model, feature_cols):
         importances = model.feature_importance(importance_type='gain')
         order = np.argsort(importances)[::-1]
         fig = go.Figure()
-        fig.add_trace(
-            go.Bar(
-                x=[feature_cols[i] for i in order],
-                y=importances[order],
-                text=[f"{importances[i]:.0f}" for i in order],
                 textposition="auto",
-                marker_color="#8E44AD",  # LightGBM purple theme
-                hovertemplate="<b>%{x}</b><br>Importance: %{y:.0f}<extra></extra>",
-            )
-        )
         fig.update_layout(
             title="LightGBM Feature Importance (Gain)",
             xaxis_title="Features",
             yaxis_title="Importance Score",
             plot_bgcolor="white",
-            height=400,
             margin=dict(l=40, r=40, t=60, b=40),
         )
-        fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor="lightgray")
-        fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor="lightgray")
         return fig
     except:
         fig = go.Figure()
@@ -578,7 +844,7 @@ def create_feature_importance_plot(model, feature_cols):
         )
         fig.update_layout(
             title="LightGBM Feature Importance",
-            height=400,
             plot_bgcolor="white"
         )
         return fig
@@ -607,15 +873,16 @@ def create_prediction_details(model, new_point, feature_cols, target_col, predic
         return f"Predicted Value: {prediction:.3f}"
-def create_algorithm_summary(model, problem_type, n_estimators, max_depth, learning_rate, feature_cols):
     num_trees = model.num_trees() if hasattr(model, 'num_trees') else n_estimators
     return f"""
     **LightGBM {problem_type.title()} Model Summary:**
     - Trees Built: {num_trees}
-    - Max Depth: {max_depth if max_depth != -1 else 'Unlimited'}
     - Learning Rate: {learning_rate}
     - Features: {len(feature_cols)}
-    - Algorithm: Gradient Boosting (LightGBM)
     """

 import pandas as pd
 import numpy as np
+import io
+import base64
 import lightgbm as lgb
 from sklearn.preprocessing import LabelEncoder
 )
 from sklearn.model_selection import train_test_split
 from sklearn.metrics import accuracy_score, mean_squared_error
+# Import Plotly for interactive charts
 import plotly.graph_objects as go
 import plotly.express as px
+import matplotlib.pyplot as plt
+import matplotlib
+matplotlib.use('Agg')  # Use non-interactive backend
+# Add graphviz import for tree visualization
+try:
+    import graphviz
+    GRAPHVIZ_AVAILABLE = True
+except ImportError:
+    GRAPHVIZ_AVAILABLE = False
+    print("Warning: graphviz not available. Tree visualization will use fallback methods.")
 _current_model = None
 def run_lightgbm_and_visualize(df, target_col, new_point_dict,
+                               n_estimators, num_leaves, min_data_in_leaf, learning_rate, train_test_split_ratio=0.8, problem_type=None, use_early_stopping=True, early_stopping_rounds=20):
     X, y, new_point, feature_cols, _ = preprocess_data(df, target_col, new_point_dict)
     if problem_type is None:
     if n_estimators < 1:
         return None, None, None, None, "Number of estimators must be ≥ 1.", None
+    if num_leaves < 2:
+        return None, None, None, None, "Number of leaves must be ≥ 2.", None
+    if min_data_in_leaf < 1:
+        return None, None, None, None, "Min data in leaf must be ≥ 1.", None
     if learning_rate <= 0 or learning_rate > 1:
         return None, None, None, None, "Learning rate must be between 0 and 1.", None
         'objective': 'multiclass' if problem_type == "classification" and len(np.unique(y)) > 2 else 'binary' if problem_type == "classification" else 'regression',
         'num_class': len(np.unique(y)) if problem_type == "classification" and len(np.unique(y)) > 2 else None,
         'boosting_type': 'gbdt',
+        'num_leaves': int(num_leaves),  # Main parameter to control tree complexity
+        'min_data_in_leaf': int(min_data_in_leaf),  # Important parameter to prevent overfitting
         'learning_rate': float(learning_rate),
         'feature_fraction': 0.9,
         'bagging_fraction': 0.8,
     train_data = lgb.Dataset(X_train, label=y_train)
     val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)
+    # Custom callback to capture evaluation results
+    evals_result = {}
+    def record_eval(env):
+        """Custom callback to record evaluation results"""
+        if 'train' not in evals_result:
+            evals_result['train'] = {}
+            evals_result['eval'] = {}
+        # Get the metric name from the first evaluation
+        if env.evaluation_result_list:
+            metric_name = env.evaluation_result_list[0][1]  # Get metric name from first result
+            if metric_name not in evals_result['train']:
+                evals_result['train'][metric_name] = []
+                evals_result['eval'][metric_name] = []
+            # Record both training and validation results
+            for eval_name, eval_metric, eval_result, _ in env.evaluation_result_list:
+                if eval_name == 'train':
+                    evals_result['train'][eval_metric].append(eval_result)
+                elif eval_name == 'eval':
+                    evals_result['eval'][eval_metric].append(eval_result)
+    # Train model with configurable early stopping
+    callbacks = [lgb.log_evaluation(0), record_eval]
+    if use_early_stopping:
+        # Use user-specified early stopping rounds, but ensure it's reasonable
+        stopping_rounds = min(early_stopping_rounds, max(10, int(n_estimators * 0.2)))
+        callbacks.append(lgb.early_stopping(stopping_rounds=stopping_rounds, verbose=False))
+        print(f"Training with early stopping: {stopping_rounds} rounds")
+    else:
+        print(f"Training without early stopping: {n_estimators} rounds")
+    # Train the model with evaluation sets
     model = lgb.train(
         params,
         train_data,
         valid_sets=[train_data, val_data],
         valid_names=['train', 'eval'],
         num_boost_round=n_estimators,
+        callbacks=callbacks
     )
+    # Store evaluation results in the model
+    model.evals_result_ = evals_result
+    # Debug information
+    print(f"Training completed. Model has evals_result_: {hasattr(model, 'evals_result_')}")
+    print(f"Custom evals_result captured: {bool(evals_result)}")
+    if evals_result:
+        print(f"Custom evaluation results keys: {list(evals_result.keys())}")
+        if 'train' in evals_result:
+            print(f"Train metrics: {list(evals_result['train'].keys())}")
+            if evals_result['train']:
+                metric_name = list(evals_result['train'].keys())[0]
+                print(f"Train {metric_name} values count: {len(evals_result['train'][metric_name])}")
+        if 'eval' in evals_result:
+            print(f"Eval metrics: {list(evals_result['eval'].keys())}")
+            if evals_result['eval']:
+                metric_name = list(evals_result['eval'].keys())[0]
+                print(f"Eval {metric_name} values count: {len(evals_result['eval'][metric_name])}")
+    else:
+        print("No evaluation results captured by custom callback")
+    # Use best iteration if early stopping was used, otherwise use all trees
+    if use_early_stopping and hasattr(model, 'best_iteration'):
+        prediction = model.predict(new_point, num_iteration=model.best_iteration)[0]
+    else:
+        prediction = model.predict(new_point)[0]
     if problem_type == "classification":
         if len(np.unique(y)) == 2:  # Binary classification
             prediction = int(prediction > 0.5)
     loss_chart_fig = create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type)
     importance_fig = create_feature_importance_plot(model, feature_cols)
     prediction_details = create_prediction_details(model, new_point[0], feature_cols, target_col, prediction, problem_type)
+    summary = create_algorithm_summary(model, problem_type, n_estimators, num_leaves, min_data_in_leaf, learning_rate, feature_cols)
     aggregation_display = create_lightgbm_aggregation_display(model, new_point[0], problem_type, target_col, df, split_info)
     return None, loss_chart_fig, importance_fig, prediction, prediction_details, summary, aggregation_display
 def create_loss_chart(model, X_train, y_train, X_val, y_val, problem_type):
+    """Create an interactive loss chart showing training and validation loss evolution during LightGBM training"""
     try:
+        # Debug information
+        print(f"Loss chart: Model has evals_result_ attribute: {hasattr(model, 'evals_result_')}")
+        if hasattr(model, 'evals_result_'):
+            print(f"Loss chart: evals_result_ content: {model.evals_result_}")
+            if model.evals_result_:
+                print(f"Loss chart: evals_result_ keys: {list(model.evals_result_.keys())}")
+                if 'train' in model.evals_result_:
+                    print(f"Loss chart: train keys: {list(model.evals_result_['train'].keys())}")
+                if 'eval' in model.evals_result_:
+                    print(f"Loss chart: eval keys: {list(model.evals_result_['eval'].keys())}")
+            else:
+                print("Loss chart: evals_result_ is empty")
+        else:
+            print("Loss chart: Model does not have evals_result_ attribute")
+        # Check if model has evaluation results
+        if not hasattr(model, 'evals_result_') or not model.evals_result_:
+            # If no evaluation results, show a message instead of simulated data
+            fig = go.Figure()
+            fig.add_annotation(
+                text="No training history available<br>Run training with validation data to see loss evolution",
+                xref="paper", yref="paper",
+                x=0.5, y=0.5, xanchor='center', yanchor='middle',
+                showarrow=False,
+                font=dict(size=14)
+            )
+            fig.update_layout(
+                title="LightGBM Training Progress - Loss Evolution",
+                height=500,
+                plot_bgcolor="white"
+            )
+            return fig
         # Get evaluation results from LightGBM training history
         eval_results = model.evals_result_
                 y=train_losses,
                 mode='lines+markers',
                 name='Training Loss',
+                line=dict(color='#8E44AD', width=3),
+                marker=dict(size=6, color='#8E44AD'),
+                hovertemplate='<b>Training Loss</b><br>Round: %{x}<br>Loss: %{y:.4f}<extra></extra>'
             ))
             # Plot validation loss
                 y=val_losses,
                 mode='lines+markers',
                 name='Validation Loss',
+                line=dict(color='#3498DB', width=3),
+                marker=dict(size=6, color='#3498DB'),
+                hovertemplate='<b>Validation Loss</b><br>Round: %{x}<br>Loss: %{y:.4f}<extra></extra>'
             ))
             # Add early stopping line if available
                     x=model.best_iteration + 1,
                     line_dash="dash",
                     line_color="red",
+                    line_width=2,
+                    annotation_text=f"Best Iteration ({model.best_iteration + 1})",
+                    annotation_position="top"
                 )
             fig.update_layout(
                 xaxis_title="Boosting Round",
                 yaxis_title=metric_name.replace('_', ' ').title(),
                 plot_bgcolor="white",
+                height=500,
+                hovermode='x unified',
                 legend=dict(
                     yanchor="top",
                     y=0.99,
         )
         fig.update_layout(
             title="LightGBM Training Progress - Loss Evolution",
+            height=500,
             plot_bgcolor="white"
         )
         return fig
+def create_individual_tree_visualization(model, tree_index, feature_cols, problem_type, num_leaves=None):
+    """Create visualization of individual LightGBM tree using multiple methods with fallback"""
     try:
+        # Check if model is valid
+        if model is None:
+            raise Exception("Model is None - please run prediction first")
+        # Check if model has the required attributes
+        if not hasattr(model, 'num_trees'):
+            raise Exception("Model does not have num_trees attribute")
+        # Check if tree index is valid - use actual trees trained, not just best iteration
+        actual_trees = model.num_trees()
+        if hasattr(model, 'evals_result_') and model.evals_result_:
+            eval_results = model.evals_result_
+            if 'train' in eval_results and eval_results['train']:
+                metric_name = list(eval_results['train'].keys())[0]
+                actual_trees = len(eval_results['train'][metric_name])
+        if tree_index >= actual_trees:
+            # If tree index is beyond what was actually trained, show a message
+            raise IndexError(f"Tree {tree_index + 1} was not trained. Only {actual_trees} trees were actually trained. Best iteration was {model.best_iteration + 1 if hasattr(model, 'best_iteration') else 'unknown'}.")
+        # Try multiple visualization methods in order of preference
+        try:
+            # Method 1: Try lightgbm.plot_tree first (as requested by user)
+            return create_lightgbm_native_tree_plot(model, tree_index, feature_cols, problem_type, num_leaves)
+        except Exception as plot_error:
+            print(f"Native plot failed: {plot_error}")  # Debug info
+            try:
+                # Method 2: Try lightgbm.create_tree_digraph as fallback (best quality)
+                return create_lightgbm_digraph_tree_plot(model, tree_index, feature_cols, problem_type, num_leaves)
+            except Exception as digraph_error:
+                print(f"Digraph plot failed: {digraph_error}")  # Debug info
+                try:
+                    # Method 3: Fallback to manual visualization
+                    return create_lightgbm_tree_plot(tree_index, feature_cols, problem_type, model, num_leaves)
+                except Exception as manual_error:
+                    print(f"Manual plot failed: {manual_error}")  # Debug info
+                    raise Exception(f"All tree visualization methods failed: {manual_error}")
+    except Exception as e:
+        # Final fallback visualization with better error message
+        fig, ax = plt.subplots(figsize=(12, 8), dpi=100)
+        error_msg = str(e)
+        if "out of range" in error_msg:
+            # Get actual trees trained for better error message
+            actual_trees = model.num_trees() if model and hasattr(model, 'num_trees') else 0
+            if model and hasattr(model, 'evals_result_') and model.evals_result_:
+                eval_results = model.evals_result_
+                if 'train' in eval_results and eval_results['train']:
+                    metric_name = list(eval_results['train'].keys())[0]
+                    actual_trees = len(eval_results['train'][metric_name])
+            best_iteration = model.best_iteration + 1 if model and hasattr(model, 'best_iteration') else 'unknown'
+            display_msg = f"Tree {tree_index + 1} was not trained.\nOnly {actual_trees} trees were actually trained.\nBest iteration was {best_iteration}.\nPlease select a tree from 1 to {actual_trees}."
         else:
+            display_msg = f"Unable to visualize Tree {tree_index + 1}\nError: {error_msg}"
+        ax.text(0.5, 0.5, display_msg, ha='center', va='center', fontsize=14, color='red', transform=ax.transAxes)
+        ax.set_title(f"LightGBM Tree {tree_index + 1} Structure", fontsize=16, fontweight='bold')
+        ax.set_xlim(0, 1)
+        ax.set_ylim(0, 1)
+        ax.axis('off')
+        plt.tight_layout()
+        return fig
+def create_lightgbm_digraph_tree_plot(model, tree_index, feature_cols, problem_type, num_leaves=None):
+    """Create tree visualization using lightgbm.create_tree_digraph for better tree structure"""
+    try:
+        # Check if model has the required number of trees - use actual trees trained
+        if not hasattr(model, 'num_trees'):
+            raise Exception("Model does not have num_trees attribute")
+        actual_trees = model.num_trees()
+        if hasattr(model, 'evals_result_') and model.evals_result_:
+            eval_results = model.evals_result_
+            if 'train' in eval_results and eval_results['train']:
+                metric_name = list(eval_results['train'].keys())[0]
+                actual_trees = len(eval_results['train'][metric_name])
+        if tree_index >= actual_trees:
+            raise Exception(f"Tree {tree_index + 1} was not trained. Only {actual_trees} trees were actually trained. Best iteration was {model.best_iteration + 1 if hasattr(model, 'best_iteration') else 'unknown'}.")
+        # Check if graphviz is available
+        if not GRAPHVIZ_AVAILABLE:
+            raise Exception("graphviz not available for tree visualization")
+        # Create tree digraph using LightGBM's native function
+        try:
+            # Use lightgbm.create_tree_digraph to create the tree structure
+            dot_data = lgb.create_tree_digraph(
+                model,
+                tree_index=tree_index,
+                show_info=['split_gain', 'internal_value', 'internal_count', 'leaf_count'],
+                precision=3
+            )
+        except Exception as digraph_error:
+            # Try with simpler parameters
+            try:
+                dot_data = lgb.create_tree_digraph(
+                    model,
+                    tree_index=tree_index,
+                    show_info=['split_gain', 'internal_count'],
+                    precision=2
+                )
+            except Exception as simple_error:
+                # Try with minimal parameters
+                dot_data = lgb.create_tree_digraph(
+                    model,
+                    tree_index=tree_index
+                )
+        # Convert dot data to matplotlib figure
+        try:
+            # Render the graph to PNG format
+            png_data = dot_data.pipe(format='png')
+            # Create a matplotlib figure and display the image
+            fig, ax = plt.subplots(figsize=(20, 12), dpi=150)
+            # Load the PNG data and display it
+            from PIL import Image
+            import io as io_module
+            image = Image.open(io_module.BytesIO(png_data))
+            ax.imshow(image)
+            ax.axis('off')  # Hide axes
+            # Add title and information
+            ax.set_title(f'LightGBM Tree {tree_index + 1} - {problem_type.title()} (Using lightgbm.create_tree_digraph)',
+                        fontsize=18, fontweight='bold', pad=20, color='#8E44AD')
+            # Add num_leaves information if available
+            if num_leaves:
+                ax.text(0.02, 0.98, f'Max Leaves: {num_leaves}',
+                       transform=ax.transAxes, fontsize=12,
+                       bbox=dict(boxstyle="round,pad=0.3", facecolor="lightblue", alpha=0.7),
+                       verticalalignment='top')
+            # Add tree information
+            ax.text(0.98, 0.98, f'Tree Index: {tree_index + 1}\nTotal Trees: {model.num_trees()}',
+                   transform=ax.transAxes, fontsize=10,
+                   bbox=dict(boxstyle="round,pad=0.3", facecolor="lightgreen", alpha=0.7),
+                   verticalalignment='top', horizontalalignment='right')
+            plt.tight_layout()
+            return fig
+        except Exception as render_error:
+            raise Exception(f"Failed to render tree digraph: {str(render_error)}")
     except Exception as e:
+        # If lightgbm.create_tree_digraph fails, raise the error to trigger fallback
+        raise Exception(f"lightgbm.create_tree_digraph failed: {str(e)}")
+def create_lightgbm_native_tree_plot(model, tree_index, feature_cols, problem_type, num_leaves=None):
+    """Create tree visualization using lightgbm.plot_tree native functionality"""
+    try:
+        # Check if model has the required number of trees - use actual trees trained
+        if not hasattr(model, 'num_trees'):
+            raise Exception("Model does not have num_trees attribute")
+        actual_trees = model.num_trees()
+        if hasattr(model, 'evals_result_') and model.evals_result_:
+            eval_results = model.evals_result_
+            if 'train' in eval_results and eval_results['train']:
+                metric_name = list(eval_results['train'].keys())[0]
+                actual_trees = len(eval_results['train'][metric_name])
+        if tree_index >= actual_trees:
+            raise Exception(f"Tree {tree_index + 1} was not trained. Only {actual_trees} trees were actually trained. Best iteration was {model.best_iteration + 1 if hasattr(model, 'best_iteration') else 'unknown'}.")
+        # Create a matplotlib figure with higher DPI for better quality
+        fig, ax = plt.subplots(figsize=(20, 12), dpi=150)
+        # Use lightgbm.plot_tree to create the tree visualization
+        # Try with different parameter combinations for better compatibility
+        try:
+            # First try with comprehensive information
+            lgb.plot_tree(
+                model,
+                tree_index=tree_index,
+                ax=ax,
+                show_info=['split_gain', 'internal_value', 'internal_count', 'leaf_count'],
+                precision=3,
+                figsize=(20, 12)
+            )
+        except Exception as plot_error:
+            print(f"Comprehensive plot failed: {plot_error}")
+            # Try with simpler parameters
+            try:
+                lgb.plot_tree(
+                    model,
+                    tree_index=tree_index,
+                    ax=ax,
+                    show_info=['split_gain', 'internal_count'],
+                    precision=2,
+                    figsize=(20, 12)
+                )
+            except Exception as simple_error:
+                print(f"Simple plot failed: {simple_error}")
+                # Try with minimal parameters
+                try:
+                    lgb.plot_tree(
+                        model,
+                        tree_index=tree_index,
+                        ax=ax,
+                        figsize=(20, 12)
+                    )
+                except Exception as minimal_error:
+                    print(f"Minimal plot failed: {minimal_error}")
+                    # Try without figsize parameter
+                    lgb.plot_tree(
+                        model,
+                        tree_index=tree_index,
+                        ax=ax
+                    )
+        # Customize the plot
+        ax.set_title(f'LightGBM Tree {tree_index + 1} - {problem_type.title()} (Using lightgbm.plot_tree)',
+                    fontsize=18, fontweight='bold', pad=20, color='#8E44AD')
+        # Add num_leaves information if available
+        if num_leaves:
+            ax.text(0.02, 0.98, f'Max Leaves: {num_leaves}',
+                   transform=ax.transAxes, fontsize=12,
+                   bbox=dict(boxstyle="round,pad=0.3", facecolor="lightblue", alpha=0.7),
+                   verticalalignment='top')
+        # Add tree information
+        ax.text(0.98, 0.98, f'Tree Index: {tree_index}\nTotal Trees: {model.num_trees()}',
+               transform=ax.transAxes, fontsize=10,
+               bbox=dict(boxstyle="round,pad=0.3", facecolor="lightgreen", alpha=0.7),
+               verticalalignment='top', horizontalalignment='right')
+        # Adjust layout
+        plt.tight_layout()
+        # Return the matplotlib figure directly (no Plotly)
         return fig
+    except Exception as e:
+        # Log the error for debugging
+        print(f"Native plot failed: {str(e)}")
+        # If lightgbm.plot_tree fails, raise the error to trigger fallback
+        raise Exception(f"lightgbm.plot_tree failed: {str(e)}")
+def create_lightgbm_tree_plot(tree_index, feature_cols, problem_type, model, num_leaves=None):
     """Create tree visualization for LightGBM trees"""
     try:
+        # Use provided num_leaves or get from model params
+        if num_leaves is None:
+            num_leaves = model.params.get('num_leaves', 31) if hasattr(model, 'params') else 31
         # Create a representative visualization for LightGBM tree
+        return create_manual_tree_plot(tree_index, feature_cols, problem_type, "LightGBM", 1.0, model, num_leaves)
     except Exception as e:
         # Fallback to manual tree creation
+        return create_manual_tree_plot(tree_index, feature_cols, problem_type, "LightGBM", 1.0, None, num_leaves or 31)
+def create_manual_tree_plot(tree_index, feature_cols, problem_type, model_type, weight=1.0, model=None, num_leaves=None):
     """Create a manual tree visualization when tree structure is not easily accessible"""
+    fig, ax = plt.subplots(figsize=(12, 8), dpi=100)
+    # Create a simple tree visualization
     import random
     random.seed(tree_index)  # Consistent trees for same index
+    # Determine actual number of leaves to use
+    if num_leaves is not None:
+        actual_leaves = int(num_leaves)
+    elif model_type == "LightGBM" and model:
         try:
+            actual_leaves = model.params.get('num_leaves', 31) if hasattr(model, 'params') else 31
         except:
+            actual_leaves = 31
     else:
+        actual_leaves = 31
+    # Simple tree structure
     root_feature = random.choice(feature_cols) if feature_cols else "feature_0"
     root_threshold = round(random.uniform(0.1, 5.0), 2)
+    # Create a simple tree diagram
+    ax.text(0.5, 0.9, f"{model_type} Tree {tree_index + 1}",
+            ha='center', va='center', fontsize=16, fontweight='bold', transform=ax.transAxes)
+    ax.text(0.5, 0.7, f"Root: {root_feature} ≤ {root_threshold}",
+            ha='center', va='center', fontsize=14, transform=ax.transAxes,
+            bbox=dict(boxstyle="round,pad=0.3", facecolor='#8E44AD', alpha=0.7))
+    ax.text(0.2, 0.4, f"Left Leaf\nOutput: {round(random.uniform(-1, 1), 3)}\nSamples: {random.randint(20, 80)}",
+            ha='center', va='center', fontsize=12, transform=ax.transAxes,
+            bbox=dict(boxstyle="round,pad=0.3", facecolor='#3498DB', alpha=0.7))
+    ax.text(0.8, 0.4, f"Right Leaf\nOutput: {round(random.uniform(-1, 1), 3)}\nSamples: {random.randint(20, 80)}",
+            ha='center', va='center', fontsize=12, transform=ax.transAxes,
+            bbox=dict(boxstyle="round,pad=0.3", facecolor='#3498DB', alpha=0.7))
+    # Draw arrows
+    ax.annotate('', xy=(0.2, 0.5), xytext=(0.4, 0.7),
+                arrowprops=dict(arrowstyle='->', lw=2, color='gray'))
+    ax.annotate('', xy=(0.8, 0.5), xytext=(0.6, 0.7),
+                arrowprops=dict(arrowstyle='->', lw=2, color='gray'))
+    # Add tree info
+    title_suffix = f"Leaf-wise Tree ({actual_leaves} leaves)" if model_type == "LightGBM" else "Decision Tree"
+    ax.text(0.5, 0.1, f"{title_suffix} - {problem_type.title()}",
+            ha='center', va='center', fontsize=12, transform=ax.transAxes)
+    ax.set_xlim(0, 1)
+    ax.set_ylim(0, 1)
+    ax.axis('off')
+    plt.tight_layout()
     return fig
+def get_individual_tree_visualization(model, tree_index, feature_cols, problem_type, num_leaves=None):
+    return create_individual_tree_visualization(model, tree_index, feature_cols, problem_type, num_leaves)
 def create_feature_importance_plot(model, feature_cols):
         importances = model.feature_importance(importance_type='gain')
         order = np.argsort(importances)[::-1]
+        # Prepare data for Plotly
+        sorted_features = [feature_cols[i] for i in order]
+        sorted_importances = importances[order]
         fig = go.Figure()
+        # Create interactive bar plot
+        fig.add_trace(go.Bar(
+            x=sorted_features,
+            y=sorted_importances,
+            text=[f"{imp:.0f}" for imp in sorted_importances],
                 textposition="auto",
+            marker_color='#8E44AD',
+            marker_line=dict(color='#6C3483', width=1),
+            hovertemplate='<b>%{x}</b><br>Importance: %{y:.0f}<extra></extra>',
+            name='Feature Importance'
+        ))
         fig.update_layout(
             title="LightGBM Feature Importance (Gain)",
             xaxis_title="Features",
             yaxis_title="Importance Score",
             plot_bgcolor="white",
+            height=500,
+            hovermode='closest',
             margin=dict(l=40, r=40, t=60, b=40),
+            xaxis=dict(
+                tickangle=45,
+                showgrid=True,
+                gridwidth=1,
+                gridcolor='lightgray'
+            ),
+            yaxis=dict(
+                showgrid=True,
+                gridwidth=1,
+                gridcolor='lightgray'
+            )
         )
+        # Add interactive features
+        fig.update_traces(
+            marker_line_width=1,
+            marker_line_color='#6C3483'
+        )
         return fig
     except:
         fig = go.Figure()
         )
         fig.update_layout(
             title="LightGBM Feature Importance",
+            height=500,
             plot_bgcolor="white"
         )
         return fig
         return f"Predicted Value: {prediction:.3f}"
+def create_algorithm_summary(model, problem_type, n_estimators, num_leaves, min_data_in_leaf, learning_rate, feature_cols):
     num_trees = model.num_trees() if hasattr(model, 'num_trees') else n_estimators
     return f"""
     **LightGBM {problem_type.title()} Model Summary:**
     - Trees Built: {num_trees}
+    - Number of Leaves: {num_leaves}
+    - Min Data in Leaf: {min_data_in_leaf}
     - Learning Rate: {learning_rate}
     - Features: {len(feature_cols)}
+    - Algorithm: Leaf-wise Gradient Boosting (LightGBM)
     """