--- license: apache-2.0 --- # Model Card for MatroidNN ## Model Details ### Model Description **Model type:** Neural Network with Matroid-based Feature Selection (MatroidNN) **Version:** 1.0 **Framework:** PyTorch **Last updated:** February 27, 2025 ### Overview MatroidNN is a neural network architecture that incorporates matroid theory for feature selection. It addresses the challenge of feature redundancy by selecting a maximally independent set of features based on matroid theory principles before training the neural network. ### Model Architecture - **Feature Selection Component**: MatroidFeatureSelector using correlation-based dependency analysis - **Neural Network**: 3-layer feedforward network with batch normalization and dropout - **Input**: Varies based on the number of features selected by the matroid selector - **Hidden Layers**: Configurable hidden layer sizes (default 64 → 32) - **Output**: Multi-class classification (configurable number of classes) - **Parameters**: ~5K-10K parameters (varies based on input/output dimensions) ## Uses ### Direct Use MatroidNN is designed for classification tasks where feature redundancy is a potential issue. It's particularly useful for: - High-dimensional datasets with correlated features - Feature selection in biological/medical data - Financial prediction with multicollinear variables - Any classification task where feature independence is desired ### Out-of-Scope Use This model is not intended for: - Regression tasks (without modification) - Time series prediction (without temporal adaptations) - Raw image or text classification (without appropriate feature extraction) ## Training Data The model was developed and tested using synthetic data with deliberate feature dependencies. For real-world applications, the model should be retrained on domain-specific data. ### Training Dataset - **Type**: Synthetic data with controlled dependencies - **Size**: 1000 samples (default), configurable - **Features**: 20 initial features (default), configurable - **Classes**: 3 classes (default), configurable - **Distribution**: Equal class distribution in the synthetic data ## Performance ### Metrics On synthetic test data with 3 classes: - **Accuracy**: 94.0% - **Macro-average F1-score**: 0.93 - **Per-class metrics**: - Class 0: Precision 0.96, Recall 1.00, F1 0.98 - Class 1: Precision 0.86, Recall 0.86, F1 0.86 - Class 2: Precision 0.97, Recall 0.93, F1 0.95 ### Factors Performance may vary based on: - Feature correlation structure in the dataset - Number of initial features and their information content - Class distribution balance - Rank threshold parameter in the MatroidFeatureSelector ## Limitations - The matroid-based feature selection uses correlation as a proxy for independence, which may not capture all forms of dependency - The current implementation assumes numerical features and may require adaptation for categorical features - Feature selection is performed once before training and does not adapt during training - The rank threshold parameter requires careful tuning based on the dataset ## Ethical Considerations - Feature selection might unintentionally exclude features that are important for fairness considerations - The model inherits any biases present in the training data - Results should be interpreted with caution in high-stakes applications, with human oversight ## Technical Specifications ### Hardware Requirements - Training: CUDA-capable GPU recommended for larger datasets - Inference: CPU sufficient for most applications ### Software Requirements - Python 3.8+ - PyTorch 1.8+ - NumPy 1.20+ - scikit-learn 0.24+ ### Training Hyperparameters - **Batch size**: 32 (default) - **Learning rate**: 0.001 (default) - **Optimizer**: Adam - **Loss function**: Cross-Entropy Loss - **Epochs**: Early stopping based on validation loss (patience=10) - **Feature selection rank threshold**: 0.7 (default, configurable) ## How to Use ```python from matroid_nn import MatroidFeatureSelector, MatroidNN # Initialize feature selector selector = MatroidFeatureSelector(rank_threshold=0.7) # Apply feature selection X_train_selected = selector.fit_transform(X_train) X_test_selected = selector.transform(X_test) # Create and train model model = MatroidNN( input_size=X_train_selected.shape[1], hidden_size=64, output_size=num_classes )