Na-Rajan/MutualFundTERModel

Model Card: TERPredictor V1 📌 Model Name TERPredictor V1 – A linear regression model for predicting the Total Expense Ratio (TER) of mutual fund Regular Plans.

📖 Overview TERPredictor V1 is a regression model trained to estimate the 'Regular Plan - Total TER (%)' of mutual funds based on various financial features. It uses a simple linear regression approach and achieves near-perfect performance on the test set. Due to the unusually high accuracy, this model is best suited for exploratory analysis and feature relationship interpretation, rather than generalization to unseen data.

📊 Intended Uses Expense Ratio Estimation: Estimate TER for new or hypothetical mutual fund structures.

Outlier Detection: Identify funds with unusually high or low TERs.

Feature Impact Analysis: Understand which components most influence TER.

🧠 Model Architecture Attribute Value Model Type Linear Regression Framework scikit-learn Input Features 10 float64 columns Target Variable Regular Plan - Total TER (%) Identifier Dropped Scheme Name (object) 📚 Training Details Dataset Size: 1,622 samples

Train/Test Split: 1297 / 325

Missing Values: None

Preprocessing:

Dropped identifier column (Scheme Name)

No normalization required due to linear model simplicity

📈 Evaluation Metrics Metric Value Mean Squared Error (MSE) 0.000001 R-squared (R²) 0.999999 ⚠️ Note: These metrics suggest potential data leakage or a deterministic relationship between features and target. Use with caution.

🚀 How to Use python from terpredictor import TERModel

model = TERModel.load_pretrained("your-huggingface-username/terpredictor-v1") input_data = { "feature_1": 0.12, "feature_2": 0.03, ... } predicted_ter = model.predict(input_data) ⚠️ Limitations Potential Data Leakage: Extremely high R² may indicate the target is directly derived from input features.

Limited Generalization: Not recommended for predicting TER on unseen or structurally different funds.

No Feature Engineering: Model assumes raw features are sufficient.

📄 License MIT License

👤 Author Created by [Your Name or Organization]

📚 Recommendations for Open-Sourcing Include full training code and preprocessing steps

Provide detailed explanation of evaluation metrics

Add cautionary notes about performance anomalies

Consider publishing a cleaned or anonymized version of the dataset