|
# SMS Spam Detection: Combined Model Card |
|
|
|
## Models |
|
|
|
### 1. Multinomial Naive Bayes |
|
- **Type:** MultinomialNB |
|
- **Library:** scikit-learn |
|
- **Description:** A Naive Bayes classifier for multinomially distributed data, commonly used for text classification tasks. |
|
- **Training Data:** SMS Spam Collection dataset (`train.csv`), preprocessed and vectorized using CountVectorizer. |
|
- **Features:** Bag-of-words (unigrams), stopwords removed. |
|
- **Target:** `label` (0: ham, 1: spam) |
|
- **Accuracy:** `{{ accuracy_score(tahmin, y_test) }}` |
|
- **Date Trained:** `{{ datetime.now().strftime("%Y-%m-%d") }}` |
|
|
|
### 2. Decision Tree Classifier |
|
- **Type:** DecisionTreeClassifier |
|
- **Library:** scikit-learn |
|
- **Description:** A decision tree classifier for binary classification of SMS messages. |
|
- **Training Data:** SMS Spam Collection dataset (`train.csv`), preprocessed and vectorized using CountVectorizer. |
|
- **Features:** Bag-of-words (unigrams), stopwords removed. |
|
- **Target:** `label` (0: ham, 1: spam) |
|
- **Accuracy:** `{{ accuracy_score(tahmin3, y_test) }}` |
|
- **Date Trained:** `{{ datetime.now().strftime("%Y-%m-%d") }}` |
|
|
|
## Preprocessing |
|
|
|
- Lowercasing all text |
|
- Removing punctuation, digits, and newlines |
|
- Stopwords removed during vectorization |
|
|
|
## Evaluation Metric |
|
|
|
- Accuracy on test set |
|
|
|
## Notes |
|
|
|
- Models saved using joblib. |
|
- For further evaluation, consider precision, recall, and F1-score. |