Model description
This is a Gaussian Naive Bayes model trained on a synthetic dataset, containining a large variety of transaction types representing normal activities as well as abnormal/fraudulent activities generated by J.P. Morgan AI Research. The model predicts whether a transaction is normal or fraudulent.
Intended uses & limitations
For educational purposes
Training Procedure
The data preprocessing steps applied include the following:
- Dropping high cardinality features. This includes Transaction ID, Sender ID, Sender Account, Beneficiary ID, Beneficiary Account, Sender Sector
- Dropping no variance features. This includes Sender LOB
- Dropping Time and date feature since the model is not time-series based
- Transforming and Encoding categorical features namely: Sender Country, Beneficiary Country, Transaction Type, and the target variable, Label
- Applying feature scaling on all features
- Splitting the dataset into training/test set using 85/15 split ratio
- Handling imbalanced dataset using imblearn framework and applying RandomUnderSampler method to eliminate noise which led to a 2.5% improvement in accuracy
Hyperparameters
Click to expand
Hyperparameter | Value |
---|---|
memory | |
steps | [('preprocessorAll', ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))])), ('classifier', GaussianNB())] |
verbose | False |
preprocessorAll | ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))]) |
classifier | GaussianNB() |
preprocessorAll__n_jobs | |
preprocessorAll__remainder | passthrough |
preprocessorAll__sparse_threshold | 0.3 |
preprocessorAll__transformer_weights | |
preprocessorAll__transformers | [('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))] |
preprocessorAll__verbose | False |
preprocessorAll__verbose_feature_names_out | True |
preprocessorAll__cat | Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]) |
preprocessorAll__num | Pipeline(steps=[('scale', StandardScaler())]) |
preprocessorAll__cat__memory | |
preprocessorAll__cat__steps | [('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))] |
preprocessorAll__cat__verbose | False |
preprocessorAll__cat__onehot | OneHotEncoder(handle_unknown='ignore', sparse_output=False) |
preprocessorAll__cat__onehot__categories | auto |
preprocessorAll__cat__onehot__drop | |
preprocessorAll__cat__onehot__dtype | <class 'numpy.float64'> |
preprocessorAll__cat__onehot__handle_unknown | ignore |
preprocessorAll__cat__onehot__max_categories | |
preprocessorAll__cat__onehot__min_frequency | |
preprocessorAll__cat__onehot__sparse | deprecated |
preprocessorAll__cat__onehot__sparse_output | False |
preprocessorAll__num__memory | |
preprocessorAll__num__steps | [('scale', StandardScaler())] |
preprocessorAll__num__verbose | False |
preprocessorAll__num__scale | StandardScaler() |
preprocessorAll__num__scale__copy | True |
preprocessorAll__num__scale__with_mean | True |
preprocessorAll__num__scale__with_std | True |
classifier__priors | |
classifier__var_smoothing | 1e-09 |
Model Plot
Pipeline(steps=[('preprocessorAll',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country','Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['USD_amount'], dtype='object'))])),('classifier', GaussianNB())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('preprocessorAll',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country','Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['USD_amount'], dtype='object'))])),('classifier', GaussianNB())])
ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country', 'Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale', StandardScaler())]),Index(['USD_amount'], dtype='object'))])
['Sender_Country', 'Bene_Country', 'Transaction_Type']
OneHotEncoder(handle_unknown='ignore', sparse_output=False)
Index(['USD_amount'], dtype='object')
StandardScaler()
[]
passthrough
GaussianNB()
Evaluation Results
Metric | Value |
---|---|
accuracy | 0.794582 |
Model Explainability
SHAP was used to determine the important features that helps the model make decisions
Confusion Matrix
Model Card Authors
This model card is written by following authors: Seifullah Bello
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.