Credit Card Defaults Part 1: Classification

Classification with Imbalanced Target

Summary

The goal of this project was to design predictive binary classification models to predict whether credit card account holders will default on their payments in the next month. The models address the imbalance in the target variable. Gradient Boosting and neural network models are highlighted. The paper and presentation walk through the data understanding and preparation, different models tested, methodology, evaluation and anticipated follow-up steps to the project.

Tools

Scikit-learn
Keras
Seaborn
Matplotlib
Numpy
Pandas
Scipy

Data

UCI

Models / Methods / Metrics

Gradient Boosting Classification
Artificial Neural Network
Random Forest
Logistic Regression / LASSO Logistic Regression
Receiver Operating Characteristic curve and Youden’s J statistic
Feature Selection:
- Principal Component Analysis
- ANOVA and Feature Importance Models
Log-Transformation and Scaling
GridSearch
Recall, Log-Loss and Binary Crossentropy Loss

Results

The Gradient Boosting Classification model had the best Recall and Log Loss Error scores. 62.43% of the actual default accounts were labeled as true positives. The Log Loss Error was .4545. The Artificial Neural Network had a Recall score of .6989 and a binary crossentropy loss of .5958. These scores resulted from addressing the imbalanced target variable.

Project Preview

Exploratory Data Analysis

The EDA shows there are distinctions between the default records and the non-default records.

ECD

PAY1

MEAN

Principal Component Analysis

PCA was implemented because of multicollinearity between groups of input variables.

PCA

Modeling

The imbalanced target variable was addressed by using predicted probabilities for positive outcome based on best classification threshold, and for the Artificial Neural Network, by weighting the binary target classes.

Evaluation

Gradient Boosting Classification, Logistic Regression and Random Forest Models:

RESULTS1

Artificial Neural Networks:

ANNRESULTS

The Complete Project: here.

Share on

Twitter Facebook Google+ LinkedIn

Mary Donovan Martello

Credit Card Defaults Part 1: Classification

Classification with Imbalanced Target

Project Preview

Exploratory Data Analysis

Principal Component Analysis

Modeling

Evaluation

The Complete Project: here.

Share on

You May Also Enjoy

Fraud Prediction in R: Credit Card Transactions

Credit Card Defaults Part 2: Imbalanced Data and Deployment

SQL

Deep Learning Projects