Insurance Fraud in Python

Classification Prediction

Summary

The goal of this project was to use Python to identify significant features in fraudulent insurance claim transactions and to design predictive classification models to predict whether fraud was reported on the insurance claim transaction. The project addressed the imbalanced target variable by weighting the classes. Logistic Regression, Support Vector Machine Classification, and Random Forest models were tested. The paper walks through the data understanding and preparation, different models tested, methodology, and evaluation of the project.

Tools

Scikit-learn
Seaborn
Matplotlib
Yellowbrick
Numpy
Pandas
Scipy
Patsy
Tabulate
Counter

Data: claims

Methodology

Compared multiple versions of models that varied techniques for data-splitting, the imbalanced target variable, and feature selection.

Models / Methods / Metrics

Random Forest
Logistic Regression / LASSO Logistic Regression
Support Vector Classification
Principal Component Analysis
Log-Transformation and Scaling
GridSearch
Recall

Project Preview

Exploratory Data Analysis

This project utilized the EDA from this Exploratory Data Analysis and Hypothesis Testing project: EDA.

Principal Component Analysis

PCA was implemented because of multicollinearity between groups of input variables.

Mary Donovan Martello

Insurance Fraud in Python

Classification Prediction

Summary

Tools

Methodology

Models / Methods / Metrics

Project Preview

Exploratory Data Analysis

Principal Component Analysis

Evaluation

The Complete Project: here.

Share on

You May Also Enjoy

Fraud Prediction in R: Credit Card Transactions

Credit Card Defaults Part 2: Imbalanced Data and Deployment

Credit Card Defaults Part 1: Classification

SQL