Ist 407 Presentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Insurance

Connor Hanan, Emma Lehr, Natalie Ruppel


Research Questions

1. What factors constitute the


beneficiary getting charged the most?
2. Which machine learning algorithm is
best at predicting the charges?
About Our Dataset
● We are using the insurance.csv dataset found on kaggle:
https://www.kaggle.com/mirichoi0218/insurance
● There are 1,338 records
● There are 7 attributes: Age, Sex, BMI, Number of Children, Region, Smoking (yes or
no), and charges (how much was the person charged for insurance)
● Our defined target variable is “charges”
For added complexity:
We included an additional dataset that gave health attributes about specific states.
We linked this to our dataset by assigning the states to the already existing regions. We
made the addition because we wanted to see if there was more evidence for confirming a
relationship between health insurance costs and region beyond the given attributes in our
main dataset.
Experiments
Decision
Tree/Pruned
Decision Tree
What is it? A decision tree is a specific
type of flow chart used to visualize the
decision-making process by mapping
out different courses of action, as well
as their potential outcomes.
Support Vector Machines
What is it? SVMs work by trying to Kernels Used:
divide up all the data points using
● Linear
the kernel trick. This draws a line
● Radial
(called a “hyperplane”), trying to
● Polynomial
maximize the distance between the
different classes of points as
possible.
KNN
What is it? KNN works by finding the distances between a query
and all the examples in the data, selecting the specified number
examples (K) closest to the query, then votes for the most frequent
label (in the case of classification) or averages the labels (in the
case of regression).
Naive Bayes
What is it? It is a classification technique based on Bayes' Theorem with an
assumption of independence among predictors. In simple terms, a Naive Bayes
classifier assumes that the presence of a particular feature in a class is unrelated to
the presence of any other feature.
Multiclass Artificial Neural Network

What is it? In multi-class classification, the


neural network has the same number of
output nodes as the number of classes.
Each output node belongs to some class
and outputs a score for that class.
Multi-Class Classification (3 classes)
Scores from the last layer are passed
through a softmax layer.
Multiclass XGBoost
What is it? XGBoost is a decision-tree-based ensemble Machine Learning
algorithm that uses an extreme gradient boosting framework. The
extreme version is the exact same as the original, with the extreme one
being focused on speed and performance.
Comparison of the Results
Conclusion:
Being a smoker had the highest information gain in determining the health
insurance cost.

The machine learning algorithm best at predicting charges is the KNN model. The
machine algorithm the worst at predicting charges is Naive Bayes.

KNN had the highest classification accuracy of 94.69%

You might also like