Explain Machine Learning Model Using SHAP

Explain Machine Learning
Model using SHAP
Avinash Navlani
·
Follow
5 min read
·
Nov 23, 2022
3
Learn SHAP tool to understand feature contribution in prediction

model.
Most of the Machine Learning and Neural Network models are
difficult to interpret. Generally, Those models are a BlackBox that
makes it hard to understand, explain, and interpret. Data scientists
always focus only on output performance of a model but not on
model interpretabiility and explainability. Data Scientists need
certain tools to understand and explain the model for an intuitive
understanding of the machine learning model. We have one such
tool SHAP that explain how Your Machine Learning Model Works.
SHAP(SHapley Additive exPlanations) provides the very useful
for model explainability using simple plots such as summary and
force plots.
In this article, we’re going to explain model explainability using

SHAP package in python.
Source: https://shap.readthedocs.io/en/latest/index.html
What is SHAP?
SHAP stands for SHapley Additive exPlanations. It is based on
a game theoretic approach and explains the output of any machine
learning model using visualization tools.
SHAP Characteristics
• It is mainly used for explaining the predictions of any
machine learning model by computing the contribution
of features into the prediction model.
• It is a combination of various tools such as lime,

SHAPely sampling values, DeepLift, QII, and other tools.
• It calculates the consistent outcome as the sum of each

feature contribution.
• It does not evaluate the quality of the prediction model.
• It summary plots provide a useful overview of the model.
• The main disadvantage is that its computing time is

high.
SHAP Installation
SHAP can be installed from either PyPI:
pip install shap
We can also install using conda-forge:

conda install -c conda-forge shap
Loading Dataset
Let’s first load the required HR dataset using pandas’ read CSV
function. You can download data from the following
link: https://www.kaggle.com/liujiaqi/hr-comma-sepcsv
import pandas # for dataframes

import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
data=pandas.read_csv('HR_comma_sep.csv')data.head()
Output:
Prepare the Data

Lots of machine learning algorithms require numerical input data,
so you need to represent categorical columns in a numerical column.
In order to encode this data, you could map each value to a number.
e.g. Salary column’s value can be represented as low:0, medium:1,
and high:2.
This process is known as label encoding, and sklearn conveniently
will do this for you using LabelEncoder.
# Import LabelEncoder
from sklearn import preprocessing
# Creating labelEncoder
le = preprocessing.LabelEncoder()# Converting string labels into
numbers.
data['salary']=le.fit_transform(data['salary'])
data['departments']=le.fit_transform(data['departments'])
Here, you imported the preprocessing module and created the Label
Encoder object. Using this LabelEncoder object you fit and
transform the “salary” and “Departments“ column into the numeric
column.
Build Model
First we split the dataset and then we build the model.
Let’s split dataset by using function train_test_split(). you need to

pass basically 3 parameters features, target, and test_set size.
Additionally, you can use random_state in order to get the same
kind of train and test set.
X = data[['satisfaction_level', 'last_evaluation',
'number_project',
'average_montly_hours', 'time_spend_company',
'Work_accident',
'promotion_last_5years', 'departments' ,'salary']]
y = data['left']# Import train_test_split function
from sklearn.model_selection import train_test_split# Split
dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42) # 70% training and 30% test
Let’s build an employee churn prediction model.
Here, you are going to predict churn using Random Forest

Classifier.
#Import Random Forest Classifier model

from sklearn.ensemble import RandomForestClassifier
# Create Random Forest Classifier

rf = RandomForestClassifier()# Train the model using the
training sets
rf.fit(X_train, y_train)# Predict the response for test dataset
y_pred = rf.predict(X_test)
Evaluate Model
# Import sklearn metrics

from sklearn import metrics
# Model Accuracy, how often is the classifier correct?

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))# Model
Precision
print("Precision:",metrics.precision_score(y_test, y_pred))#
Model Recall
print("Recall:",metrics.recall_score(y_test, y_pred))
Output:
Accuracy: 0.9871111111111112
Precision: 0.9912790697674418
Recall: 0.9542910447761194
Well, you got a classification rate of 98%, considered as good

accuracy.
Model Interpretability
Now, we’ll move to the model interpretability using SHAP. First we
will calculate the SHAP values.
In order to calculate the SHAP values, we need to create

TreeExplainer object and compute the SHAP values for a sample or
the full dataset:
# Create object that can calculate shap values

explainer = shap.TreeExplainer(rf)
# Calculate Shap values

shap_values = explainer.shap_values(X_test)
Lets build the summary plot using summary_plot() method.
# Create summary_plot
shap.summary_plot(shap_values, X_test)
Output:
In the above example, feature’s importance is arranged in
descending order from the highest to the lowest. This order is
showing the impact of features on prediction. It shows the absolute
SHAP value so it doesn’t matter predictions are affected positive or
negative. This plot is also showing the
Lets plot the force plot to see the impact of features on predictions
by observation. Force plots shows the features contribution to the
model’s prediction for a specific observation.
shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[1], X_test)
Output:
This plot is the generalised plot. Now I am selecting the

last_evaluation(or employee performance) as feature on both the
axis.
Above plot is showing the SHAP value for last_evaluation (or
employee performance). When employee’s performance is lower
than 0.43 employee is leaving or company is firing them. When
employee’s performance score is between 0.57 to 0.82 than
employee is leaving the firm.
Summary
Congratulations, you have made it to the end of this tutorial!
I hope this article will help you in model interpretability and

explainability. This one of the most important tool for any data
scientist to dig down into the model.
In this tutorial, you have learned about how to interpret the model
and understand feature contribution. Don’t stop here! I recommend
you try different classifiers on different datasets. You can also try
SHAP on text and image datasets. In upcoming lecture we will focus
on text data, image data and deep learning based models
interpretability.
originally published on https://machinelearninggeek.com/explain-
machine-learning-model-using-shap/
Reach out to me on
Linkedin: https://www.linkedin.com/in/avinash-navlani/
Member-only story
SHAP: Shapley Additive

Explanations
A step-by-step guide for understanding how SHAP works
and how to interpret ML models by using the SHAP
library
Fernando Lopez
·
Follow
Published in
Towards Data Science
12 min read
Jul 12, 2021
246
5
Figure 1. SHAP | Image by author | Icons by freepick
In the past decade, we have witnessed the explosion of the age of

artificial intelligence, driven by academia and embraced by the
industry. Such has been the inclusion of AI in everyday life that even
business models revolve around artificial intelligence models.
AI has shown impressive results in areas such as natural language

processing, computer vision, etc. Today, it takes just a few lines of
code to implement state-of-the-art AI models, it’s fascinating.
However, as humans, how can we interpret the predictions that AI
models make? How can we measure the importance that the model
gives the data to be inferred? Well, in this blog we will talk about this
controversial topic: the interpretability of artificial intelligence
models.
The interpretability of AI models is an active research area. Several

alternatives have been proposed in recent years, and in this blog, we
will focus on one in particular: SHAP (Shapley Additive
Explanations).
In this blog we are going to see the interpretability of ML models
from the SHAP approach, we will see how SHAP works and its main
component: Shapley Values. Therefore, this blog will be divided into
the following sections:
• What are Shapley values?
• What is SHAP?
• SHAP in action: A classification problem
• Conclusion
I recommend you go for a cup of coffee and make yourself

comfortable ☕️!
What are Shapely values?

Shapley values are a concept of the cooperative game theory field,
whose objective is to measure each player’s contribution to the
game. The method for obtaining Shapley values was proposed by
Lloyd Shapley [1] in 1953. Shapley values emerge from the context
where “n” players participate collectively obtaining a reward “p”
which is intended to be fairly distributed at each one of the “n”
players according to the individual contribution, such a contribution
is a Shapley value.
In simple words, a Shapley value is the average marginal

contribution of an instance of a feature among all possible
coalitions. Average marginal contribution? all possible coalitions?
Let’s see in detail what all this refers to.
Let’s say that a group of friends (A , B , C , D ) is working together

obtain a profit P . To distribute the profit fairly, it is intended to
measure the contribution of each member, that is, the Shapley value
of every friend. To calculate the Shapley value of a given member,
the difference between the profit that is generated when the member
is present is calculated with respect to the profit that is generated
when that member is absent (such difference is the marginal
contribution of the given member to the current coalition), this is
done for each all the subgroups (or coalitions) that can be generated
where the member for whom you are looking to calculate your
contribution is present. The mean of the differences obtained (the
mean marginal contribution) is the Shapley value.
In the following figure, we see a representation of the calculation of

the marginal contribution of the friend A to the collation composed
of friends B , C and D.
Figure 2. Marginal Contribution of member “A” to the coalition of members
B, C, D. | Image by author | Icons by freepick
For example, to calculate the Shapley value of the friend A we would

need to construct all the collations where the friend A appears, for
each collation, the marginal contribution would be calculated (the
difference between the profit obtained when the member is present
versus the profit obtained when the member is absent) and given all
marginal contributions, the mean marginal contribution would be
calculated, that is, the Shapley value. Simple, right?
Figure 3 shows the representation of the calculation of the Shapley

value for the friend A.
Figure 3. Shapely value calculation for member “A” | Image by author | Icons
by freepick
Well, up to this point we already know what Shapley value is, how it
is calculated and what is its interpretation. However, how are
Shapley values related to the interpretability of ML models? Let’s see
how this happens.
In the context of an ML model, let’s think that each friend in our

example is a feature, the game is the function that generates
predictions, and the profit is the prediction.
However, ML models usually have a large number of features where

each feature is a discrete or continuous variable, this causes it to be
computationally very complicated to calculate Shapley values for
each instance of each feature, in fact, it is an NP-hard problem.
And it is at this point where SHAP becomes the protagonist. In the
next section, we will see what SHAP is and its approach to the
interpretability of ML models.
Maybe it’s time to refill the coffee cup ☕️!
What is SHAP?
Shapley Additive Explanations (SHAP), is a method introduced by

Lundberg and Lee in 2017 [2] for the interpretation of predictions of
ML models through Shapely values. The key idea of SHAP is to
calculate the Shapley values for each feature of the sample to be
interpreted, where each Shapley value represents the impact that the
feature to which it is associated, generates in the prediction.
The intuition behind SHAP is easy to understand, for each feature

there is an associated Shapley value. However, how does SHAP
work? In the previous section, we saw that the calculation of Shapley
values can become intractable for many features. Well, to avoid this,
the authors introduced Kernel Shap, an extended and adapted
method from linear LIME [3] to calculate Shapley values.
Kernel Shap is a method that allows the calculation of Shapley

values with much fewer coalition samples. Kernel Shap is based on a
weighted linear regression where the coefficients of the solution are
the Shapley values. To build the weighted linear model, n sample
coalitions are taken, for each coalition, the prediction is obtained
and the weight is calculated with the kernel shap. Finally, the
weighted linear model is fit and the resulting coefficients are the
Shapley values.
A bit complicated, right? Let’s see it in detail.
Let’s say we want to explain the instance x , which consists of

features f1 , f2 , f3 , and f4 . The procedure begins by taking a set of
coalition samples. For example, the coalition (1,0,1,0) refers to the
presence of the features f1 and f3 , and the absence of the
features f2 and f4 . Since ML models cannot omit features to make
inferences, the values of features f2 and f4 are replaced by values
taken from the training set. Then, for the coalition (1,0,1,0)the
values of features f1 and f3 are taken from the instance x and the
values of features f2 and f4 come from the training set, in this way
the prediction can be made correctly. Therefore, for each coalition
sample, the prediction is obtained and the weight is calculated with
the kernel shap described in equation 1. Solving the linear model,
the resulting coefficients are the Shapley values.
Equation 1. Kernel Shap
Everything is clearer, right? Nothing is better than an image that

illustrates the entire process. In Figure 4 we see the descriptive
process for the calculation of Shapley values for a given instance X.
Figure 4. Descriptive process of SHAP to obtain Shapley values from an ML
model | Image by author
As we can see from the figure above, the process to obtain Shapley
values from an ML model through Kernel Shap is not a complicated
process but it is laborious. The main ingredients are coalitions,
predictions, and weights.
It is important to note that trying to calculate all the coalitions

makes the problem intractable. This is the reason why coalition
samples are taken, the larger the sample, the lower the uncertainty.
Finally, it is important to say Kernel Shap is the only method model-

agnostic for the calculation of Shapley values, that is, Kernel Shap
can interpret any ML model regardless of its nature. On the other
hand, the authors also propose other variants to obtain Shapley
values based on different types of models such as Tree SHAP, Deep
SHAP, Low-Order SHAP, Linear SHAP, and Max SHAP.
Well, it’s time to see SHAP in action. In the next section, we will see
how to use the shap library to obtain Shapley values from an ML
model. I think it’s time to refill the coffee cup ☕️!
SHAP in action: A classification problem

SHAP and its variants are integrated into the python library shap ,
which, in addition to providing different methods for calculating
Shapely values, also integrates several methods for the visualization
and interpretation of results.
The goal of this section is to show how to use the shap library to
calculate, plot, and interpret Shapley values from a classification
problem. So, let’s go for it!
The complete project, with the examples shown in this article, can
be found at: https://github.com/FernandoLpz/SHAP-
Classification-Regression
The dataset
For this example, we will use the Breast Cancer Wisconsin

(Diagnostic) dataset [4], whose features describe the characteristics
of a tumor and where the objective is to predict whether the tumor is
benign or malignant. The dataset contains 100 samples, 8
independent features
( radius , texture , perimeter , area , smoothness , compactness , symmetr
y and fractal_dimension ) and one dependent feature
( diagnosis_result , the target feature).
The target feature contains 2 categories M = Malign and B = Benign .

For practical purposes, we have coded the categories as follows: 0 =
Malign and 1 = Benign.
The model
For this problem, we have implemented and optimized a model

based on Random Forest obtaining an accuracy of 92% in the test set.
The classifier implementation is shown in the following code
snippet.
Code snippet 1. Training and optimization of Random Forest Classifier
In the code snippet above, we implement a Random Forest based

classifier which has been optimized using the Optuna library. From
this point on, we already have a model that makes predictions which
we intend to interpret with SHAP, so let’s see how we do it!
Interpretation: Shapley values
Suppose we have 2 samples that we intend to interpret, a sample

that belongs to class 1 ( Benign ) and another sample that belongs to
class 0 ( Malign ), which are shown in Figures 5 and 6 respectively.
Figure 5. Positive sample | Class = Benign
Figure 6. Negative sample | Class = Malign

Since our ML model is decision tree based, to obtain Shapley values
we will use TreeShap. Therefore, we first need to import the library
and initialize the explainer by passing as parameter our classifier, as
shown in the next code snippet:
Code snippet 2. TreeExplainer initialization
Next, we need the samples for which we want to calculate the

Shapley values. In this case, I extract from the training dataset the
samples that correspond to those shown in figures 5 and 6
respectively (it is important to mention that, for the purpose of this
explanation, I am using one sample at a time, however the full
dataset can also be used).
Code snippet 3. Positive and negative samples
Just to check, the predictions for each sample are:

Code snippet 4. Predictions for positive and negative samples
To calculate the Shapley values, we use the shap_values method that

we extend from explainer. The argument it receives is the sample
that we intend to interpret:
Code snippet 5. Get Shapley values for positive and negative sample
And there we have it, the Shapley values! But wait a minute, how do
we interpret these values? why are they printed from index [1] ?
Well, let’s see this in detail.
First, the
variables shap_values_positive and shap_values_negative contain the
Shapley values corresponding to each feature for both the positive
and negative class, therefore, shap_values_positive[0] contains the
Shapley values of each feature with respect to class 0 (or Malign )
and shap_values_positive[1] contains Shapley values of each feature
with respect to class 1 (or Benign ), the same applies
for shap_values_negative. For practicality, we use the results with
respect to the class 1 .
Then, for both the positive and the negative samples, we obtain a
list of 8 values, which correspond to the Shapley values for each
feature. In the following figure, we can see more clearly each sample
with its respective Shapley value:
Figure 7. Shapley values for each feature for each positive and negative
sample
Do you remember the essence of the Shapley values explained in the

first section? Well, exactly the same thing happens here, features
with high Shapley values have a greater impact, features with low
Shapley values have less impact with respect to the prediction. We
can also interpret Shapley values as, features with high shap
values push towards one class and low shap values push towards the
other class.
From the positive sample, we see that the features with the highest
Shapley values are perimeter , compactness and area . From the
negative sample, the features with the lowest Shapley values
are perimeter , area and compactness .
As we mentioned, the shap library also provides tools for generating
plots. Let’s see how to plot a called force_plot for our positive and
negatie samples:
Code snippet 6. Plot positve and negative samples
The resulting plots for each sample are:
Figure 8. Force plot for positive sample | Class = Benign | Image by author
Figure 9. Force plot for negative sample | Class = Malign | Image by author
As we can see from both plots, the impact of each feature

corresponds to the Shapely values described in Figure 7. In red we
see high Shapley values, that is, they push towards
class 1 (or Benign), in blue we see low Shapley values, that is, they
push towards class 0 (or Malign ). The force_plot is a great
visualization to understand the impact of each feature on a specific
sample for a given prediction.
The force_plot() is a great visualization to understand the impact of

each feature on a specific sample, likewise the shap library provides
various types of visualizations to favor the interpretation of Shapley
values. For example, the summary_plot() method provides
information about the feature importance as well as the impact of
the Shapley value by varying the value of each feature.
For this example, the summary_plot() is given by:
Code snippet 7. Summary plot
generating the following plot:
Figure 10. Summary plot
On the left side we observe each feature ordered according to its

importance. The perimeter feature being the most important
and texture feature being the least important. The color represents
the values that each feature can take, red for high values and blue for
low values. Therefore, for the feature perimeter , if the values are
high (red) the Shapley values will be low and consequently it will be
pushing towards class 0 (or Malign ), otherwise, when the values are
low (blue) the Shapley values they will be high that consequently
they will be pushing towards class 1 (or Benign ). This can be verified
with the results shown in Figures 8 and 9 respectively.
Another important observation is that
the perimeter , compactness and area features are the ones that
generate the most impact on the predictions given their associated
Shapely values, that is, their Shapley values are very high or very
low. On the other hand, the rest of the features do not generate as
much impact because their associated Shapley values are closer zero.
A variation of the plot summary_plot() , presents in the form of bars

the impact of each feature, such a plot can be obtained with:
Code snippet 8. Summary plot, bars
generating the following plot:
Figure 11. Summary plot, bars
As in figure 10, in figure 11 we can see each feature ordered

according to impact with respect to its associated Shapley values. In
the same way, we observe that the
features perimeter , compactness and area are the ones that have the
greatest impact on the model.
The complete project, with the examples shown in this article, can
be found at: https://github.com/FernandoLpz/SHAP-
Classification-Regression
And after several cups of coffee, for now we have reached the end!
Conclusion
In this blog we saw what SHAP (Shapley Additive Explanations) is.
In the first section we talked about the origin and the interpretation
from the Shapely values. In the second section we learned what
SHAP is, how it works and its support based on LIME and Shapely
values for the interpretability of ML models. Finally in the third
section we saw how to use the shap library and we show an example
to interpret the results returned by the shap library.
The shap library provides a large number of tools to visualize and

interpret ML models. In this blog we only saw a few examples. It is
recommended to take a look at the documentation to get the
maximum benefit from this great library.
“Read, learn, write, share. Repeat”

— Fernando López Velasco.
References
[1] A Value for n-Person Games. L.S. Shapley. 1953

[2] A Unified Approach to Interpreting Model Predictions
[3] “Why Should I Trust You?” Explaining the Predictions of Any

Classifier
[4] Breast Cancer Wisconsin (Diagnostic)

Explain Machine Learning Model Using SHAP

Uploaded by

Copyright:

Available Formats

Explain Machine Learning Model Using SHAP

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Explain Machine Learning Model Using SHAP

Uploaded by

Copyright:

Available Formats

Explain Machine Learning

Model using SHAP

Learn SHAP tool to understand feature contribution in prediction

In this article, we’re going to explain model explainability using

• It is a combination of various tools such as lime,

• It calculates the consistent outcome as the sum of each

• It does not evaluate the quality of the prediction model.

• It summary plots provide a useful overview of the model.

• The main disadvantage is that its computing time is

SHAP can be installed from either PyPI:

pip install shap

We can also install using conda-forge:

import pandas # for dataframes

Prepare the Data

First we split the dataset and then we build the model.

Let’s split dataset by using function train_test_split(). you need to

Here, you are going to predict churn using Random Forest

#Import Random Forest Classifier model

# Create Random Forest Classifier

# Import sklearn metrics

# Model Accuracy, how often is the classifier correct?

Well, you got a classification rate of 98%, considered as good

In order to calculate the SHAP values, we need to create

# Create object that can calculate shap values

# Calculate Shap values

Lets build the summary plot using summary_plot() method.

This plot is the generalised plot. Now I am selecting the

Congratulations, you have made it to the end of this tutorial!

I hope this article will help you in model interpretability and

SHAP: Shapley Additive

Towards Data Science

Jul 12, 2021

In the past decade, we have witnessed the explosion of the age of

AI has shown impressive results in areas such as natural language

The interpretability of AI models is an active research area. Several

• What are Shapley values?

• SHAP in action: A classification problem

I recommend you go for a cup of coffee and make yourself

What are Shapely values?

In simple words, a Shapley value is the average marginal

Let’s say that a group of friends (A , B , C , D ) is working together

In the following figure, we see a representation of the calculation of

For example, to calculate the Shapley value of the friend A we would

Figure 3 shows the representation of the calculation of the Shapley

In the context of an ML model, let’s think that each friend in our

However, ML models usually have a large number of features where

Maybe it’s time to refill the coffee cup ☕️!

Shapley Additive Explanations (SHAP), is a method introduced by

The intuition behind SHAP is easy to understand, for each feature

Kernel Shap is a method that allows the calculation of Shapley

A bit complicated, right? Let’s see it in detail.

Let’s say we want to explain the instance x , which consists of

Equation 1. Kernel Shap

Everything is clearer, right? Nothing is better than an image that

It is important to note that trying to calculate all the coalitions

Finally, it is important to say Kernel Shap is the only method model-

SHAP in action: A classification problem

For this example, we will use the Breast Cancer Wisconsin

The target feature contains 2 categories M = Malign and B = Benign .