Explain Machine Learning Model Using SHAP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Explain Machine Learning

Model using SHAP

Avinash Navlani
·
Follow
5 min read
·
Nov 23, 2022
3

Learn SHAP tool to understand feature contribution in prediction


model.
Most of the Machine Learning and Neural Network models are
difficult to interpret. Generally, Those models are a BlackBox that
makes it hard to understand, explain, and interpret. Data scientists
always focus only on output performance of a model but not on
model interpretabiility and explainability. Data Scientists need
certain tools to understand and explain the model for an intuitive
understanding of the machine learning model. We have one such
tool SHAP that explain how Your Machine Learning Model Works.
SHAP(SHapley Additive exPlanations) provides the very useful
for model explainability using simple plots such as summary and
force plots.

In this article, we’re going to explain model explainability using


SHAP package in python.

Source: https://shap.readthedocs.io/en/latest/index.html

What is SHAP?
SHAP stands for SHapley Additive exPlanations. It is based on
a game theoretic approach and explains the output of any machine
learning model using visualization tools.

SHAP Characteristics
• It is mainly used for explaining the predictions of any
machine learning model by computing the contribution
of features into the prediction model.

• It is a combination of various tools such as lime,


SHAPely sampling values, DeepLift, QII, and other tools.

• It calculates the consistent outcome as the sum of each


feature contribution.

• It does not evaluate the quality of the prediction model.

• It summary plots provide a useful overview of the model.

• The main disadvantage is that its computing time is


high.

SHAP Installation

SHAP can be installed from either PyPI:

pip install shap

We can also install using conda-forge:


conda install -c conda-forge shap

Loading Dataset
Let’s first load the required HR dataset using pandas’ read CSV
function. You can download data from the following
link: https://www.kaggle.com/liujiaqi/hr-comma-sepcsv

import pandas # for dataframes


import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs

data=pandas.read_csv('HR_comma_sep.csv')data.head()

Output:

Prepare the Data


Lots of machine learning algorithms require numerical input data,
so you need to represent categorical columns in a numerical column.

In order to encode this data, you could map each value to a number.
e.g. Salary column’s value can be represented as low:0, medium:1,
and high:2.
This process is known as label encoding, and sklearn conveniently
will do this for you using LabelEncoder.

# Import LabelEncoder
from sklearn import preprocessing

# Creating labelEncoder
le = preprocessing.LabelEncoder()# Converting string labels into
numbers.
data['salary']=le.fit_transform(data['salary'])
data['departments']=le.fit_transform(data['departments'])

Here, you imported the preprocessing module and created the Label
Encoder object. Using this LabelEncoder object you fit and
transform the “salary” and “Departments“ column into the numeric
column.

Build Model

First we split the dataset and then we build the model.

Let’s split dataset by using function train_test_split(). you need to


pass basically 3 parameters features, target, and test_set size.
Additionally, you can use random_state in order to get the same
kind of train and test set.
X = data[['satisfaction_level', 'last_evaluation',
'number_project',
'average_montly_hours', 'time_spend_company',
'Work_accident',
'promotion_last_5years', 'departments' ,'salary']]
y = data['left']# Import train_test_split function
from sklearn.model_selection import train_test_split# Split
dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42) # 70% training and 30% test
Let’s build an employee churn prediction model.

Here, you are going to predict churn using Random Forest


Classifier.

#Import Random Forest Classifier model


from sklearn.ensemble import RandomForestClassifier

# Create Random Forest Classifier


rf = RandomForestClassifier()# Train the model using the
training sets
rf.fit(X_train, y_train)# Predict the response for test dataset
y_pred = rf.predict(X_test)

Evaluate Model

# Import sklearn metrics


from sklearn import metrics

# Model Accuracy, how often is the classifier correct?


print("Accuracy:",metrics.accuracy_score(y_test, y_pred))# Model
Precision
print("Precision:",metrics.precision_score(y_test, y_pred))#
Model Recall
print("Recall:",metrics.recall_score(y_test, y_pred))

Output:

Accuracy: 0.9871111111111112
Precision: 0.9912790697674418
Recall: 0.9542910447761194

Well, you got a classification rate of 98%, considered as good


accuracy.

Model Interpretability
Now, we’ll move to the model interpretability using SHAP. First we
will calculate the SHAP values.

In order to calculate the SHAP values, we need to create


TreeExplainer object and compute the SHAP values for a sample or
the full dataset:

# Create object that can calculate shap values


explainer = shap.TreeExplainer(rf)

# Calculate Shap values


shap_values = explainer.shap_values(X_test)

Lets build the summary plot using summary_plot() method.

# Create summary_plot
shap.summary_plot(shap_values, X_test)

Output:
In the above example, feature’s importance is arranged in
descending order from the highest to the lowest. This order is
showing the impact of features on prediction. It shows the absolute
SHAP value so it doesn’t matter predictions are affected positive or
negative. This plot is also showing the

Lets plot the force plot to see the impact of features on predictions
by observation. Force plots shows the features contribution to the
model’s prediction for a specific observation.

shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[1], X_test)

Output:

This plot is the generalised plot. Now I am selecting the


last_evaluation(or employee performance) as feature on both the
axis.
Above plot is showing the SHAP value for last_evaluation (or
employee performance). When employee’s performance is lower
than 0.43 employee is leaving or company is firing them. When
employee’s performance score is between 0.57 to 0.82 than
employee is leaving the firm.

Summary

Congratulations, you have made it to the end of this tutorial!

I hope this article will help you in model interpretability and


explainability. This one of the most important tool for any data
scientist to dig down into the model.

In this tutorial, you have learned about how to interpret the model
and understand feature contribution. Don’t stop here! I recommend
you try different classifiers on different datasets. You can also try
SHAP on text and image datasets. In upcoming lecture we will focus
on text data, image data and deep learning based models
interpretability.
originally published on https://machinelearninggeek.com/explain-
machine-learning-model-using-shap/

Reach out to me on
Linkedin: https://www.linkedin.com/in/avinash-navlani/
Member-only story

SHAP: Shapley Additive


Explanations
A step-by-step guide for understanding how SHAP works
and how to interpret ML models by using the SHAP
library

Fernando Lopez
·

Follow
Published in

Towards Data Science

12 min read

Jul 12, 2021

246
5
Figure 1. SHAP | Image by author | Icons by freepick

In the past decade, we have witnessed the explosion of the age of


artificial intelligence, driven by academia and embraced by the
industry. Such has been the inclusion of AI in everyday life that even
business models revolve around artificial intelligence models.

AI has shown impressive results in areas such as natural language


processing, computer vision, etc. Today, it takes just a few lines of
code to implement state-of-the-art AI models, it’s fascinating.
However, as humans, how can we interpret the predictions that AI
models make? How can we measure the importance that the model
gives the data to be inferred? Well, in this blog we will talk about this
controversial topic: the interpretability of artificial intelligence
models.

The interpretability of AI models is an active research area. Several


alternatives have been proposed in recent years, and in this blog, we
will focus on one in particular: SHAP (Shapley Additive
Explanations).
In this blog we are going to see the interpretability of ML models
from the SHAP approach, we will see how SHAP works and its main
component: Shapley Values. Therefore, this blog will be divided into
the following sections:

• What are Shapley values?

• What is SHAP?

• SHAP in action: A classification problem

• Conclusion

I recommend you go for a cup of coffee and make yourself


comfortable ☕️!

What are Shapely values?


Shapley values are a concept of the cooperative game theory field,
whose objective is to measure each player’s contribution to the
game. The method for obtaining Shapley values was proposed by
Lloyd Shapley [1] in 1953. Shapley values emerge from the context
where “n” players participate collectively obtaining a reward “p”
which is intended to be fairly distributed at each one of the “n”
players according to the individual contribution, such a contribution
is a Shapley value.

In simple words, a Shapley value is the average marginal


contribution of an instance of a feature among all possible
coalitions. Average marginal contribution? all possible coalitions?
Let’s see in detail what all this refers to.

Let’s say that a group of friends (A , B , C , D ) is working together


obtain a profit P . To distribute the profit fairly, it is intended to
measure the contribution of each member, that is, the Shapley value
of every friend. To calculate the Shapley value of a given member,
the difference between the profit that is generated when the member
is present is calculated with respect to the profit that is generated
when that member is absent (such difference is the marginal
contribution of the given member to the current coalition), this is
done for each all the subgroups (or coalitions) that can be generated
where the member for whom you are looking to calculate your
contribution is present. The mean of the differences obtained (the
mean marginal contribution) is the Shapley value.

In the following figure, we see a representation of the calculation of


the marginal contribution of the friend A to the collation composed
of friends B , C and D.
Figure 2. Marginal Contribution of member “A” to the coalition of members
B, C, D. | Image by author | Icons by freepick

For example, to calculate the Shapley value of the friend A we would


need to construct all the collations where the friend A appears, for
each collation, the marginal contribution would be calculated (the
difference between the profit obtained when the member is present
versus the profit obtained when the member is absent) and given all
marginal contributions, the mean marginal contribution would be
calculated, that is, the Shapley value. Simple, right?

Figure 3 shows the representation of the calculation of the Shapley


value for the friend A.
Figure 3. Shapely value calculation for member “A” | Image by author | Icons
by freepick

Well, up to this point we already know what Shapley value is, how it
is calculated and what is its interpretation. However, how are
Shapley values related to the interpretability of ML models? Let’s see
how this happens.

In the context of an ML model, let’s think that each friend in our


example is a feature, the game is the function that generates
predictions, and the profit is the prediction.

However, ML models usually have a large number of features where


each feature is a discrete or continuous variable, this causes it to be
computationally very complicated to calculate Shapley values for
each instance of each feature, in fact, it is an NP-hard problem.
And it is at this point where SHAP becomes the protagonist. In the
next section, we will see what SHAP is and its approach to the
interpretability of ML models.

Maybe it’s time to refill the coffee cup ☕️!

What is SHAP?

Shapley Additive Explanations (SHAP), is a method introduced by


Lundberg and Lee in 2017 [2] for the interpretation of predictions of
ML models through Shapely values. The key idea of SHAP is to
calculate the Shapley values for each feature of the sample to be
interpreted, where each Shapley value represents the impact that the
feature to which it is associated, generates in the prediction.

The intuition behind SHAP is easy to understand, for each feature


there is an associated Shapley value. However, how does SHAP
work? In the previous section, we saw that the calculation of Shapley
values can become intractable for many features. Well, to avoid this,
the authors introduced Kernel Shap, an extended and adapted
method from linear LIME [3] to calculate Shapley values.

Kernel Shap is a method that allows the calculation of Shapley


values with much fewer coalition samples. Kernel Shap is based on a
weighted linear regression where the coefficients of the solution are
the Shapley values. To build the weighted linear model, n sample
coalitions are taken, for each coalition, the prediction is obtained
and the weight is calculated with the kernel shap. Finally, the
weighted linear model is fit and the resulting coefficients are the
Shapley values.

A bit complicated, right? Let’s see it in detail.

Let’s say we want to explain the instance x , which consists of


features f1 , f2 , f3 , and f4 . The procedure begins by taking a set of
coalition samples. For example, the coalition (1,0,1,0) refers to the
presence of the features f1 and f3 , and the absence of the
features f2 and f4 . Since ML models cannot omit features to make
inferences, the values of features f2 and f4 are replaced by values
taken from the training set. Then, for the coalition (1,0,1,0)the
values of features f1 and f3 are taken from the instance x and the
values of features f2 and f4 come from the training set, in this way
the prediction can be made correctly. Therefore, for each coalition
sample, the prediction is obtained and the weight is calculated with
the kernel shap described in equation 1. Solving the linear model,
the resulting coefficients are the Shapley values.

Equation 1. Kernel Shap

Everything is clearer, right? Nothing is better than an image that


illustrates the entire process. In Figure 4 we see the descriptive
process for the calculation of Shapley values for a given instance X.
Figure 4. Descriptive process of SHAP to obtain Shapley values from an ML
model | Image by author

As we can see from the figure above, the process to obtain Shapley
values from an ML model through Kernel Shap is not a complicated
process but it is laborious. The main ingredients are coalitions,
predictions, and weights.

It is important to note that trying to calculate all the coalitions


makes the problem intractable. This is the reason why coalition
samples are taken, the larger the sample, the lower the uncertainty.

Finally, it is important to say Kernel Shap is the only method model-


agnostic for the calculation of Shapley values, that is, Kernel Shap
can interpret any ML model regardless of its nature. On the other
hand, the authors also propose other variants to obtain Shapley
values based on different types of models such as Tree SHAP, Deep
SHAP, Low-Order SHAP, Linear SHAP, and Max SHAP.

Well, it’s time to see SHAP in action. In the next section, we will see
how to use the shap library to obtain Shapley values from an ML
model. I think it’s time to refill the coffee cup ☕️!

SHAP in action: A classification problem


SHAP and its variants are integrated into the python library shap ,
which, in addition to providing different methods for calculating
Shapely values, also integrates several methods for the visualization
and interpretation of results.

The goal of this section is to show how to use the shap library to
calculate, plot, and interpret Shapley values from a classification
problem. So, let’s go for it!

The complete project, with the examples shown in this article, can
be found at: https://github.com/FernandoLpz/SHAP-
Classification-Regression

The dataset

For this example, we will use the Breast Cancer Wisconsin


(Diagnostic) dataset [4], whose features describe the characteristics
of a tumor and where the objective is to predict whether the tumor is
benign or malignant. The dataset contains 100 samples, 8
independent features
( radius , texture , perimeter , area , smoothness , compactness , symmetr
y and fractal_dimension ) and one dependent feature
( diagnosis_result , the target feature).

The target feature contains 2 categories M = Malign and B = Benign .


For practical purposes, we have coded the categories as follows: 0 =

Malign and 1 = Benign.

The model

For this problem, we have implemented and optimized a model


based on Random Forest obtaining an accuracy of 92% in the test set.
The classifier implementation is shown in the following code
snippet.
Code snippet 1. Training and optimization of Random Forest Classifier

In the code snippet above, we implement a Random Forest based


classifier which has been optimized using the Optuna library. From
this point on, we already have a model that makes predictions which
we intend to interpret with SHAP, so let’s see how we do it!

Interpretation: Shapley values

Suppose we have 2 samples that we intend to interpret, a sample


that belongs to class 1 ( Benign ) and another sample that belongs to
class 0 ( Malign ), which are shown in Figures 5 and 6 respectively.

Figure 5. Positive sample | Class = Benign

Figure 6. Negative sample | Class = Malign


Since our ML model is decision tree based, to obtain Shapley values
we will use TreeShap. Therefore, we first need to import the library
and initialize the explainer by passing as parameter our classifier, as
shown in the next code snippet:
Code snippet 2. TreeExplainer initialization

Next, we need the samples for which we want to calculate the


Shapley values. In this case, I extract from the training dataset the
samples that correspond to those shown in figures 5 and 6
respectively (it is important to mention that, for the purpose of this
explanation, I am using one sample at a time, however the full
dataset can also be used).
Code snippet 3. Positive and negative samples

Just to check, the predictions for each sample are:


Code snippet 4. Predictions for positive and negative samples

To calculate the Shapley values, we use the shap_values method that


we extend from explainer. The argument it receives is the sample
that we intend to interpret:
Code snippet 5. Get Shapley values for positive and negative sample

And there we have it, the Shapley values! But wait a minute, how do
we interpret these values? why are they printed from index [1] ?
Well, let’s see this in detail.

First, the
variables shap_values_positive and shap_values_negative contain the
Shapley values corresponding to each feature for both the positive
and negative class, therefore, shap_values_positive[0] contains the
Shapley values of each feature with respect to class 0 (or Malign )
and shap_values_positive[1] contains Shapley values of each feature
with respect to class 1 (or Benign ), the same applies
for shap_values_negative. For practicality, we use the results with
respect to the class 1 .

Then, for both the positive and the negative samples, we obtain a
list of 8 values, which correspond to the Shapley values for each
feature. In the following figure, we can see more clearly each sample
with its respective Shapley value:

Figure 7. Shapley values for each feature for each positive and negative
sample

Do you remember the essence of the Shapley values explained in the


first section? Well, exactly the same thing happens here, features
with high Shapley values have a greater impact, features with low
Shapley values have less impact with respect to the prediction. We
can also interpret Shapley values as, features with high shap
values push towards one class and low shap values push towards the
other class.

From the positive sample, we see that the features with the highest
Shapley values are perimeter , compactness and area . From the
negative sample, the features with the lowest Shapley values
are perimeter , area and compactness .
As we mentioned, the shap library also provides tools for generating
plots. Let’s see how to plot a called force_plot for our positive and
negatie samples:
Code snippet 6. Plot positve and negative samples

The resulting plots for each sample are:

Figure 8. Force plot for positive sample | Class = Benign | Image by author

Figure 9. Force plot for negative sample | Class = Malign | Image by author

As we can see from both plots, the impact of each feature


corresponds to the Shapely values described in Figure 7. In red we
see high Shapley values, that is, they push towards
class 1 (or Benign), in blue we see low Shapley values, that is, they
push towards class 0 (or Malign ). The force_plot is a great
visualization to understand the impact of each feature on a specific
sample for a given prediction.

The force_plot() is a great visualization to understand the impact of


each feature on a specific sample, likewise the shap library provides
various types of visualizations to favor the interpretation of Shapley
values. For example, the summary_plot() method provides
information about the feature importance as well as the impact of
the Shapley value by varying the value of each feature.
For this example, the summary_plot() is given by:
Code snippet 7. Summary plot

generating the following plot:

Figure 10. Summary plot

On the left side we observe each feature ordered according to its


importance. The perimeter feature being the most important
and texture feature being the least important. The color represents
the values that each feature can take, red for high values and blue for
low values. Therefore, for the feature perimeter , if the values are
high (red) the Shapley values will be low and consequently it will be
pushing towards class 0 (or Malign ), otherwise, when the values are
low (blue) the Shapley values they will be high that consequently
they will be pushing towards class 1 (or Benign ). This can be verified
with the results shown in Figures 8 and 9 respectively.
Another important observation is that
the perimeter , compactness and area features are the ones that
generate the most impact on the predictions given their associated
Shapely values, that is, their Shapley values are very high or very
low. On the other hand, the rest of the features do not generate as
much impact because their associated Shapley values are closer zero.

A variation of the plot summary_plot() , presents in the form of bars


the impact of each feature, such a plot can be obtained with:
Code snippet 8. Summary plot, bars

generating the following plot:

Figure 11. Summary plot, bars

As in figure 10, in figure 11 we can see each feature ordered


according to impact with respect to its associated Shapley values. In
the same way, we observe that the
features perimeter , compactness and area are the ones that have the
greatest impact on the model.
The complete project, with the examples shown in this article, can
be found at: https://github.com/FernandoLpz/SHAP-
Classification-Regression

And after several cups of coffee, for now we have reached the end!

Conclusion

In this blog we saw what SHAP (Shapley Additive Explanations) is.

In the first section we talked about the origin and the interpretation
from the Shapely values. In the second section we learned what
SHAP is, how it works and its support based on LIME and Shapely
values for the interpretability of ML models. Finally in the third
section we saw how to use the shap library and we show an example
to interpret the results returned by the shap library.

The shap library provides a large number of tools to visualize and


interpret ML models. In this blog we only saw a few examples. It is
recommended to take a look at the documentation to get the
maximum benefit from this great library.

“Read, learn, write, share. Repeat”


— Fernando López Velasco.

References

[1] A Value for n-Person Games. L.S. Shapley. 1953


[2] A Unified Approach to Interpreting Model Predictions

[3] “Why Should I Trust You?” Explaining the Predictions of Any


Classifier

[4] Breast Cancer Wisconsin (Diagnostic)

You might also like