Redicting Ustomer Hurn: Yilun Gu, Anna Klutho, Yinglu Liu, Yuhuai Wang, Hao Yan
Redicting Ustomer Hurn: Yilun Gu, Anna Klutho, Yinglu Liu, Yuhuai Wang, Hao Yan
Redicting Ustomer Hurn: Yilun Gu, Anna Klutho, Yinglu Liu, Yuhuai Wang, Hao Yan
W
Yilun Gu, Anna Klutho, Yinglu Liu,
Yuhuai Wang, Hao Yan
E
EXECUTIVE SUMMARY
P ROBLEM R ECOMMENDATIONS
How can QWE predict customer churn and increase customer
retention in the coming months? Given this information, QWE should implement
ANALYSIS
D EFINING CHURN
Before investing what drivers influence customer churn, it is important to first investigate the overall churn rate within the
QWE customer base. This gives us a basic understanding of the problem QWE is facing with customer retention.
Of the 6437 customers in the database, only 323 have left QWE (“churned”) between November and December 2011. This
gives a starting churn rate of 5.1%. This means that across QWE’s customer base, 5.1% have left the company in the last
month.
CUSTOMER AGE
It is natural to assume that the length of a customer
relationship (“Customer Age”) would have a large impact on
customer retention. But can Customer Age predict churn on
its own? The graph left shows the relationship between
Customer Age and Customer Churn (where 1 = Customer
Churn & 0 = No Customer Churn).
1
LOOKING AT OTHER DRIVERS – CUSTOMER HAPPINESS INDEX (CHI)
However, while Customer Age may not be the best to predict churn on its own, this does not eliminate the possibility of other
drivers having the ability to singularly explain churn. The question asked here is – What driver explains customer churn the
best on its own?
Using statistical methods like correlation and univariate logistic regression across all 11 provided customer characteristics, our
team deeply explored the impact of the current Customer Happiness Index score on its ability to predict customer churn.
We chose this variable because it had the strongest association with customer churn (with a correlation value of - 0.084)
and the highest significance as an individual predictor of churn (p = 2.04e-11).
Using this information, our team built a model to predict the probability of customer churn for 3 randomly selected customers
(Customers 672, 354, & 5203):
This information confirms Wall’s theory that happiness would be a major driver of a customer churn. As happiness goes up,
the probability of a customer leaving decreases.
MULTIPLE LOGISTIC REGRESSION – PREDICTING INDIVIDUAL POSSIBILITIES OF CHURN
Multiple Logistic Regression (MLR) is a statistical technique that allows us to incorporate multiple customer characteristics to
determine the probability of customer churn. The results from this analysis provides for the calculation of churn
probabilities for each individual customers, which can then be used to rank customers as the “riskiest” or most likely to
churn.
For this approach, our team chose the following customer characteristics to include in the model:
• Change of Customer Happiness Index Score (between November and December)
• Customer Age (expressed in months as a QWE customer)
• Recency of Logins (expressed through days since last login)
• Current Customer Happiness Index Score
• Change in Number of Blog Views
These variables were selected because they had the highest significance in a model that included all 11 possible customer
characteristics (see Exhibit A in the Appendix for more details). This means they had the highest impact on churn within the
model. By re-running the model with these 5 characteristics, we can predict the probability of customer churn for the
aforementioned randomly selected customers (Customers 672, 354, & 5203):
2
Table 2 – Probability of Churn for Customers 672, 354, & 5203 (MLR)
As mentioned earlier, the advantage of this
approach is that we are able to get a list of
Customer 672 354 5203
individual customers and their individual
Probability of Churn* 3.4% 3.3% 5.3%
probabilities, allowing QWE management to
specifically target the needs of these
*Please see Exhibit B in the Appendix for MLR Model
customers.
COMPARING METHODS
In comparing the results of the Decision Tree method to that of the Multiple Logistic Regression, there is a difference in the
final churn probabilities predicted for each customer (see table below).
3
Table 4 – Comparing Results for Customers 672, 354, & 5203
Customer Decision Tree Multiple Logistic Regression Customer Actually Churn?
672 3.9% 3.4% No
354 3.9% 3.3% No
5203 3.9% 5.3% No
This difference occurs for two reasons:
• Different variables to determine the chance of churn
o While the decision tree method uses four variables selected by the computer to determine probability, MLR
uses the five variables selected by the team to calculate the chance of churn.
ACCURACY
It’s important to evaluate the accuracy of our recommended method as well. Given that accuracy reflects the percentage of
what we predict will happen versus what actually happened, it is important to maximize accuracy in order to correctly
capture the current situation. In addition, accuracy is an important measure, as it is easily understood and communicated
across a business.
Our team chose a threshold of 12% (i.e. we predict a customer will churn if P(C) ≥ 12%), as it provides the highest accuracy
across this model overall. At this level, the MLR model has a 93.4% accuracy rate.
4
RECOMMENDATION
In the end, we see that the following three drivers* have the highest impact on predicting customer churn:
• Change in Customer Happiness Index (between November and December)
• Customer Age (expressed in months as a customer)
• Regency of Logins (expressed through days since last login)
*While the MLR model included 5 drivers in is calculations, these three characteristics had the most significant coefficients in
the model
Intuitively, this relationship makes sense; a change in happiness level, the length of a customer relationship, and the activeness
of the customer (as expressed through a recency in logins) logically could have a significant impact on customer churn. This
fact is confirmed by our model.
Therefore, it is recommended that QWE management take action to build strategies to address these three drivers in their
operations. Such strategies include:
• Customer Satisfaction Programs – When the Customer Happiness Index score drops dramatically, personalized
outreach to these individuals with problem-solving solutions would be beneficial.
• Incentives to Increase Login Recency and Frequency – One possible incentive is the reduction of QWE subscription
price based on the number and frequency of logins in a month.
T herefore, by implementing these strategies, QWE may be able to reduce churn for their company in the future.
APPENDIX
EXHIBIT A – RESULTS FROM MULTIPLE LOGISTIC REGRESSION WITH ALL 11 VARIABLES
Estimate Std. Error p-value Significance
Level
(Intercept) -2.76E+00 1.069e-01 -25.841 <0.0000000000000002 ***
CHI -4.657e-03 1.223e-03 -3.808 0.00014 ***
Age 1.271e-02 5.370e-03 2.366 0.01799 *
Change in CHI -1.027e-02 2.474e-03 -4.153 0.0000329 ***
Cases -1.524e-01 1.049e-01 -1.452 0.14643
Change in Cases 1.703e-01 9.050e-02 1.881 0.05992 .
SP 1.593e-02 1.022e-01 0.156 0.87611
Change in SP -5.194e-02 7.852e-02 -0.661 0.50830
Logins 2.893e-04 2.092e-03 0.138 0.89002
Blogs 2.905e-04 1.960e-02 0.015 0.98817
Views -1.098e-04 4.071e-05 -2.697 0.00700 **
Days since Last Login 1.724e-02 4.289e-03 4.020 0.0000581 ***
EXHIBIT B – MLR PREDICTION MODEL