Module 4 - Logistic Regression - Afterclass1b

IIMT 2641 Introduction to Business Analytics
Module 4: Logistic Regression
1
Ask the experts!
§ Critical decisions are often made by people with expert knowledge
§ Use logistic regression to model experts – physician – assess the quality of

healthcare patients receive
§ Healthcare Quality Assessment

– Good quality care educates patients and controls costs
– Need to assess quality for proper medical interventions
– No single set of guidelines for defining quality of healthcare (not well-defined)
– Health professionals (physicians) are experts in quality of care assessment (rely
on their experiences, knowledge, and intuitions…)
2
Experts are human
§ Experts are limited by memory and time

– Expert physicians can evaluate quality by examining a patient’s records
– This process is time consuming and inefficient
– Physicians cannot assess quality for millions of patients
3
Replicating expert assessment
§ Can we develop analytical tools that replicate expert assessment on a large
scale?
§ Learn from expert human judgment

– Develop a model, interpret results, and adjust the model to improve
predictability on healthcare quality
§ Make predictions/evaluations on a large scale

– Let’s identify poor healthcare quality using analytics
4
Claims data
Generated when a patient

visits a doctor
§ Electronically available
§ Standardized, well-established codes
§ Not 100% accurate (human generated)
§ Under-reporting is common (since the job of recording is tedious)
§ When the hospitals/doctors are evaluated according to patients’
experiences, they may misreport
§ Claims for hospital visits can be vague
5
Creating the dataset: Claims samples
Claims
Sample
§ The data comes from a large health insurance claims database

§ Randomly selected 131 diabetes patients
§ Ages range from 35 to 55
§ Costs: $10,000 ~ $20,000
§ September 1, 2003 – August 31, 2005
6
Creating the dataset: Expert Review
Claims Expert
Sample Review
§ Expert physician reviewed claims and wrote descriptive notes:

“Ongoing use of narcotics” (a drug that relieves pain and induces drowsiness,
stupor, or insensibility)
"Only on Avandia (one type of drug treats diabetes), not a good first choice
drug"
"Had regular visits, mammogram (X-ray picture of breast), and
immunizations"
"Was given home testing supplies"
7
Creating the dataset: Expert Assessment
Claims Expert Expert

Sample Review Assessment
§ Rated quality on a two-point scale (poor/good)

"I’d say care was poor – poorly treated diabetes"
"No eye care, but overall I’d say high quality"
8
Creating the dataset: Variable extraction

§ Dependent Variable (Expert assessment)

– Quality of care
§ Independent Variables (Information in Claims Data)
– "Ongoing use of narcotics"
– "Only on Avandia, not a good first choice drug"
– "Had regular visits, mammogram, and immunizations"
– "Was given home testing supplies"
9
Creating the dataset: Variable extraction

§ Dependent Variable
– Quality of care
§ Independent Variables
– Diabetes treatment
– Patient demographics
– Healthcare utilization
– Providers
– Claims
– Prescriptions
10
Data preview
Variable Description
MemberID A unique identifier for each observation.
ERVisits The number of times the patient visited the emergency room.
OfficeVisits The number of times the patient visited any doctor’s office
Narcotics The number of prescriptions the patient had for narcotics
ProviderCount The number of providers that saw or treated the patient.
NumberClaims The total number of medical claims the patient had.
StartedOnCom Whether or not the patient was started on a combination of drugs to treat their
bination diabetes
PoorCare Whether or not the patient received poor care (1 if the patient had poor care, 0
otherwise)
11
Predicting quality of care
§ The dependent variable is modelled as a binary variable

– 1 if low-quality care
– 0 if high-quality care
§ This is a categorical variable

– A small number of possible outcomes
§ Linear regression would predict a continuous outcome
§ How can we extend the idea of linear regression to situations where the
outcome variable is categorical?
– Only want to predict 1 or 0
– Could round outcome to 1 or 0
– But we can do better with logistic regression
12
Quick Question
§ Which of the following dependent variables are categorical?

– Deciding whether to buy, sell, or hold a stock Y X binary
The weekly revenue of a company
Y
–
– The winner of an election with two candidates binary
The day of the week with the highest revenue
1
X
– binary
– The number of daily car thefts in New York City N
– Whether or not revenue will exceed $50,000 Y binary

§ Which of the above dependent variables are binary?
-
13
Quick Question
§ Which of the following dependent variables are categorical?

– Deciding whether to buy, sell, or hold a stock (categorical)
– The weekly revenue of a company
– The winner of an election with two candidates (categorical, binary)
– The day of the week with the highest revenue (categorical)
– The number of daily car thefts in New York City
– Whether or not revenue will exceed $50,000 (categorical, binary)
§ Which of the above dependent variables are binary?
14
Why linear regression fails?
15
Why linear regression fails?
16
Logistic function
G( -
z) =
0
G(z) =
1
1
&(!) =
1 +0* #$ +
o
=
I 8
! = #! + #" %
↑
⑰x
-
O
office visits
30 + ,
x
,
+
zx+ - -
S
1
1 + * #(&!'&"()
17
Interpretation
The logistic regression model is:
D
1
+(, = 1|%) =
1 + * #(&!'&"()
↑ -en
Estimating probability that , = 1 on input %
. /=01 +. /=21 =1
-
-
18
P(Y 11X)
=
#espots
Logistic Regression Model
-
nee
0 0
M
. /=0< =
-
-
0 + exp −logit
=
0 + exp − =+ + =, <
* ,=13 -
(Bo+ Xi + Xt 3
§ Odds =
. . . .
- * ,=03
1st
– Odds > 1 if Y= 1 is more likely
– Odds < 1 if Y= 0 is more likely
en =
Potpx
§ Log odds or “logit” is a linear function
. /=0<
&logit = ln
- -
. /=2<
=E
=+ + =, 3 I
PxX
-nee
– The bigger the Logit is, the bigger ! " = 1 %

-
– Interpretation: For each unit increase in X, Logit will increase by &! (holding
everything else constant in the multivariate regression).
-
19
Decision Boundary (Prediction)
Suppose predict “y=1” if
⑳
1
+ ,=1% = ≥ 0.5
-
-
1 + * # &!'&"(
Predict “y=0” if + , = 1 % < 0.5

-
Predict y
=
1 if P(Y 11) 20 5
= .
p(1/x) =+3)
↳ One-variable
Equivalently, predict “y=1” if ⑭ Y 0 Y 1 Predict
Y 1 if Ply 1x) <
=
=
>
=
>
#! + #" % ≥ 0 0 5
>
PredictY=l if
.
,>D 3: 0
-
-
- -
Multiple variable case: ⑧ X, x 3

+ - =
8
#! + #" %" + #- %- ≥ 0
in
-
0
20
Quick Question
§ Suppose the coefficients of a logistic regression model with two

independent variables are as follows:
1 #! = −1.5, #" = 3, #- = −0.5
And we have an observation with the following values for the independent
variables:
%" = 1, %- = 5
-
– What is the value of the Logit for this observation?
– What is the value of the Odds for this observation?
– What is the value of P(y=1) for this observation?
Logit -1 5 3x/ 0 5x5 1

-
=
+
-
= .
+ .
0 36
expl-1)
=
Odds
exp(Logit)
.
= =
27
21 P(Y
=
1) =
-Logit =
0 .
Estimation of parameters (Not required)
min SSE (bo .
b ... -
(
How can we estimate H! , H" ? ↑ ↑
-
• Non-convex least squares

17
• Linear regression: By choosing H! , H" , we minimize ∑0./" J. − JK.
- where
JK. = H! + H" %. . -
"
• Logistic regression: JK. = &(H! + H" %. ) where &(!) = #$
-
"'1
• Solution: Maximum likelihood estimation (Not required)

-
-
22
Maximum likelihood estimation (Not required)
-
Under what values of H! , H" , the likelihood of obtaining our sample is the
highest?
• H! , H" maximize the probability of obtaining the sample data.

• MLE is the primary method for nonlinear models.
-
TP(y
bo bi -
=
1Px >
, ,
23
Data preview
Variable Description
MemberID A unique identifier for each observation.
ERVisits The number of times the patient visited the emergency room.
OfficeVisits The number of times the patient visited any doctor’s office
Narcotics The number of prescriptions the patient had for narcotics
ProviderCount The number of providers that saw or treated the patient.
NumberClaims The total number of medical claims the patient had.
StartedOnCom Whether or not the patient was started on a combination of drugs to treat their
bination diabetes
PoorCare Whether or not the patient received poor care (1 if the patient had poor care, 0
otherwise)
24
Data preview
§ Frequency table: count how many cases of poor care and good care
§ Baseline model: predict the most frequent outcome for all observations
– Predict all patients are receiving good care
– Accuracy: 98/131 = 74.8%
25
Create training and testing sets
§ Training dataset: used to build model
§ Testing dataset: used to test the model’s out-of-sample accuracy
§ If there is no chronological order to the observations, we randomly assign
observations to the training set or testing set.
§ Training data set (75% of the data)
26
&
Build a logistic regression model 131 -If
§ Used data for 99 patients to build the model (75% of the data)
logit = =+ + =, ×OfOice Visits + #- ×Narcotics
27
Build a logistic regression model
§ Used data for 99 patients to build the model (75% of the data)
Logit = −2.646 + 0.082∗OﬃceVisits + 0.076∗NarcoUcs
§
(0)?
-
Are higher values in these variables indicative of poor care (1) or good care
§ Now that we have a model, how do we evaluate the quality of the model?
logit
=
Int5] = +
P
.
x, + exz
NY Logit
.
Pr 0 .
+ x) =
28
Estimated Coefficients Standard Errors of
Logit = -2.646+0.082*OfficeVisits + Coefficients
0.076*Narcotics H0: Coefficient = 0 versus HA: Coefficient ≠ 0
zstat=(Estimated Coefficient – 0)/std.error
Two-tail p-value = 2*P(Z<-|zstat|)
29
Logit =
BotBiX +X
, :
elgit=odds. Keep
:
x2 ,
x1 +
Xi+ 1
↓
30 /x , +1) 2xz
e
Cit
+ +
office visits
,

.
-P2X +
Estimated Coefficients -
e ,
Logit = -2.646+0.082*OfficeVisits +
0.076*Narcotics
Coefficients of OfficeVisits = 0.082
->
Exp(0.082) = 1.085 =
85 .
% increase
XI For every 1 unit increase in

x2 OfficeVisits, there is a 8.5% increase in
odds the quality is bad
Odds = exp(Logits)
Odds(OfficeVisits +1)/Odds(OfficeVisits) = exp(Coefficients of OfficeVisits)

29
Estimated Coefficients Standard Errors of
Logit = -2.646+0.082*OfficeVisits + Coefficients
0.076*Narcotics H0: Coefficient = 0 versus HA: Coefficient ≠ 0
zstat=(Estimated Coefficient – 0)/std.error
Two-tail p-value = 2*P(Z<-|zstat|)
In linear regression model, the estimated coefficients follow t-

distributions, hence, we can perform t-tests. We see a difference in
31 logistic regression. It's approximately normally distributed when your
sample size is large, so we use a z-test.
Checking Multicollinearity
§ Multicollinearity could be a problem
– Do the coefficients make sense?
– Check correlations
– Check VIF
No worries about multi-collinearity

VIF < 10
32
Classification of the prediction
§ The outcome of a logistic regression model is a probability.
§ Often, we want to make a binary prediction to decide whether an action

should be taken.
– Did this patient receive poor care or good care?
– Is this email spam or not?
§ We can do this using a threshold value t.

– If + , = 1 % ≥ W, predict Y = 1 (e.g. poor quality)
– If + , = 1 % < W, predict Y = 0 (e.g. good quality)
§ What value should we pick for t?
33
Threshold value
§ Often selected based on which errors are “worse”
§ If t is large, predict +(, = 1) (e.g. poor care) rarely

– More errors where we say good care, but it is actually poor care
– Detects patients who are receiving the worst care
§ If t is small, predict +(, = 0) (e.g. good care) rarely

– More errors where we say poor care, but it is actually good care
– Detects all patients who might be receiving poor care
§ With no preference between errors, select t = 0.5

– Predict the more likely outcome
34
Selecting a threshold value
§ Confusion/Classification matrix
Predicted = 0 Predicted = 1
Actual = 0 True Negatives(TN) False Positives(FP)
Actual = 1 False Negatives(FN) True Positives(TP)
35
§ A different threshold value changes the types of errors
§ Quantify this trade-off using
– true positive rate (or sensitivity)
– true negative rate (or specificity)
§ True positive rate measures percentage of actual y=1 cases that we classify
correctly
§ True negative rate measures percentage of actual y=0 cases that we classify
correctly
36
O
§ Threshold ⇑
– Predicted positive cases ⇓
TP
-
True positive rate ⇓ - rate=
Actual 1
q
#
.
– Predicted negative cases ⇑

q True negative rate ⇑ TN rate =
-
Actual
# O
§ Threshold ⇓
– Predicted positive cases ⇑
q True positive rate ⇑
– Predicted negative cases ⇓
q True negative rate ⇓
36
Prediction results
§ Threshold = 0.5
Actual = 0 70 4 -> 74
Actual = 1 15
I ①TP
10 - 15
-
.
§ True positive rate = 10/(10+15) = 0.4 FalI =
10915
- -
§ True negative rate = 70/(70+4) = 0.95
37
Prediction results
§ Threshold = 0.7
Actual = 0 73 1
Actual = 1 17 8
§ True positive rate = 8/(8+17) = 0.32

§ Threshold = 0.2
Actual = 0 54 20
Actual = 1 9 16
§ True positive rate = 16/(16+9) = 0.64

38
Quick Question
§ Consider the following two confusion matrices. To go from Confusion
Matrix 1 to Confusion Matrix 2, did we increase or decrease the threshold
value?
00
Actual = 0 15 10 v
Actual = 1 5 20
Actual = 0 20 5
Actual = 1 10 15
39
Measure performance of logistic regression
Evaluate predictive ability of the logistic regression model
§ Classification/confusion matrix
o Accuracy of the model = # of correct predictions / # of data points
o # of false positive errors: predict 1 but actually 0
o # of false negative errors: predict 0 but actually 1
§ Receiver operator characteristic (ROC) curve

o False positive rate (% of false predictions in negative observations) = #
of false positive errors / # of actual negative observations
o False negative rate (% of false predictions in positive observations) = #
of false negative errors / # of actual positive observations
o True positive rate (% of correct predictions in positive observations) =
1 - false negative rate
o Area under the curve (AUC)
41
Receiver Operating Characteristic (ROC) curve
• True positive rate on y-axis

• Proportion of y=1 cases labelled correctly
• False positive rate on x-axis
• 1 - True negative rate
• Proportion of y=0 cases labelled as y=1
• ROC curve always starts at (0, 0), corresponding to a threshold value of 1
42
• Threshold value = 1
Actual = 0 TN FP = 0
Actual = 1 FN TP = 0
§ True positive rate = 0/(0+FN) = 0
§ True negative rate = TN/(TN+0) = 1
43
– False positive rate = 1 - True negative rate = 0
• ROC curve always ends at (1, 1), corresponding to a threshold value of 0

Actual = 0 TN=0 FP
Actual = 1 FN=0 TP
§ True positive rate = TP/(TP+0) = 1
§ True negative rate = 0/(0+FP) = 0
44
– False positive rate = 1 - True negative rate = 1
Selecting a threshold using ROC
§ Choose best threshold for best trade off
– cost of raising false alarms
q false positive errors
q patients receiving good care labelled as poor care
– cost of failing to detect positives
q false negative errors
q patients receiving poor care labelled as good care
§ ROC curve captures all thresholds simultaneously
– High threshold
q Low true positive rate = High false negative rate
q Low false positive rate
– Low threshold
q High true positive rate = Low false negative rate
q High false positive rate
45
Selecting a threshold using ROC
46
Area Under the ROC Curve (AUC)
0.775
• Just take the area under the curve

• Ideally, we want the ROC curve to be close to the top left corner
– High true positive rate
– Low false positive rate
47
48
§ AUC measures model’s ability to distinguish between two outcomes (based
on observed characteristics)
§ 0 ≤ AUC ≤ 1 (perfect classifier)
§ Diagonal line: prediction by random guessing (baseline)
– Predicting positive with probability q irrespective of any observed
characteristic gives the point (q, q)
– AUC = 0.5
§ The farther ROC is from the diagonal line, the larger the AUC is, the better
the model is.
49
Measures of accuracy
§ Confusion matrix:
Actual = 0 TN FP
Actual = 1 FN TP
§ N: number of observations
50
Making predictions
§ Just like in linear regression, we want to make predictions on a test set to
compute out-of-sample metrics
§ If we use a threshold value of 0.5, we get the following confusion matrix

Predicted Good Care Predicted Poor Care
Actually Good Care 23 1
Actually Poor Care 5 3
§ Out-of-sample accuracy of (23+3)/32 = 81.3%
51
Making predictions

52
Summary
§ An expert-trained model can accurately identify diabetics receiving low-quality
care
– Out-of-sample accuracy of 81.3%, if the threshold equals 0.5
– Identifies most patients receiving poor care
§ In practice, the probabilities returned by the logistic regression model can be used
to prioritize patients for intervention
§ Electronic medical records could be used in the future
53
Takeaway messages
§ While humans can accurately analyze small amounts of information,
models allow larger scalability
§ Models do not replace expert judgment

– Experts can improve and refine the model
§ Models can integrate assessments of many experts into one final unbiased
and unemotional prediction
54

Module 4 - Logistic Regression - Afterclass1b

Uploaded by

Copyright:

Available Formats

Module 4 - Logistic Regression - Afterclass1b

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 4 - Logistic Regression - Afterclass1b

Uploaded by

Copyright:

Available Formats

IIMT 2641 Introduction to Business Analytics

Module 4: Logistic Regression

§ Critical decisions are often made by people with expert knowledge

§ Use logistic regression to model experts – physician – assess the quality of

§ Healthcare Quality Assessment

§ Healthcare Quality Assessment

§ Learn from expert human judgment

§ Make predictions/evaluations on a large scale

§ Healthcare Quality Assessment

Generated when a patient

§ The data comes from a large health insurance claims database

§ Expert physician reviewed claims and wrote descriptive notes:

Claims Expert Expert

§ Rated quality on a two-point scale (poor/good)

Claims Expert Expert

§ Dependent Variable (Expert assessment)

Claims Expert Expert

§ The dependent variable is modelled as a binary variable

§ This is a categorical variable

§ Linear regression would predict a continuous outcome

§ Which of the following dependent variables are categorical?

– Whether or not revenue will exceed $50,000 Y binary

§ Which of the following dependent variables are categorical?

§ Which of the above dependent variables are binary?

The logistic regression model is:

Estimating probability that , = 1 on input %

§ Log odds or “logit” is a linear function

– The bigger the Logit is, the bigger ! " = 1 %

Suppose predict “y=1” if

Predict “y=0” if + , = 1 % < 0.5

Multiple variable case: ⑧ X, x 3

§ Suppose the coefficients of a logistic regression model with two

1 #! = −1.5, #" = 3, #- = −0.5

Logit -1 5 3x/ 0 5x5 1

• Non-convex least squares

• Solution: Maximum likelihood estimation (Not required)

• H! , H" maximize the probability of obtaining the sample data.

logit = =+ + =, ×OfOice Visits + #- ×Narcotics

Two-tail p-value = 2*P(Z<-|zstat|)

Build a logistic regression model

Coefficients of OfficeVisits = 0.082

XI For every 1 unit increase in

Odds(OfficeVisits +1)/Odds(OfficeVisits) = exp(Coefficients of OfficeVisits)

Two-tail p-value = 2*P(Z<-|zstat|)

In linear regression model, the estimated coefficients follow t-

No worries about multi-collinearity

§ The outcome of a logistic regression model is a probability.

§ Often, we want to make a binary prediction to decide whether an action

§ We can do this using a threshold value t.

§ What value should we pick for t?

§ If t is large, predict +(, = 1) (e.g. poor care) rarely

§ If t is small, predict +(, = 0) (e.g. good care) rarely

§ With no preference between errors, select t = 0.5

Actual = 1 False Negatives(FN) True Positives(TP)

– Predicted negative cases ⇑

§ True positive rate = 10/(10+15) = 0.4 FalI =

§ True negative rate = 70/(70+4) = 0.95

§ True positive rate = 8/(8+17) = 0.32