Module 4 - Logistic Regression - Afterclass1b

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

IIMT 2641 Introduction to Business Analytics

Module 4: Logistic Regression

1
Ask the experts!

§ Critical decisions are often made by people with expert knowledge

§ Use logistic regression to model experts – physician – assess the quality of


healthcare patients receive

§ Healthcare Quality Assessment


– Good quality care educates patients and controls costs
– Need to assess quality for proper medical interventions
– No single set of guidelines for defining quality of healthcare (not well-defined)
– Health professionals (physicians) are experts in quality of care assessment (rely
on their experiences, knowledge, and intuitions…)

2
Experts are human
§ Experts are limited by memory and time

§ Healthcare Quality Assessment


– Expert physicians can evaluate quality by examining a patient’s records
– This process is time consuming and inefficient
– Physicians cannot assess quality for millions of patients

3
Replicating expert assessment
§ Can we develop analytical tools that replicate expert assessment on a large
scale?

§ Learn from expert human judgment


– Develop a model, interpret results, and adjust the model to improve
predictability on healthcare quality

§ Make predictions/evaluations on a large scale

§ Healthcare Quality Assessment


– Let’s identify poor healthcare quality using analytics

4
Claims data

Generated when a patient


visits a doctor

§ Electronically available
§ Standardized, well-established codes
§ Not 100% accurate (human generated)
§ Under-reporting is common (since the job of recording is tedious)
§ When the hospitals/doctors are evaluated according to patients’
experiences, they may misreport
§ Claims for hospital visits can be vague
5
Creating the dataset: Claims samples

Claims
Sample

§ The data comes from a large health insurance claims database


§ Randomly selected 131 diabetes patients
§ Ages range from 35 to 55
§ Costs: $10,000 ~ $20,000
§ September 1, 2003 – August 31, 2005

6
Creating the dataset: Expert Review

Claims Expert
Sample Review

§ Expert physician reviewed claims and wrote descriptive notes:


“Ongoing use of narcotics” (a drug that relieves pain and induces drowsiness,
stupor, or insensibility)
"Only on Avandia (one type of drug treats diabetes), not a good first choice
drug"
"Had regular visits, mammogram (X-ray picture of breast), and
immunizations"
"Was given home testing supplies"

7
Creating the dataset: Expert Assessment

Claims Expert Expert


Sample Review Assessment

§ Rated quality on a two-point scale (poor/good)


"I’d say care was poor – poorly treated diabetes"
"No eye care, but overall I’d say high quality"

8
Creating the dataset: Variable extraction

Claims Expert Expert


Sample Review Assessment

§ Dependent Variable (Expert assessment)


– Quality of care
§ Independent Variables (Information in Claims Data)
– "Ongoing use of narcotics"
– "Only on Avandia, not a good first choice drug"
– "Had regular visits, mammogram, and immunizations"
– "Was given home testing supplies"

9
Creating the dataset: Variable extraction

Claims Expert Expert


Sample Review Assessment

§ Dependent Variable
– Quality of care
§ Independent Variables
– Diabetes treatment
– Patient demographics
– Healthcare utilization
– Providers
– Claims
– Prescriptions
10
Data preview
Variable Description
MemberID A unique identifier for each observation.
ERVisits The number of times the patient visited the emergency room.
OfficeVisits The number of times the patient visited any doctor’s office
Narcotics The number of prescriptions the patient had for narcotics
ProviderCount The number of providers that saw or treated the patient.
NumberClaims The total number of medical claims the patient had.
StartedOnCom Whether or not the patient was started on a combination of drugs to treat their
bination diabetes
PoorCare Whether or not the patient received poor care (1 if the patient had poor care, 0
otherwise)

11
Predicting quality of care

§ The dependent variable is modelled as a binary variable


– 1 if low-quality care
– 0 if high-quality care

§ This is a categorical variable


– A small number of possible outcomes

§ Linear regression would predict a continuous outcome

§ How can we extend the idea of linear regression to situations where the
outcome variable is categorical?
– Only want to predict 1 or 0
– Could round outcome to 1 or 0
– But we can do better with logistic regression

12
Quick Question

§ Which of the following dependent variables are categorical?


– Deciding whether to buy, sell, or hold a stock Y X binary
The weekly revenue of a company

Y

– The winner of an election with two candidates binary
The day of the week with the highest revenue
1
X
– binary
– The number of daily car thefts in New York City N

– Whether or not revenue will exceed $50,000 Y binary


§ Which of the above dependent variables are binary?
-

13
Quick Question

§ Which of the following dependent variables are categorical?


– Deciding whether to buy, sell, or hold a stock (categorical)
– The weekly revenue of a company
– The winner of an election with two candidates (categorical, binary)
– The day of the week with the highest revenue (categorical)
– The number of daily car thefts in New York City
– Whether or not revenue will exceed $50,000 (categorical, binary)

§ Which of the above dependent variables are binary?

14
Why linear regression fails?

15
Why linear regression fails?

16
Logistic function
G( -

z) =
0
G(z) =

1
1
&(!) =
1 +0* #$ +

o
=

I 8

! = #! + #" %

⑰x
-

O
office visits
30 + ,
x
,
+
zx+ - -

S
1
1 + * #(&!'&"()

17
Interpretation

The logistic regression model is:

D
1
+(, = 1|%) =
1 + * #(&!'&"()
↑ -en

Estimating probability that , = 1 on input %

. /=01 +. /=21 =1
-
-

18
P(Y 11X)
=

#espots
Logistic Regression Model

-
nee

0 0
M
. /=0< =
-
-
0 + exp −logit
=
0 + exp − =+ + =, <

* ,=13 -

(Bo+ Xi + Xt 3
§ Odds =
. . . .

- * ,=03

1st
– Odds > 1 if Y= 1 is more likely
– Odds < 1 if Y= 0 is more likely
en =

Potpx

§ Log odds or “logit” is a linear function

. /=0<
&logit = ln
- -
. /=2<
=E
=+ + =, 3 I
PxX
-nee

– The bigger the Logit is, the bigger ! " = 1 %


-

– Interpretation: For each unit increase in X, Logit will increase by &! (holding
everything else constant in the multivariate regression).
-

19
Decision Boundary (Prediction)

Suppose predict “y=1” if


1
+ ,=1% = ≥ 0.5
-
-
1 + * # &!'&"(

Predict “y=0” if + , = 1 % < 0.5


-

Predict y
=

1 if P(Y 11) 20 5
= .

p(1/x) =+3)
↳ One-variable
Equivalently, predict “y=1” if ⑭ Y 0 Y 1 Predict
Y 1 if Ply 1x) <
=
=

>
=

>
#! + #" % ≥ 0 0 5
>
PredictY=l if
.

,>D 3: 0
-

-
- -

Multiple variable case: ⑧ X, x 3


+ - =

8
#! + #" %" + #- %- ≥ 0
in
-

0
20
Quick Question

§ Suppose the coefficients of a logistic regression model with two


independent variables are as follows:

1 #! = −1.5, #" = 3, #- = −0.5

And we have an observation with the following values for the independent
variables:
%" = 1, %- = 5
-
– What is the value of the Logit for this observation?
– What is the value of the Odds for this observation?
– What is the value of P(y=1) for this observation?

Logit -1 5 3x/ 0 5x5 1


-
=
+
-

= .
+ .

0 36
expl-1)
=

Odds
exp(Logit)
.

= =

27
21 P(Y
=

1) =

-Logit =
0 .
Estimation of parameters (Not required)
min SSE (bo .
b ... -

(
How can we estimate H! , H" ? ↑ ↑
-

• Non-convex least squares


17
• Linear regression: By choosing H! , H" , we minimize ∑0./" J. − JK.
- where

JK. = H! + H" %. . -

"
• Logistic regression: JK. = &(H! + H" %. ) where &(!) = #$
-
"'1

• Solution: Maximum likelihood estimation (Not required)


-
-

22
Maximum likelihood estimation (Not required)

-
Under what values of H! , H" , the likelihood of obtaining our sample is the
highest?

• H! , H" maximize the probability of obtaining the sample data.


• MLE is the primary method for nonlinear models.
-

TP(y
bo bi -
=

1Px >

, ,

23
Data preview
Variable Description
MemberID A unique identifier for each observation.
ERVisits The number of times the patient visited the emergency room.
OfficeVisits The number of times the patient visited any doctor’s office
Narcotics The number of prescriptions the patient had for narcotics
ProviderCount The number of providers that saw or treated the patient.
NumberClaims The total number of medical claims the patient had.
StartedOnCom Whether or not the patient was started on a combination of drugs to treat their
bination diabetes
PoorCare Whether or not the patient received poor care (1 if the patient had poor care, 0
otherwise)

24
Data preview
§ Frequency table: count how many cases of poor care and good care

§ Baseline model: predict the most frequent outcome for all observations
– Predict all patients are receiving good care
– Accuracy: 98/131 = 74.8%

25
Create training and testing sets
§ Training dataset: used to build model
§ Testing dataset: used to test the model’s out-of-sample accuracy
§ If there is no chronological order to the observations, we randomly assign
observations to the training set or testing set.
§ Training data set (75% of the data)

26
&
Build a logistic regression model 131 -If
§ Used data for 99 patients to build the model (75% of the data)

logit = =+ + =, ×OfOice Visits + #- ×Narcotics

27
Build a logistic regression model
§ Used data for 99 patients to build the model (75% of the data)
Logit = −2.646 + 0.082∗OfficeVisits + 0.076∗NarcoUcs

§
(0)?
-
Are higher values in these variables indicative of poor care (1) or good care

§ Now that we have a model, how do we evaluate the quality of the model?

logit
=

Int5] = +

P
.
x, + exz

NY Logit
.
Pr 0 .

+ x) =

28
Build a logistic regression model
Estimated Coefficients Standard Errors of
Logit = -2.646+0.082*OfficeVisits + Coefficients
0.076*Narcotics H0: Coefficient = 0 versus HA: Coefficient ≠ 0
zstat=(Estimated Coefficient – 0)/std.error

Two-tail p-value = 2*P(Z<-|zstat|)

29
Logit =
BotBiX +X
, :
elgit=odds. Keep
:
x2 ,
x1 +

Xi+ 1

30 /x , +1) 2xz
e
Cit
+ +
office visits
,

Build a logistic regression model


.

-P2X +

Estimated Coefficients -

e ,

Logit = -2.646+0.082*OfficeVisits +
0.076*Narcotics

Coefficients of OfficeVisits = 0.082

->
Exp(0.082) = 1.085 =
85 .

% increase

XI For every 1 unit increase in


x2 OfficeVisits, there is a 8.5% increase in
odds the quality is bad

Odds = exp(Logits)

Odds(OfficeVisits +1)/Odds(OfficeVisits) = exp(Coefficients of OfficeVisits)


29
Build a logistic regression model
Estimated Coefficients Standard Errors of
Logit = -2.646+0.082*OfficeVisits + Coefficients
0.076*Narcotics H0: Coefficient = 0 versus HA: Coefficient ≠ 0
zstat=(Estimated Coefficient – 0)/std.error

Two-tail p-value = 2*P(Z<-|zstat|)

In linear regression model, the estimated coefficients follow t-


distributions, hence, we can perform t-tests. We see a difference in
31 logistic regression. It's approximately normally distributed when your
sample size is large, so we use a z-test.
Checking Multicollinearity
§ Multicollinearity could be a problem
– Do the coefficients make sense?
– Check correlations
– Check VIF

No worries about multi-collinearity


VIF < 10

32
Classification of the prediction

§ The outcome of a logistic regression model is a probability.

§ Often, we want to make a binary prediction to decide whether an action


should be taken.
– Did this patient receive poor care or good care?
– Is this email spam or not?

§ We can do this using a threshold value t.


– If + , = 1 % ≥ W, predict Y = 1 (e.g. poor quality)
– If + , = 1 % < W, predict Y = 0 (e.g. good quality)

§ What value should we pick for t?

33
Threshold value
§ Often selected based on which errors are “worse”

§ If t is large, predict +(, = 1) (e.g. poor care) rarely


– More errors where we say good care, but it is actually poor care
– Detects patients who are receiving the worst care

§ If t is small, predict +(, = 0) (e.g. good care) rarely


– More errors where we say poor care, but it is actually good care
– Detects all patients who might be receiving poor care

§ With no preference between errors, select t = 0.5


– Predict the more likely outcome

34
Selecting a threshold value
§ Confusion/Classification matrix

Predicted = 0 Predicted = 1
Actual = 0 True Negatives(TN) False Positives(FP)

Actual = 1 False Negatives(FN) True Positives(TP)

35
Selecting a threshold value
§ A different threshold value changes the types of errors
§ Quantify this trade-off using
– true positive rate (or sensitivity)
– true negative rate (or specificity)

§ True positive rate measures percentage of actual y=1 cases that we classify
correctly
§ True negative rate measures percentage of actual y=0 cases that we classify
correctly

36
Selecting a threshold value

O
§ Threshold ⇑
– Predicted positive cases ⇓
TP
-
True positive rate ⇓ - rate=
Actual 1
q
#
.

– Predicted negative cases ⇑


q True negative rate ⇑ TN rate =
-
Actual
# O

§ Threshold ⇓
– Predicted positive cases ⇑
q True positive rate ⇑
– Predicted negative cases ⇓
q True negative rate ⇓

36
Prediction results
§ Threshold = 0.5
Predicted = 0 Predicted = 1
Actual = 0 70 4 -> 74
Actual = 1 15
I ①TP
10 - 15
-
.

§ True positive rate = 10/(10+15) = 0.4 FalI =

10915
- -

§ True negative rate = 70/(70+4) = 0.95

37
Prediction results
§ Threshold = 0.7
Predicted = 0 Predicted = 1
Actual = 0 73 1
Actual = 1 17 8

§ True positive rate = 8/(8+17) = 0.32


§ True negative rate = 73/(73+1) = 0.99

§ Threshold = 0.2
Predicted = 0 Predicted = 1
Actual = 0 54 20
Actual = 1 9 16

§ True positive rate = 16/(16+9) = 0.64


§ True negative rate = 54/(54+20) = 0.73
38
Quick Question
§ Consider the following two confusion matrices. To go from Confusion
Matrix 1 to Confusion Matrix 2, did we increase or decrease the threshold
value?

00
Predicted = 0 Predicted = 1
Actual = 0 15 10 v
Actual = 1 5 20

Predicted = 0 Predicted = 1
Actual = 0 20 5
Actual = 1 10 15

39
Measure performance of logistic regression
Evaluate predictive ability of the logistic regression model
§ Classification/confusion matrix
o Accuracy of the model = # of correct predictions / # of data points
o # of false positive errors: predict 1 but actually 0
o # of false negative errors: predict 0 but actually 1

§ Receiver operator characteristic (ROC) curve


o False positive rate (% of false predictions in negative observations) = #
of false positive errors / # of actual negative observations
o False negative rate (% of false predictions in positive observations) = #
of false negative errors / # of actual positive observations
o True positive rate (% of correct predictions in positive observations) =
1 - false negative rate
o Area under the curve (AUC)

41
Receiver Operating Characteristic (ROC) curve

• True positive rate on y-axis


• Proportion of y=1 cases labelled correctly
• False positive rate on x-axis
• 1 - True negative rate
• Proportion of y=0 cases labelled as y=1
• ROC curve always starts at (0, 0), corresponding to a threshold value of 1
42
Receiver Operating Characteristic (ROC) curve

• Threshold value = 1
Predicted = 0 Predicted = 1
Actual = 0 TN FP = 0
Actual = 1 FN TP = 0
§ True positive rate = 0/(0+FN) = 0
§ True negative rate = TN/(TN+0) = 1
43
– False positive rate = 1 - True negative rate = 0
Receiver Operating Characteristic (ROC) curve

• ROC curve always ends at (1, 1), corresponding to a threshold value of 0


Predicted = 0 Predicted = 1
Actual = 0 TN=0 FP
Actual = 1 FN=0 TP
§ True positive rate = TP/(TP+0) = 1
§ True negative rate = 0/(0+FP) = 0
44
– False positive rate = 1 - True negative rate = 1
Selecting a threshold using ROC
§ Choose best threshold for best trade off
– cost of raising false alarms
q false positive errors
q patients receiving good care labelled as poor care
– cost of failing to detect positives
q false negative errors
q patients receiving poor care labelled as good care
§ ROC curve captures all thresholds simultaneously
– High threshold
q Low true positive rate = High false negative rate
q Low false positive rate
– Low threshold
q High true positive rate = Low false negative rate
q High false positive rate

45
Selecting a threshold using ROC

46
Area Under the ROC Curve (AUC)

0.775

• Just take the area under the curve


• Ideally, we want the ROC curve to be close to the top left corner
– High true positive rate
– Low false positive rate

47
Area Under the ROC Curve (AUC)

48
Area Under the ROC Curve (AUC)
§ AUC measures model’s ability to distinguish between two outcomes (based
on observed characteristics)
§ 0 ≤ AUC ≤ 1 (perfect classifier)
§ Diagonal line: prediction by random guessing (baseline)
– Predicting positive with probability q irrespective of any observed
characteristic gives the point (q, q)
– AUC = 0.5
§ The farther ROC is from the diagonal line, the larger the AUC is, the better
the model is.

49
Measures of accuracy
§ Confusion matrix:
Predicted = 0 Predicted = 1
Actual = 0 TN FP
Actual = 1 FN TP

§ N: number of observations

50
Making predictions
§ Just like in linear regression, we want to make predictions on a test set to
compute out-of-sample metrics

§ If we use a threshold value of 0.5, we get the following confusion matrix


Predicted Good Care Predicted Poor Care
Actually Good Care 23 1
Actually Poor Care 5 3
§ Out-of-sample accuracy of (23+3)/32 = 81.3%

51
Making predictions
§ If we use a threshold value of 0.3, we get the following confusion matrix
Predicted Good Care Predicted Poor Care
Actually Good Care 19 5
Actually Poor Care 2 6
§ Out-of-sample accuracy of (19+6)/32 = 78.3%

§ If we use a threshold value of 0.7, we get the following confusion matrix


Predicted Good Care Predicted Poor Care
Actually Good Care 23 1
Actually Poor Care 7 1
§ Out-of-sample accuracy of (23+1)/32 = 75.0%

52
Summary
§ An expert-trained model can accurately identify diabetics receiving low-quality
care
– Out-of-sample accuracy of 81.3%, if the threshold equals 0.5
– Identifies most patients receiving poor care

§ In practice, the probabilities returned by the logistic regression model can be used
to prioritize patients for intervention

§ Electronic medical records could be used in the future

53
Takeaway messages
§ While humans can accurately analyze small amounts of information,
models allow larger scalability

§ Models do not replace expert judgment


– Experts can improve and refine the model

§ Models can integrate assessments of many experts into one final unbiased
and unemotional prediction

54

You might also like