Lecture 7 Probit

Lecture 7 Limited Dependent Variable -2

Econometrics II
MA Eco. 2022, Fall 2023
Instructor: Sunaina Dhingra
Lectures: Wednesday, (11.20-12.50pM) & Thursday (9-40-11.10 am)
Lecture Meeting Mode: In person (Classroom: T4-F99)
Office Hours: Wednesday 1-2.30 pm & by appointment in FOB, Office No.1B in south on 7th Floor)
Email-id: [email protected]
Lecture Material: Slides and textbooks
Credits: 4.5
• Assume that in the two-variable model Yi = β1 + β2 Xi + ui the Yi are normally
and independently distributed with mean = β1 + β2 Xi and variance = σ 2.
• The joint probability density function of Y1, Y2, ... , Yn , given the preceding
mean and variance, can be written as

• But in view of the independence of the Y’s, this joint probability density
function can be written as a product of n individual density functions as

• Where

• which is the density function of a normally distributed variable with the given mean and 1-2
• Substituting Equation (2) for each Yi into Equation (1) gives

• If Y1, Y2, . . . , Yn are known or given, but β1, β2, and σ2 are not known, the function in
Equation (3) is called a likelihood function, denoted by LF(β1, β2, σ2), and written as

• MLE Method consists in estimating the unknown parameters (β1, β2, and σ2 )in such a manner
that the probability of observing the given Y’s is as high (or maximum) as possible.
• Therefore, we find the maximum of the function in Equation (4) using differential calculus.
• For differentiation it is easier to express Equation (4) in the log term as follows.
(Note: ln = natural log.)
• Differentiating Equation (5) partially with respect to β1, β2, and σ2, we obtain

• After simplifying, Eqs. (9) and (10) yield

• which are precisely the normal equations of the least-squares theory obtained by OLS

• the ML estimator of σ2 is biased. The magnitude of this bias can be easily determined
as follows.

Limited Dependent Variable Models
• Logit and Probit models for binary response

• Disadvantages of the LPM for binary dependent variables

• Predictions sometimes lie outside the unit interval
• Partial effects of explanatory variables are constant

• Nonlinear models for binary response

• Response probability is a nonlinear function of explanatory variables

Limited Dependent Variable Models
• Choices for the link function

• Latent variable formulation of the Logit and Probit models

Limited Dependent Variable Models
• Interpretation of coefficients in Logit and Probit models

• Partial effects are nonlinear and depend on the level of x.

Limited Dependent Variable Models
• Maximum likelihood estimation of Logit and Probit models

• Properties of maximum likelihood estimators

• Maximum likelihood estimators are consistent, asymptotically normal, and asymptotically efficient if the
distributional assumptions hold.

Probit and Logit Regression

• The problem with the linear probability model is that it

models the probability of Y=1 as being linear:

Pr(Y = 1|X) = β0 + β1X

• Instead, we want:

• Pr(Y = 1|X) to be increasing in X for β1>0, and

• 0 ≤ Pr(Y = 1|X) ≤ 1 for all X

• This requires using a nonlinear functional form for the

probability. How about an “S-curve” (like a CDF from earlier classes)
• The probit model satisfies these conditions:
I. Pr(Y = 1|X) to be increasing in X for β1>0, and
II. 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
Probit Regression
• Probit regression models the probability that Y=1 using
the cumulative standard normal distribution function,
Φ(z), evaluated at z = β0 + β1X. The probit regression
model is,
• Pr(Y = 1|X) = Φ(β0 + β1X)
• where Φ is the cumulative normal distribution function
and z = β0 + β1X is the “z-value” or “z-index” of the
probit model.
• Example: Suppose β0 = -2, β1= 3, X = .4, so
• Pr(Y = 1|X=.4) = Φ(-2 + 3×.4) = Φ(-0.8)
• Pr(Y = 1|X=.4) = area under the standard normal density
to left of z = -.8, which is…
Pr(z ≤ -0.8) = .2119
Probit regression, ctd.
• Why use the cumulative normal probability distribution?
• The “S-shape” gives us what we want:
• Pr(Y = 1|X) is increasing in X for β1>0
• 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
• Easy to use – the probabilities are tabulated in the cumulative
normal tables (and easily using regression software)
• Relatively straightforward interpretation:
• β0 + β1X = z-value
• ̂0+ ̂1X is the predicted z-value, given X
• β1 is the change in the z-value for a unit change in X
• The probit model satisfies these conditions:
I. Pr(Y = 1|X) to be increasing in X for β1>0, and
II. 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
The probit model uses the cumulative normal distribution function to
model the probability of denial given the payment-to income ratio or,
more generally, to model Pr(Y = 1| X). Unlike the linear probability model,
the probit conditional probabilities are always between 0 and 1.

STATA Example: HMDA data
. probit deny p_irat, r;
Iteration 0: log likelihood = -872.0853 We’ll discuss this later
Iteration 1: log likelihood = -835.6633
Iteration 2: log likelihood = -831.80534
Iteration 3: log likelihood = -831.79234
Probit estimates Number of obs = 2380
Wald chi2(1) = 40.68
Prob > chi2 = 0.0000
Log likelihood = -831.79234 Pseudo R2 = 0.0462
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901
_cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082
Pr (deny = 1|P / Iratio) = Φ(-2.19 + 2.97×P/I ratio)
(.16) (.47)
STATA Example: HMDA data, ctd.
Pr (deny = 1|P / Iratio) = Φ(-2.19 + 2.97×P/I ratio)
(.16) (.47)
• Positive coefficient: Does this make sense?
• Standard errors have the usual interpretation
• Predicted probabilities:
Pr (deny = 1|P / Iratio = .3) = Φ (-2.19+2.97×.3)
= Φ (-1.30) = .097
• Effect of change in P/I ratio from .3 to .4:
Pr (deny = 1|P / Iratio = .4) = Φ (-2.19+2.97×.4)
= Φ (-1.00) = .159
• Predicted probability of denial rises from .097 to .159
• increase in the probability of denial of 6.2 percentage points,
from 9.7% to 15.9%
• Because the probit regression function is nonlinear, the effect of
a change in X depends on the starting value of X.

• For example, if P/I ratio = 0.5, the estimated denial probability

based on Equation is (-2.19 + 2.97 * 0.5) = (-0.71) = 0.239.

• Thus the change in the predicted probability when P/I ratio

increases from 0.4 to 0.5 is 0.239 - 0.159, or 8.0 percentage

• Which larger than the increase of 6.2 percentage points when

P/I ratio increases from 0.3 to 0.4.

Probit regression with multiple regressors
Pr(Y = 1|X1, X2) = Φ (β0 + β1X1 + β2X2)
• The model is best interpreted by computing predicted probabilities and the
effect of a change in a regressor.
• Φ is the cumulative normal distribution function.
• The predicted probability that Y = 1, given values of X1, X2 is calculated by
computing the z-value, z = β0 + β1X1 + β2X2 and then looking up this z-value
in the normal distribution table (Appendix Table 1).
• z = β0 + β1X1 + β2X2 is the “z-value” or “z-index” of the probit model.
• β1 is the effect on the z-score of a unit change in X1, holding constant X2
• The effect on the predicted probability of a change in a regressor is
computed by
• (1) computing the predicted probability for the initial value of the regressors,
• (2) computing the predicted probability for the new or changed value of the
regressors, and
• (3) taking their difference.
STATA Example: HMDA data
. probit deny p_irat black, r;
Iteration 0: log likelihood = -872.0853
Iteration 1: log likelihood = -800.88504
Iteration 2: log likelihood = -797.1478
Iteration 3: log likelihood = -797.13604
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
We’ll go through the estimation details later…
STATA Example, ctd.: Predicted probit probabilities
. probit deny p_irat black, r;
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
. sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0;
. display "Pred prob, p_irat=.3, white: " normprob(z1);
Pred prob, p_irat=.3, white: .07546603
_b[_cons] is the estimated intercept (-2.258738)
_b[p_irat] is the coefficient on p_irat (2.741637)
sca creates a new scalar which is the result of a calculation
display prints the indicated information to the screen
STATA Example, ctd.
Pr (deny = 1|P/I, black)
= Φ(-2.26 + 2.74×P/I ratio + .71×black)
(.16) (.44) (.08)
• Is the coefficient on black statistically significant?
• Estimated effect of race for P/I ratio = .3:
Pr (deny = 1|.3,1)= Φ(-2.26+2.74×.3+.71×1) = .233

Pr (deny = 1|.3,0)= Φ(-2.26+2.74×.3+.71×0) = .075

• Difference in rejection probabilities = .158 (15.8 pp)

• Still plenty of room for omitted variable bias!

