Undergrad Labor Lectures PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 226

Wiswall, Labor Economics (Undergraduate), Lecture Notes 1

1 A Very Brief Math Review

All the math you need for this course is summarized here.

1.1 Sigma Notation

N
X
Yi = Y1 + Y2 + · · · + YN −1 + YN
i=1

P P
is called sigma notation. is the Greek capital letter sigma (σ is
the lower case sigma.). This symbol indicates a summation of a series of
variables. In this case, the notation says to start summing the Yi variables
starting with Yi=1 and ending at Yi=N .

1.2 Exponents

For any X, a, b, c,
i)

1
X −a =
Xa

and

1
= Xb
X −b

ii)
Wiswall, Labor Economics (Undergraduate), Lecture Notes 2

X a X b = X a+b

iii)

(X a )b = X ab

iv)

(X a Y b )c = X ac Y bc

v)

X0 = 1

X1 = X

1
X −1 =
X

X 1/2 = X

1
X −1/2 = √
X
1
X −1/3 = √
3
(cube root)
X
Wiswall, Labor Economics (Undergraduate), Lecture Notes 3

1.3 Logarithms

The natural logarithm is given by ln or sometimes log.

Rules for Logarithms

For any X and Y :

i) ln XY = ln X + ln Y .

ii) ln X
Y
= ln X − ln Y .

iii) ln X Y = Y ln X.

iv) ln 1 = 0.

v) ln 0 undefined.

1.4 Calculus

The partial derivative of a function f (X, Y ) with respect to X is

∂f (X, Y )
.
∂X

If there is only one argument of a function, the derivative is sometimes


given by this notation:

∂f (X)
= f 0 (X).
∂X
Wiswall, Labor Economics (Undergraduate), Lecture Notes 4

Rules for partial derivatives:

i) if f (X, Y ) = d + aX, where a and d are constants, then

∂f (X, Y )
= a.
∂X

Example:

∂(3X)
=3
∂X

ii) if f (X, Y ) = d + aX b , where a, b, d are constants, then

∂f (X, Y )
= abX b−1 .
∂X

Example:

∂(2 + 3X 5 )
= 3 ∗ 5 ∗ X 4 = 15 ∗ X 4
∂X

iii) if f (X, Y ) = d + aX b Y c , where a, b, c, and d are constants, then

∂f (X, Y )
= abX b−1 Y c .
∂X

Example:
Wiswall, Labor Economics (Undergraduate), Lecture Notes 5

∂(2 + 3X 5 Y 7 )
= 3 ∗ 5 ∗ X 4 Y 7 = 15 ∗ X 4 Y 7
∂X

iv) if f (X, Y ) = d + aX b Y c , where a, b, c, and d are constants, then

∂f (X, Y )
= aX b cY c−1 .
∂Y

Example:

∂(2 + 3X 5 Y 7 )
= 3 ∗ 7 ∗ X 5 Y 6 = 21 ∗ X 5 Y 6
∂Y

v) if f (X, Y ) = lnX,

∂f (X, Y ) 1
= .
∂X X

vi) Chain Rule: if f (X, Y ) = g(h(X)), then

∂f (X, Y ) ∂g(h(X)) ∂h(X)


= .
∂X ∂h(X) ∂X

An example: if f (X, Y ) = ln(aX b ), where a and b are constants, then

∂f (X, Y ) 1 b
= b
abX b−1 = .
∂X aX X
Wiswall, Labor Economics (Undergraduate), Lecture Notes 6

Note: g(·) = ln(aX b ) and h(X) = aX b .

What Do they Mean?

Derivatives indicate the relationship between a function and a variable.


Consider the case of a line

Y = f (X) = a + bX.

The derivative of Y , which is a function of X, is simply the slope of this


line.

∂f (X)
=b
∂X

If the slope is positive (b > 0), the derivative is positive. X increases Y .


If the slope is negative (b < 0), the derivative is negative. X decreases Y .
If the slope is zero (b = 0), there is no relationship between X and Y . This
is a flat line.

An Example
Consider a line:
1
Y = f (X) = 4 + X.
2

If X increases by 6, how is Y affected?

The derivative of Y with respect to X is


Wiswall, Labor Economics (Undergraduate), Lecture Notes 7

∂f (X) 1
=
∂X 2

If X increases by 6, Y changes by

∂f (X) 1
∗ ∆X = ∗ 6 = 3.
∂X 2

1.5 Optimization

An unconstrained optimization problem:

max f (X, Y ),
X,Y

where X and Y are the two choice variables, and f (X, Y ) is the objective
function.

Solving an Optimization Problem


To solve this problem, derive the first order conditions for maximization.

1) With respect to X

∂f (X, Y )
=0
∂X

2) With respect to Y
Wiswall, Labor Economics (Undergraduate), Lecture Notes 8

∂f (X, Y )
=0
∂Y

This provides a system of two equations and two unknown variables (X


and Y ). This system can be solved for the optimal values of X and Y , called
X ∗ and Y ∗

A constrained optimization problem:

max f (X, Y ) s.t. g(X, Y ) = C,


X,Y

where g(X, Y ) = C is the constraint in the problem.


In this course, the constraints are simple enough where we can find a
way to re-write the problem as an unconstrained problem. This is generally
accomplished by substituting the constraint into the objective function.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 9

2 Overview of the Labor Market

2.1 Labor Markets

A labor market is where some unit of labor is bought and sold. A unit of
labor may be an hour, year, or some other measure. Labor is a good with a
price called the wage.
There are many different kinds of labor (labor of college educated workers,
labor of high school educated workers, labor of experienced workers, labor
of inexperienced workers, labor of medical doctors, labor of waiters, etc.).
For each kind of labor there is a market. A market may have geographic
dimensions. The labor market for doctors in India may be a separate labor
market from the labor market for doctors in the United States.

2.2 Labor Market Participants

Workers

Workers supply labor. Workers are the employees of firms. Workers make
decisions about whether to work and how much to work. Workers receive
wages for their work, which they use to purchase consumer goods.

Firms

Firms demand labor. Firms are the employers of workers. Firms combine
labor with other inputs, such as capital (machines, factories, etc.), to produce
Wiswall, Labor Economics (Undergraduate), Lecture Notes 10

output goods. Firms can be one person (the worker is self-employed) or very
large with thousands of employees.

Government

Various levels of government (local, state, federal) enforce laws which


regulate wages (e.g. minimum wage laws), who can work (e.g. children can-
not work in the United States), and working conditions (e.g. construction
workers must wear hard hats). The government also taxes wages and sub-
sidizes individuals who do not or cannot work (e.g. Social Security benefits
for disabled workers).

Unions

Unions are groups of workers who collectively bargain labor contracts with
employers. Usually unions are formed among workers in the same industry or
occupation, for example the United Auto Workers (UAW) or the American
Federation of Teachers (AFT).

2.3 Measurement

Measuring employment, hours worked, and wages is far from straightforward.


There are many potential definitions. Some definitions are better suited for
some populations than others. In empirical applications, measurement is
dictated by data limitations, as surveys may only collect data according to
one set of measurements.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 11

2.3.1 Where Does Data Come From?

Data used in labor economics typically comes from three sources: 1) surveys
of individual who self-report information about how much they work and how
much they get paid, 2) surveys of firms in which the employers report similar
information for their employees, 3) government statistics (e.g. tax returns or
social security claims). Federal, state, and local governments spend millions
of dollars each year to fund labor force surveys. The United States decennial
Census is probably the most well known survey of individuals. Every 10 years,
the Census Bureau surveys all household in the United States, although only
a subset fill out the detailed “long form” of the Census.
The main source of labor force statistics in the United States is a survey
called the Current Population Survey (CPS) conducted by the Bureau of
Labor Statistics (BLS). Each month the survey contacts about 60,000 house-
holds (usually by phone) and asks them a series of questions about their labor
market activities in a recent week. From these questions, the BLS constructs
a number of statistics, the most widely cited of which is the unemployment
rate.

Measurement Errors

It is important to keep in mind that much of the data used in labor eco-
nomics is prone to several kinds of errors. Because the data is self-reported by
individuals voluntarily, some people will not respond. Having at least 20-30
percent of individuals refuse to respond to a survey is not uncommon. An-
Wiswall, Labor Economics (Undergraduate), Lecture Notes 12

other source of error occurs because the survey asks about past employment,
which respondents may not remember correctly. Do you know exactly how
many hours you worked last week or exactly how much you were paid last
year? Finally, some respondents have an incentive to intentionally misreport
information, even though most surveys like the Census and CPS have strict
confidentiality rules. For example, a respondent may want to under-report
her wages because of the fear that this information would be reported to the
Internal Revenue Service (it’s not).

2.3.2 Employment

The BLS divides the United States population into three mutually exclusive
groups: employed people, unemployed people, and people defined as “out of
the labor force” (OLF). Only adults 16 or older are counted; children are
excluded from all three groups.

Employed An individual is considered employed if they worked at a job for


at least 1 hour (self-employed are included) or worked at least 15 hours on a
nonpaid job (e.g. family farm). This latter part of the definition is difficult
to check, but generally this excludes people working at home for no pay (e.g.
mothers or fathers taking care of their children).

Unemployed An individual is considered unemployed if they are temporarily


unemployed (e.g. construction workers not working because of bad weather)
or have been actively looking for work in the past 4 weeks. Again, this
Wiswall, Labor Economics (Undergraduate), Lecture Notes 13

definition is somewhat vague and arbitrary. How do we define “actively”


looking for work?

Out of the Labor Force Anyone not considered either employed or unemployed
is counted as out of the labor force. This generally includes students without
jobs, homemakers, retirees, etc.

Definitions
Population = employed + unemployed + out of labor force
Labor Force = employed + unemployed
Labor Force Participation Rate = labor force / population
Unemployment Rate = unemployed / labor force.
The labor force participation rate in the United States is about 75 percent
for all men and 60 percent for all women. The labor force participation rate
varies substantially with age. For men age 25-44, the labor force participation
rate is about 90 percent.
The current unemployment rate is about 4-5 percent in the United States.
In many countries in Western Europe, the unemployment rate is around 10
percent.

Why are these statistics sometimes misleading?

Notice that the unemployment rate is affected by the size of the labor
force. However, individuals may enter or exit the labor force at any time.
During economic recessions, unemployed workers may stop “actively” look-
ing for work and be counted as out of the labor force. This discouraged
Wiswall, Labor Economics (Undergraduate), Lecture Notes 14

worker effect is used to explain why the measured unemployment rate un-
derstates the real economic conditions. Similarly, as the economy recovers
from a recession, workers out of the labor force may start looking for work.
This would factor would tend to overstate the measured unemployment rate
during an economic recovery.

2.3.3 Hours

Ideally labor economists would want to measure all labor by a continuous


variable, such as the number of hours worked. For workers who are paid
by the hour, this information is readily available. For workers paid a salary
or paid by the task (e.g. salesman paid sales commissions), the number of
hours worked is not so easily measured. I am paid a salary and I would have
difficulty reporting on a survey the exact number of hours I worked last week.
A related issue is that a continuous measure of hours may not be the
appropriate measure of labor supply. Few workers actually adjust their hours
continuously in response to changes in wage rates. If an employer offers a
lower wage, workers may simply quit the job and move from 40 hours to
0 hours. Many economic models would assume a worker smoothly adjusts
her hours by moving from 40 hours to 34 hours, for example. Later in the
course, we will discuss models that incorporate this idea of discontinuous
labor supply.
Firms, too, have a related problem. Firms generally hire workers not
hours of labor. Because of fixed hiring and firing costs, firms may also have
Wiswall, Labor Economics (Undergraduate), Lecture Notes 15

difficulty smoothly adjusting their labor demand.

2.3.4 Wages

One of the most challenging measurement issues in labor economics is mea-


suring an individual’s pay. The ideal situation would be if everyone was paid
a constant wage rate for each hour. Unfortunately, this is only the case for
some hourly workers. Calculating a worker’s hourly wage is difficult if the
worker is paid a salary, a sales commission, or receives tips.

Total Compensation

Wages are only one element of the total compensation workers receive.
Workers often receive a large share of their compensation in the form of ben-
efits, e.g. health insurance and pensions. Accurate information on benefits
is often difficult to obtain from respondents to surveys. And many surveys
do not ask for this information (e.g. there is only limited information in the
CPS on benefits).
Workers may also receive utility from simply working at the job. An
employer’s investment in the working conditions (e.g. new office furniture)
can change the level of utility a worker receives from a job. For many jobs,
an investment in protecting worker safety (e.g. installing smoke detectors)
can also increase the utility from the job by reducing the risk of injury or
death.
To the extent that the benefits and other aspects of utility from working at
Wiswall, Labor Economics (Undergraduate), Lecture Notes 16

a job are not counted, wages may understate or overstate the utility workers
receive from a job. If the level of these benefits varies systematically across
occupations and industries (e.g. few benefits for restaurant workers, many
benefits for accountants), wages may fail to capture the true variation in the
returns that workers receive from various jobs.

Real vs. Nominal Wages

An issue that often arises in tracking trends in wages over time is measur-
ing the purchasing power of wages as the prices of goods and services people
purchase changes. Real wages are wages adjusted for price changes. Since
prices have generally been increasing over time in the recent history, real
wages are adjusted for inflation. Nominal wages are the actual wage paid in
any given period of time.
We generally use a price index based on a representative bundle of goods
to measure price changes. For the United States, this is most often one of
the price index series which is part of the Consumer Price Index (CPI). Price
indexes work by picking a base year, say 2005, and expressing all other prices
in terms of this base year.
An example: Normalize our base year price to p2005 = 100. If there
was 3 percent inflation from 2005 to 2006, our price index increased from
p2005 = 100 to p2006 = 103. We can construct a price deflator as

p2005 100
= = 0.971
p2006 103
Wiswall, Labor Economics (Undergraduate), Lecture Notes 17

We can deflate 2006 nominal wages into 2005 dollars by multiplying the
2006 nominal wages by this price deflator. A worker is paid a nominal wage
of w2006 = 11 in 2006 and a nominal wage of w2005 = 10 in 2005. In nominal
terms, the workers wage increased by 10 percent.
The real wage in 2006 (in 2005 dollars) is

R p2005 100
w2006 = w2006 ∗ = 11 ∗ = 10.68
p2006 103

Given inflation, real wages only increased 6.8 percent from 2005 to 2006,
although nominal wages increased 10 percent. We would argue that the real
wage increase better represents what how much this worker’s living standards
increased, as the real wage indicates how many more goods and services she
can purchase.

2.4 Current Statistics for the United States

Take a look at the wealth of information on the BLS webpage.

http://www.bls.gov

2.5 Supply, Demand, and Equilibrium in the Labor

Market

We can readily apply the models learned in introductory microeconomics


to the labor market. As we will discuss later, labor is a unique good and
Wiswall, Labor Economics (Undergraduate), Lecture Notes 18

the supply and demand for labor will receive specialized treatment in future
lectures. For now, we can think of an hour of labor as any other good, like
a bushel of wheat.
Label h as the aggregate hours of labor in the market. The price of labor
is the wage rate w.
Figure 1 displays a standard supply and demand graph with w replacing
price p on the vertical axis and h replacing quantity q on horizontal axis. As
in in the market for wheat, the demand curve for labor is downward sloping.
The higher the price of labor, the less firms demand of it. The demand curve
is the sum of the labor demands of many individual firms. The supply curve
is upward sloping. The higher the price of labor, the more workers will supply
of it. The supply curve is the sum of the labor supply of many individual
workers.
The intersection of the labor supply and labor demand curves provides
the equilibrium wage (w∗ ) and labor employed (h∗ ). All workers are paid w∗
and the total number of hours worked in the economy is h∗ .
Wiswall, Labor Economics (Undergraduate), Lecture Notes 19

3 Labor Demand

3.1 Model of Firm Demand for Labor

Firms purchase capital and labor in the input markets. (Note: input markets
are sometimes called “factor markets” for “factors of production”). Firms
combine these inputs to make output goods, which it sells in the output
market.
For simplicity, we assume there is only one kind of labor measured in
labor hours h. Labor is paid only one wage w for each hour. Later, we will
relax these assumptions and examine a model with different kinds of labor
(e.g. labor of experienced and unexperienced workers) and different wage
rates paid to each kind of labor.

It is important to note that the firm is involved in three separate markets:

i) Labor Market: the firm is a consumer in this market.

ii) Capital Market: the firm is a consumer in this market.

iii) Output or Product Market: the firm is a supplier in this market.

We first examine a model in which each market is competitive: com-


petitive labor market, competitive capital market, and competitive output
market. Firms cannot influence input or output prices, and take these prices
as given or constant (they are “price takers”). Later, we will examine non-
competitive markets (e.g. monopsony in the labor market).
Wiswall, Labor Economics (Undergraduate), Lecture Notes 20

Competitive Markets Assumptions:


1) Firms can hire unlimited labor hours, h, at a constant wage rate w.
2) Firms can rent unlimited amounts of capital, k, at a constant rental rate
r.
3) Firms can produce one good, q, and sell unlimited amounts of this good
at a constant price p.

Are Firm Profits Zero? General vs. Partial Equilibrium

Although it does not play a large role in the discussion that follows, it
should be noted that firms can earn positive profits in this model. This is
a partial equilibrium model and focuses on how changes in given input and
output prices affect a firm’s demand for labor and capital. In the partial
equilibrium model, input and output prices are exogenous or given. In a
general equilibrium model, the input and output prices may respond to the
labor and capital demand decisions of the firms in the economy. In a general
equilibrium model, input and output prices are endogenous, and reflect the
level of market demand for inputs and the level of market output. In a general
equilibrium model, it is possible that competition among firms would lower
profits to near zero for individual firms.

Production Function

Output is produced according to this production function. The produc-


tion function embodies the technology of production.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 21

q = f (h, k)

Output is increasing in labor hours and capital.

∂f (h, k)
>0
∂h

∂f (h, k)
>0
∂k

Labor Demand of a Firm and Market Demand

For the moment, we consider a representative firm. However, the market


demand for labor comes from a number of potentially heterogenous firms. In
general, firms can differ in their production technologies (production func-
tions), the output goods they produce, and the prices they receive for their
output goods. Firms that sell hot dogs and firms that manufacture pencils
both demand labor, but differ in many dimensions. The labor demand curve
for a kind of labor is the aggregation of the labor demand of these many
different firms.

3.2 Profit Maximization

Profits, π, are defined as total revenue minus total costs:


Wiswall, Labor Economics (Undergraduate), Lecture Notes 22

π = pq − wh − rk.

Firms choose h and k to maximize profits. Firms do not choose prices.


Firms also do not choose output. Output is a function of input choices as
given by the production function.
The firm’s maximization problem is

max pq − wh − rk s.t. q = f (h, k),


h,k

where s.t. is “subject to” and indicates that the production function is
the constraint in this problem.
The simplest way to solve this problem is to substitute the production
function for q. This yields an unconstrained maximization problem:

max pf (h, k) − wh − rk.


h,k

Now derive the first order conditions.


1) Write the partial derivative with respect to h:

∂f (h, k)
p = w.
∂h

2) Write the partial derivative with respect to k:


Wiswall, Labor Economics (Undergraduate), Lecture Notes 23

∂f (h, k)
p = r.
∂k

With w, r, p, and an assumed form for the production function, we


can solve these two equations for the two unknown variables h∗ (w, r, p) and
k ∗ (w, r, p). h∗ (w, r, p) and k ∗ (w, r, p) are the optimal amounts of labor hours
and capital units the firm would purchase. These optimal values are func-
tions of the input and output prices. The firm’s demand for labor and capital
changes as these prices vary.
It should also be noted that from the optimal labor and capital demand,
the profit maximizing output can be calculated by substituting these values
into the production function.

q ∗ = f (h∗ (w, r, p), k ∗ (w, r, p))

The profit maximizing levels of total revenue (T R = pq ∗ ), total costs


(T C = wh∗ (w, r, p) + rk ∗ (w, r, p)), and profits can all be calculated by sub-
stituting the optimal levels of labor and capital.

3.3 Some Terminology

Marginal Product of Labor


Wiswall, Labor Economics (Undergraduate), Lecture Notes 24

∂f (h, k)
M Ph =
∂h

Marginal Product of Capital

∂f (h, k)
M Pk =
∂k

Marginal Rate of Technical Substitution

M Ph
M RT S =
M Pk

Marginal Revenue Product of Labor

M RPh = pM Ph

Marginal Revenue Product of Capital

M RPk = pM Pk
Wiswall, Labor Economics (Undergraduate), Lecture Notes 25

3.4 What Do the First Order Conditions Mean?

Given the definitions of marginal product of labor and capital, we can re-
write the first order conditions as
1)

pM Ph = w,

or

M RPh = w.

2)

pM Pk = r,

or

M RPk = r.

M RPh provides the value in additional revenue of one more labor hour.
Each additional labor hour produces M Ph ∗ 1 of output. This output can be
sold in the output market at price p. The wage rate provides the cost of one
more labor hour. The marginal cost of labor is w.
The intuition behind the first order conditions is that the firm should
Wiswall, Labor Economics (Undergraduate), Lecture Notes 26

continue to purchase labor and capital inputs up until the point that the
marginal benefit of these inputs (M RPh and M RPk ) equals the marginal
cost, w and r. The first order conditions are simply a re-statement of a fun-
damental principle of economics: decision makers make choices by equating
the marginal benefit with the marginal cost.

At the point where M RPh > w, the benefit of more labor hours exceeds
the cost. The firm should hire more labor and increase profits.

At the point where M RPh < w, the cost of more labor hours exceeds the
benefit. The firm is losing money at this point and should hire less labor.

3.5 Labor Demand Elasticity (with respect to wages)

The labor demand elasticity with respect to wages is defined as

%∆h∗ (w, r, p)
= .
%∆w

or

∂h∗ (w, r, p) w
= .
∂w h∗

(To see the connection between these two definitions of elasticity, note
X1 −X0 ∆X
that the percent change in X from X0 to X1 is X0
100% = X0
100%.)

Another way to write this is using natural logs


Wiswall, Labor Economics (Undergraduate), Lecture Notes 27

∂ ln h∗ (w, r, p)
=
∂ ln w

Given the assumptions of profit maximization,  ≤ 0. This is equivalent


to the assumption that the labor demand curve is downward sloping.

The labor demand elasticity has the following interpretation:

i)  = 0 (perfectly inelastic labor demand). If wages increase by X %, labor


demand does not change.

ii) −1 <  < 0 (inelastic labor demand). If wages increase by X %, labor


demand decreases by less than X %.

iii)  = −1 (unit elastic labor demand). If wages increase by X %, labor


demand decreases by exactly X %.

iv)  < −1 (elastic labor demand). If wages increase by X %, labor demand


decreases by more than X %.

v)  = −∞ (perfectly elastic labor demand). If wages increase by X %, labor


demand goes to 0 after the wage increase.

It is important to note that  is (in general) a function of input prices (w


and r), level of labor hours (h) and units of capital (k) already used, and the
output price (p). This means that the labor demand elasticity can vary as
these factors change.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 28

3.6 Scale and Substitution Effects

We can decompose how changes in wages affect labor demand into two fac-
tors. Consider a reduction in the wage rate (and everything else remains the
same, including capital and output prices).

1) Scale Effect
Lowered wage rates decreases the cost of production. This induces firms
to increase output and demand more labor. This effect is called a scale effect
and is analogous to an income effect in consumer theory (e.g. price of apples
declines, the consumer is in effect “wealthier”, so she purchases more of all
goods).

2) Substitution Effect
Lowered wages relative to capital rental rates makes labor relatively less
expensive compared to capital. This induces the firm to shift its input mix
toward labor, and away from capital, and therefore demand more labor. The
substitution effect is analogous to the substitution effect in consumer theory
as the relative prices of consumer goods change (e.g. price of apples declines
relative to oranges, so consumers purchase more apples).

3.7 Elasticity of Substitution

The elasticity of substitution indicates how easily firms can change their
input mix as relative input prices change. The elasticity of substitution
Wiswall, Labor Economics (Undergraduate), Lecture Notes 29

is determined by the production function. The elasticity of substitution is


defined holding output constant (no scale effects):

%∆(k/h)
σ= ,
%∆(w/r)

or

∂(k/h) w/r
σ= .
∂(w/r) k/h

Or in logs,

∂ ln(k/h)
σ= .
∂ ln(w/r)

Notice that for profit maximization, w/r = M RT S. Recall the first order
conditions for profit maximization:

M Ph w
M RT S = = .
M Pk r

Substituting this into the definition of σ, the elasticity of substitution can


also be written as

%∆(k/h)
σ= ,
%∆(M RT S)

or
Wiswall, Labor Economics (Undergraduate), Lecture Notes 30

∂(k/h) M RT S
σ= .
∂(M RT S) k/h

Or in logs,

∂ ln(k/h)
σ= .
∂ ln(M RT S)

σ ≥ 0. Higher values of σ indicate that the firm can more easily substi-
tute inputs as the relative input prices change. For example, a firm with a
w
high elasticity of substitution would respond to an increase in r
by rapidly
substituting capital for labor (e.g. replacing workers with machines).
It should be clear then that a firm with a production technology char-
acterized by a high elasticity of substitution should also have a high labor
demand elasticity with respect to wages (high σ implies a high ). A firm that
can easily substitute between labor and capital would respond to a change in
wage rates by rapidly shifting between labor and capital and rapidly changing
its labor demand.

3.8 Types of Production Functions

Firms have different types of production technologies indicated by different


production functions. There is likely a considerable amount of variation in
the technology of production among firms in different industries (e.g. man-
Wiswall, Labor Economics (Undergraduate), Lecture Notes 31

ufacturing vs. services). Even facing the same input prices, a firm manufac-
turing cars uses a very different combination of labor and capital than a firm
that produces accounting services.

3.8.1 Isoquant Curves

Figures 2a, 2b, 2c graph different types of production functions. In these


graphs, capital (k) is measured along the vertical axis and labor (h) along the
horizontal axis. The curve in each of the figures is an isoquant curve. Along
this curve production of the output good is constant at q0 . The equation for
an isoquant curve is

q0 = f (h, k)

The shape of the isoquant curve depends on the functional form of the
production function. The slope of this curve is −M RT S. The two extreme
cases are perfect substitutes and perfect complements.

3.8.2 Perfect Substitutes

Figure 2a graphs a perfect substitutes isoquant:

q0 = f (h, k) = ah + bk,

where a ≥ 0 and b ≥ 0 are constants.


Wiswall, Labor Economics (Undergraduate), Lecture Notes 32

An example of a perfect substitutes isoquant:

q0 = f (h, k) = 2h + 3k,

A perfect substitutes isoquant is a straight line with equation

q0 a
k= − h.
b b

The vertical intercept of this line is q0 /b. The horizontal intercept is q0 /a.
In the perfect substitutes case, labor and capital inputs can be substituted
by firms at a constant rate. Notice that the slope of this line −a/b is −M RT S.
The elasticity of substitution for a perfect substitutes production function
is σ = ∞.

3.8.3 Perfect Complements

Figure 2b graphs a perfect complements isoquant:

q0 = f (h, k) = min(ch, dk),

where c ≥ 0 and d ≥ 0 are constants.


An example of perfect complements:

1
q0 = f (h, k) = min( h, 4k).
2
Wiswall, Labor Economics (Undergraduate), Lecture Notes 33

Production is given by fixed proportions of labor and capital. In the


perfect complements case, production is limited by the smallest input:

If dk < ch, then q0 = dk. In this case, adding additional labor does not
increase production.

If dk > ch, then q0 = ch. In this case, adding additional capital does not
increase production.

A perfect complements isoquant is an “L” shaped curve with a vertical


line at h = q0 /c and a horizontal line at k = q0 /d. The slope at the vertex
of the isoquant is −c/d, which is −M RT S.
The elasticity of substitution for a perfect complements production func-
tion is σ = 0.

3.8.4 Cobb-Douglas

Another type of production function is called a Cobb-Douglas production


function:

q0 = f (h, k) = hθ1 k θ2 ,

where θ1 ≥ 0 and θ2 ≥ 0 are constants.

Figure 2c graphs a Cobb-Douglas isoquant.

The elasticity of substitution for a Cobb-Douglas production function is


σ = 1.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 34

An example of a Cobb-Douglas isoquant is

q0 = f (h, k) = h1/2 k 1/4 .

This production function is a middle ground production between perfect


substitutes and perfect complements. The shape of the production function
depends on the values of θ1 and θ2 . The slope of the curve is −M RT S.
Derive M RT S as

M Ph θ1 h(θ1 −1) k θ2 θ1 k
M RT S = = θ1 (θ2 −1) = .
M Pk h θ2 k θ2 h

The slope of the Cobb-Douglas isoquant is then

θ1 k
− .
θ2 h

Unlike the perfect substitutes or perfect complements isoquants, the slope


of the Cobb-Douglas isoquant changes depending on the values of k and h.
The slope increases (in absolute value) as the ratio of capital to labor (k/h)
increases.

3.8.5 Calculating the Elasticity of Substitution

Can we show that the elasticity of substitution for the Cobb-Douglas pro-
duction function is σ = 1?
Wiswall, Labor Economics (Undergraduate), Lecture Notes 35

Use the log form of σ with M RT S in the definition:

∂ ln(k/h)
σ= .
∂ ln(M RT S)

What is ln M RT S?
From above the M RT S for the Cobb-Douglas production function is

θ1 k
M RT S =
θ2 h

Taking logs,

θ1 k
ln M RT S = ln( ) + ln( )
θ2 h

Re-arranging,

k θ1
ln( ) = ln M RT S − ln( )
h θ2

Therefore,

∂ ln(k/h)
σ= = 1.
∂ ln(M RT S)

Note that σ is always 1 for the Cobb-Douglas production function. σ in


this case does not depend on the parameters θ1 or θ2 .
Wiswall, Labor Economics (Undergraduate), Lecture Notes 36

3.9 A Cobb-Douglas Example

An example: q = f (h, k) = h1/4 k 1/4 .

Marginal product of labor:

1
M Ph = h−3/4 k 1/4 .
4

Marginal product of capital:

1
M Pk = h1/4 k −3/4 .
4

Profit maximizing combination of labor and capital:


First order conditions:
1)

1
p h−3/4 k 1/4 = w.
4

and
2)

1
p h1/4 k −3/4 = r.
4

Divide 1) and 2):


Wiswall, Labor Economics (Undergraduate), Lecture Notes 37

k w
= .
h r

Solve for k:

w
k=h
r

Now substitute this equation back into either first order condition.
Substitute into first order condition 1):

1 w
p h−3/4 (h )1/4 = w.
4 r

Solve for h and simplify:

1
p h−3/4 (hw/r)1/4 = w
4

w
h−3 h+1 = w4 p−4 44
r

h−2 = w3 p−4 r44


Wiswall, Labor Economics (Undergraduate), Lecture Notes 38

Finally,
1 p2
h∗ =
16 r1/2 w3/2

Labor demand is a function of output prices, price of capital, and wage


rates. Check to make sure this seems right. The equation indicates that the
optimal level of labor demand is increasing in output prices, but decreasing
in capital prices and wages.

Solve for the optimal input of capital by substituting the optimal level of
labor demand.

1 p2 w
k∗ = ( 1/2 3/2
)
16 r w r

Simplify,

1 p2
k∗ =
16 r3/2 w1/2

Again, check to make sure this seems right.

The optimal level of production is found by substituting optimal factor


inputs into the production function:

q ∗ = (h∗ )1/4 (k ∗ )1/4

Anything else can be found the same way (profits, total costs, marginal
cost, etc.)
Wiswall, Labor Economics (Undergraduate), Lecture Notes 39

What is labor demand if p = 100, r = 16, and w = 2?


Simply substitute these prices into the labor demand equation:

1 1002
h∗ = = 55.243
16 161/2 23/2

(Note: there is some rounding error, so h∗ is approximately 55.243.)

Are the First Order Conditions Satisfied?

One easy way to check your work is to see whether the first order condi-
tions are satisfied at these optimal levels of labor and capital demand.

Calculating Labor Demand Elasticity with respect to Wages

The labor demand elasticity with respect to wages is

∂h∗ (w, r, p) w
=
∂w h∗

For this example, the first part is

∂h∗ (w, r, p) 1 2 −1/2 3 −5/2 3 p2


= pr (− )w = (− ) 1/2 5/2 .
∂w 16 2 2 r w

Substituting this into the equation for :

3 p2 w 3 p2 1
 = (− ) 1/2 5/2 ∗ = (− ) 1/2 3/2 ∗
2 r w h 2 r w h
Wiswall, Labor Economics (Undergraduate), Lecture Notes 40

Notice that  is a function of w, p, r, and h∗ . (Note: this is how you


should express  when you are asked to express it as a function, i.e. do not
substitute for the value h∗ .)

By substituting for w, p, r, and h∗ , we can calculate the exact elasticity


of substitution at the point of optimal labor demand (h∗ ). (Note: in general,
we could evaluate  at other h values, but these are not as relevant as the
optimal labor demand point.)
Substituting for w = 2, p = 100, r = 16, and h∗ = 55.243,

3 1002 1
 = (− ) 1/2 3/2 = −1.5
2 16 2 55.243

(Note: there is some rounding error, so  is approximately −1.5.)


Another way to find this number is to substitute the labor demand equa-
tion directly into the elasticity equation:

3 p2 1
 = (− ) 1/2 3/2 .
2 r w 1/16 r1/2pw2 3/2

Simplifying,

3 16 ∗ 3
 = (− )p2 r−1/2 w3/2 16p−2 r1/2 w3/2 = − = −1.5.
2 32

This value is the same as above.


This value of  indicates that the labor demand elasticity with respect to
Wiswall, Labor Economics (Undergraduate), Lecture Notes 41

wages (at the point of optimal labor demand h∗ ) is elastic:  > 1.

3.10 Aggregate Labor Market

The total market demand for labor in the economy is the sum of the labor
demand of each firm. If each firm j has a different production function
PN
(indexed j), total labor demand is j=1 h∗j (w, r, p), where N is the number
of firms. If all firms are identical, the market demand for labor is simply:
N ∗ h∗ (w, r, p), where h∗ (w, r, p) is the optimal labor demand of each firm.
(Note: I will use the same lower case h to indicate market labor demand and
individual firm labor demand. The distinction depends on the context.)

3.10.1 What Affects Aggregate Demand for Labor?

As we have seen, several factors affect h∗ and the market demand for labor.
The four most basic conclusions:

1) Lower wage rates increase h∗ .

2) Higher output prices increase h∗ .

3) Lower rental rates on capital can increase h∗ through a scale effect. A


decline in r reduces total costs and causes the firm to increase production.
The increased level of production causes the firm to increase its labor demand.

4) Lower rental rates on capital can decrease h∗ through a substitution effect.


As r declines, capital becomes less expensive than labor.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 42

3.10.2 Labor Demand Curve

The labor demand curve we discussed in the labor market overview section
graphs the relationship between the aggregate h∗ for all firms and the wage
rate (w). The other factors that affect h∗ , such as p and r, and the shape
of the production function, affect the location of the curve (e.g. rotate it,
shift it in or out). A given labor demand curve therefore reflects a particular
production technology and prices (p and r). Changes in w are reflected in
changes along a given labor demand curve.

3.10.3 Elasticity of Labor Demand and Market Demand

The elasticity of labor demand (with respect to wage rates) for the aggregate
labor market depends on the  value of individual firms. Consider the two
extreme cases. Reality is somewhere in between these two cases.

Perfectly Elastic Labor Demand

If all firms have an  = −∞, then the labor market demand curve is a flat
(horizontal) line at the equilibrium wage rate (w∗ ). Labor demand is perfectly
elastic in this case. An increase in the wage rate from the equilibrium rate
causes all firms to lower their labor demand to 0.

Perfectly Inelastic Labor Demand

If all firms have an  = 0, then the labor market demand curve is a vertical
line at the equilibrium number of hours used in the labor market (h∗ ). Labor
Wiswall, Labor Economics (Undergraduate), Lecture Notes 43

demand is perfectly inelastic in this case. An increase in the wage rate from
the equilibrium rate does not change labor demand at all and simply raises
the equilibrium wage rate.

3.11 Minimum Wage Laws

Minimum wages are price floors in the labor market. A minimum wage law
states that no worker can be paid less than the minimum wage. Generally,
minimum wage laws in the United States specify some exceptions to the
law. Some workers in “uncovered” sectors are not subject to the minimum
wage laws. Most workers are in the “covered” sector and must be paid
at least the minimum wage. Minimum wages laws are set by all levels of
government. Currently, the federal minimum wage is $5.15. Some states
and city governments (mainly large cities and states where the cost of living
is higher) set their own, higher, minimum wages. Some cities have passed
“living wage” laws, which set even higher minimum wages.

See this website for some information on minimum wages in the United
States:

http://www.dol.gov/esa/minwage/america.htm

3.11.1 Graphing a Minimum Wage

Figure 3 graphs a minimum wage law in a basic labor supply and demand
graph. The minimum wage is set at w, which is above the equilibrium wage
Wiswall, Labor Economics (Undergraduate), Lecture Notes 44

level of w∗ . A minimum wage set below the equilibrium wage would have no
effect. The minimum wage reduces equilibrium labor hours employed from
h∗ to h0 . This reduction in employment because of the minimum wage is
often called the employment effect.

3.11.2 Labor Demand Elasticities and Minimum Wage

From our analysis of labor demand, we know that the demand for labor is
decreasing in wages (downward sloping labor demand curve). The extent of
employment effects depends on the labor demand elasticity with respect to
wage changes. Especially important to recognize is that the relevant labor
demand elasticity is that for the low wage labor market (e.g. manual laborers,
workers in retail and fast food restaurants). How firms, which hire mainly
workers with wages well above the minimum wage, respond to minimum
wages is not relevant.
If the labor demand elasticity with respect to wages is low (|| is low),
the employment effects of minimum wages are low.
In the one extreme case, where labor demand is perfectly inelastic (a
vertical labor demand curve), there are no employment effects. A minimum
wage law in this case generates only the benefit of higher wages.
In the other extreme case, where labor demand is perfectly elastic (a
horizontal labor demand curve), there are no wage gains resulting from a
minimum wage wage. Faced with higher wage rates, firms with a perfectly
elastic labor demand choose to hire no workers (e.g. they substitute capital
Wiswall, Labor Economics (Undergraduate), Lecture Notes 45

for workers).
The reality is somewhere between the two extreme cases. The tradeoffs
involved in minimum wage laws involve a choice between a labor market
with higher employment and lower wages and a labor market with lower
employment and higher wages.
Empirical research on this topic has generally found that the employ-
ment effects associated with recent minimum wage are small. This does not
imply that minimum wage increases, especially large increases, would not
have substantial employment effects. It is difficult to predict the effects of
minimum wage changes because of the difficulty of estimating labor supply
and demand functions. Most empirical work which analyzes minimum wages
examines the effects of prior changes. These previous experiences may have
only limited relevance to future, higher changes.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 46

4 Labor Supply

4.1 Model of Labor Supply

The following is a simple model of individual labor supply decisions. At the


end of this section, we will examine extensions to this model to make it more
realistic.
In this model, individuals choose how many hours to work in the labor
market (h) and how many consumer goods to purchase (c). Every hour the
individual does not work is leisure time (l). Leisure is essentially another
good that the individual can purchase by not working. The total amount of
time an individual has is T . Therefore, the time constraint in the problem
is T = l + h. For every hour the individual works, she receives a wage w per
hour.
Individuals have preferences over the two goods, consumer goods (c) and
leisure (l). Preferences are indicated by a utility function: U (c, l). The utility
function provides the number of “utils” or satisfaction the individual receives
from different combinations of c and l.
The price for consumer goods is p. The price of leisure is foregone wages
or w. Like our labor demand model, this is a partial equilibrium model and
these prices are taken as given by the individual. The prices are exogenous.
The individual is assumed to have two sources of income: labor income
from working (wh) and non-labor income called V . V can be thought of
as wealth the individual has accumulated (e.g. savings or inheritance from
Wiswall, Labor Economics (Undergraduate), Lecture Notes 47

a rich relative). The individual receives this non-labor income even if she
chooses not to work (h = 0).

4.2 Utility Maximization

The individual maximizes her utility by choosing c and l subject to a budget


constraint and a time constraint. The budget constraint is that total income
must equal total expenditures on consumer goods:

V + wh = pc.

Total income is non-labor income plus labor income (V + wh). Total


expenditure on consumer goods is pc.
The maximization problem is then

max U (c, l) s.t. V + wh = pc and T = l + h.


c,l

Re-written in terms of leisure,

max U (c, l) s.t. V + w(T − l) = pc


c,l

Without a more complex setup (i.e. writing the constrained optimization


problem as a Lagrangian function) or specifying the functional form of the
utility function (see the Cobb-Douglas example below), we cannot derive the
Wiswall, Labor Economics (Undergraduate), Lecture Notes 48

first order conditions explicitly, as in the labor demand analysis. Instead, we


will solve this optimization problem by stating that the optimal leisure hours
choice and consumer goods choice must satisfy this tangency condition:

∂U (c,l)
∂l w
∂U (c,l)
= .
∂c
p

Below, we will describe how this tangency condition relates to a graphical


representation of the utility maximization problem.
With w, p, V , T , and a functional form for the utility function, we can
solve this problem for the optimal number of leisure hours demand l∗ (w, p, V ),
the optimal purchase of consumer goods c∗ (w, p, V ), and the optimal labor
supply decision h∗ (w, p, V ) = T − l∗ (w, p, V ).

4.3 Some Terminology

Marginal Utility of Leisure

∂U (c, l)
M Ul =
∂l

Marginal Utility of Consumption

∂U (c, l)
M Uc =
∂c

Marginal Rate of Substitution


Wiswall, Labor Economics (Undergraduate), Lecture Notes 49

M Ul
M RS =
M Uc

This is the M RS of l for c.

4.4 What Does the Tangency Condition Mean?

The marginal utility of leisure provides the value of one more hour of leisure.
The cost of obtaining this extra leisure is that the individual must give up
an hour of work, which would provide w of additional income. w therefore is
the “price” of leisure. w is the marginal cost of leisure.
The marginal utility of consumer goods provides the value of one more
unit of consumer goods. The marginal cost of obtaining an additional unit
of consumer goods is the price of the consumer goods p.
The intuition behind the first order conditions is that the individual
should continue to purchase leisure and consumption goods until the point
at which the ratio of the marginal benefit of the two goods (M RS) equals
the ratio of the marginal costs (w/p). The tangency condition is simply

w
M RS =
p

This tangency condition is simply a re-statement of a fundamental prin-


ciple of economics: decision makers make choices by equating the marginal
Wiswall, Labor Economics (Undergraduate), Lecture Notes 50

benefit with the marginal cost.

4.5 Labor Supply Elasticity with Respect to Wages

The labor supply elasticity with respect to wages is defined as:

%∆h∗ (w, p, V )
γ= ,
%∆w

or

∂h∗ (w, p, V ) w
γ= .
∂w h∗

Or in logs,

∂ ln h∗ (w, p, V )
γ=
∂ ln w

The labor supply elasticity with respect to wages can be positive or neg-
ative: −∞ ≤ γ ≤ +∞.

The labor supply elasticity has the following interpretation. This inter-
pretation and terminology is similar to that for the labor demand elasticity.
However, note that γ can be negative or positive, so the interpretation is for
the absolute value of γ: |γ|.

i) |γ| = 0 (perfectly inelastic labor supply). Changes in wage rates do not


affect labor supply.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 51

ii) 0 < |γ| < 1 (inelastic labor supply). If wages change by X %, labor supply
changes by less than X %.

iii) |γ| = 1 (unit elastic labor supply). If wages change by X %, labor supply
changes by exactly X %.

iv) |γ| > 1 (elastic labor supply). If wages change by X %, labor supply
changes by more than X %.

v) a) γ = ∞ (perfectly elastic labor supply). If wages change by X %, labor


supply increases to ∞.

v) b) γ = −∞ (perfectly elastic labor supply). If wages change by X %, labor


drops to 0.

It is important to note that γ is (in general) a function of prices (w and


p), level of labor hours or leisure (h or l), and consumer goods (c). This
means that the labor supply elasticity can vary as these factors change.

4.6 Income and Substitution Effects

We can decompose how changes in wage rates affect labor supply into two
factors. Consider an increase in wages.

Substitution Effect
Higher wages increase the return to work and raise the price of the leisure
good. The worker substitutes away from leisure toward consumer goods. This
causes the individual to demand less leisure and work more.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 52

Income Effect
Higher wages increase the labor income for the individual. In effect, the
individual is wealthier. If leisure is a normal good (demand for leisure is
increasing in income), then higher income causes the individual to consume
more leisure and work less.

Key
Unlike the scale and substitution effects with labor demand, the sub-
stitution and income effects in the labor supply case move in the opposite
direction. Which effect is stronger determines whether an increase in wage
rates reduces or increases labor supply.

If the substitution effect is larger than the income effect, an increase in


wages increases labor supply.

If the income effect is larger than the substitution effect, an increase in


wages decreases labor supply.

4.7 Indifference Curves and Budget Lines Graphs

Figure 4 graphs the utility maximization problem. On the vertical axis is


units of consumption (c). On the horizontal axis are hours of leisure (l). The
two objects in the graph are an indifference curve and a budget line.

Budget Line

The budget line is the budget constraint from above


Wiswall, Labor Economics (Undergraduate), Lecture Notes 53

V + wh = pc.

Re-written in terms of leisure, the budget line becomes

V + w(T − l) = pc.

Re-arranging for an equation of a line (y = c and x = l).

V w w
c= + T− l
p p p

V w
The vertical intercept is c = p
+ p
T . If the individual always works
(consumes 0 leisure), she receives wT in labor income. Therefore, she can
V w
buy p
+ p
T worth of consumer goods.
w
In this simple model, p
is the real wage. As opposed to w which is the
nominal wage.
Notice that the horizontal intercept (c = 0) is never reached. Even if the
individual never works (l = T ), she still can use her non-labor income V to
V
purchase p
units of consumer goods.
The slope of the line is the negative of the price ratio − wp . This indicates
the rate at which workers can trade leisure for consumer goods.

Indifference Curve
Wiswall, Labor Economics (Undergraduate), Lecture Notes 54

The indifference curve is determined by the utility function. It indicates


the preferences the individual has over consumer goods and leisure. Along
an indifference curve, utility is constant.
The equation of an indifference curve is

U0 = U (c, l),

where U0 is some constant utility. The slope of the indifference curve is


the marginal rate of substitution between leisure and consumer goods.
Indifference curves can take any of the forms of production functions
discussed above: perfect substitutes, perfect complements, or Cobb-Douglas.
The analysis of each of these types of production functions follows through
to the utility functions.
For example, a Cobb-Douglas utility function would take the form

U (c, l) = cθ1 lθ2

Tangency Condition

Where the indifference curve and the budget line are tangent gives the
optimal combination of consumption and leisure. At this point, the M RS =
w/p. This is simply a graphical representation of tangency condition we
discussed above.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 55

4.8 A Cobb-Douglas Example

An example: U (c, l) = cl.


Marginal utility of consumption:

M Uc = l

Marginal utility of leisure:

M Ul = c

Utility maximization problem is

max cl s.t. V + w(T − l) = pc.


c,l

We noted above that this condition always holds:

w
M RS = .
p

In this problem,

M Ul c
M RS = = .
M Uc l

Substituting,
Wiswall, Labor Economics (Undergraduate), Lecture Notes 56

c w
=
l p

Solve for c,

lw
c=
p

Now substitute this into the budget constraint:

lw V w
= + (T − l)
p p p

Simplify,

lw = V + w(T − l)

V
l= +T −l
w

V
2l = +T
w

V
l∗ (w, p, V, T ) = 1/2( + T)
w
Wiswall, Labor Economics (Undergraduate), Lecture Notes 57

Check to see if this seems right. Leisure consumption is decreasing in its


price, w. Leisure consumption is increasing in non-labor income V .

Another Way to Solve this Problem

Another way to solve this problem is to substitute for consumption from


V
the budget constraint: c = p
+ wp (T − l). This reduces the maximization
problem above to a choice of leisure only.

V w
max[( + (T − l))l]
l p p

Simplify,

V w w
max[ l + T l − l2 ]
l p p p

The first order condition with respect to leisure is

V w w
+ T =2 l
p p p

Solve for optimal l,

1 V
l∗ (w, p, V, T ) = ( + T )
2 w

This is the same equation as above.

Continuing the Problem


Wiswall, Labor Economics (Undergraduate), Lecture Notes 58

With an equation for optimal leisure demand, we can find everything else.
Optimal labor supply is simply

1 V
h∗ (w, p, V, T ) = T − l∗ (w, p, V, T ) = T − ( + T ).
2 w

Simplifying,
1 1V
h∗ (w, p, V, T ) = T −
2 2w

If the person has no non-labor income (V = 0), she works exactly 1/2
of her time, regardless of the wage. This is because of the particular utility
function we assumed in which leisure and consumer goods are equally valued.

We can also calculate the optimal consumption of consumer goods using


the optimal leisure decision.

V w 1 V
c∗ (w, p, V ) = + [T − ( + T )]
p p 2 w

Simplifying,

1V 1w
c∗ (w, p, V ) = + T
2p 2p

Check to see if this seems right. Consumption is increasing in non-labor


income. Consumption is decreasing in the price of consumer goods. Con-
sumption is also increasing in the wage rate because of an income effect.
Note that if there is no non-labor income, the individual can buy 12 wT
Wiswall, Labor Economics (Undergraduate), Lecture Notes 59

worth of consumer goods.

What is l∗ , h∗ , and c∗ with p = 2, V = 8, w = 4?

To find the exact values of l∗ , h∗ , and c∗ given these prices, simply substi-
tute the prices into each of our equations. Assume the individual has T = 16
total hours each day to devote to leisure or working (i.e. the individual sleeps
for 8 hours each day).

1 8
l∗ = ( + 16) = 9.
2 4

Check to make sure this is feasible: l∗ < T and 9 < 16.


(Note: It is possible that as non-labor income increases to very high
numbers (e.g. V = 1000), the optimal leisure decision will exceed the amount
of time available: l∗ > T . This indicates that the person is so wealthy that
she chooses to never work.)
Optimal labor supply is

h∗ = T − l∗ = 16 − 9 = 7.

Or using the equation we derived above.

1 1V 1 18
h∗ = T − = ∗ 16 − = 8 − 1 = 7.
2 2w 2 24

Finally, optimal consumption of consumer goods is


Wiswall, Labor Economics (Undergraduate), Lecture Notes 60

18 14
c∗ = + 16 = 2 + 16 = 18.
22 22

Is the Time Constraint Satisfied?

This time constraint must hold at the values of labor supply and leisure
demand we calculated.

T = h∗ + l∗

16 = 7 + 9

16 = 16

Is the Budget Constraint Satisfied?

Check to make sure the budget constraint is satisfied:

V + w(T − l) = pc

8 + 4(16 − 9) = 2 ∗ 18
Wiswall, Labor Economics (Undergraduate), Lecture Notes 61

36 = 36

Is the Tangency Condition Satisfied?

The tangency condition must hold at the values of l∗ and c∗ we calculated.

w
M RS = .
p

In this problem,

M Ul∗ c∗ 18
M RS = = ∗ = .
M Uc∗ l 9

The price ratio is

w 4
= .
p 2

Therefore this condition holds:

18 4
= .
9 2

Labor Supply Elasticity with Respect to Wages

The labor supply elasticity with respect to wages is defined as


Wiswall, Labor Economics (Undergraduate), Lecture Notes 62

∂h∗ (w, p, V ) w
γ= .
∂w h∗

For our example, the first part is

∂h∗ (w, p, V ) 1 V 1V
= − ∗ (−1) ∗ 2 =
∂w 2 w 2 w2

Substituting,

1V w 1V 1
γ= ∗
= .
2
2w h 2 w h∗

This is the labor supply elasticity expressed as a function (at the point
of optimal labor supply h∗ ).
Given our prices and non-labor income, p = 2, V = 8, and w = 4, and the
optimal labor supply value calculated above h∗ = 7, the elasticity of labor
supply is

181 1
γ= = .
247 7

At this point, labor supply is relatively inelastic.


Another way to calculate the labor supply elasticity is to substitute the
labor supply function:
Wiswall, Labor Economics (Undergraduate), Lecture Notes 63

1V 1
γ= ∗ 1 1V .
2 w 2T − 2w

Simplifying,

1V 1 V 1
γ= 2∗ V = ∗ V
2w T− w
w T− w

Substituting values,

8 1 1
γ= ∗ 8 =2∗ = 1/7.
4 16 − 4
14

Same value as above.

4.9 Labor Supply Curve

Like the labor demand curve, the labor supply curve is the sum of the indi-
vidual labor supplies.
PN
If each individual i has different preferences i=1 h∗i (w, p, V ). If all indi-
viduals are identical, then labor supply is simply: N ∗ h∗ (w, p, V )
As we have seen, several factors affect h∗ and the market supply of labor.

The three most basic conclusions:

1) Higher non-labor income leads to lower labor supply through an income


effect.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 64

2) If the substitution effect is larger than the income effect, higher wages
increase labor supply.

3) If the income effect is larger than the substitution effect, higher wages
decrease labor supply.

It may be the case that these two wage effects have varying strengths
over the range of possible wages. Figure 5 graphs a backward bending labor
supply curve. Over the range of wages from w = 0 to w = w0 , higher wages
are causing the workers to supply more labor. In this area, the substitution
effect is dominating the income effect. At wages w0 and higher, the income
effect is larger than the substitution effect. At this point, higher wages are
causing the worker to work less and consume more leisure.

4.10 Reservation Wages and the Decision to Work

The models examined thus far assume workers smoothly adjust their hours
of work in response to changes in wage rates or non-labor income. Here
we consider a model where workers can decide whether to work at all. The
basic model assumes that there is some reservation wage that the worker
would need to be offered before she works. If she is offered a wage below
her reservation wage, she refuses to work. If she is offered a wage above her
reservation wage, she works at that wage.
Call the reservation wage, w. If w > w, then the worker works. If w < w,
the worker works zero hours. If w = w, the worker is just indifferent between
Wiswall, Labor Economics (Undergraduate), Lecture Notes 65

working and not.


Figure 6 displays a reservation wage model. There are two budget lines
corresponding to different wages (remember that the slope of budget line
is −w/p). On the budget line corresponding to the reservation wage, the
indifference curve is tangent at the point of l = T . Here the worker is just
indifferent between working and not. At the lower budget line, with wage
w < w, there is no leisure consumption level, l < T , along this budget line
where the worker receives as much utility as she receives at l = T . Therefore,
it is not optimal for the worker to work at all if she is offered a wage less than
the reservation wage. At wages higher than reservation wage (these budget
lines are not drawn in Figure 6), it is optimal for the worker to work and
choose l < T .

4.11 Welfare Programs

4.11.1 Cash Grants

There are many variants of welfare programs. One basic welfare program
provides a cash grant to individuals who do not work. (To simplify the
model, we will assume the individual has the option of working. This model
would not apply to individuals who are disabled and cannot work.)
A cash grant for non-working individuals changes the budget line the
individual faces. Thus far, our budget lines have this form:
Wiswall, Labor Economics (Undergraduate), Lecture Notes 66

V w w
c= + T− l
p p p

For simplicity, assume that the individual has no non-labor income (V =


0).
Re-write the budget line:

w w
c= T − l.
p p

The welfare program rules state that if an individual does not work (l =
T ), then she receives M dollars. If she works even one hour, she receives no
welfare assistance (a take it or leave it program). With these welfare rules,
the budget line has two parts:

w w
c= T− l if l < T,
p p

and

M
c= if l = T.
p

Figure 7 graphs this form of budget line. The budget line is in bold
and has slope −w/p until l = T . At l = T , the budget line is a single
point c = M/p. In this graph, without the cash grant welfare program, the
Wiswall, Labor Economics (Undergraduate), Lecture Notes 67

individual choose to work T −l∗ hours and consume c∗ consumer goods. With
the welfare program, the individual receives higher utility by not working,
accepting the cash grant, and consuming c = M/P consumer goods.
In this model, the welfare program has reduced labor supply. The extent
to which a welfare program reduces labor supply depends on the size of the
cash payment and an individual’s preferences for leisure and consumption
(the shape of the indifference curve). As long as leisure is a normal good, it
is not possible for a welfare program to increase labor supply.

4.11.2 Earned Income Tax Credit

An alternative type of welfare program attempts to provide incentives for


working by increasing the effective wage for low income people. The Earned
Income Tax Credit (EITC) is one such program in the United States. This
program provides individuals a tax credit based on the income they earn
through working. One reason to prefer this type of welfare program over
a cash grant program is that it can provide both income assistance for low
wage workers and increase labor supply.
Here is a model of an EITC-like program. Workers pay no taxes, but
receive a wage subsidy per each hour they work. If the worker works at least
h = T − la hours, they receive a wage of (1 + α) times the market wage
w. For example, if the market wage is $10 and α is 0.1, the worker would
receive $11 per hour under the EITC. α can be negative as well if taxes are
decreasing the wage rate (e.g. α = −0.1 means the worker receives 0.9 dollars
Wiswall, Labor Economics (Undergraduate), Lecture Notes 68

for every 1 dollar in wages). If the worker works between h = T − la and


h = T − lb hours, then she receives the market wage. If the worker works less
than h = T − lb hours, she receives a wage of (1 + β) times the market wage.
Like a cash grant welfare program, our EITC-like program changes the
budget line. The usual budget line (with V = 0) is

w
c= (T − l).
p

Our hypothetical EITC rules transform the budget line into three sec-
tions.

w
c = (1 + α) (T − l) if 0 < l < la ,
p

and

w
c= (T − l) if 1a < l < lb ,
p

and

w
c = (1 + β) (T − l) if 1b < l < T.
p

This EITC-like program is graphed in Figure 8. With no EITC policy,


the individual consumes l∗ leisure.
Our EITC rules transform the budget line into a kinked budget line.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 69

Under the EITC rules, the individual increases their consumption of leisure
to l0 . This reduction in labor supply as result of the EITC like program
indicates that the income effect of higher wages is larger than the substitution
effect.
This is not the only outcome from an EITC program. The result that
EITC reduces labor supply is a direct result of the assumptions about the
utility function and the shape of indifference curves. Under different assump-
tions, an EITC-like program can increase labor supply or have no effect.
Whether the EITC reduces labor supply is an empirical question. Most
research has found that the EITC in the United States increases labor supply
and causes more people to enter the labor market. We will examine the
methodology underlying these findings later in the course.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 70

5 Equilibrium and Market Structure

5.1 Equilibrium in the Labor Market

In the overview of the labor market section, we briefly described an equi-


librium in the labor market. An equilibrium is reached when the labor de-
manded by firms is equal to the labor supplied by workers. This is the point
of intersection of the labor supply and labor demand curves. At this point, we
have an equilibrium number of labor hours employed in the economy h∗ and
an equilibrium wage rate w∗ . The equilibrium condition can be summarized
as

h∗s (w∗ ) = h∗d (w∗ ),

where h∗s (w∗ ) is the optimal labor supply (of all workers) at the wage w∗ ,
and h∗d (w∗ ) is the optimal labor demand (of all firms) at the wage w∗ .

5.2 Out of Equilibrium

Let’s consider what happens when the labor market is not at an equilibrium.
Consider two cases.

Case 1 Labor Surplus

The wage in the labor market is higher than the equilibrium wage w > w∗ .
At a wage of w > w∗ , the optimal labor supply is greater than the optimal
Wiswall, Labor Economics (Undergraduate), Lecture Notes 71

labor demand. There is a surplus of labor in the economy at this w:

h∗s (w) > h∗d (w).

At this wage w, there are more people willing to work than there are jobs.
Some of these people may be unemployed and others may be working fewer
hours than they would like to.

Case 2 Labor Shortage

The wage in the labor market is lower than the equilibrium wage w < w∗ .
At a wage of w < w∗ , the optimal labor supply is less than the optimal labor
demand. There is a shortage of labor in the economy at this w:

h∗s (w) < h∗d (w),

At this wage w, there are too few people willing to work. There are
some jobs which are not filled. Some jobs are filled, but the workers are not
willing to work more hours (e.g. overtime) at this lower than equilibrium
wage w < w∗ .

5.3 Reaching a Labor Market Equilibrium

How does the economy move from out of equilibrium (either a labor surplus
or shortage) to a labor market equilibrium?
Wiswall, Labor Economics (Undergraduate), Lecture Notes 72

Consider the two cases again.

Case 1 Labor Surplus

In this case the prevailing wage w is greater than the equilibrium wage
w∗ . With this excess supply of labor, workers are competing with each other
for a limited number of jobs. There is an excess number of workers lined up
outside a firm’s door. The process of reaching an equilibrium can be thought
of in two ways: 1) as the workers compete with each other, they offer to work
at a lower wage rates, or 2) the firm offers a lower wage rate until the excess
supply of labor disappears. That is, as the wage rate falls, some workers no
longer are willing to work at that lower wage. The wage rate w falls until
w = w∗ and the labor surplus disappears:

h∗s (w∗ ) = h∗d (w∗ ).

Case 2 Labor Shortage

In this case the prevailing wage w is less than equilibrium wage w∗ . With
this shortage of labor, firms are competing with each other for a limited
number of workers. There are a number of firms (employers) lined up outside
each worker’s door. The process of reaching an equilibrium can be thought
of in two ways: 1) as the firms compete with each other, they offer workers
higher wage rates, or 2) the workers ask for a higher wage rate until the
shortage of labor disappears. That is, as the wage rate rises, some firms no
Wiswall, Labor Economics (Undergraduate), Lecture Notes 73

longer want to employ labor at that higher wage. The wage rate w rises until
w = w∗ and the labor shortage disappears:

h∗s (w∗ ) = h∗d (w∗ ).

5.4 Arbitrage Across Labor Markets

What we described above is the process by which a single labor market


reaches an equilibrium point. Now, let’s consider what happens to the equi-
librium point in two separate labor markets.

5.4.1 Two Geographically Separated Labor Markets

Consider two geographically separated labor markets. One is the United


State labor market and the other is the Chinese labor market.

Chinese Labor Market

The supply of labor in China (from Chinese workers) and demand for
labor in China (from Chinese firms) yields an equilibrium wage rate in China

of wch and equilibrium number of labor hours in China of h∗ch .

United States Labor Market

Similarly, the supply of labor in United States (from US workers) and


demand for labor in the US (from US firms) yields an equilibrium wage rate

in the US of wus and equilibrium number of labor hours in the US of h∗us .
Wiswall, Labor Economics (Undergraduate), Lecture Notes 74

Assume (as is the case in reality) that the US equilibrium wage rate is
higher than the Chinese wage rate:

∗ ∗
wus > wch .

What do workers and firms do in each country?

Firms

Because wage rates are lower in China, US firms have an incentive to


move to China and produce their goods in China. As US firms move to
China, the labor demand for Chinese workers increases (a shift out of the
Chinese labor demand curve). This produces an increase in the equilibrium
Chinese wage rate.

Workers

Because wage rates are higher in the US, Chinese workers have an incen-
tive to emigrate to the United States and work in the US. As Chinese workers
emigrate to the US, the labor supply of US resident workers (immigrants and
natives) increases (a shift out of the US labor supply curve). This produces
a decrease in the equilibrium US wage rate.

Arbitrage

This movement of workers to higher wage labor markets (from China to


the US) and firms to lower wage labor markets (from US to China) causes
Wiswall, Labor Economics (Undergraduate), Lecture Notes 75

the equilibrium wage rates in the two countries to converge. This general
process is often called arbitrage. If the arbitrage is complete, the wage rate
in China and the wage rate in the US will be equal:

∗ ∗
wus = wch .

Although these two labor markets are geographically separated, the mo-
bility of firms and workers implies that the labor demand and supply in each
labor market interact and affect each country’s respective equilibrium wage
rate.

5.4.2 Arbitrage Across Skill Levels

Consider another example. There are two types of labor defined by the level
of skill. One type of labor is college educated labor provided by workers
with a college degree. The other type of labor is non-college educated labor
provided by workers who do not have a college degree. The two types of
labor work in the same geographic area (e.g. same city), but for each type
of labor there is a separate labor market.

Labor Market for College Education Workers

The supply of college educated labor (from college graduates) and the
demand for college educated labor (from firms that employ college educated
labor) yields an equilibrium wage rate for college educated labor of wc∗ and
Wiswall, Labor Economics (Undergraduate), Lecture Notes 76

equilibrium number of college educated labor hours h∗c .

Labor Market for Non-College Educated Workers

The supply of non-college educated labor (from workers who do not have
a college degree) and the demand for non-college educated labor (from firms
that employ non-college educated labor) yields an equilibrium wage rate for

non-college educated labor of wnc and equilibrium number of non-college
educated labor hours h∗nc .
Assume (as is the case in reality) that the equilibrium wage rate for college
educated labor is higher than the wage rate for non-college educated labor:

wc∗ > wnc



.

What do workers and firms in each labor market do?

Firms

Because college educated labor is more expensive relative to non-college


educated labor, firms have an incentive to substitute non-college educated
labor for college education labor. This reduces the demand for college edu-
cated labor (a shift in of the demand curve in this market) and increases the
demand for non-college educated labor (a shift out of the demand curve in
this market). This shift in demand, decreases the college educated wage rate
and increases the non-college educated wage rate.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 77

Workers

Because the wage rate is higher for college educated workers, there is an
incentive for non-college educated individuals to attend college and become
college educated workers (i.e. enter the college educated labor market). This
shifts the supply curve of college educated workers out and shifts in the supply
curve of non-college educated workers. This shift in of supply decreases the
college educated wage rate and increases the non-college educated wage rate.

Arbitrage

This movement of workers to higher wage occupations (from non-educated


to educated) and firms to lower wage labor (from educated to non-educated)
causes the equilibrium wage rates in the two labor markets to converge. If
the arbitrage is complete, the wage rate for both types of labor will be equal:

wc∗ = wnc

.

5.5 Labor Market Frictions

In reality, the wage rate has not fully converged in either example. The wage
rate in China and the United States is not equal and college graduates are
paid more than non-college graduates.
Labor market frictions are the explanations for the failure of wage rates
to converge across all labor markets. A labor market friction is some cost
Wiswall, Labor Economics (Undergraduate), Lecture Notes 78

to arbitrage across labor markets. If there are no labor market frictions, we


would expect the wage rate to be the same across all labor markets.

Types of Labor Market Frictions:

1) Mobility Costs

These are the costs of physically moving production to a new location


(the costs of moving a US firm to China) or the costs of workers moving to
a new labor market (the costs for Chinese workers emigrating to the US).
For some firms, such as those in manufacturing, the cost of moving to a new
labor market is relatively small. For other firms, such as firms in service
industries, these costs are relatively larger. It would be prohibitively costly
for my local grocery store to move to China, use cheaper Chinese labor, and
ship my groceries back to me in the US. However, we have recently seen that
some services, such as telemarketing or phone help, are being “out-sourced”
to countries like India where labor costs are lower.

2) Search Costs

These are the costs borne by firms in finding new workers or the costs
borne by workers in finding new employers. There are some costs incurred
during the process of searching for a good match between employers and
employees. For workers, search costs can include the costs of walking around
and interviewing at different jobs. For firms, the costs can include the costs
Wiswall, Labor Economics (Undergraduate), Lecture Notes 79

of using employment agencies, headhunters, and interviewing and screening


job applicants.

3) Cost of Human Capital Investment

Both firms and workers may need to pay for the cost of new human capital
as firms and workers enter new labor markets. For example, US firms moving
to China may have to invest in training their new Chinese workers. Chinese
workers moving to the US will incur some costs in learning English or other
new skills. Workers who want to move from the non-college educated labor
market to the college educated labor market incur the costs of time and
tuition in obtaining a college degree.
Some forms of human capital are impossible to obtain because the human
capital stems from skills or talents an individual is born with. The costs for
workers to acquire these skills if they are not born with them is prohibitively
high. For example, professional basketball players earn much more than I
do and I would like to move into their labor market. However, I was not
born 7 feet tall or with any athletic ability. The cost for me to obtain these
types of human capital and enter the professional basketball labor market is
essentially infinity.

4) Institutional Rules

There may also be a number of institutional rules which create labor mar-
ket frictions and increase the cost of arbitrage across labor markets. Many
Wiswall, Labor Economics (Undergraduate), Lecture Notes 80

governments have explicit rules that impose costs on firms looking to move
to new labor markets or hire or fire workers. If there is a government rule
that states a firm must pay a severance package to laid off workers or con-
tribute to an unemployment insurance fund, this rule imposes costs on firms
who would like to replace their current workers or move to another labor
markets. In labor contracts negotiated by unions, restrictions may be placed
on the extent to which a firm can lay off workers or hire new workers.

5.6 Market Structure

Frictions in the labor market change the structure of labor markets. A labor
market without frictions, in which firms and workers can costlessly enter the
labor market, are called competitive labor markets. These markets are often
characterized by a a large number of firms or workers in the market. A
non-competitive market, on the other hand, may have fewer firms or workers
because of the presence of some labor market friction that imposes costs to
entry into the market.

Types of Labor Market Structures

Competitive Labor Market

These types of markets are typically characterized by a large number of


firms (employers) and workers. The firms have a large pool of workers to
choose from. And the workers have a large number of employers to choose
Wiswall, Labor Economics (Undergraduate), Lecture Notes 81

from. This creates competition for jobs among workers and competition
among firms for workers. No firm or worker has any market power and
therefore cannot influence the market price of labor, the wage rate.

Monospony in the Labor Market

If there is only one firm in a labor market, this firm is called a monop-
sonist. The one firm is the one employer or consumer of labor. A monop-
sonist could be the one employer in a geographic area (e.g. the one company
town). Or, the monopsonist firm could be the one employer of a particular
occupation or set of skills (e.g. Major League Baseball is the only American
employer in the market for professional baseball players). In addition, groups
of independent firms may join together in the form of a cartel to act jointly
as a monopsonist employer.
Typically, monopsonist firms employ less labor (lower labor demand) than
a competitive firm. The lower labor demand pushes the equilibrium wage rate
in a labor market with a monopsonist firm lower than the equilibrium wage
rate in a competitive labor market. In addition, monopsonist firms typically
earn higher profits than firms in a competitive labor market. To maintain the
market power of the monopsonist, there must be some sort of labor market
friction which prevents other firms from entering this labor market.

Monopoly in the Labor Market

If there is only one worker in a labor market, the one worker is a mo-
nopolist in this labor market. Although true monopoly labor markets are
Wiswall, Labor Economics (Undergraduate), Lecture Notes 82

probably non-existent, many workers enjoy some level of market power in


their labor market. An example would be entertainers or athletes for which
there are only limited or imperfect substitutes.
Typically, monopolist workers restrict the amount of their labor supply.
Just like a monopolist firm in the product market, this supply restriction
increases the equilibrium wage monopolist worker receives relative to a com-
petitive labor market. If there were many good substitutes for certain enter-
tainers or athletes, their market power would dissipate and the wages they
receive would fall. There must be some sort of labor market friction that
prevents individuals from entering these markets and capturing the rents the
monopolist workers receive. For example, my lack of talent prevents my entry
into the basketball labor market.

A Bilateral Monopoly Problem

If there is both a single monopsonist firm and a single monopolist worker,


the market structure is called a bilateral monopoly. In this labor market,
both the firm and the worker have market power. How the equilibrium wage
rates and labor hours are determined in this market is an open question.

Monopoly in the Product Market

If there is a monopoly in the product market, there is only one firm that
produces a particular product. This does not imply anything necessarily
about the structure of labor market the monopolist firm is in. Classic exam-
ples of firms with substantial market power in the product market are the
Wiswall, Labor Economics (Undergraduate), Lecture Notes 83

OPEC (Organization of Petroleum Exporting Countries) cartel, which is a


cartel of oil produces, and the firm DeBeers, which owns most of the world’s
supply of diamonds.
Monopolist firms restrict output and raise output prices relative to firms
in a competitive product market. These monopolist firms often earn higher
profits than firms in competitive output markets. An interesting question
for the labor market is who gets to keep these rents: labor or capital. The
monopolist firm could return at least part of these rents to the workers in form
of higher wages or give them to the owners of the capital (the firm’s owners).
Because of the presence of these rents, the employees of monopolist firms
may have a greater opportunity to extract higher wages from monopolist
firms than from firms in competitive product markets.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 84

6 Unions

6.1 Basic Facts about Unions

Unions are collections of workers. There are three main activities of unions:
collective bargaining with employers over employment contracts, providing
social services to union members (e.g. job training), and political lobbying.
About 13 percent of workers in the United States belong to unions. This
is a decline from a peak of about 25 percent in the 1950-70s period. In
Western Europe, far more workers belong to unions, and the unions there
are more involved in politics. The main focus of unions in the United States
is collective bargaining. Most union members are blue collar workers in the
construction, manufacturing, and transportation industries. More recently,
unions have increased their membership among public sector/government
employees: police and firemen, teachers, postal workers. About 40 percent
of public sector employees are union members.
Unions are formed after a majority of workers at firm vote in an election to
certify a particular union as their collective bargaining representative. This
gives the union the sole right to bargain over labor contracts. The union
negotiated contracts apply to all workers, regardless of their individual union
membership. In some states, all workers in a unionized firm are required to
join the union. In 22 states,“right-to-work” legislation allows workers to
work at a unionized firm without joining the union. The non-union workers
in unionized firms have varying rights to negotiate their own labor contracts
Wiswall, Labor Economics (Undergraduate), Lecture Notes 85

separately with the employer.

6.2 What Do Firms and Unions Bargain Over?

Unions and firms can potentially bargain over all aspects of the employment
contract, including

1) Wage rates: starting salaries, criteria for salary increases (e.g. cost of
living adjustments), overtime pay.

2) Benefits: pension and health insurance.

3) Rules regarding hiring, both the numbers of new hires and the selection
criteria for hiring.

4) Rules regarding promotion within the firm.

5) Rules regarding firing and layoffs; what are grounds for dismissal; and
who gets laid off first.

6) Level and types of training workers receive from the firm.

7) Work and safety conditions.

6.3 Collective Bargaining

The process of collective bargaining over labor contracts can take several
forms.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 86

i) Agreement

The employer and the union representatives come to an agreement re-


garding the labor contract. Depending on the union rules, the labor contract
may need to be approved by a vote of the union membership.

ii) Mediation

If the employer and the union representatives cannot come to an agree-


ment, some sort of mediation may be used. This can take the form of formal-
ized arbitration in which an independent, and hopefully objective, individual
(an arbitrator) helps the employer and union representation reach a com-
promise labor contract. Prior to the start of negotiations, the employer and
union representative may agree to some form of arbitration which will take
place if the parties cannot come to an agreement on their own. If the em-
ployer and union agree to binding arbitration, the ruling of the arbitrator
must be abided by the employer and union.

iii) Strikes and Lockouts

If an agreement on a labor contract cannot be reached, the workers may


strike and withhold all labor services. Likewise, the employer can lockout
the workers, shut down production, and withhold all employment. Often,
workers are simultaneously on strike and are locked out by employers. The
difference is trivial.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 87

Both sides lose from either a strike or lockout. Workers lose wages and
firms lose profits. Strikes and lockouts can also be quite costly to society if
production of some valuable good or service is stopped (e.g. policeman on
strike).
The threat of a strike or lockout is used as a negotiating tactic to secure
a better contract for one of the sides. Although strikes and lockouts often
receive considerable media attention (e.g. hockey or baseball players strikes),
very few labor disputes end in a strike or lockout. This is likely due to the
considerable cost to both sides from a strike or lockout. In addition, many
states prohibit public sector employees from striking and instead force some
sort of binding arbitration.

6.4 Rents and Unions

There is a connection between the product market structure and the effec-
tiveness of unions in securing better labor contracts. As we discussed in a
previous lecture, firms earn higher profits or rents in non-competitive product
markets (e.g. a monopolist firm). Through collective bargaining, a union-
ized workforce may be able to extract some of these rents in the form of
higher wages, benefits, or more jobs. In more competitive labor markets,
competitive pressures may prevent a firm from allowing greater concessions
to workers. In a highly competitive product market, where profits are near
zero, a firm which pays its workers more than other firms may end up with
negative profits and be forced to shut down. Therefore, we would expect
Wiswall, Labor Economics (Undergraduate), Lecture Notes 88

that unions would be more effective and have higher membership in less
competitive industries.

An Example

One issue facing car manufacturers in the United States is the high cost of
American labor (high wages and especially high pension and health care ben-
efits) relative to foreign labor (e.g. relative to Japan, where national health
insurance effectively subsidizes health care insurance costs for Japanese car
manufacturers). In the period prior to the 1980s, where there was less foreign
competition in the automobile industry, unions representing American auto
workers were able to secure labor contracts with high wages and benefits.
Now, with more competitive pressure on the automobile industry, there is
greater resistance from American automobile firms to union demands. In
particular, American automobile firms argue that they now need to reduce
labor costs in order to compete with foreign manufacturers.

6.5 What Do Unions Want?

The objective of unions can vary according to their leadership and member-
ship.
Some potential objectives:

1) Some unions may favor higher wages and benefits over higher employment.

2) Other unions may favor spreading out jobs and work hours among a large
Wiswall, Labor Economics (Undergraduate), Lecture Notes 89

number of employees in exchange for lower wages.

3) Still other unions may sacrifice health and pension benefits to avoid layoffs.

4) Other unions value compressing the wage structure to ensure that all
workers, regardless of seniority or skill, receive similar wages.

6.6 Unions in the Public Sector

In the public sector, budgets for government services are set through some
political process. Unlike private firms that compete with other firms, public
sector firms (e.g. New York City) generally set their budgets with only
limited competitive pressures.
In this environment, unions representing public sector workers (e.g. teach-
ers, policeman, fireman) have two basic ways to increase the benefits to their
workers:

1) Increase Total Budget

The union can lobby to increase the total government budget. If the share
of labor costs in the total budget remains constant, an increase in the total
budget would increase resources given to workers (through wages, benefits,
or employment).

2) Shift Resources to Labor

The total budget remains the same, but the union can lobby to increase
the share of the budget that is devoted to labor. This diverts resources from
Wiswall, Labor Economics (Undergraduate), Lecture Notes 90

other uses to pay labor costs. For example, a union representing teachers
could lobby for resources to be shifted from school building maintenance or
administrator pay to pay for higher teacher salaries or benefits.

Which of the two options public sector unions choose to pursue may have
different implications. If it is thought that a particular public service is un-
derfunded (e.g. not enough money for public schools), then increasing the
total budget for this service would be beneficial. However, shifting resources
from one public service input to another (e.g. from school building mainte-
nance to teacher salaries) may be a mis-allocation of resources. It is unclear
which inputs are most productive in improving the efficiency of the public
service. For example, would it improve student learning more if we increased
school building maintenance or teacher salaries?
Wiswall, Labor Economics (Undergraduate), Lecture Notes 91

7 Employment Contracting and Personnel Eco-

nomics

7.1 Principal Agent Problems in Labor Markets

In the model presented so far, the labor market involves a simple transaction.
Workers sell an hour of their time for an hourly wage. In this model, the mar-
ket exchange is not very different from a farmer selling a bushel of wheat.
A more realistic model of the labor market would recognize that the em-
ployment relationship is more complex. The primary factor influencing the
complexity of labor market arrangements is the presence of principal-agent
problems.
Principal-agent problems is a term used in economics for conflicts that
arise between two economic actors. In the case of labor markets, the principal
is the owner or owners of the firm. The agent are the workers that the firm
employees. (Note: There could be multiple principal-agent problems within
a firm as managers are agents for owners and the managers are also the
principal for the workers they supervise.)
The root of the principal-agent problem is that the principal and agent
have different objectives. The firm (principal) wants the worker (agent) to
work as hard as possible at her job, but workers prefer lower effort.
There are four basic characteristics of principal agent problems in a labor
market.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 92

1) Workers can decide to provide various amounts of effort on their jobs

This assumption implies that the worker is not selling the firm a homoge-
nous good (a labor hour). A worker who is not working hard is often referred
to as shirking.

2) Effort is not costless for workers

Workers do not like effort and would prefer low effort. If effort were
costless, then there would be no principal-agent problem, as all workers would
provide the maximum effort.

3) A worker’s effort is not perfectly observed by the firm

This assumption often arises because there is some monitoring cost to


observing exactly how much effort a worker is providing. For example, it
would be prohibitively costly for the firm’s owners to stand next to the worker
all day long in order to assess how hard the employee is working.
Even if there are no monitoring costs, there may be other factors that
affect a worker’s observed output which make it difficult for employers to
measure an individual worker’s effort. If workers work in teams, separating
one worker’s contribution from another worker’s is often not possible. If
the effort of teams of workers cannot be separated, workers would have the
incentive to free ride on the contributions of others by reducing their effort.

4) Workers are not full claimants on their effort


Wiswall, Labor Economics (Undergraduate), Lecture Notes 93

This assumption is really the definition of an agent. The worker receives


pay from the firm’s owners, but does not necessarily receive all of the firm’s
profits (or losses) that derive from the worker’s effort. A worker who is a
full claimant on her effort would be self-employed, by definition. Various em-
ployment contracts have the goal of making workers nearly full claimants on
their effort. Another way to express this is that many employment contracts
try to align as closely as possible the objectives of the principal and agent
by making.

7.2 Piece Rates

A piece rate employment contract bases a worker’s pay on their output or


“piece”. In this system, workers are not paid for their time (either by the
hour or a monthly salary). An example is paying a textile worker based on
how many shirts she sews together.

7.2.1 A Model of Piece Rates

A piece rate wage takes the following form

w = α + βq,

where w is the worker’s pay and q is the worker’s output. α and β are
parameters that determine how much the worker gets paid for each unit of
output. Note that α indicates how much the worker gets if she produces
Wiswall, Labor Economics (Undergraduate), Lecture Notes 94

nothing (q = 0). Here w is not a wage per hour, but the total pay the worker
receives.

Main question: What is the optimal α and β which will maximize the
firm’s profits?
To answer this, we need to make a few more modeling assumptions.

Output

The output of worker depends on the worker’s choice of effort (e).

q=e

Worker’s Problem

There is some cost to effort, c(e), with c0 (e) > 0. The utility function for
the worker is simple. Utility is increasing in wages and decreasing in effort.
The worker chooses the level of effort to maximize her utility. The worker’s
utility maximization problem is

max w − c(e)
e

max α + βq − c(e).
e

Substitute for q:
Wiswall, Labor Economics (Undergraduate), Lecture Notes 95

max [α + β(e) − c(e)].


e

The first order condition for this problem is

β = c0 (e).

β is the marginal benefit of additional effort. c0 (e) is the marginal cost


of additional effort. Note that α doesn’t matter because the worker gets α
even with no output or effort.

Firm’s Problem
The firm chooses α and β to maximize profits. The constraint on the
firm’s problem is that these values of α and β must be such that the worker
will choose to work. The total benefit of working (w) for the worker must
exceed the total cost (c(e)). The constraint is

w ≥ c(e)

Or

α + βe ≥ c(e)

If α + βe < c(e), then the worker quits and does not work at all.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 96

The firm’s maximization problem is

max q − w s.t. α + βe ≥ c(e),


α,β

where the price of output is normalized to 1, so total revenue is just q.


It is optimal for the firm to pay no more than the cost of effort. Therefore,
the constraint becomes

α + βe = c(e).

Substituting the constraint into the objective function, the maximization


problem is then

max e − c(e).
α,β

The first order conditions are based on how the level of effort responds
to changes in the wage parameters, α and β.

1) From the worker’s problem, we saw that the worker’s effort decision
does not depend on α

∂e
=0
∂α

2) For β,
Wiswall, Labor Economics (Undergraduate), Lecture Notes 97

∂e ∂e
− c0 (e) = 0.
∂β ∂β

Re-arrange

∂e
[1 − c0 (e)] = 0.
∂β

The optimal β is for c0 (e) = 1. From the worker’s problem, we know that
c0 (e) = β. Thus the optimal β is β = 1.
To solve for α, we substitute the optimal β into the constraint: α + βe =
c(e).
Substituting this becomes

α + e = c(e).

The optimal level of α is c(e) − e.

7.2.2 What Does this Mean?

The solution to this problem is for the firm to provide the worker the full
benefit of her effort and output. By tying compensation directly to output,
the firm (principal) has made the worker (agent) the full claimant on her
output. To see this note that with β = 1 and α = c(e) − e, the worker gets
paid
Wiswall, Labor Economics (Undergraduate), Lecture Notes 98

w = c(e) − e + q.

The wage is

w = c(e) − e + q = c(e),

since q = e.
With the optimal β and α, the worker is just indifferent between working
and not. The total benefit of working is w which is set equal to the total
cost of working c(e).
One way to interpret this payment scheme is that the firm is renting the
means of production to the worker for the price of −α. The firm then allows
the worker to collect all revenue from selling the product.
Some employment relationships are actually structured in exactly this
way. Taxicab drivers often have an arrangement with their taxicab firms to
rent the cab from the taxicab company. All the collected fares are kept by
the taxi cab driver. The alternative contract is for the taxicab company to
pay its drivers an hourly wage and then require the drivers to return all fares
to them at the end of their shift.
The piece rate employment scheme where drivers rent their cabs and keep
their fares is preferred for two reasons.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 99

1) By allowing the drivers to keep all fares (making them full claimants
on their effort), the taxicab company ensures the that drivers work as hard
as possible rather than snoozing on the side of the road.
2) Without allowing the drivers to keep all fares, the drivers would have
an incentive to hide some fares and negotiate a separate, “off the meter,”
deal with passengers. This is a type of monitoring problem.

7.2.3 Problems with Piece Rates

Piece rates are not used for all employment contracts. Some reasons why
piece rates may not work:

1) It is difficult to measure output

See discussion of monitoring costs above. How do we measure the output


of managers, for example?

2) Risk aversion

Making the worker the full claimant on her effort exposes the worker to
risk. Any factor that reduces output leads immediately to losses in wages for
the worker (e.g. a snowstorm stops all taxi cab traffic). To the extent that
workers are risk averse, they may require higher wages to compensate them
for these risks. Firms, because they are larger, may be better able to deal
with risk than individual workers.

3) Quantity/Quality Tradeoff
Wiswall, Labor Economics (Undergraduate), Lecture Notes 100

Piece rate wages encourage workers to focus on maximizing the current


quantity of their output. They may sacrifice investments in maintaining the
quality of their capital equipment (e.g. maintenance of machinery) in favor
of increasing the quantity of output. For example, taxicab drivers may drive
their cabs very hard and neglect maintenance. This is a problem to the
extent that quality is more difficult to measure than quantity. Otherwise the
firm could attach piece rate type wages to quality as well. In general, there
is a problem of providing workers incentives for one aspect of production if
there are multiple production tasks.

7.3 Bonuses and Profit Sharing

Bonus and profit sharing share a similar structure with piece rates. The
intention of these compensation schemes is to provide incentives for worker
effort by basing pay on some measure of performance.
Bonuses are given to workers who meet either objective standards (e.g.
sales targets) or subjective evaluations (e.g. a supervisor’s evaluation). For
example, a bonus scheme could make the wage a function of sales:

w = α + βsales

Profit sharing ties compensation directly to the profits of the firm (π):

w = α + βπ
Wiswall, Labor Economics (Undergraduate), Lecture Notes 101

This type of compensation scheme is found mainly among top managers


of companies. One way to accomplish a profit sharing arrangement is for the
firm to pay part of a worker’s compensation in the form of ownership shares
of the firm (e.g stocks or the option to purchase stocks, “stock options”).
The problems with these compensation schemes are similar to those with
piece rates. First, it may be difficult to measure a worker’s entire performance
based on a few objective or subjective measures. This is especially true when
workers work in teams. Second, basing compensation strictly on profits or
sales can cause workers to neglect other tasks. Some profit sharing plans
where compensation is tied to a firm’s stock price may encourage managers
to focus their effort on increasing the stock price and neglect the long term
productivity of the firm.

7.4 Tournaments

Another compensation scheme designed to increase worker effort are tour-


naments. Some firms base compensation in the same way that sports tour-
naments are set-up. Workers start at entry level positions. If the workers
perform relatively better than other entry level workers, they are promoted
to a higher paying position. This competition continues until the worker
reaches the top position in the firm. In principle, this is the same way a
basketball tournament is structured. In the playoffs, basketball teams play
a game and the winner moves on to play another game until only one team
remains.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 102

Notice two crucial differences between this type of compensation scheme


and a piece rate, bonus, or profit sharing scheme.

1) In the tournament type of compensation system, absolute productivity or


performance does not matter. For the worker to be promoted, she only needs
to be better than her competition, her co-workers. Only relative performance
matters.

2) Unlike the other compensation systems, in the tournament scheme the


reward for higher effort now may not arrive until later when the individual
is promoted.

Tournaments have been used as one explanation for the highly unbalanced
compensation structure within firms. The highest paid employees within a
firm (e.g. CEOs) are often paid many, many times more than entry level
employees. The very high compensation for top managers may not reflect
their productivity, but instead may exist to motivate lower level employees.
The potential high pay for promotion in effect produces a tournament “prize,”
which encourages higher levels of effort among employees competing for this
prize.
The potential pitfalls of this type of compensation is that it discourages
co-workers to collaborate, even if collaboration produces higher output. In
fact, because tournaments value only relative performance, there are incen-
tives to sabotage the performance of fellow employees. In these cases, the
competition engendered by tournaments may be counter-productive. There-
Wiswall, Labor Economics (Undergraduate), Lecture Notes 103

fore, tournament schemes are often used by firms where employees act in-
dependently, such as independent salesman responsible for separate sales
territories.

7.5 Deferred Compensation

Another compensation scheme changes how young workers (entry level) are
paid relative to older workers (supervisors or managers). In the deferred
compensation system, younger workers are paid less than their full marginal
revenue product. This compensation is deferred until later. Older workers
are paid more than their marginal revenue product.
The rationale for this scheme is that younger workers will increase their
effort to avoid being fired. If a younger worker provides low effort, she may
be fired and lose her future deferred compensation.
The problem with this type of compensation contract is that the firm
has an incentive to fire older workers before they can collect their deferred
compensation. If young workers know this, then they will not increase their
effort. This problem can be circumvented with strict rules about firings and
layoffs, which prevent the firm from arbitrarily eliminating expensive older
workers. In addition, if a firm wants to maintain a reputation as a good
employer in order to attract high quality workers, they will avoid reneging
on deferred compensation agreements.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 104

7.6 Efficiency Wages

When firms pay their own workers more than worker could receive at other
firms, these wages are called efficiency wages. The rationale for efficiency
wages is that a firm wants to make itself the most desirable employer. Work-
ers want to provide a high level of effort because they do not want to be fired
from their current, more desirable, firm. If every other firms provided the
same wage as a worker’s current firm, the costs of being fired are low and
the worker may provide low effort.

An Efficiency Wage Model

The worker can either provide an effort of e = 0 (low effort) or e = 1 (high


effort). The cost of high effort is c. The cost of low effort is 0. If the worker
choose effort of e = 0, she is fired with probability p. With probability 1 − p
she is not fired. If the worker chooses e = 1, she is never fired. p is assumed
to be less than 1. This implies that the firm cannot perfectly monitor the
worker’s effort and fire all the low effort employees.

All other firms offer the worker a wage of w. This is the worker’s outside
option. If she is fired, she receives w from an outside firm.

We want to show that it is optimal for the worker’s firm to provide an


efficiency wage of w∗ > w.

Worker’s Problem
Wiswall, Labor Economics (Undergraduate), Lecture Notes 105

If the worker chooses high effort, she receives

u(e = 1) = w∗ − c.

If the worker chooses low effort, she receives

u(e = 0) = pw + (1 − p)w∗ .

The worker will choose high effort if

w∗ − c ≥ pw + (1 − p)w∗ .

Simplifying,

c
w∗ ≥ w + .
p

Firm’s Problem
Assume the firm sets the efficiency wage at the minimum level to induce
high effort. The firm sets the wage at

c
w∗ = w + .
p

Is the firm receiving higher profits with this efficiency wage or with a
lower wage w?
Wiswall, Labor Economics (Undergraduate), Lecture Notes 106

Assume that if the worker provides high effort, the firm produces q ∗ . If
the worker provides low effort, the firm produces q < q ∗ .
Profits for the firm if the firm pays efficiency wages are

π ∗ = q ∗ − w∗ .

Profits for the firm if the firm pays w are

π = q − w.

The firm pays the efficiency wage if

q ∗ − w∗ > q − w.

Or

q ∗ − q > w − w∗ .

This implies that as long as the benefit in output due to higher worker
effort (q ∗ − q) is greater than the cost of higher wages (w − w∗ ), the firm will
pay the efficiency wage.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 107

8 Compensating Differentials

The theory of compensating differentials states that workers are paid to com-
pensate them for non-wage (non-pecuniary) aspects of jobs. Undesirable
jobs, such as jobs with high risk of injury (e.g. policeman), or jobs with
poor working conditions (e.g. coal miner), must offer higher wages in order
to get people to work in these jobs. On the other hand, jobs with desirable
characteristics, (e.g. good location, more benefits, or nice office furniture)
can offer lower wages and still attract workers.

8.1 A Model of Compensating Differentials

To think about a compensating differentials model, re-write our labor supply


utility function to include a Z variable. Z represents the non-pecuniary
characteristics of a job. The utility function is then U (c, l, Z). Z is a desirable
characteristic. Higher levels of Z increase the worker’s utility

∂U (c, l, Z)
>0
∂Z

Suppose there are two firms. Firm A offers a wage wA and Firm B offers
a wage of wB . Both firms offer a package of non-pecuniary benefits: ZA and
ZB .
We make the following assumptions:

1) Firm A offers a higher level of non-pecuniary benefits: ZA > ZB .


Wiswall, Labor Economics (Undergraduate), Lecture Notes 108

2) The wage offers are the same: wA = wB .


Under these assumption, all workers will receive higher utility from work-
ing at Firm A. Therefore, every worker will choose to work at Firm A.
In order for Firm B to compete and attract workers, it must compensate
the workers for the higher utility workers receive by working at Firm A.
Suppose Firm B cannot adjust its level of non-pecuniary benefits (e.g. the
firm is located in a bad location and cannot move). But Firm B can adjust
its wage offer. In order to make workers just indifferent between Firm A and
Firm B, Firm B must increase its wage offer.
To calculate the amount Firm B needs to increase its wages, first we need
to calculate the indirect utility workers receive from a firm as a function of
wages and non-pecuniary benefits. Let V (ZA , wA ) be the amount of utility
workers receive from working at Firm A. Workers receive V (ZB , wB ) from
Firm B. The difference in utility is then

V (ZA , wA ) − V (ZB , wB ) > 0.

Firm B calculates that it needs to raise wages to w


gB > wB in order to

equalize utility:

V (ZA , wA ) = V (ZB , w
gB ) = 0.

Note that the wage offer Firm B offers is now greater than the wage offer
Wiswall, Labor Economics (Undergraduate), Lecture Notes 109

at Firm A: w
gB > wA . The difference in the wage offers, w
gB − wA , is the

compensating differential for the difference in non-pecuniary characteristics,


ZA − Z B .

8.2 Using Compensating Differentials

Compensating differentials help to explain differences in wages across jobs


and firms. Another use of compensating differentials is to enable economists
to “price” characteristics that have no readily available prices. We could ask
people how much they value characteristics of jobs. For example, we could
survey people and ask them how much they would pay for their firm to move
to a more desirable location. But it may be the case that this information
would not be as accurate as using the actual observed behavior of people in
the market.
Consider two examples.

1) What is the value of a human life?

Two firms differ in the risk of death on the job. An example would be
two coal mines and one of the mines has a higher risk of fatal accidents. At
Firm A, the risk of death is pA . In one year, a worker has pA probability of
dying. The riskier mine, Firm B, has a risk of death of pB > pA . Firm A
pays an annual wage of wA . Firm B pays a wage of wB > WA .
If we think that these wage differentials are compensation for the higher
risk of death in Firm B, then we can use these differentials to calculate a
Wiswall, Labor Economics (Undergraduate), Lecture Notes 110

worker’s value of her life. Workers are willing to trade wB − wA dollars for a
pB − pA higher probability of death.

Some Numbers
Assume that the probability of death is 0.001 greater in Firm B.

pB = pA + 0.001

The annual salary differential is

wB = wA + $6, 600

Assume each firm employs 1,000 workers. The difference in the proba-
bility of death between the two firms implies that in a given year, one more
worker will die in Firm B than in Firm A. Each worker in Firm B requires
$6,600 in additional wages each year to compensate them for this risk. If
there are 1,000 workers, Firm B is essentially “buying” one human life for
1,000 x $6,600 or 6.6 million dollars.

2) What is the value of school quality?

This is a non-labor market example. Some researchers have used differ-


ences in house prices to examine how much parents value differences in school
quality.
Suppose here are two houses on opposite sides of the street: House A
Wiswall, Labor Economics (Undergraduate), Lecture Notes 111

and House B. The street forms a boundary between two school districts.
The children from House A attend School A and the children from House B
attend School B. The two schools have different levels of quality, measured
by the average test scores of the students who attend the schools. The test
scores for School A are TA and are greater than the test scores in School B:
TA > TB . The price of House A is PA is higher than the price of House B:
PA > PB . A compensating differentials model would indicate that we can
use the difference in the prices of the houses, PA − PB , to measure how much
parents value the difference in test scores, TA − TB .

8.3 Problems with Compensating Differentials

1) People must know the actual differences.

The compensating differentials model assumes that the decision makers


know what the actual differences in characteristics are. The miners must
know what the difference in the risk of death is between the two mines.
The home owners must know the difference in test scores between the two
schools. If decision makers (workers, homeowners) are acting on inaccurate
information, then the compensating differential we observe in wages and
home prices are meaningless.

2) Must control for all other differences.

Each of these examples is extremely simple. In reality, using compen-


Wiswall, Labor Economics (Undergraduate), Lecture Notes 112

sating differentials requires controlling for all other characteristics of job or


good. Ideally, the compensating differentials model only applies to situations
where the job or goods being compared are exactly the same, except for the
one characteristic we are interested in. For example, mines could differ in the
risk of death and in other factor such as life insurance benefits. The riskier
mine could offer its workers more life insurance benefits than the other mine.
The two houses could be very different from each other–one could have 2,000
square feet and the other only 1,000 square feet. If we do not take these
other factors into account in some way, the results from a compensating
differentials analysis are biased.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 113

9 Human Capital

9.1 Human Capital Overview

Human capital is capital embodied in people. Like physical capital (e.g. ma-
chines, tools), human capital also makes labor more productive. In general,
individuals with more human capital receive higher wages. The level of hu-
man capital in the population is a major reasons why wages differ in the
population.
The major sources of human capital are formal schooling, on-the-job
training, and experience. However, there are many other types of human
capital investments. Investments in health can be considered investments in
human capital as healthy people are more productive. Another major source
of human capital are the abilities and talents individuals are born with. The
time and resources that parents spend caring for and raising their children
can also be considered investments in their children’s human capital.
The level of human capital investments in the population varies widely.
We will initially focus on schooling because it is a large source of human
capital and is relatively easy to measure. Today, about 25 percent of the
United States population (25 and older) have earned at least a college degree.
Less than 20 percent of the adult population has not finished high school.
Why individuals make different schooling decisions and the implications of
these choices for the labor market are the topics of this section.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 114

9.2 Human Capital and Productivity

To bridge the labor demand model we discussed earlier with this section on
human capital, let’s assume the productivity of a worker is a function of
human capital. Human capital can come from many different sources, but
for simplicity, we can summarize the level of an individual’s human capital by
the variable S. Write the marginal product of each labor hour as a function
of human capital: M Ph (S).
Because wages depend on the labor productivity, wages are now a function
of human capital:

w(S) = p ∗ M Ph (S).

If we assume that more human capital makes workers more productive,

∂M Ph (S)
> 0,
∂S

wages are therefore increasing in human capital:

∂w(S)
>0
∂S

9.3 A Model of Human Capital Investments

Consider a simple model of the decision to attend college. A recent high


school graduate has two choices: she can start work right away or attend
Wiswall, Labor Economics (Undergraduate), Lecture Notes 115

college. If she attends college, after 4 years, she graduates from college and
works with a college degree.
Assume that if she works without a college degree, she earns wH . If she
works with a college degree, she earns wC . A reasonable assumption is that
a college graduate earns more than a high school graduate: wC > wH . This
assumption can be motivated based on the assumption that college makes
workers more productive (college provides more human capital) or because
college signals other forms of unobserved human capital (a signalling model,
discussed below).
If attending college is costless, every person would choose to attend col-
lege. However there are some costs to attending college. One of the costs of
obtaining a college degree is that, while the college student is in school, she
cannot work (or at least not work as much if not attending college). There
is an opportunity cost of college attendance from foregone earnings while in
college.

9.3.1 Present Value Calculations

In order to model the costs of foregone earnings, we need to consider how


individuals value present utility versus future utility. We use a concept called
present value. It is a fair assumption that present utility is valued more than
future utility. Said another way, future utility is discounted relative to present
utility. The rate at which future utility is discounted is called the discount
rate.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 116

Discount rate

The discount rate is given by δ: 0 ≤ δ ≤ 1. Higher δ indicates a higher


value placed on future utility relative to present utility. If δ = 1, future utility
has the same value as present utility. If δ = 0, future utility is completely
discounted, and the individual only cares about present utility.

Assume there are two periods. t = 1 is today and t = 2 is tomorrow. If


an individual receives U2 in period 2 (tomorrow), the present value of this
utility in period 1 (today) is

P V = δU2

This indicates that δU2 units of utility received tomorrow provides the
individual less utility than receiving U2 units of utility today.

P V = δU2 ≤ U2

Interest Rates and Present Value

Present value calculations can be used to calculate the future value of


money. In this case, the discount rate is a function of the interest rate:
1
δ= 1+r
. The present value of $100 wage received in period 2 is

$100
PV =
1+r
Wiswall, Labor Economics (Undergraduate), Lecture Notes 117

An example: if the interest rate is 5 percent (r = 0.05). The present


value of $100 received in perod 2 is

$100
PV = ≈ $95.2
1 + 0.05

One justification for discounting the future in this case is that money can
be invested today and earn interest collected tomorrow. If we could have
$100 today, put it in a bank, and earn 5 percent interest, we could have $105
tomorrow. Therefore, having the $100 in hand today is more valuable than
having the $100 tomorrow.
The present value calculation indicates that if we were given $95.2 today,
we could save this money and at 5 percent interest, we would receive about
$4.80 in interest. By saving the $95.2 today, we would receive approximately
$100 tomorrow.

Discounting More Than One Period

To discount more than one period in the future, we use the following
equation

P V = U1 + δU2 + δ 2 U3 + · · · + δ T −1 UT ,

where the periods are t = 1, . . . , T and T is the last period. Notice that
the farther away the future period is, the more it is discounted.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 118

9.3.2 Foregone Earnings and College Attendance

Using the present value framework, we can write the present value of working
after high school and not attending college. We assume a high school grad-
uate earns wH every year she works. The high school graduate works every
year until retirement in period T . Period t = 1 is the first year after high
school graduation. The present value of earnings if an individual chooses not
to attend college is

P VH = wH + δwH + δ 2 wH + · · · + δ T −1 wH .

P
Or using notation, we can write this as

T
δ t−1 wH .
X
P VH =
t=1

Assume a college graduate also works every year after college graduation
at a wage of wC . A college graduate spends the first four years in college and
earns no wages during these four years. The present value of earnings if the
individual chooses to attend college is

P VC = 0 + δ0 + δ 2 0 + δ 3 0 + δ 4 wC + δ 5 wC + · · · + δ T −1 wC .

P
Or using notation, we can write this as
Wiswall, Labor Economics (Undergraduate), Lecture Notes 119

4 T
δ t−1 0 + δ t−1 wC .
X X
P VC =
t=1 t=5

These expressions provide the discounted lifetime earnings from the two
choices. A high school graduate decides to attend college if the discounted
lifetime earnings from attending college are greater than the discounted life-
time earnings of not attending college: P VC > P VH .

9.4 A More Detailed Look at Human Capital Invest-

ments

As the model now stands, there is still no reason why some people attend
college and others don’t. If everyone has the same discount rate and earns
the same wage from either choice, then the present value of the two choices
is the same for all individuals. Either everyone should be attending college
or no one should.
We can add several features to the model to make the model more realistic
and provide some reasons for differences in the behavior observed in the
population.

1) Direct Costs of Schooling

In the present model, the only cost to attending college is the opportunity
cost of foregone labor market wages. There are also direct costs of attending
Wiswall, Labor Economics (Undergraduate), Lecture Notes 120

college, such as the cost of tuition. Assume the dollar value of these costs
are D (D ≥ 0) per year. If D does not vary over the four years of college
attendance, the present value of college can be re-written as

4 T
δ t−1 D + δ t−1 wC .
X X
P VC = −
t=1 t=5

2) Ability and Wages

The model thus far assumes that all individual earn the same college
wage (if they attend college) and the same high school wage (if they do
not attend college). However, we know that there is substantial variation
in wages even among the population of high school and college graduates.
The wages individuals receive may be a function of other sources of human
capital besides the human capital received from college. We can call these
other sources of human capital ability. The source of these abilities may
be from genetic endowments, from family or social environments, or from
pre-college schooling (e.g. high school quality). Use A as the scalar (one
dimensional) ability endowment. Individuals with higher levels of A have
more ability. We can re-write the high school and college wage as a function
of A: wH (A) and wC (A), where

∂wH (A)
≥ 0,
∂A

and
Wiswall, Labor Economics (Undergraduate), Lecture Notes 121

∂wC (A)
≥ 0.
∂A

The present value of high school and college is now a function of an


individual’s ability:

T
δ t−1 wH (A),
X
P VH (A) =
t=1

and

4 T
δ t−1 D + δ t−1 wC (A).
X X
P VC (A) = −
t=1 t=5

Schooling decisions may vary because some people have higher abilities
and this affects the wages they receive.
Consider two possibilities:

i) Ability and the human capital obtained from a college degrees are substi-
tutes. At a high ability level, Ahigh ,

P VH (Ahigh ) > P VC (Ahigh )

In this case, high ability individuals receive sufficiently high wages without
a college degree that the additional benefit from a college degree is not greater
than the costs.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 122

ii) Ability and college human capital are complements. At a high ability
level, Ahigh ,

P VC (Ahigh ) > P VH (Ahigh )

In this case, high ability individuals receive even higher wages with a
college degree than without a degree. The combination of a college degree
and high ability generates a large benefit to attending college relative to the
cost.

3) Tastes for Schooling

In addition to variation in the abilities people have, individuals may also


have different tastes for schooling. Some people may enjoy more (or dislike
less) attending school more than others. To capture this idea, we need to
depart from the idea that individuals only consider the present value of their
lifetime earnings. Similar to the compensating differentials models, assume
there is Z variable that proxies for an individual’s taste for schooling. Higher
values of Z indicate a greater taste for schooling. In this model, individuals
have a utility function which weighs the pecuniary aspects of schooling (wages
and direct costs of schooling) with the non-pecuniary aspects given by Z.
Write these utility functions as

UH (Z, P VH ),
Wiswall, Labor Economics (Undergraduate), Lecture Notes 123

and

UC (Z, P VC ).

An individual chooses to attend college if

UC (Z, P VC ) > UH (Z, P VH ).

Consider two cases:

i) If an individual has a high enough preference for schooling, she may attend
college even if the present value of college earnings are lower than than present
value of earnings from working immediately.

ii) On the other hand, an individual with a high enough dislike for schooling
may not attend college even if the pecuniary rewards are higher for college
attendance.

4) Credit Constraints

Another important factor which may influence the decision to attend


college is whether an individual can borrow to pay for the costs of college.
In the model presented so far, there are direct costs to schooling because of
tuition and the opportunity costs of not working while in school. The model
implicitly assumes an individual can pay for these costs. However, as we
know, many people borrow against future income in order to pay for college.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 124

A potentially important source of heterogeneity in the population is that


some individuals may be more credit constrained than others. Wealthy indi-
viduals (e.g. college students with wealthy parents) may have more money
available to pay for college. Less wealthy individuals may have to borrow
from banks or the government (student loans). These differences in wealth
essentially impose higher credit constraints on some individuals than on oth-
ers. It could be the case that an individual would receive a higher utility
from attending college than not, but because they are credit constrained,
they cannot afford the costs of college. Many government policies in the
United States (e.g. financial aid for low income students, subsidies for public
universities) are designed to alleviate credit constraints.

5) Uncertainty about Future Returns

Finally, it is important to consider that the wages paid to high school and
college graduates may change over time. This is important for the decision to
attend college to the extent individuals cannot predict with perfect certainty
what the wage rates will be in the future. There is uncertainty about future
returns to school. We can add a term to reflect this uncertainty to our college
and high school wages:

wC (A, t) = wC (A) + εC (t)


Wiswall, Labor Economics (Undergraduate), Lecture Notes 125

wH (A, t) = wH (A) + εH (t)

The wage college graduates receive wC (A, t) now varies by t. It is com-


posed of two parts. wC (A) is the part of the wage college graduates of ability
A receive in all periods. This is the non-stochastic part of the wage. εC (t)
is the stochastic part of the wage. This term can be negative or positive in
different time periods. If in some period t, the wage for college graduates
is particularly high (e.g because of high demand for college educated labor),
the εC (t) will be positive and large.
The same interpretation can be made for the high school wage wH (A, t).
It is also composed of a non-stochastic part (wH (A)) and a stochastic part
εH (t).
Because the wages are uncertain for human capital investments, there is
some risk to investment in human capital. This risk can be thought of in
same way as the risk involved in investments in any other asset (e.g. the
risk that the price of a stock in a firm might change in the future). Here
the return to the college degree asset is the wage college graduates receive
relative to the wage non-college graduates receive. To the extent that some
people are more risk averse than others, the risk averse people may prefer
not to invest in a college degree. The risk averse people want to avoid the
risk that the return on this asset falls in the future.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 126

An additional source of uncertainty is that an individual who attends col-


lege may not finish. This dropout risk adds additional risk to the investment
in college human capital.

Summary: Why Does Schooling Vary?

The choice of attending college may vary in the population for several
reasons:

1) Some people discount the future more than others. Individuals who value
the present relatively more than the future would choose to work right away
rather than waiting until after college graduation.

2) Some people have higher abilities than others and this affects the relative
returns to college human capital.

3) Some people have a higher taste for schooling than others.

4) Some people are credit constrained and cannot afford to attend college.

5) The returns to college are uncertain and some individuals are more risk
averse than others. These risk averse individuals choose not to invest in this
risky asset (the college degree).

9.5 Life Cycle Human Capital Investment

Human capital investments take place throughout the life cycle, from birth
to death. But, there is generally a distinct life cycle pattern where most
Wiswall, Labor Economics (Undergraduate), Lecture Notes 127

of the investment happens at the beginning of life and less human capital
investment occurs toward the end of life. One strong reason for this pattern
is that the earlier a human capital investment is made, the more periods
there are for an individual to accumulate the return on the investment. An
individual who graduates from college at age 22 has many more years to
earn the higher wages associated with a college degree than an individual
who graduates from college at age 40.
On the other hand, there are at least three main reasons why workers
continue to make human capital investments throughout their life cycle.

1) Human capital depreciates.

Just as a physical capital depreciates over time (e.g. tools become dull),
human capital may also depreciate. Many students in this course have at
least partially forgotten their high school math skills. Their accumulated
stock of math human capital has depreciated. These students need to make
new investments in math human capital in order to replace this depreciated
portion of their human capital.

2) Returns to human capital change over time.

Consider a secretary trained prior to the advent of personal computers


in the 1980s. When the secretary was young in the 1970s, the return to
investments in computer skills were low. She therefore decided not invest
in computer human capital. In the 1980s, as personal computer technology
Wiswall, Labor Economics (Undergraduate), Lecture Notes 128

was introduced into the economy, the return to these investments increased.
With the new higher return to investments in computer human capital, the
administrative assistant now chooses to make these investments. The reason
she needs to make these investments later in life is that she could not predict
when she was young that the returns to computer human capital would rise.
In this case, technological change creates uncertainty in the returns to human
capital investments. This technological change affects the returns to human
capital and causes workers to make new investments in human capital to
update their skills.

3) People update their preferences.

When individuals are young they may have different preferences for edu-
cation and occupations than when they get older. An individual when young
may have had a strong preference for engineering and made investments in
engineering human capital to work in an engineering occupation. As the indi-
vidual got older, her preferences changed and she now has a strong preference
for teaching. Because of this change in preferences, which were not perfectly
predicted in youth, she must now make new investments in teaching human
capital later in life.

9.6 Human Capital Production Functions

Human capital can be thought as an output which is produced using some


combination of inputs. Like the production functions we discussed earlier, the
Wiswall, Labor Economics (Undergraduate), Lecture Notes 129

human capital production function describes the technology of how different


combinations of inputs create the output of human capital:

h = f (purchased inputs, time, prior human capital)

Higher levels of purchased inputs, time, or prior human capital are ex-
pected to increase the production of human capital h.
Consider the output of labor economics human capital from this course.
A student’s inputs into the production of labor economics human capital
are her purchased inputs (tuition to pay the instructor’s salary, buy books),
time (time spent in lecture and studying), and prior human capital. For this
course, the student’s output of labor economics human capital may depend
on how much economics and mathematical human capital the student has
accumulated prior to the start of the course. Prior human capital may also
depend on intelligence and other abilities a student is born with. All of
these inputs would be expected to increase the production of labor economics
human capital.

9.7 Complementarities in Human Capital Production

Complementarities in human capital investments occur when investments in


past human capital increase the productivity of future human capital invest-
ments. Complementarities in human capital production may have important
implications for policy interventions. Complementarities in human capital
Wiswall, Labor Economics (Undergraduate), Lecture Notes 130

production imply that early investments in childhood human capital devel-


opment (e.g. pre-school programs) may be very cost effective relative to later
interventions (e.g. job training for adults).

An Example: Cognitive Development

A large literature (much of it outside economics) documents that the later


production of future human capital through schooling and job training is af-
fected by the level of an individual’s cognitive ability (one imperfect measure
would be IQ). These cognitive skills are partly determined by genetics and
partly determined by early human capital investments (e.g. investments by
parents in family environments and early schooling). The child development
literature indicates that after about age 10, the level of cognitive ability is
fixed and cannot be altered by later human capital investments. Prior to
this age, investments in the human capital of children can influence cognitive
development. To the extent that cognitive ability increases the productivity
of later human capital investments, investment in cognitive human capital
are complementary with later human capital investments. This feature of
the human capital production function suggests the importance of investing
in early cognitive development.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 131

9.7.1 A Model of Human Capital Investments with Complemen-


tarities

There are two periods of decision making: t = 1 is childhood (all ages from
birth to age 18), t = 2 is young adulthood (e.g. ages 18-24).
Investments in human capital can be made in either of the two periods.
Call these investments in each period I1 and I2 . This investment term can
be thought of as the combination of the purchased inputs and time inputs.
Adult human capital (the human capital accumulated after period t = 2)
is a function of all of the prior human capital investments:

h = g(I1 , I2 ).

9.7.2 Two Extreme Cases

Case 1: Perfect Complements

If investments in period 1 and period 2 are perfect complements, adult


human capital is given by this production function:

h = min{I1 , I2 }.

In the perfect complements case, early investments in I1 must be matched


equally by later investments in I2 . Any investment in period 2 larger than
period 1 investments (I2 > I1 ) are wasted. Low investments in period 1
Wiswall, Labor Economics (Undergraduate), Lecture Notes 132

cannot be remedied with higher investments in period 2.

Case 2: Perfect Substitutes

In the case of perfect substitutes, adult human capital is produced as

h = aI1 + (1 − a)I2 .

If this is the human capital production function, later investments in


period 2 can substitute for low first period investments and make up for
deficits in period 1 investments. At a = 1/2, there is no real difference
between this two period childhood model and a one period model.

9.7.3 Implications for the Timing of Interventions

If the human capital production function exhibits perfect complementarity,


an important implication is that inequalities in early human capital (e.g.
cognitive development) at the point of late childhood and adulthood cannot
be overcome with later human capital investments. This would suggest that
subsidies for later human capital (e.g. secondary schooling, job training,
and post-secondary financial aid) would be ineffective in reducing income
inequality.
Even if there is only a limited degree of complementarity in human capital
investments, early investments are preferred over later investments. As evi-
dence for the presence of complementarities, there is a considerable amount
Wiswall, Labor Economics (Undergraduate), Lecture Notes 133

of research that finds that early interventions (e.g. subsidies for pre-school)
have a large effect on adult outcomes (e.g. wages earned as adults).
An important issue to consider is why some parents do not make sufficient
investments in early childhood human capital. Since there is a high return to
these investments, less wealthy parents should be borrowing against future
income to pay for these investments. The reason some parents do not borrow
to pay for these investments is likely some form of a credit constraint. That
is, parents cannot borrow against their children’s future income.

9.8 A Signalling Model of Human Capital Investments

Signalling models of human capital explain investments in observable human


capital (e.g. formal schooling) as an attempt by workers to signal unobserved
human capital to firms (e.g. the worker is intelligent, hard working, i.e. high
ability). The difference between a signalling model of human capital and the
prior models is that the observable human capital may not affect a worker’s
productivity at all ( ∂M∂S
Ph (S)
= 0.
However, because firms believe that higher observed (unproductive) hu-
man capital signals unobserved (productive) human capital, workers with
higher levels of observed human capital receive higher wages. The signalling
model then provides a distinct explanation for the positive correlation in
earnings and observed human capital.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 134

9.8.1 Model Setup

There is one representative firm and two types of workers: Type 1 and Type
2.

Worker Types and Productivity

Type 1 workers have lower unobserved human capital than Type 2 work-
ers. As a consequence, Type 1 workers have a lower productivity than Type
2 workers.
Type 1 worker’s marginal product is q1 .
Type 2 worker’s marginal product is q2 > q1 .

Information

The proportion of workers of Type 1 is p > 0. Proportion of Type 2 is


1 − p. The firm knows p (the distribution of types in the population). But
the firm does not know the type of any given worker before a wage offer is
accepted. The firm learns the type of the worker only after the firm has paid
the worker her wage.
In general, workers have no incentive to reveal their type. Type 1 workers
always have the incentive to lie and claim they are high productivity Type 2
workers. In this environment, the firms cannot directly learn anything about
a worker’s productivity.

Observed Human Capital (Schooling)


Wiswall, Labor Economics (Undergraduate), Lecture Notes 135

Observed human capital (schooling) is indexed s. Where higher s indi-


cates more schooling. The marginal cost (in $) to investments in schooling
differs by type.
Type 1 marginal cost of schooling is c1 > 0.
Type 2 marginal cost of schooling is c2 > 0.
Type 1 has a higher marginal cost of schooling: c1 > c2 .

One reasonable example of these assumptions would be that the smarter


Type 2 workers have higher productivity and have a lower cost of obtaining
schooling because school work is easier for them.

9.8.2 A Signalling Equilibrium

Workers of each type choose their investment in observed human capital (s).
Firms choose a wage policy as a function of s: w(s). A signalling equilibrium
is where the beliefs of the firm/employer match the actual actions of the
workers. Thereby the firm’s beliefs are confirmed by the workers’ choices,
and there is no reason for the firm to change its beliefs.
Note that the signal in the model is schooling s and the signalling cost in
the model is the cost of investing in schooling (c1 and c2 ).
To find a signalling equilibrium in this model, we first choose a wage
setting policy (w(s)). We then examine how workers choose their level of s
in response to this wage policy.
Suppose the firm believes that there is some schooling level s∗ such that
all workers who choose s < s∗ are the less productive Type 1 workers, and
Wiswall, Labor Economics (Undergraduate), Lecture Notes 136

all workers who choose s ≥ s∗ are the more productive Type 2 workers. The
firm sets wages according to these beliefs (see graph):

w(s) = q1 if s < s∗

w(s) = q2 if s ≥ s∗

How do the worker’s respond? Because of the structure of the wage


setting policy and because there is some cost to schooling (signalling cost),
a worker will either set s = 0 and receive w(s) = q1 or set s = s∗ and receive
w(s) = q2 .
In order for the workers’ actions to confirm the firm’s beliefs, it must be
the case that Type 1 workers choose s = 0 and Type 2 workers choose s = s∗ .
Type 1 chooses s = 0 over s = s∗ if

q1 − 0 > q 2 − c 1 s ∗ .

Type 2 chooses s = s∗ over s = 0 if

q2 − c 2 s∗ > q 1 − 0

Solve to find values of s∗ which satisfy both inequalities:


Wiswall, Labor Economics (Undergraduate), Lecture Notes 137

q2 − q1 q2 − q1
< s∗ <
c1 c2

Any values of s∗ that satisfy this condition are signalling equilibria.

9.8.3 Some Comments

1) The model generates a positive correlation between observed human cap-


ital and wages, but observed human capital has no effect on productivity.

2) The key part of this model which allows firms to use observed human
capital to signal worker type is that the cost of obtaining schooling is higher
for the less productive workers. In general, a signalling equilibrium requires
that the signalling cost must be negatively correlated with unobserved pro-
ductivity.

3) There are multiple equilibria

There are a number of s∗ that satisfy the condition above and generate
a signalling equilibrium.

4) Different welfare implications for different values of s∗

Although for any s∗ , which satisfies the condition above, agents act ra-
tionally, the particular level of s∗ has different welfare implications. Notice
that an increase in s∗ increases the cost of signalling for Type 2 workers but
Wiswall, Labor Economics (Undergraduate), Lecture Notes 138

does not affect Type 1 workers (they continue to choose s = 0). Therefore,
the lowest s∗ creates the greatest total welfare among all s∗ .

5) Directly testing for type might be less costly

Investments in human capital (s) appear to have no social return because


they do not directly affect a worker’s productivity. But there is some benefit
to signalling because these signals help solve an information problem (types
are unobserved by firms). However, there may be less socially costly ways for
firms to distinguish between workers and allocate workers correctly. Firms
may be able to use some sort of test to provide information about a worker’s
type.

9.8.4 A No Signalling Regime

It is possible that workers may prefer a no signalling regime in which there


are no productive signals.
Consider a model where s provides no signal of type. In this model, firms
base their wage offers on the expectation of worker types. All workers receive
the same wage:

w = pq1 + (1 − p)q2 .

Type 1 workers would prefer this no signalling wage since p > 0 and
q2 > q1 . Type 2 workers also may prefer the no signalling wage if
Wiswall, Labor Economics (Undergraduate), Lecture Notes 139

q2 − c2 s∗ < pq1 + (1 − p)q2

Notice several comparative statistics about this relationship. The benefit


to Type 2 workers of the no signalling wage relative to the signalling wage
increases with

i) The smaller the difference in productivity between types (q1 − q2 ).

ii) The higher the cost of signalling (c2 ).

iii) The higher the fraction of Type 2 workers in the economy (1 − p).

iv) The higher the signalling point s∗ .

Given that both types of workers may prefer a no signalling wage, but
firms in general would prefer to receive signals, there may be some scope for
workers of different types to collude. Type 1 and Type 2 workers could both
agree to choose s = 0, thus eliminating the value of the signal and returning
to the no signalling wage.

9.9 Non-Schooling Human Capital

Another major source of human capital is the human capital individuals


obtain from work experience and post-schooling training. We can think of
this type of human capital as rather heterogenous, as it is often specific
Wiswall, Labor Economics (Undergraduate), Lecture Notes 140

to firms, occupations, and industries. Non-schooling human capital can be


informally obtained on-the-job through learning by doing. Or, it can be
obtained more formally through training courses the worker or the firm pay
for.
In general, evidence for the importance of non-schooling human capital
is more difficult to obtain given that much of it is not measured in major
surveys. This is in contrast to schooling human capital (e.g. college measured
by a college degree), which is relatively easier to measure.
To examine the importance of non-schooling human capital, economists
use indirect evidence. In particular, we know that wages continue to increase
with age even after formal schooling has been completed. We can write wages
W as a function of age:

W = f (age)

This age-earnings profile can be represented by a graph with age on


the horizontal axis and wages or earnings on the vertical axis. Wages are
increasing in age:

∂W ∂f (age)
= > 0.
∂age ∂age

There is some evidence that this pattern is due to investments in non-


schooling human capital individuals continue to make as they get older. The
Wiswall, Labor Economics (Undergraduate), Lecture Notes 141

next section examines investments by firms in their non-schooling human


capital through on-the-job training.

9.10 On-the-Job Training

Training that is provided by firms is called on-the-job training. Firms invest


in the human capital of their workers because it makes the workers more pro-
ductive. This increase in labor productivity can in some situations increase
the firm’s profits.

9.10.1 Two Types of On-the-Job Training

Firms can make two types of investments in a worker’s human capital.

1) General Human Capital

General human capital is defined as human capital that is productive in


more than one firm. For a welder employed by the automaker GM (her firm),
general human capital would be welding skills the welder could use in many
firms and industries.

2) Firm Specific Human Capital

Firm specific human capital is only productive in a given firm. For a


welder employed by GM, firm specific human capital would be the knowledge
the welder has in welding together the unique parts for GM cars. This human
capital is not productive in another firm.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 142

We know that firms will generally provide firm specific human capital
to their workers. However, we do not know whether they will also pro-
vide general training. The extent to which firms provide general training
to their workers is potentially one of the main determinants of the stock of
human capital in an economy. If firms do not provide general training and
workers cannot finance these investments themselves (e.g. because of credit
constraints), there may be a rationale for government subsidized training.

9.10.2 Will Firms Provide General Training to their Workers?

The answer depends on the structure of the labor market.

1) Perfectly Competitive Labor Market

If the labor market is competitive, firms will not invest in general skills
because workers will leave after the training. A firm that pays for the gen-
eral training of their workers makes these workers more productive with all
firms. If the firm does not increase the worker’s wages to reflect this higher
productivity, the worker will leave the current firm and work in another firm.
Even if the firm raises the worker’s wages to reflect the increased productiv-
ity, the cost to the firm of the higher wages completely offsets the benefit to
the firm of higher productivity. The firm therefore has no incentive to invest
in general human capital.
In contrast, a firm invests in firm specific skills because these skills are
only productive with the current firm.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 143

Workers may still invest in general skills on the job by paying for the train-
ing themselves through lower wages. As an example, apprentices typically
are paid less than their marginal revenue product during the apprenticeship
period.

2) Non-Competitive Labor Market

As we discussed earlier, in a non-competitive labor market there are fric-


tions or imperfections in the labor market which impose costs on workers
moving to another firm. This provides the firm an incentive to invest in the
general skills of their workers since the firm can capture at least part of the
surplus associated with the higher productivity of workers with their current
firm relative to other firms.
To illustrate the point, take the extreme case of slavery. Slavery amounts
to a complete labor market friction in which the firm owns the worker, and the
worker cannot leave the employer. In this example, the firm has the incentive
to make optimal investments in the worker’s human capital through general
training. Since the worker cannot leave, the firm can capture the all of the
benefits from the investment in the worker’s general human capital.

9.10.3 Model of On-the-Job Training

There are two periods in the model and two agents: a worker and an employer
(firm).

Period 1
Wiswall, Labor Economics (Undergraduate), Lecture Notes 144

During the first period, there is a joint decision by the worker and em-
ployer on the investment in general human capital. For simplicity, there is
no firm specific human capital in the model.
General human capital is measured by τ ≥ 0.
Production in period 1 is normalized to 0.
The worker receives a first period wage of W .

Period 2
During the second period, the worker either stays with the employer and
gets a second period wage w(τ ) or quits the firm and receives her outside
option from another firm (v(τ )). Note that the second period wages and
outside options are a function of τ . It is reasonable to assume that

∂w(τ )
≥0
∂τ

∂v(τ )
≥0
∂τ

Outside Option

What the worker receives if she quits is key. The outside option depends
on the market structure of the labor market:

i) Perfectly competitive labor market: worker gets v(τ ).


Wiswall, Labor Economics (Undergraduate), Lecture Notes 145

ii) Non-competitive labor market: worker gets v(τ ) − 4, where 4 > 0. 4


represents the cost to the worker of the friction in the labor market. The
presence of these frictions distinguishes a non-competitive labor market from
a competitive labor market.

Exogenous Separation

With probability q, the worker and firm receive an exogenous adverse


shock, which causes them to separate (e.g. a recession causes the firm to
shut down). q is a measure of expected turnover in the model, aside from
voluntary quits by the worker.

Optimal General Training Level

The worker produces f (τ ). This function is increasing in the training


level τ .

∂f (τ )
f 0 (τ ) = ≥0
∂τ

The cost of acquiring general human capital is c(τ ). c(τ ) is increasing in


τ.

∂c(τ )
c0 (τ ) = ≥0
∂τ

The optimal training level is τ ∗ > 0. τ ∗ is the solution to this first order
condition:
Wiswall, Labor Economics (Undergraduate), Lecture Notes 146

c0 (τ ∗ ) = f 0 (τ ∗ ).

τ ∗ is the optimal training level which sets the marginal cost of training
equal to the marginal benefit.

Perfectly Competitive Labor Market

The distinguishing feature of perfect competition is that if the worker


quits, she gets the outside offer of v(τ ) = f (τ ), i.e. 4 = 0. That is, her
outside option reflects exactly her productivity with the current firm.
Here firms do not invest in training even though the optimal training level
is τ ∗ > 0. Because firms cannot lower worker wages to pay for training, the
worker immediately leaves for her outside option as soon as any training is
received. Anticipating this, firms do not make any investments in general
training.
However, workers can receive training if they choose to pay the full costs
of training themselves through lower wages. Workers are the full residual
claimant on their training investment because they can move freely between
firms. The worker therefore choose the optimal training level τ ∗ . The first
period wage is

W = −c(τ ∗ )
Wiswall, Labor Economics (Undergraduate), Lecture Notes 147

Non-Competitive Labor Market

Frictions in the labor market imply that general training has less value
in an outside firm than in the current firm: v(τ ) < f (τ ). This means there
is a surplus of f (τ ) − v(τ ) for the worker and firm to bargain over.
A useful way to summarize this is that the wage the worker receives in
the second period is

w(τ ) = v(τ ) + β[f (τ ) − v(τ )],

where β is the bargaining power of the worker. If β = 1, the worker gets


all of the surplus. If β = 0, the firm gets all of the surplus.
As long as β < 1 and q < 1, a sufficient condition for positive training
investment by the firm is f 0 (0) > v 0 (0). The basic intuition is that an invest-
ment in training (at the point of τ = 0) increases profits for the current firm
more than it increases the outside option of the worker.
As we might expect, higher bargaining power of the worker or higher quit
rates decrease the incentive for firms to invest in training.
An important point is that the profit maximizing investment in general
training by the firm is generally less than the optimal training investment
(τ ∗ ). This is because the firm is not the full residual claimant on the increased
productivity from training investments. Two reasons for less than optimal
training investments:
Wiswall, Labor Economics (Undergraduate), Lecture Notes 148

1) The worker receives part of the surplus back in wages, depending on the
bargaining power parameter.

2) The risk of exogenous separation (q > 0) creates uncertainty that the firm
could lose their investment in the worker’s general skills.

9.10.4 Sources of Labor Market Frictions

1) Search and Mobility Costs

See the discussion of search and mobility costs we discussed earlier. If


these types of labor market frictions exist, the worker has an incentive to stay
with the current firm and the firm has an incentive to invest in the general
training of the worker.

2) Asymmetric Information I

Outside firms have less information about the quality and content of
training a worker receives. Therefore, outside firms may not pay the full value
of the training a worker receives. A third party credential or certification
system could be designed to solve this problem.

3) Asymmetric Information II

The current firm learns about the imperfectly observed ability of workers.
This information cannot be readily shared with other firms. An adverse
selection (“lemons market”) problem occurs because a worker sends a signal
Wiswall, Labor Economics (Undergraduate), Lecture Notes 149

of low ability by being laid off by the firm (we expect low ability workers
to be fired or laid off first). This allows firms to keep and train high ability
workers and pay them lower wages than they would otherwise receive if all
firms had the same information.

4) Complementarity between General and Firm Specific Skills

If firm specific skills make general skills more productive, investments in


firm specific skills increase the return to general skills with the current firm
more than with outside firms.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 150

10 Econometrics Review

10.1 Random Variables and Data

A random variable is a placeholder for an unknown experimental outcome.


One example is flipping a coin. Let X be the random variable for flipping a
coin. Before we flip the coin and know the result of the experiment, we know
that X can either be heads or tails.

X = {“heads”, “tails”}

Each X outcome (“heads” and “tails”) is a realization of the random


variable X.
If the die is fair, the probability that X is heads is 0.5. The probability
that X is tails is also 0.5. The following is the probability distribution for the
random variable X.

pr(X = “heads”) = 0.5

pr(X = “tails”) = 0.5

This probability distribution tells us the probability of each and every


outcome or realization for the random variable X.
A more relevant example is a survey of labor hours. Assume 5 individuals
are surveyed. Each survey respondent is asked how many hours they worked
Wiswall, Labor Economics (Undergraduate), Lecture Notes 151

last week. Let Yi be the random variable for the hours of work individual
i reports. There are 5 random variables corresponding to each respondent’s
hours of work:

Y1 = 0, Y2 = 40, Y3 = 20, Y4 = 47, Y5 = 38

This collection of hours of work information is our data. Each individual’s


report is a data observation. There are 5 observations in our data.

A non-random variable is a constant. This is simply a number which has


no probability distribution (or a “degenerate” distribution).

10.2 Descriptive Statistics

Functions of data are called statistics. For this labor hours data, we can
calculate these descriptive statistics (or sample statistics:

1) Sample Median

This is the middle number of hours arranged from highest to lowest.

med(Y ) = 38

2) Sample Mean or Sample Average:


Wiswall, Labor Economics (Undergraduate), Lecture Notes 152

N
1 X
Y = Yi = 1/5(0 + 40 + 20 + 47 + 38) = 145/5 = 29,
N i=1

where N is the number of observations in the sample.

The symbol used for the Sample Mean, Y , is said “Y bar”.

3) Sample Variance

Sample variance tells us something about the dispersion of data around


the mean.

N
1 X
SY2 = (Yi − Y )2
N i=1

SY2 = 1/5[(0 − 29)2 + (40 − 29)2 + (20 − 29)2 + (47 − 29)2 + (38 − 29)2 ] ≈ 289.6

Variance is never negative: SY2 ≥ 0. The higher the variance, the more
dispersion there is in the data.

4) Sample Standard Deviation

The sample standard deviation is the square root of the variance.

q √
SY = SY2 = 289.6 ≈ 17.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 153

10.3 Types of Variables

1) Discrete Variables

Discrete variables take on a countable number of values. For example,


the variable for a coin flip is discrete because it can only take on two values:
“heads” and “tails.”

2) Dummy Variables

Dummy variables are discrete variables which take on only one of two
values: 0 or 1. The value of the dummy variable is intended to indicate some
characteristic. For example, we can summarize the gender of the survey
respondents by a dummy variable D. D = 1 if the person is male, and
D = 0 if the person is female.

3) Continuous Variables

Continuous variables take on a potentially infinite number of values.


Hours worked last week is a continuous variable. This variable can take
any values from 0 to 168.

10.4 Populations and Samples

Samples are drawn from populations. In our hours of work survey data, the
population is all adults in the United States. The sample of 5 individuals is
drawn from this population of over 100 million people.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 154

The major distinction between a sample and a population is that statis-


tics can be calculated for a sample, but these same statistics are in general
unknown for the population. We can calculate the mean hours worked for
our sample of 5 survey respondents, but we can never know the mean hours
worked in the entire population. The exception to this is experiments where
the population probability distribution is known (e.g. the coin flip experi-
ment). For these known experiments, we can calculate population statistics.
The reason we collect data on a sample of individuals rather than the
population is that it is in general infeasible to collect information on the entire
population. We hope that samples tells us something about the population.
We use sample statistics to infer information about the population. The
connection between the sample and population is therefore called inference.

10.5 Sampling Schemes

There are two main ways to randomly sample from a population:

1) Random Sample

A random sample collects information from people in a population chosen


at random. A practical way to do this for our hours of work survey would be
to pick random telephone numbers and survey each person who picks up the
phone. The reason we use random samples rather than non-random samples
is that a random sample of individuals better represents the population. If
we were to only survey lawyers or only survey 30 year old males, we would
Wiswall, Labor Economics (Undergraduate), Lecture Notes 155

obtain biased statistics for the population we are interested in (all American
adults).

2) Stratified Sample

Stratified sampling is random sampling for particular sub-groups or strata.


The population is divided into sub-groups by characteristics such as race,
gender, occupation, or residence. Within each sub-group a random sample
is collected. The reason many surveys are collected as stratified samples is
to ensure there are adequate numbers of people with a given set of charac-
teristics in the sample. For example, a non-stratified random sample of a
1,000 people in the United States may only contain a handful of people from
New York. If we want reliable statistics for New Yorkers, we should instead
stratify our sampling on location and collect at least a 100 observations from
New York.

10.6 Consequences of Random Sampling

Drawing a random sample means that the randomly collected observations


have two properties:

i) Observations are independent.


Independent observations have no relationship to each other; they were
collected randomly. We did not survey our five closest friends, but instead
randomly chose people to survey.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 156

ii) Observations are identically distributed.


If observations have an identical distribution, they are drawn from the
same population. Each observation reflects the population it was drawn from
and has the same population distribution. There is no fundamental difference
between one randomly collected observation and another.

10.7 Expectations

The expectation of a random variable is essentially the population equivalent


of the sample mean (or the sample mean is the sample analog of the popu-
lation expectation). Except in the case of controlled experiments (e.g. the
coin flip experiment), the expectation of a random variable cannot be known
because it is a function of the unknown population distribution of a random
variable.
For a discrete random variable X, the expectation of this random variable
is

X
E(X) = pr(X = x)x,
x

where x (lower case) is one of the outcomes or realizations of the random


P
variable X, pr(X = x) is the probability distribution of X, and x indicates
that we are summing over all of the possible outcomes or realizations of X.

(Note: For continuous variables, expectations is defined using integrals


because there are an infinite number of realizations of the random variable.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 157

R
For a continuous variable X, E(X) = xf (x)dx, where f (x) is the continuous
probability distribution function for X.)

Properties of the Expectations Operator

For random variables X and Y and constants a and b,

i) E(b + aY ) = b + aE(Y )

ii) E(aY + bX) = aE(Y ) + bE(X)

iii) E(Y X) 6= E(Y )E(X) (in general).

10.8 Relationship between Variables: Variance, Co-

variance, Correlation, and Independence

10.8.1 Variance

The variance of a random variable is the population equivalent of the sample


variance.

V (X) = E[(X − E(X))(X − E(X))].

Sometimes the notation V ar(X) is also used for variance.

Properties of the Variance Operator


Wiswall, Labor Economics (Undergraduate), Lecture Notes 158

For random variables X and Y and constants a and b,

i) V (b) = 0. Variance of a constant (non-random variable) is zero.

ii) V (b + aX) = a2 V (X).

iii) V (aX + bY ) = a2 V (X) + b2 V (Y ) + ab2cov(X, Y )

10.8.2 Covariance

Covariance indicates the relationship between two variables:

cov(X, Y ) = E[(X − E(X))(Y − E(Y ))].

Notice that the covariance between the same variables is the variance:
cov(X, X) = V (X). The covariance operator therefore has the same proper-
ties as the variance operator.

Interpretation of Covariance:

i) If cov(X, Y ) = 0, there is no relationship between X and Y .

ii) If cov(X, Y ) > 0, there is a positive relationship between X and Y (higher


X is associated with higher Y ).

iii) If cov(X, Y ) < 0, there is a negative relationship between X and Y


(higher X is associated with lower Y ).
Wiswall, Labor Economics (Undergraduate), Lecture Notes 159

Properties of the Covariance Operator

For random variables, X and Y , and constants a and b.

i) cov(X, X) = V (X, X)

ii) cov(X, Y ) = cov(Y, X)

iii) cov(a, b) = 0. Covariance of constants is zero.

iv) cov(a, X) = 0. Covariance of a constant and a random variable is zero.

10.8.3 Correlation

Correlation is a units free measure of a relationship between two variables.


The correlation coefficient between X and Y is written corr(X, Y ).

cov(X, Y )
corr(X, Y ) =
V (X)1/2 V (Y )1/2

Correlation is bounded between −1 and 1.

−1 ≥ corr(X, Y ) ≥ 1,

Correlation corr(X, Y ) has the same sign and interpretation as covariance


cov(X, Y ):

i) corr(X, Y ) = 0 indicates no correlation between X and Y .

ii) corr(X, Y ) = 1 indicates a perfect positive correlation between X and Y .


Wiswall, Labor Economics (Undergraduate), Lecture Notes 160

iii) corr(X, Y ) = −1 indicates a perfect negative correlation between X and


Y.

10.8.4 Independence

Independent random variables have no relationship with each other. The


notation for independence is ⊥. If X and Y are independent, we write
X ⊥ Y . Note that X ⊥ Y implies Y ⊥ X.
All independent variables have a covariance and correlation of 0.

X ⊥ Y ⇒ cov(X, Y ) = 0

X ⊥ Y ⇒ corr(X, Y ) = 0

However, variables with a covariance and correlation of 0 are NOT nec-


essarily independent.

10.9 Conditional Expectations

Means, variances, and covariances can be calculated conditionally. E[Y |X] is


the expectation of Y conditional on X (mean of Y conditional on X). Condi-
tioning on a random variable transforms the conditioning random variable X
into a constant. The conditional mean is now a function of the conditioning
variable X.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 161

An Example

D is a dummy variable for male gender (D = 1 for men, and D = 0 for


women) and Y is hours worked last week. E[Y |D = 1] is the expectation of
hours worked for men. E[Y |D = 0] is the expectation of hours worked for
women.
We can write this conditional expectation as a function:

E[Y |D] = β0 + β1 D,

where β0 and β1 are population parameters.


The conditional expectations for each value of D are

E[Y |D = 1] = β0 + β1 ∗ 1 = β0 + β1

and

E[Y |D = 0] = β0 + β1 ∗ 0 = β0

Independence and Conditional Expectations

It is important to note that if any two random variables X and Y are


independent (X ⊥ Y ), then E[Y |X] = E[Y ] and E[X|Y ] = E[X]. The
intuition is that if X and Y are not related to each other, then conditioning
on the other will not affect their expectation.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 162

In our example above, D ⊥ Y implies that gender (D) has no relationship


with Y . Hence, the conditional expectations of Y for men and women are
the same. If D ⊥ Y , on average, men and women work the same number of
average hours E[Y ].
If D ⊥ Y , then

E[Y |D = 1] = E[Y ],

and

E[Y |D = 0] = E[Y ].

Conditional Variances and Covariances

Conditional variances and covariances have the same interpretation. For


example, V (Y |D = 1) is the variance of hours worked for men. V (Y |D = 0)
is the variance of hours worked for women.

10.10 Sample Analog

We can construct sample analogs for each of the population concepts defined
above. The sample analog is the sample statistic calculated using a data
sample of size N .

i) Mean, Average
Wiswall, Labor Economics (Undergraduate), Lecture Notes 163

Population: E[Y ].

1 PN
Sample analog: Y = N i=1 Yi

ii) Variance

Population: V [V ].

1 PN
Sample analog: SY2 = N i=1 (Yi − Y )2 .

iii) Standard Deviation

Population: V [Y ]1/2 .
q PN
Sample analog: SY = ( N1 i=1 (Yi − Y )2 ).

iv) Covariance

Population: cov(X, Y ).

1 PN
Sample analog: SXY = N i=1 (Yi − Y )(Xi − X).

v) Correlation

Population: corr(X, Y ).

SXY
Sample analog: corr(X,
d Y)= S X SY
.

vi) Conditional Expectations

Population: E[Y |X].


Wiswall, Labor Economics (Undergraduate), Lecture Notes 164

Sample analog: we calculate the sample mean of Y for each value of X.


For our dummy variable example above, the sample analog of E[Y |D = 1] is

NX
1 male

Y male = Yi ,
Nmale i=1

where Y male is the sample average of Y for the males in our sample, and
Nmale is the number of males in our sample.
The sample analog of E[Y |D = 0] is

Nf emale
1 X
Y f emale = Yi ,
Nf emale i=1

where Y f emale is the sample average of Y for the females in our sample,
and Nf emale is the number of females in our sample.

vii) Conditional Variance

Population: V [Y |X].

Sample analog: we calculate the sample variance of Y for each value of X.


For our dummy variable example above, the sample analog of V [Y |D = 1] is

NX
2 1 male

SY,male = (Yi − Y male )2 .


Nmale i=1
Wiswall, Labor Economics (Undergraduate), Lecture Notes 165

10.11 Relationship between Sample and Population Statis-

tics

From our sample, we calculate sample statistics. We would like these statis-
tics to be as close as possible to the unknown population statistics. Consider
the difference between the population and sample mean.
Call the unknown population mean µ. µ is NOT a random variable.
Unless we have a survey the entire population, we cannot know the value of
µ.
Our sample mean is Y . The sample mean is a function of our sample
data. Each data observation is a random variable. Because the sample mean
is a function of random variables, it is also a random variable. The sample
mean is one estimator of the unknown population mean.

Is the sample mean a “good” estimator of the unknown population mean?

10.11.1 Bias

One criteria for deciding how “good” an estimator is to examine its bias.

bias = E(Y ) − µ.

We would like this bias to be as close to 0 as possible.


What is the expectation of the sample mean?
Wiswall, Labor Economics (Undergraduate), Lecture Notes 166

N
1 X
E(Y ) = E[ Yi ]
N i=1

Using the properties of expectations, we can write this as

N
1 X
E(Y ) = E(Yi )
N i=1

Because of random sampling, each observation has the same distribution


and the same population mean. For all observations i, E(Yi ) = µ.
Substituting this,

N
1 X 1
E(Y ) = µ = (µ + · · · + µ)
N i=1 N

Since µ is a constant, we can re-write this as

1
E(Y ) = ∗N ∗µ=µ
N

This shows that the bias of the sample mean is 0 in a random sample.
The sample mean is an unbiased estimator of the population mean.

10.11.2 Variance of the Estimator

Another desirable property of an estimator is low variance. High variance


implies that for different samples we might have widely different sample mean
Wiswall, Labor Economics (Undergraduate), Lecture Notes 167

estimators. High variance therefore reduces the precision of our estimator.


The variance of the sample mean estimator is

N
1 X
V (Y ) = V ( Yi )
N i=1

Using the variance operator properties,

1 2
V (Y ) = ( ) [V (Y1 ) + V (Y2 ) + · · · + V (YN ) + 2cov(Y1 , Y2 )
N

+2cov(Y1 , Y3 ) + · · · + 2cov(YN , YN −1 )]

Because of random sampling, observations are independent. For any two


observations i and j, we know Yi ⊥ Yj . Independence implies that all of the
covariance terms are 0. Since the covariance terms are 0, we can re-write the
variance of the sample mean as

N
1 2X
V (Y ) = ( ) V (Yi )
N i=1

Because of random sampling, all of the observations i have an identical


distribution. This implies that V (Yi ) = σ 2 for all i. σ 2 is the unknown
population variance. Like µ, σ 2 is NOT a random variable.
Substituting,

1 2
V (Y ) = ( ) N σ2.
N
Wiswall, Labor Economics (Undergraduate), Lecture Notes 168

Simplifying,

1 2
V (Y ) = σ .
N

This expression indicates that the variance of our sample mean estimator
is a function of the unknown population variance σ 2 .

10.11.3 Estimating the Population Variance

Without an estimator for the unknown population variance, we cannot es-


timate the variance of the sample mean estimator. A sensible estimator for
the population variance σ 2 is the sample variance of Y , SY2 .

N
1 X
SY2 = (Yi − Y )2
N i=1

Substituting this into the expression for the variance of the sample mean
(above), we obtain an estimator for the variance of the sample mean:

1 2
(Y ) =
Vd S .
N Y

10.11.4 Standard Error for the Sample Mean

The square root of the estimated variance of the sample mean is called the
standard error for the sample mean.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 169

r
1 q 2
SE(Y ) = (Y ) = √
Vd SY
N

This statistic provides a measure of the precision of the sample mean


estimator. Low standard error indicates a highly precise estimator.

Returning to our hours of work example

In our hours of work survey data there are N = 5 observations. The


sample mean for our hours of work survey is Y = 29. The sample variance
q
of Y is SY2 = 289.6, and SY2 ≈ 17.
The standard error of our sample mean estimator is then

1
SE(Y ) = √ ∗ 17 ≈ 7.60.
5

10.12 Inference for the Sample Mean

Our goal is to learn something about the population mean from a random
sample. We know that because of random sampling, the sample mean is
an unbiased estimator of the population mean. But the sample mean is a
random variable. There is some probability that the sample mean could be
very different from the unknown population mean.
We would like to have an idea of the probability distribution of the sample
mean. If we know the distribution of the sample mean, we can calculate
Wiswall, Labor Economics (Undergraduate), Lecture Notes 170

the probability that the sample mean is very different from the unknown
population mean. We cannot know exactly what the distribution of the
sample mean estimator is. But, under some assumptions (embodied in a
Central Limit Theorem), we can approximate this distribution.

10.12.1 Confidence Intervals

Using the approximation of the distribution of the sample mean, we can


construct the following confidence interval :
With 95 percent probability, the unknown population mean µ is within
this interval:

Y − SE(Y ) ∗ 2 ≤ µ ≤ Y + SE(Y ) ∗ 2,

where 2 is the critical value, which depends on the confidence level of the
interval. The critical value for the 95 percent confidence level is 2. (Note:
the critical value is not exactly 2, but we’ll just use 2 in this course.)

For our example, this confidence interval is

29 − 7.60 ∗ 2 ≤ µ ≤ 29 + 7.60 ∗ 2

13.8 ≤ µ ≤ 44.2
Wiswall, Labor Economics (Undergraduate), Lecture Notes 171

It is possible µ is not equal to the sample mean of 29. However, given


the approximation of the distribution of the sample mean, we can say with
95 percent confidence that µ is in the interval between 13.8 and 44.2.

Comments on Confidence Intervals

1) Smaller confidence intervals indicate greater precision in the estimator.

2) A higher number of observations (higher N ) reduces the size of the confi-


dence interval.

3) A lower standard error (SE(Y )) reduces the size of the confidence interval.

4) If we want to be even more confident about the range of values the un-
known µ could take on (e.g. raise the confidence level to 99 percent), then
we need to increase the critical value and the confidence interval becomes
wider. For example, the 99 percent confidence interval has a critical value of
about 2.6.

10.12.2 Hypothesis Tests

Confidence intervals can be used to conduct tests of specific hypothesis. A


hypothesis which is often tested is whether the unknown population mean is
0. If the value of 0 falls within the 95 percent confidence interval, then we fail
to reject the hypothesis (at the 95 percent confidence level) that µ = 0. In
our example, 0 is outside the calculated 95 percent confidence interval. We
Wiswall, Labor Economics (Undergraduate), Lecture Notes 172

can therefore say that we reject the hypothesis that µ = 0 at this confidence
level.

10.13 Regression Analysis

10.13.1 Regression Model

The regression models we will study take this form:

Y = β0 + β1 X + ε.

This is a theoretical model about how values of Y relate to values of X


in the population. Notice that this regression model is linear in X. This
regression model provides an equation for a line, a regression line.

Y is the dependent variable. It is a random variable. The variable Y


depends on the value of the independent variable X.

X is the independent or explanatory variable. It is also a random variable.


The X variable is also sometimes called the regressor.

β0 and β1 are the regression model parameters. β0 is the intercept of the


regression line. β1 is the slope of the regression line.

Values of β1 have the following interpretation:

∂Y
1) If β1 > 1, then Y is increasing in X: ∂X
> 0.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 173

∂Y
2) If β1 < 1, then Y is decreasing in X: ∂X
< 0.

∂Y
3) If β1 = 0, then Y is not related to X: ∂X
= 0.

ε is sometimes thought of as a the error or residual component of a


regression model.
The major distinction between ε and the X and Y variables is that we
observe the X and Y variables, but the ε variable is unobserved. That is, we
can survey individuals and ask them to tell us their X and Y values. In our
hours of work survey, we could ask the survey respondents to tell us their
hours of work (their Y ) and tell us their gender (their X variable, where
X = 1 if male, and X = 0 if female). Whatever we do not observe in data
we include in the ε random variable.

Regression Model Assumptions

The regression model we will maintain has the following four main as-
sumptions:

1) ε, Y , and X are random variables. We do not know their population


distributions (their population means, variances, covariances, etc.).

2) The relationship between Y and X is linear and given by this equation:

Y = β0 + β1 X + ε

3) E[ε] = 0.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 174

4) ε ⊥ X.

This last assumption is the most important. This assumption is often


referred to as an exogeneity assumption. X is assumed to be an exogenous
variable in the regression model. If X is not independent of ε, then X is an
endogenous variable.
Note an immediate implication of this assumption is that cov(ε, X) =
corr(ε, X) = 0.

There are other important regression model assumptions, but for simplic-
ity we will not discuss these.

The Regression Model Holds for Everyone

This regression model holds for all units in the population (e.g. individ-
uals, firms, etc.). To emphasize this, let’s index each random variable by a
subscript i. i indicates a particular unit (e.g. an individual) from the pop-
ulation. Note that here i does not indicate a data observation because this
regression model is a theoretical model for the population.
The model written with subscripts i is

Yi = β0 + β1 Xi + εi .

Notice that the parameters are not indexed by i. One of the implicit
assumptions of the regression model is that these parameters are the same
for everyone.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 175

For simplicity, we often simply drop the i subscripts. But, unless stated
otherwise, it is always the case that a given regression model holds for all
individuals in the population.

10.13.2 A Prediction Interpretation of Regression Models

Consider another interpretation of regression models. Let’s assume our goal


is to predict the Y value for someone. If we don’t know anything about the
person, the “best” predictor of the Y for a given person is the average Y ,
E[Y ]. However, if we know some more information about the person, say
the individual’s X value, we may be able to form a better prediction of the
person’s Y value IF their is a relationship between X and Y .
With no information about X values, our model for each individual i is
simply

Yi = β0 + εi

Given the assumptions of the regression model, β0 is simply β0 = E[Yi ].


Our predicted Yi is indicated Ye . The predicted value of Y in this case is
Ye = E[Y ].
In this interpretation of the regression model, εi is the prediction error.
For each individual indexed i, the prediction error is

εi = Yi − Ye = Yi − E[Yi ] = Yi − β0 = εi
Wiswall, Labor Economics (Undergraduate), Lecture Notes 176

The assumption above that E[ε] = 0 (or, equivalently, E[εi ] = 0) is


an assumption that the average prediction error is zero. That is, Ye is an
unbiased predictor of Yi .
If we know more information, an Xi value for each individual i, our model
is then

Yi = β0 + β1 Xi + εi

Our “best” predictor uses this Xi variable information to form a better


predictor of an individual’s Yi value. The predictor is now the expectation
of Y conditional on X: Ye = E[Yi |Xi ]. According to the assumptions above,
this conditional expectation has a specific (linear) form:

E[Yi |Xi ] = β0 + β1 Xi

εi is again the prediction error:

εi = Yi − Ye = Yi − E[Yi |Xi ] = Yi − β0 + β1 Xi = εi

The assumptions of the regression model again imply that this prediction
error is mean zero (E[εi ] = 0). However, the predictor using X variables is
potentially a better predictor given the additional information X provides
about individuals.
It is important to note that if there is no relationship between X and Y ,
Wiswall, Labor Economics (Undergraduate), Lecture Notes 177

then this model collapses back to the model without the X. If X and Y are
independent, then β1 = 0. The model is then

Yi = β0 + 0Xi + εi = β0 + εi .

An Example: Predicting Wages

Assume our goal is to predict a person’s wage (Wi ). If we know noth-


ing about the individual, the best predictor of the individual’s wage is the
average wage, E[Wi ]. However, we think there is a relationship between an
individual’s human capital (measured by the years of schooling the individ-
ual has completed, Si ) and the individual’s wage. We think this relationship
is given by this regression model:

Wi = β0 + β1 Si + εi

If there is some relationship between Si and Wi , then we can use this


information to form a better predictor of an individual’s wage. For example,
an individual with a college degrees (Si = 16) likely has a higher wage than
an individual with no college degree (Si = 12). This positive relationship
between human capital and wages would be indicated by a positive β1 . Our
regression model uses schooling information to form a better predictor of an
individual’s wage. We will discuss this problem in more detail below.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 178

10.13.3 Estimating Regression Model Parameters

Given a sample of data (a Y value and X value for each person or data
observation), we can estimate the β0 and β1 parameters. In many circum-
stances, the optimal method for estimating regression model parameters is
called Ordinary Least Squares or OLS. Call the estimators for the population
parameters βb0 and βb1 . The OLS equations for these estimators are

SXY
βb1 = 2
SX

βb0 = Y − βb1 X.

SXY is the sample covariance of the X and Y variables in the data sample:

N
1 X
SXY = (Yi − Y )(Xi − X)
N i=1

2
SX is the sample variance of the X variable.

N
2 1 X
SX = (Xi − X)2
N i=1

X is the sample mean of the X variable, and Y is the sample mean of


the Y variable.
Notice that the equation for βb1 implies that this estimator reflects the
Wiswall, Labor Economics (Undergraduate), Lecture Notes 179

covariance in the X and Y data.

If X and Y have positive sample covariance, βb1 > 0.

If X and Y have negative sample covariance, βb1 < 0.

If the sample covariance between X and Y is zero, βb1 = 0.

10.13.4 Inference

Because the estimators βb0 and βc1 are functions of the random variables Y
and X, they are also random variables and have some unknown distribution.
Just as we did for the sample mean estimator, we can examine the bias
and variance of our estimators. It turns out, that if the assumptions of the
regression model hold, the OLS estimators are unbiased:

E[βb0 ] = β0

E[βb1 ] = β1

Estimating the Standard Error of the OLS Estimators

Under the assumptions of our regression model and some additional as-
sumptions (which we will not discuss), the variance of the estimator for β1
is
Wiswall, Labor Economics (Undergraduate), Lecture Notes 180

σ2
V (βb1 ) = 2
.
SX

As with the variance of the sample mean estimator, we need to estimate


the unknown population variance σ 2 . Using the assumptions of the regression
model, an estimate of σ 2 , called σc2 , can be found as

N
1 X
σc2 = [Yi − (βb0 + βb1 Xi )]2
N i=1

With this estimate for σ 2 , an estimate of the variance of the estimator βc1
is

σc2
V (βb1 ) = .
d
2
SX

The standard error of βb1 is then

r
SE(βb1 ) = V (βb1 ).
d

Like the standard error for the sample mean, we want as low a stan-
dard error as possible. Lower standard error for βb1 indicates a more precise
estimate of β1 .

The SE(βc1 ) decreases with


Wiswall, Labor Economics (Undergraduate), Lecture Notes 181

i) Lower variation in Y (lower σc2 ).

2
ii) Higher variation in X (higher SX ).

iii) More observations (higher N ).

The variance and standard error for βb0 have a similar expression and
interpretation.

Confidence Intervals and Hypothesis Tests for the OLS Estimators

Confidence intervals and hypothesis tests for βb0 and βb1 can be constructed
in the same way as we constructed them for the sample mean estimator.
The 95 percent confidence interval for β1 is

βb1 − SE(βb1 ) ∗ 2 ≤ β1 ≤ βb1 + SE(βb1 ) ∗ 2.

10.13.5 Multivariate Regression Models

Our regression model thus far has one X variable and is a univariate model.
Regression models with more than one X variable are called multivariate
regression models.

Y = β0 + β1 X1 + β2 X2 + εi

The estimators for the multivariate regression parameters (β0 , β1 , β2 )


have different OLS equations than in the univariate model. However, the
Wiswall, Labor Economics (Undergraduate), Lecture Notes 182

basic interpretation of the parameters is similar. In addition, the inference,


standard errors, and confidence interval concepts carry over to the multivari-
ate framework.
The advantage of multivariate regression models is that more explanatory
X variables can be used to predict the Y variable. For example, if Y is hours
of work last week, X1 could be a dummy variable for gender, and X2 could
be a variable for the wage rate of the individual. With this model, we could
estimate whether men and women have different labor supply responses to
changes in wage rates. The applications we discuss next will present more
material on how to interpret regression analysis in particular contexts.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 183

11 Topics in Applied Labor Economics: Es-

timating the “Return” to Schooling

11.1 What is the “Return” to Schooling?

For a variety of reasons, economists have been interested in estimating the


“return” to schooling. What exactly this “return” to schooling represents is
an open question. One interpretation is that the return to schooling is the
causal effect of forcing a random person to complete additional schooling.
The assumption is that this individual would earn more in the labor market
(higher wages) with the additional schooling than without the additional
schooling. The increase in labor market earnings is assumed to reflect the
increase in the labor productivity of the individual due to the higher level of
human capital she now has.
Let’s look at the return to college, where for simplicity we’ll define college
as 16 years of schooling and non-college as 12 years. The expected return to
college can be defined as

E[Wi |Si = 16] − E[Wi |Si = 12],

where Si is the years of schooling variable, Wi is labor market earnings


(wage), and i indexes individuals.
We can re-write this in terms of a regression model as
Wiswall, Labor Economics (Undergraduate), Lecture Notes 184

Wi = β0 + β1 Si + εi .

εi is the stand-in for all of the other unmeasured factors that affect wages.
εi is simply the residual difference in wages, net of the “effect” of schooling:

εi = Wi − (β0 + β1 Si )

Using the assumptions of the regression model, the expectation of the


wage for individuals with 12 and 16 years of schooling are

E[Wi |Si = 16] = β0 + β1 16

E[Wi |Si = 12] = β0 + β1 12

The difference is

E[Wi |Si = 16] − E[Wi |Si = 12] = β1 4.

β1 indicates how much the wage increases with a one unit (one year)
increase in schooling. To see this explicitly, re-write the regression model to
indicate that the wage is a function of Si and i .
Wiswall, Labor Economics (Undergraduate), Lecture Notes 185

Wi (Si , εi ) = β0 + β1 Si + εi

The partial derivative of the wage function with respect to schooling is

∂Wi (Si , εi )
= β1
∂Si

11.2 Percent Change

The previous regression model has wage levels as the dependent variable
and β1 indicates the change in the level of wages from a change in schooling.
However, we are often more interested in the percent change “effect” schooling
has on wages. To calculate this, we transform the wage using the log function.
Re-write the regression model as

ln Wi (Si , εi ) = β0 + β1 Si + εi .

Now calculate the partial derivative,

∂ ln Wi (Si , εi ) ∂Wi (Si , εi ) 1


= = β1
∂Si ∂Si Wi (Si , εi )

β1 ∗ 100 indicates the percent change in the wage with a one unit increase
in schooling. For example, if β1 = 0.07, an individual who increase her
Wiswall, Labor Economics (Undergraduate), Lecture Notes 186

schooling 1 more year will have 7 percent higher wages. An individual who
increases her schooling 2 more years will have 14 percent higher wages, and
so on.

Note: the β1 and β0 in the model with wage levels is not the same as the
β1 and β0 in the model with log wages.

11.3 OLS Estimation

The OLS estimators for this regression model are

1 PN
ε
N i i=1 (ln Wi − ln W )(Si − S)
βc1 = 1 PN 2
N i=1 (Si − S)

βc0 = ln W − βc1 S

I estimated this model using data from the March 2003 Current Popula-
tion Survey. The number of observations is N = 87, 585.
You can obtain this data at this website:

https://beta.ipums.org/cps/

The regression model estimates are (standard errors are in parentheses).

βc1 = 0.09216 (0.00069)


Wiswall, Labor Economics (Undergraduate), Lecture Notes 187

βc0 = 1.488 (0.00967)

This coefficient estimate indicates that each year of schooling increases


wages by about 9.2 percent.

11.4 Inference

Using an approximation of the unknown distribution of βc1 , we can construct


a 95 percent confidence interval for the β1 parameter. With 95 percent
probability, the unknown β1 population parameter lies in this interval:

0.09216 − 2 ∗ 0.00069 ≤ β1 ≤ 0.09216 + 2 ∗ 0.00069

0.0908 ≤ β1 ≤ 0.0935

This is a fairly tight confidence interval.


We can also conduct hypothesis tests. Since 0 is outside the 95 percent
confidence interval, we can reject, with 95 percent probability, the hypothesis
that β1 = 0, and the return to schooling is zero. However, we cannot reject
the hypothesis, with 95 percent probability, that β1 = 0.091, for example.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 188

11.5 Self-Selection Bias

There is a strong reason to suspect that this OLS estimate of β1 is biased.


An OLS regression estimator may be biased if one or more the regression
model assumptions does not really hold.
Recall that one of the assumptions of the regression model is that the
explanatory variable (X variable) and the error component (ε) are indepen-
dent. In this case, this assumption holds if the number of years of schooling
an individual obtains (Si ) is independent of the εi variable. To understand
whether this is in fact true, we need to think about what the εi variable
reflects.
In our regression model, we partitioned the factors that explain an indi-
vidual’s wage (Wi ) into two factors: years of schooling, which we observe in
our data, and all other factors represented by εi . As we discussed above, it
is likely that the wage an individual receives is affected by more than just
the level of formal schooling an individual has obtained. For example, other
sources of human capital, which we called ability, may also affect the wage
an individuals receives.
Let’s re-write our regression model to include a random variable for an
individual’s level of ability (Ai ).

ln Wi (Si , Ai , ηi ) = β0 + β1 Si + αAi + ηi ,

where α is a population parameter which indicates the relationship be-


Wiswall, Labor Economics (Undergraduate), Lecture Notes 189

tween ability and wages (e.g. if ability increases an individual’s wage, then
α > 0). ηi is another random variable (error component) which reflects
everything else that affects wages.
Since we do not observe Ai or ηi in our data set, without loss of generality,
we simply called these terms εi . That is, εi in our original regression model
is

εi = αAi + ηi

In the human capital lecture above, we also discussed why an individual’s


level of ability may affect their human capital investment (e.g. their choice to
attend college). Let’s write another regression model to indicate that there
is a relationship between schooling and ability.

Si = γ0 + γ1 Ai + ωi

This model indicates that the level of an individual’s schooling (Si ) is


related to the level of the individual’s ability (Ai ) and to other factors rep-
resented by ωi . ωi reflects all of the other factors that affect an individual’s
schooling choice (e.g. individual i’s discount rate, taste for schooling, degree
of credit constraint, etc.).
If γ1 is not zero (i.e. there is some relationship between ability and school-
ing), then there is some correlation between schooling for an individual and
Wiswall, Labor Economics (Undergraduate), Lecture Notes 190

that individual’s ability. Therefore, cov(Si , Ai ) 6= 0. Since εi is also a function


of ability, cov(Si , εi ) 6= 0. If this is the case, the regression model assumption
that the explanatory variable and the error component are independent does
not hold.
Without showing this formally, this violation of the regression model as-
sumptions implies that the OLS estimator we used above provides a biased
estimate of the regression model parameters β0 and β1 . In particular, we
think that higher ability individuals are more likely to attend school and
earn higher wages (regardless of schooling level). If this is true, then the
OLS estimator for the return to schooling has an upward bias: βc1 > β1 .

Five Ways to Express This Bias

i) Self-Selection Bias

This bias is often called self-selection bias because the source of the bias
is from the fact that individual’s self-select into schooling, i.e. schooling is
not randomly assigned.

ii) Sample Selection Bias

This bias can also be thought of as a sample selection bias. The root
cause of the bias is that we do not have a random sample. The sample of
individuals who attend college is not a random or representative sample of
the entire population. The sample of individuals who attend college is a
selected or choice based sample.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 191

iii) Omitted Variable Bias

Another name for this bias is omitted variable bias. The bias in this case
stems from the fact that we cannot observe all aspects of an individual’s
human capital. Ai is therefore omitted from the regression model we can
estimate. If we were able to observe Ai completely, then we could include it
in our regression model, and the omitted variable bias would be eliminated.

iv) Endogenous Regressors

Still another term for this bias is to say that X is an endogenous regressor.
Recall that if X is independent of ε, then X is an exogenous regressor. In
this case, if cov(X, ε) 6= 0, then X is an endogenous regressor.

v) Correlation is Not Necessarily Causation

Finally, we can also think of this bias as reflecting the fact that observed
correlations in random variables do not necessarily reflect causation. For
almost any data set, including our CPS data, the sample correlation wages
and schooling is positive: SSW > 0. However the self-selection bias suggests
that this correlation may not be entirely due to the causal effect of schooling
on wages.

11.6 Difference in College and High School Wages

Let’s return to examining the difference between someone with 12 years of


schooling (high school graduate) and someone with 16 years of schooling (col-
Wiswall, Labor Economics (Undergraduate), Lecture Notes 192

lege graduate). The expectation of their wage conditional on their schooling


is

E[Wi |Si = 16] = β0 + β1 16 + E[εi |Si = 16]

E[Wi |Si = 12] = β0 + β1 12 + E[εi |Si = 12]

Assume that individuals self-select into college based on ability. Assume


that because of this self-selection, the average ability for those who attend
college is E[Ai |Si = 16] = Acol . The average ability for individuals who do
not attend college is E[Ai |Si = 12] = Anc . Given these assumptions, we can
write:

E[εi |Si = 16] = αAcol + E[ηi |Si = 16]

E[εi |Si = 12] = αAnc + E[ηi |Si = 12]

Assume that η is independent of Si . Therefore,

E[ηi |Si = 16] = E[ηi |Si = 12] = 0.

Substituting back into the conditional expectation functions for wages,


Wiswall, Labor Economics (Undergraduate), Lecture Notes 193

E[Wi |Si = 16] = β0 + β1 16 + αAcol

E[Wi |Si = 12] = β0 + β1 12 + αAnc

Taking the difference,

E[Wi |Si = 16] − E[Wi |Si = 12] = β1 4 + α(Acol − Anc )

The expected difference in wages is the left-hand-side of this equation.


This difference consists of two parts:

i) the difference in schooling multiplied by the return to that difference


in schooling (β1 4),

and

ii) the difference in average ability between the college educated and non-
college educated populations multiplied by the return to that ability differ-
ence (α(Acol − Anc )).

If we believe that α > 0 (higher ability individuals receive higher wages),


and (Acol − Anc ) > 0 (college educated individuals have on average more
ability than non-college educated individuals), then the difference in average
Wiswall, Labor Economics (Undergraduate), Lecture Notes 194

wages is an upwardly biased indicator for the actual return to schooling (β1 4).
Self-selection essentially implies that the correlation between schooling
and wages we observe cannot be interpreted as entirely due to causation.

11.7 A Treatment Effects Interpretation

Another way to interpret this self-selection bias is to think of a college edu-


cation as a treatment. In one sense, we would like to know the causal effect
of taking a person randomly from the population and forcing this person to
take the treatment, i.e. graduate from college and push her Si from Si = 12
to Si = 16. How much would this randomly chosen person’s wage increase?
If the assumptions hold, the regression model answers this question. The
wage for this person would increase by β1 4. β1 indicates the treatment effect
on wages of increasing an individual’s schooling.
If people randomly choose their schooling, then there is no difference in
the average ability level of the people who have Si = 12 and those who have
Si = 16. Randomization of schooling implies Si is independent of εi and the
assumptions of the regression model hold. Randomization implies that

E[εi |Si = 12] = E[εi |Si = 16]

However, in reality, because people do not randomly choose to enter col-


lege, but instead self-select into college based on factors (e.g. ability) which
also affect their wages, these statements break down.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 195

11.8 Possible Solutions to the Self-Selection Bias

Possible Solution 1: Conduct a Controlled Experiment

The most immediate possible solution to this problem is to conduct a


controlled experiment. In the experiment, we force some randomly chosen
individuals to attend college (the treatment group). Another group of people
(the control group) is forced not to attend college. An unbiased estimator of
the return to schooling can be constructed simply be comparing the average
wage for the two groups. This type of experiment is no different in principle
from a randomized drug trial in which a randomly selected treatment group
receives the drug being tested, and a control group receives a placebo.
More feasible experiments may be to randomly offer some people free
college tuition (the treatment group) and not offer the aid to others (the
control group).
Although these types of experiments may offer excellent information on
the return to schooling, these types of experiments are relatively rare in the
social sciences. For many interesting and important questions, a randomized
experiment is unethical, impractical, or too expensive.

Possible Solution 2: Measure Omitted Variables

Another possible solution is to try to measure the omitted variable. In this


case, we might use an IQ test or SAT score to measure an individual’s level
Wiswall, Labor Economics (Undergraduate), Lecture Notes 196

of Ai . Assume we have an IQi measure for each individual. Our regression


model is then

ln Wi (Si , IQi , εi ) = β0 + β1 Si + β2 IQi + εi .

Including measures of the omitted variable in this way may reduce the
self-selection bias, but it is unlikely that it will eliminate the bias since most
measures, like IQ and SAT scores, are only partial measures of the factors
that affect an individual’s wages.

Possible Solution 3: Instrument or Natural Experiment

Another way to approach this problem is to look for variables which are
i) correlated with an individual’s schooling level, but ii) not correlated with
wages except through schooling (i.e. variables not correlated with εi ). These
types of variables are called instrumental variables.
Consider an example. One instrumental variable approach has been to
use the proximity of an individual to college or university as an instrumental
variable. The idea is that living closer to a college or university lowers the
cost of attending college. It can also be argued that the proximity to a
college or university should not affect an individual’s wages (except through
the schooling).
Why does this help solve the self-selection problem? The idea behind
using instrumental variable is that these variables essentially form a natural
experiment. For whatever reason, some individuals, and not others, live
Wiswall, Labor Economics (Undergraduate), Lecture Notes 197

close to colleges and universities and the cost of schooling is relatively low
for this group. This variation in proximity to college approximates an actual
experiment in which the cost of schooling is randomly changed for some
individuals and not for others. As in a controlled experiment, we can use
this source of exogenous variation to solve the self-selection bias in the OLS
estimation.
It is important to understand the limitations of this approach. In general,
the natural experiment or instrumental variable only affects the behavior of
particular group of individuals. In the example of college proximity, the group
affected is the group on the margin between deciding whether to attend or not
attend college. For most individuals, the choice of whether to attend college
or not would not be affected by the proximity of a college or university.
Because the natural experiment only uses the behavior of a particular group,
the return to schooling using this variation may not tell us very much about
the return to schooling for most other individuals.

Possible Solution 4: Use Identical Twins

Another interesting possible solution to the self-selection bias is to com-


pare identical twins. The idea is that twins share a common genetic and
family background. If this common genetic and family background captures
all of the factors that affect wages (our Ai ) term), then the difference in wages
for two twins with different schooling levels provides an unbiased estimator
of the return to schooling.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 198

Index each pair of twins in the population by i. Label the wage of twin
1, Wi1 , and the wage of twin 2, Wi2 . The schooling level of each twin pair is
given by Si1 and Si2 . For each pair of twins, we difference their wages and
schooling levels:

Wi1 − Wi2 = β1 (Si1 − Si2 ) + εi1 − εi2

The assumption underlying the use of twins is that each pair of twins has
the same εi : εi1 = εi2 .
Our estimator estimates the expectation of the difference in wages for all
twins:

E[Wi1 − Wi2 ] = β1 E[(Si1 − Si2 )]

A potential problem with this methodology is that it relies on at least


some twins having different levels of schooling (Si1 6= Si2 for at least some
i). Why did otherwise identical twins choose different levels of schooling?
The answer may be that identical twins, although they may have the same
genetics, are not exactly the same. These differences between twins may be
correlated with schooling choices and wages. If this is true, we are right back
to the original self-selection problem.

Possible Solution 5: Model Selection into Schooling

The final approach to the self-selection problem uses economic theory to


Wiswall, Labor Economics (Undergraduate), Lecture Notes 199

model how people choose their human capital investments. If the model is
correct, we can find an unbiased estimator of the return to schooling. The
estimation based on these models is often called structural estimation. The
disadvantage of this type of methodology is that it relies on non-testable
modeling assumption. In reality, all of the previous solutions did as well to
some degree. The advantage of structural estimation is that it can potentially
answer many important questions, which cannot be answered using natural
experiments.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 200

12 Inequality

12.1 Characterizing the Distribution of Earnings

12.1.1 Measuring Inequality

Inequality is defined as the unequal distribution of resources in an economy.


There at least three broad ways to measure or define inequality in an econ-
omy.

1) Monetary Resources

We could examine inequality using a measure of monetary resources avail-


able to individuals and households, such as labor market earnings, income,
and wealth.

2) Consumption

Another way to examine inequality would be to examine consumption


inequality, such as the distribution of food, housing, or health care in an
economy.

3) Outcomes

Still another way to examine inequality would be to look at the dis-


tribution of outcomes experienced by people, such as sickness, death, and
happiness.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 201

In this section we will focus primarily on the distribution of labor market


earnings. Although there is generally a strong correlation between inequality
in labor market earnings and other measures of inequality, this correlation is
not perfect. It is important to note that different measures of inequality may
provide distinct evidence about the nature of the distribution of resources
within an economy.

12.1.2 Three Types of Monetary Resources

Labor Market Earnings

Earnings in the labor market context usually refer to wages, salary, bonuses,
tips, and commissions earned by working. As we discussed earlier in the
course, we should ideally include all forms of work compensation (including
non-pecuniary benefits, pension benefits, health insurance, etc.) as part of
labor market earnings. However, since these other forms of compensation are
difficult to measure, we typically focus only on monetary earnings.

Income

Income is the flow of returns from either work or savings (investments).


An individual’s total income in a period consists of all labor market earnings
plus the returns to all other assets the individual owns (interest received from
savings, stock dividends, etc.). For most working Americans, the majority of
their income consists of labor market earnings.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 202

Wealth

Wealth is the value of the stock of assets an individual owns at a point


in time. Wealth at some period is the value of all the assets (houses, stocks,
savings, etc.) the individuals owns minus the value of all debts. For many
Americans, much of their wealth is embodied in their homes.

12.1.3 Three Concepts of Earnings Inequality

For the remainder of our discussion of inequality, we will measure inequality


using labor market earnings (wages). Define labor market earnings for indi-
vidual i as Wi . I won’t specify a particular time dimension for these earnings
(earnings could be measured hourly, weekly, monthly, annually, etc.)
We can define three distinct concepts of earnings inequality.

1) Cross-Sectional Inequality

The level of cross-sectional inequality indicates how unequally labor mar-


ket earnings are distributed in an economy at a particular point in time. We
can characterize cross-sectional inequality by the earnings distribution:

pr(Wit = w)

i indexes individuals and t indexes time. This function indicates the


proportion of individuals in the economy at time t with labor market earnings
of level w.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 203

Earnings Differentials

Often it is useful to summarize the level of cross-sectional inequality with


a single number. We’ll avoid the more complicated Gini coefficient in favor
of a simple measure of inequality using the differential or ratio of earnings
percentiles.
The median is the 50th percentile of earnings. We can also define other
earnings percentiles the same way. In a population of 100 individuals, the
1st percentile is the earnings of the lowest earning individual. The 100th
percentile is the earnings of the highest earning individual. The P th per-
centile of earnings is the earnings of the individual in which there are P
percent people earning less and 100 − P percent people earning more than
that individual.
The 90-10 differential (as a ratio) is defined as

W90
D90−10 = ,
W10

where WP is the P th percentile of earnings. D90−10 = 1 indicates no


inequality as the earnings at the 90th and 10th percentiles are the same.
D90−10 > 1 indicates inequality as earnings at the 90th percentile are greater
than earnings at the 10th percentile.
The 50-10 differential or any other earnings differential is defined similarly.
Wider differentials indicate a greater level of inequality in the economy.
In the United States today, D90−10 is about 4.6. This indicates that
Wiswall, Labor Economics (Undergraduate), Lecture Notes 204

earnings at the 90th percentile are about 4.6 times higher than the earnings
at the 10th percentile.
The 50-10 differential is much smaller. D50−10 is about 2.2. The 50th
percentile earns about 2.2 times more than the 10th percentile. This suggest
that there is greater inequality at the top of the earnings distribution than
at the bottom.

College Premium

Another simple measure of cross-sectional inequality is the college pre-


mium. Like the 90-10 differential, we’ll measure the wage gap between the
college educated and the non-college educated (high school graduates) as a
ratio:

WC
δb =
WH

δb = 1 indicates that college and high school graduates earn the same
wage. δb > 1 indicates that college graduates earn more.
Today, the college premium is about δb = 1.9. This indicates that the
average college graduates earns 1.9 times more than the average high school
graduate.

2) Lifetime Inequality

Lifetime inequality indicates the degree of social mobility in the economy.


How earnings change with age (the difference in the earnings of young and old
Wiswall, Labor Economics (Undergraduate), Lecture Notes 205

workers) is a measure of lifetime inequality. To examine lifetime inequality,


we can look at earnings as a function of age (the age-earnings profile we
discussed in the Human Capital section):

Wi = f (agei )

We can write this as a regression model using log wages:

ln Wi = β0 + β1 agei + β2 age2i + εi ,

where agei is individual i’s age in years.


As in the return to schooling section, the log function of wages allows us
to interpret the population parameters in terms of percentage changes. For
example, β1 = 0.05 implies that an individual’s wage increase by 5 percent
every year.
I estimated this regression model using the 2003 March CPS data sample.
The OLS estimates are (standard errors in parentheses):

βb0 = 0.62 (0.028)

βb1 = 0.092 (0.0012)


Wiswall, Labor Economics (Undergraduate), Lecture Notes 206

βb2 = −0.0009 (0.00002)

These parameter estimates indicate that wages are increasing in age (βc1 >
0). The estimate of β2 indicates that the age-earnings profile is concave.

How this function varies in the population with initial income can provide
some evidence on social mobility. Do low-earning young workers have slower
or faster wage growth over their lifetime than high-earning young workers?
Define two different age-earnings profiles for low-earning and high-earning
workers:

Wilow = β0low + β1low agei + β2low age2i + εlow


i

Wihigh = β0high + β1high agei + β2high age2i + εhigh


i

Assume β0low < β0high . If the two earnings growth rates are the same
(β1low = β1high ), then there will be no convergence in earnings over the lifetime.
If β1low < β1high , then the gap between rich and poor increases over the lifetime.

3) Intergenerational Inequality

Intergenerational inequality measures the persistence of inequality across


generations. This is typically measured by estimating an intergenerational
Wiswall, Labor Economics (Undergraduate), Lecture Notes 207

earnings elasticity. Let Wip be the earnings of individual i’s parents (in most
studies, this is the father’s earnings). Wi is the wage of the child (in most
studies, this is the son). The regression model is

ln Wi = β0 + β1 ln Wip + εi

(Note: Because earnings change with age, we would want to compare the
earnings of parents and children at the same age, e.g. earnings when the
father and son are both age 35.)

If β1 = 0, there is no relationship between parents’ and children’s earn-


ings. A positive and large β1 indicates that their are substantial transfers
between generations. These transfers could include wealth (e.g. wealth trans-
fers lower credit constraints for the children’s human capital investments) or
ability (e.g. smart parents have smart kids).
Several studies have estimated β1 to be βc1 = 0.4 or higher for the United
States.
β1 indicates the extent to which inequality is transferred across genera-
tions. If the parents have 50 percent higher labor market earnings than the
population average of the parents’ generation, with βc1 = 0.4, the child is
expected to have earnings 50 ∗ 0.4 = 20 percent higher than the average in
her generation. The grandchildren are expected to have 50 ∗ 0.4 ∗ 0.4 = 8
percent higher earnings than the average in the grandchildren’s generation.
It is important to note that intergenerational inequality could influence
Wiswall, Labor Economics (Undergraduate), Lecture Notes 208

the level of cross-sectional inequality through credit constraints to finance


human capital. The level of intergenerational transfers may determine the
extent to which individuals are credit constrained in their human capital
investments.

12.2 What Determines the Level Cross-Sectional In-

equality in the Economy?

For the remainder of this section, we will focus on cross-sectional inequality.


First, let’s briefly consider three major factors that determine the level of
cross-sectional inequality in an economy.

1) Differences in Human Capital Levels

As we have discussed previously, a major determinant of wage levels is


human capital. Let’s return to our simple model of wages and schooling.
Assume there are two groups: college educated and non-college educated.
Their wages in period t are WtC and WtH . Assume WtC > WtH .
The distribution of earnings in this simple economy is determined by the
proportion of the population with a college degree (pCt ). We can write the
cross-sectional distribution of earnings in period t as

pr(Wit = WtC ) = pCt ,


Wiswall, Labor Economics (Undergraduate), Lecture Notes 209

pr(Wit = WtN ) = 1 − pCt ,

We can close the gap in earnings (WtC − WtH ) in two ways:

i) Convergence in Human Capital Investments

If all individuals become college educated (pCt = 1), then all individuals
earn WCt .

ii) Convergence in Human Capital Returns

If the college premium is eliminated (WtC = WtH ), then all individuals


earn the same wage.

2) Government Taxes and Transfers

Another important determinant of cross-sectional inequality is the extent


of government taxes and transfers. In principle (although often not in prac-
tice), the government can act to redistribute income and wealth from the
rich to the poor.
For a two group model, the government can equalize labor market earn-
ings by taxing the college educated T and transferring this income to the
non-college educated. The after tax wage that individuals take home is now
WtC − T for the college educated and WtH + T for the non-college educated.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 210

As with most tax systems, the tax imposes a distortion in the economy. In
the extreme case in which the tax T is so large as to force WtC −T = WtH +T ,
the monetary return to college is zero, and it would be unlikely that many
individuals would incur the cost of schooling and obtain a college degree.

3) Discrimination

Another potential determinant of the level of cross-sectional inequality


is discrimination. Discrimination in the labor market is discussed in more
detail below. Briefly, discrimination can act as a “tax” on the earnings of
some groups and widen the gap in earnings directly. Discrimination can also
lower the return to human capital investments for some groups and thereby
widen the difference in human capital levels in the economy.

12.3 Trends in Cross-Sectional Earnings Inequality

12.3.1 Trends

Trends in cross-sectional inequality have received a lot of attention in eco-


nomics. The general pattern since World War II was a decline in inequality
up until the 1970s. In particular, the period of the 1940s and 50s has been
termed the “Great Compression.” Since the late 1970s, inequality has in-
creased substantially.
Let’s look at trends in inequality measured using the college premium (δ)
b

defined above.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 211

In 1970, δb = 1.6. In 1980, δb = 1.5. In 1990, δb = 1.7. In 2000, δb = 1.9.


Over the 20 years from 1980 to 2000, the college premium nearly doubled.
This rapid rise in inequality has attracted a considerable amount of research.
Similar trends in the 90-10 differential occurred. In 1970, D90−10 was
about 3. By 2000, D90−10 was 4.6.

12.3.2 Explaining the Trends

Relative Supply and Demand

One potentially useful way to explain these trends is to examine the


market for skill. We can think of the college and non-college workers as
supplying two distinct types of labor or two types of labor skill. The college
and non-college groups comprise two skill groups, where the college group is
considered “skilled” and the non-college group is “unskilled”. The college
premium can then be thought of as a skill premium.
Let’s construct a supply and demand graph for relative skill. On the
vertical axis, we measure the relative skill price (relative wages) or skill pre-
WC
mium: WH
. On the horizontal axis, we measure the relative labor employed
HC
from the two groups: HH
.
The curves in this graph represent relative labor supply and demand. The
relative labor supply curve indicates how the supply of college educated labor
relative to high school educated labor responds to changes in the relative
wages. The relative labor demand curve indicates how the demand from firms
Wiswall, Labor Economics (Undergraduate), Lecture Notes 212

for college educated labor relative to high school educated labor responds
to changes in the relative wage. As with any labor supply and demand
curves, relative labor supply slopes upward and relative labor demand slopes
downward.

Shifts in the Relative Supply of Skill

The stock or supply of college graduates and non-college graduates in any


particular year is composed of all the individuals who are now of working age
(18-65 say). The current stock of college graduates and high school graduates
depends on three factors:

i) the size of previous birth cohorts (flow of new workers),

ii) the fraction of individuals from each birth cohort who graduated from
college (flow of new college educated workers), and

iii) the number of college and non-college educated immigrants.

One reason for the low level of the skill premium in the 1970s is that
during this period there was an increase in the number of college educated
workers. This was due to the high numbers of individuals from the “baby
boom” birth cohort who graduated from college and entered the labor market
in the 1970s.
A supply-side explanation of the increase in the college premium from
1980 to 2000 requires a shift in of the relative supply curve over this period. In
Wiswall, Labor Economics (Undergraduate), Lecture Notes 213

fact, the opposite occurred. The percent of college graduates in the workforce
increased from 21 percent in 1980 to 28 percent by 2000. If the relative labor
demand curve remained fixed over this period, this shift out of the relative
supply curve should have reduced the college premium.
Immigration may have also played a role in increasing inequality in the
United States, especially for the lower half of the earnings distribution (which
is measured by the 50-10 differential). The number of immigrants entering
the United States over the 1966-2000 period (either legal or illegal) was siz-
able (over 30 million). These immigrants had low levels of education on
average and they increased the number of high school dropouts much more
than the number of college graduates.

Shifts in the Relative Demand for Skill

Given this shift out in the relative supply curve, economists have turned
to shifts in of the relative demand curve as an explanation for the 1980-2000
increase in the skill premium. The relative demand curve for skill represents
the relative aggregate demand for skill by all firms in the United States. Two
basic factors can cause the the relative supply curve to shift out:

i) a relative reduction in the output price for goods produced by firms


that employ relatively more unskilled labor, or

ii) an increase in the relative productivity of skilled labor versus unskilled


labor (skill biased technological change).
Wiswall, Labor Economics (Undergraduate), Lecture Notes 214

Most firms hire both skilled and unskilled labor. However, firms in certain
industries hire more of one type than the other. Manufacturing firms (e.g.
firms manufacturing steel, automobiles, etc.) generally hire more unskilled
workers than many service firms (e.g. firms producing health care, educa-
tion, or financial services). In addition, unskilled workers in manufacturing
firms have higher wages on average than unskilled workers in firms in other
industries.
It has been argued that one of the reasons the skill premium increased over
the 1980-2000 period is that international competition decreased the output
price of manufactured goods relative to the output price of other goods (e.g.
service goods). As we studied in the labor demand section, a reduction in
the output price causes firms to reduce labor demand. The reduction in the
price of manufactured goods in the United States caused the relative demand
curve for skill to shift out.
Economists have also argued that that skill biased technological change
shifted the relative demand curve for skill out. Over the 1980-2000 period,
computers, manufacturing automation, and other technologies increased the
productivity of skilled workers relative to unskilled workers. As the relative
productivity of skilled workers increased, the relative demand curve for skilled
workers shifted out.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 215

12.4 Earnings Differences for Sub-Groups

12.4.1 Gender

The well known gender gap in earnings can be measured by the ratio of
average earnings for women versus men:

W women
G
b
wm = .
W men

A value of G
b
wm = 1 indicates men and women have the same average

earnings. A value of G
b
wm < 1 indicates women are paid on average less than

men.
In 2000, the gender gap was about G
b
wm = 0.75. Average women’s wages

were 75 percent of average men’s wages. This gender gap has declined over
time, especially after 1980. In the 1960s, the gender gap was larger at G
b
wm =

0.6.

12.4.2 Race

Similar to the gender gap, we can define several racial gaps in earnings. Let’s
measure the gap in earnings between blacks and whites using this ratio of
average earnings:

W black
G
b =
bw .
W white

A value of G
b
bw = 1 indicates blacks and whites have the same average
Wiswall, Labor Economics (Undergraduate), Lecture Notes 216

earnings. A value of G
b
bw < 1 indicates blacks are paid on average less than

whites.
In 2000, the black-white earnings gap was about G
b = 0.8. Average black
bw

wages are 80 percent of average white wages. This black-white earnings gap
has declined somewhat over time. In the 1960s, the black-white earnings gap
was larger at G
b
wm = 0.7.

12.5 Explaining Earnings Differences between Sub-Groups

The two competing explanations for the gender and black-white earnings
gaps are i) human capital differences and ii) discrimination. To the extent
that these groups have different levels of human capital, these earnings gaps
may be due to these human capital differences. The discrimination expla-
nation posits that even if individuals have the same level of human capital
and productivity, the wages of women and minorities are lower because of
discrimination. Another way to express this is that the return on the human
capital for women or minorities is lower than the return on the human capital
for men and whites.
These two explanations may not be unrelated. As I discuss in more detail
below, discrimination in the labor market may cause groups discriminated
against to reduce their investment in human capital.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 217

12.5.1 Decomposition Analysis

One potentially useful approach to examining differences in sub-group earn-


ings is a decomposition analysis. In this analysis, we want to decompose the
observed earnings differences into a part that can be “explained” by differ-
ences in measurable human capital levels and a part that cannot be explained
by human capital.
For simplicity, again assume there are two levels of human capital: college
and non-college. The expected wage for any two groups (A and B) is given
by

A
E[WA ] = E[WA |college] ∗ Pcol + E[W A |high] ∗ (1 − Pcol
A
),

B B
E[WB ] = E[WB |college] ∗ Pcol + E[WB |high] ∗ (1 − Pcol ),

where E[Wj |college] is the expected wage of college graduates for group
j
j = {A, B}, E[Wj |high] is the expected wage of non-college graduates, Pcol
j
is the proportion of the population with a college degree, and (1 − Pcol ) is
the proportion of the population without a college degree.
We can write the difference in the expected wages between the two groups
as

A B
E[WA ] − E[WB ] = E[WA |college](Pcol − Pcol ) + η,
Wiswall, Labor Economics (Undergraduate), Lecture Notes 218

where η is are all the other remaining terms.


The two parts of the difference in expected wages is

A B
i) the difference in human capital levels (Pcol − Pcol ), evaluated at the
expected college graduate wage for group A, E[WA |college], and

ii) the residual difference represented by η.

Estimating the Decomposition

With a random sample, we can estimate each of the components in the


decomposition.
Let’s examine the difference in average wages between black and white
workers. Here are our descriptive statistics (wages are hourly wages):

W black = 16.80

W white = 21.00

black
Pbcol = 0.17

white
Pbcol = 0.27

W white,col = 28

The decomposition is

21 − 16.8 = 28 ∗ (0.27 − 0.17) + η


Wiswall, Labor Economics (Undergraduate), Lecture Notes 219

4.2 = 2.8 + η

The decomposition indicates that about 2/3 (2.8/4.2 = 2/3) of the dif-
ference in black and white wages can be attributed to differences in human
capital levels. The remaining 1/3 is attributed to η, which arguably reflects
discrimination.

12.5.2 Problems with Decomposition Analysis

Should we believe that only 1/3 of the difference in black and white wages is
due to discrimination (η)? The decomposition analysis is intended to answer
the following question: What would the black-white wage gap be if blacks
had the same level of human capital as whites?
However, as we have discussed at several points, human capital levels
(e.g. college degrees) are not randomly assigned. If discrimination in the
labor market reduces the return to a college degree for blacks relative to
whites, then part of the difference in human capital levels should also be
attributed to discrimination. That is, our estimate that 87.5 percent of the
difference in black-white wages is due to differences in human capital is an
over-estimate.
Another important caveat to the decomposition analysis is that it only
uses observed human capital. There may be many other differences in un-
Wiswall, Labor Economics (Undergraduate), Lecture Notes 220

observed human capital between blacks and whites and men and women. In
several studies, economists have found that including more measures of other
forms of human capital (e.g. tenure on the job, occupation, college major,
etc.) reduces differences in earnings even further. However, the same prob-
lem arises here as with schooling human capital. Discrimination in the labor
market may be reducing the returns to these other forms of human capital as
well and thereby causing some of the differences in observed human capital
levels.

12.6 Discrimination

12.6.1 Taste Discrimination

Taste discrimination is the result of a preference for one group over another.
Taste discrimination occurs when someone receives a higher utility (non-
pecuniary benefits) from interactions with members of a particular group
(e.g. whites prefer to interact with whites). This prejudice can take several
forms:

i) Employers prefer employing some workers over others.

ii) Workers prefer working with some co-workers over others.

iii) Customers prefer purchasing products from some workers or some


firms over others.

Taste Discrimination as a “Tax” on Wages


Wiswall, Labor Economics (Undergraduate), Lecture Notes 221

The key aspect of taste discrimination is that the prejudice stems from
a non-pecuniary preference. But this non-pecuniary preference is reflected
in wage rates. Take the example of taste discrimination of whites against
blacks. The preference of employers, co-workers, or customers for white
workers essentially imposes a “tax” on the wage of black workers relative
to white workers. This tax lowers the demand for black workers relative to
white workers. The tax shifts the demand curve in for black workers and
pushes out the demand curve for white workers. Fewer black workers are
employed and their wage is now lower than that of white workers.

A Simple Model of Taste Discrimination

Employers, co-workers, and customers receive utility from workers. This


utility is composed of a pecuniary part (wages) and a non-pecuniary part
(their preferences for them). Utility from black workers and white workers is

Ub = Wb − D,

Uw = Ww

where D > 0 reflects the taste discrimination. D is the tax on black


worker earnings. For black workers to be hired, the utility others receive
from them must be equal to the white utility: Ub = Uw . This implies that
the black wage must be lower than the white wage by D:
Wiswall, Labor Economics (Undergraduate), Lecture Notes 222

Wb = Ww − D

Compensating Differentials

The effect of taste discrimination on wage differentials can be viewed


within the context of the compensating differentials model we studied earlier.
Black workers must compensate others for their lower level of non-pecuniary
benefits they provide through offering lower wages. They essentially have to
“bribe” employers through lower wages in order to get hired.

12.6.2 Statistical Discrimination

Statistical discrimination is where employers use group stereotypes to deter-


mine wage offers. We already discussed a type of statistical discrimination
in our discussion of signalling theories of human capital. In that model, em-
ployers had imperfect information about worker productivity. Firms used an
observable characteristic, observed schooling, to infer whether an individuals
was a high productivity (high ability) type or a low productivity (low abil-
ity) type. Wages were determined by the level of schooling the individual
possessed.
In the same way, employers could use any number of observable charac-
teristics to determine wage levels: gender, race, height, weight, age, etc. The
Wiswall, Labor Economics (Undergraduate), Lecture Notes 223

major difference between the signalling and statistical discrimination models


is that unlike schooling in our signalling model, these physical characteristics
are immutable.

A Simple Model of Statistical Discrimination

Assume there are two groups: group A and group B. All firms know the
average productivity of both groups, qA and qB . Assume group A is more
productive: qA > qB . If firms know nothing else about the workers, they will
offer the following wages. A worker belonging to group A receives WA = qA
and a worker belonging to group B receives WB = qB .
In this model, the high productivity workers in group B are discriminated
against because they belong to the low average productivity group B. In
general, if a worker is below average relative to her group, she benefits. If a
worker is above average relative to her group, she loses.
This model explains gender and racial earnings differences as a lack of
information by firms on the true productivity of individual workers. Women
and minorities receive lower average wage offers by firms because on average
firms believe they are less productive.
We should not interpret the lower perceived average productivity of women
and minorities as reflecting genetics. Instead, it may reflect lower average lev-
els of unobserved human capital for these groups relative to men and whites.
In the case of women, lower perceive average productivity may reflect the
perception that women will be less likely to stay with a firm and make in-
Wiswall, Labor Economics (Undergraduate), Lecture Notes 224

vestments in firm training (e.g. women are more likely to take time off to
take care of their children or a sick parent).

12.7 Affirmative Action

Since the civil rights era in the 1950s and 1960s, most forms of institution-
alized discrimination have been declared illegal in the United States, and
substantial resources have been devoted to the enforcement of these laws.
Gender and racial earnings gaps have not fully closed, however. Since the
1960s, additional policies, called affirmative action policies, have been im-
plemented in an attempt to reduce the remaining inequities. These laws
take several forms, including racial preferences for college admissions and
requirements that a certain percentage of government contracts be filled by
firms who are owned by women or minorities or firms that employ a sufficient
number of women and minorities.
Affirmative action policies are very controversial. Let’s briefly discuss
some major advantages and disadvantages of these policies.

12.7.1 Advantages

1) Lower Earnings and Employment Differences Directly

Affirmative action programs directly transfer resources to women and mi-


norities through education subsidies and improved labor market conditions.
This can result in a direct reduction in earnings inequality.
Wiswall, Labor Economics (Undergraduate), Lecture Notes 225

2) Change Perceptions of Employers, Co-Workers, and Customers

Increased employment for women and minorities in firms, occupations,


and industries where their representation was previously small may help to
dispel inaccurate stereotypes. This could change the perceptions of employ-
ers, co-workers, and customers and reduce discrimination.

3) Increase Returns to Human Capital

Greater labor market opportunities could increase the return to human


capital investments and cause women and minorities to increase their levels
of human capital investments. By increasing the number of women and
minorities in better paying jobs, affirmative action could also help create
important role models and job networks for women and minorities.

4) Diversity as an Externality

To the extent that diversity benefits others (e.g. affirmative action in


college admissions improves the learning environment), affirmative action
programs may also produce positive externalities.

12.7.2 Disadvantages

1) Mis-Allocate Resources

If we assume that prior to affirmative action programs the labor market


is operating efficiently, then any intervention in the labor market through
Wiswall, Labor Economics (Undergraduate), Lecture Notes 226

affirmative action policies may be a mis-allocation of resources and be inef-


ficient. For example, it has been argued that affirmative action programs in
schooling shift educational resources to less qualified minority applicants.

2) Undermine Positive Gains

Affirmative action programs may create the perception that the accom-
plishments of women and minorities are undeserved. Affirmative action may
create new stereotypes and increase discrimination, and thereby undermine
the gains under-represented groups have made.

3) Ineffective without Prior Human Capital Investments

As we discussed in the human capital section, complementarities in hu-


man capital production imply that later interventions are less effective in
changing outcomes than early interventions. Affirmative action programs in
the labor market and post-secondary education may be less effective given
that women and minorities have lower levels of prior human capital. A more
effective public policies is to use subsidies to equalize early human capital
differences.

You might also like