Drifts and Volatilities: Monetary Policies and Outcomes in The Post WWII U.S
Drifts and Volatilities: Monetary Policies and Outcomes in The Post WWII U.S
Drifts and Volatilities: Monetary Policies and Outcomes in The Post WWII U.S
Abstract
For a VAR with drifting coecients and stochastic volatilities, we present posterior densities for several objects that are of interest for designing and evaluating monetary policy. These include measures of inflation persistence, the
natural rate of unemployment, a core rate of inflation, and activism coecients for monetary policy rules. Our posteriors imply substantial variation of
all of these objects for post WWII U.S. data. After adjusting for changes in
volatility, persistence of inflation increases during the 1970s then falls in the
1980s and 1990s. Innovation variances change systematically, being substantially larger in the late 1970s than during other times. Measures of uncertainty
about core inflation and the degree of persistence covary positively. We use our
posterior distributions to evaluate the power of several tests that have been
used to test the null of time-invariance of autoregressive coecients of VARs
against the alternative of time-varying coecients. Except for one test, we find
that those tests have low power against the form of time variation captured by
our model. That one test also rejects time invariance in the data.
Introduction
This paper extends the model of Cogley and Sargent (2001) to incorporate stochastic
volatility and then reestimates it for post World War II U.S. data in order to shed
light on the following questions. Have aggregate time series responded via timeinvariant linear impulse response functions to possibly heteroskedastic shocks? Or
For comments and suggestions, we are grateful to Jean Boivin, Marco Del Negro, Mark Gertler,
Sergei Morozov, Simon Potter, Christopher Sims, Mark Watson, and Tao Zha.
is it more likely that the impulse responses to shocks themselves have evolved over
time because of drifting coecients or other nonlinearities? We present evidence that
shock variances evolved systematically over time, but that so did the autoregressive
coecients of VARs. One of our main conclusions is that much of our earlier evidence
for drifting coecients survives after we take stochastic volatility into account. We
use our evidence about drift and stochastic volatility to infer that monetary policy
rules have changed and that the persistence of inflation itself has drifted over time.
1.1
The statistical tests of Sims (1980, 1999) and Bernanke and Mihov (1998a, 1998b)
seem to arm a model that contradicts our findings. They failed to reject the hypothesis of time-invariance in the coecients of VARs for periods and variables like
ours. To shed light on whether our results are inconsistent with theirs, we examine
the performance of various tests that have been used to detect deviations from time
invariance. Except for one, we find that those tests have low power against our particular model of drifting coecients. And that one test actually rejects time invariance
in the data. These results about power help reconcile our findings with those of Sims
and Bernanke and Mihov.
1.2
Volcker-Greenspan eras. They find evidence for a systematic change of monetary policy across the two eras, a change that in Clarida, Gali, and Gertlers new-neoclassicalsynthesis macroeconomic model would lead to better inflation-unemployment outcomes.
But Taylors and Clarida, Gertler, and Galis interpretation of the data has been
disputed by Sims (1980, 1999) and Bernanke and Mihov (1998a, 1998b), both of
whom have presented evidence that the U.S. data do not prompt rejection of the time
invariance of the autoregressive coecients of a VAR. They also present evidence for
shifts in the variances of the innovations to their VARs. If one equation of the VAR
is interpreted as describing a monetary policy rule, then Simss and Bernanke and
Mihovs results say that it was not the monetary policy strategy but luck (i.e., the
volatility of the shocks) that changed between the Burns and the non-Burns periods.
1.3
The persistence of inflation plays an important role in some widely used empirical
strategies for testing the natural rate hypothesis and for estimating the natural unemployment rate. As we shall see, inflation persistence also plays an important role
in lending relevance to instruments for estimating monetary policy rules. Therefore,
we use our statistical model to portray the evolving persistence of inflation. We define a measure of persistence based on the normalized spectrum of inflation at zero
frequency, then present how this measure of persistence increased during the 1960s
and 70s, then fell during the 1980s and 1990s.
1.4
Drifting coecients have been an important piece of unfinished business within macroeconomic theory since Lucas played them up in the first half of his 1976 Critique, but
then ignored them in the second half.2 In Appendix A, we revisit how drifting coecients bear on the theory of economic policy in the context of recent ideas about
self-confirming equilibria. This appendix provides background for a view that helps to
bolster the time-invariance view of the data taken by Sims and Bernanke and Mihov.
1.5
Method
We take a Bayesian perspective and report time series of posterior densities for various
economically interesting functions of hyperparameters and hidden states. We use a
Markov Chain Monte Carlo algorithm to compute posterior densities.
2
See Sargent (1999) for more about this interpretation of the two halves of Lucass 1976 paper.
1.6
Organization
The remainder of this paper is organized as follows. Section 2 describes the basic
statistical model that we use to develop empirical evidence. We consign to appendix
B a detailed characterization of the priors and posterior for our model, and appendix
C describes a Markov Chain Monte Carlo algorithm that we use to approximate the
posterior density. Section 3 reports our results, and section 4 concludes. Appendix A
pursues a theme opened in the Lucas Critique about how drifting coecient models
bear on alternative theories of economic policy.
The object of Cogley and Sargent (2001) was to develop empirical evidence about the
evolving law of motion for inflation and to relate the evidence to stories about changes
in monetary policy rules. To that end, we fit a Bayesian vector autoregression for
inflation, unemployment, and a short term interest rate. We introduced drifting VAR
parameters, so that the law of motion could evolve, but assumed the VAR innovation
variance was constant. Thus, our measurement equation was
yt = Xt0 t + t ,
(1)
(2)
represent the history of VAR parameters from dates 1 to T . The driftless random
walk component is represented by a joint prior,
f( T , Q) = f( T |Q)f (Q) = f (Q)
where
YT 1
s=0
f (t+1 |t , Q) N (t , Q).
f (s+1 |s , Q).
(3)
(4)
(5)
The innovation vt is normal with mean zero and variance Q, and we allowed for
correlation between the state and measurement innovations, cov(vt , t ) = C. The
marginal prior f (Q) makes Q an inverse-Wishart variate.
Q
The reflecting barrier was encoded in an indicator function, I(T ) = Ts=1 I(s ).
The function I(s ) takes a value of 0 when the roots of the associated VAR polynomial
are inside the unit circle, and it is equal to 1 otherwise. This restriction truncates
and renormalizes the random walk prior,
p( T , Q) I(T )f (T , Q)
(6)
This is a stability condition for the VAR, reflecting an a priori belief about the
implausibility of explosive representations for inflation, unemployment, and real interest. The stability prior follows from our belief that the Fed chooses policy rules
in a purposeful way. Assuming that the Fed has a loss function that penalizes the
variance of inflation, it will not choose a policy rule that results in a unit root in
inflation, for that results in an infinite loss.3
In appendix B, we derive a number of relations between the restricted and unrestricted priors. Among other things, the restricted prior for T |Q can be expressed
as
I(T )f( T |Q)
,
p( |Q) =
m (Q)
T
(7)
m (Q)f (Q)
,
mQ
(8)
(9)
The terms m (Q) and mQ are normalizing constants and are defined in the appendix.4
In (7), the stability condition truncates and renormalizes f(T |Q) to eliminate explosive s. In (8), the marginal prior f (Q) is re-weighted by m (Q), the probability
3
To take a concrete example, consider the model of Rudebusch and Svennson (1999). Their
model consists of an IS curve, a Phillips curve, and a monetary policy rule, and they endow the
central bank with a loss function that penalizes inflation variance. The Phillips curve has adaptive
expectations with the natural rate hypothesis being cast in terms of Solow and Tobins unit-sum-ofthe weights form. That form is consistent with rational expectations only when there is a unit root
in inflation. The autoregressive roots for the system are not, however, determined by the Phillips
curve alone; they also depend on the choice of monetary policy rule. With an arbitrary policy
rule, the autoregressive roots can be inside, outside, or on the unit circle, but they are stable under
optimal or near-optimal policies. When a shock moves inflation away from its target, poorly chosen
policy rules may let it drift, but well-chosen rules pull it back.
4
These expressions supercede those given in Cogley and Sargent (2001). We are grateful to Simon
Potter for pointing out an error in our earlier work and for suggesting ways to correct it.
of an explosive draw fromf (T |Q). This lessens the probability of Q-values that are
likely to generate explosive s. Since large values of Q make explosive draws more
likely, this shifts the prior probability toward smaller values of Q. In other words,
relative to f(Q), p(Q) is tilted in the direction of less time variation in . Finally,
in (9), f (t+1 |t , Q) is truncated and re-weighted by (t+1 , Q). The latter term represents the probability that random walk paths emanating from t+1 will remain in
the nonexplosive region going forward in time. Thus, the restricted transition density
censors explosive draws fromf (t+1 |t , Q) and down-weights those likely to become
explosive.5
2.1
Sims (2001) and Stock (2001) were concerned that our methods might exaggerate the
time variation in t . One comment concerned the distinction between filtered and
smoothed estimates. Cogley and Sargent (2001) reported results based on filtered
estimates, and Sims pointed out that there is transient variation in filtered estimates
even in time-invariant systems. In this paper, we report results based on smoothed
estimates of .
More importantly, Sims and Stock questioned our assumption that R is constant.
They pointed to evidence developed by Bernanke and Mihov (1998a,b), Kim and
Nelson (1999), McConnell and Perez Quiros (2000), and others that VAR innovation
variances have changed over time. Bernanke and Mihov focused on monetary policy
rules and found a dramatic increase in the variance of monetary policy shocks between
1979 and 1982. Kim and Nelson and McConnell and Perez Quiros studied the growing
stability of the U.S. economy, which they characterize in terms of a large decline in
VAR innovation variances after the mid-1980s. The reason for this decline is the
subject of debate, but there is now much evidence against our assumption of constant
R.
Sims and Stock also noted that there is little evidence in the literature to support
our assumption of drifting . Bernanke and Mihov, for instance, used a procedure
developed by Andrews (1993) to test for shifts in VAR parameters and were unable
to reject time invariance. Indeed, their preferred specification was the opposite of
ours, with constant and varying R.
If the world were characterized by constant and drifting R, and we fit an approximating model with constant R and drifting , then it seems likely that our estimates
of would drift to compensate for misspecification of R, thus exaggerating the time
variation in . Stock suggested that this might account for our evidence on changes in
inflation persistence. There is much evidence to support a positive relation between
5
The probability that random walk trajectories will leave the nonexplosive region increases with
the distance between t and T , but this tendency for (t+1 , Q) to decrease also aects the normalizing
constant for equation (9). What matters is the relative likelihood of future instability, not the
absolute likelihood.
the level and variance of inflation, but the variance could be high either because of
large innovation variances or because of strong shock persistence. A model with constant and drifting R would attribute the high inflation variance of the 1970s to an
increase in innovation variances, while a model with drifting and constant R would
attribute it to an increase in shock persistence. If Bernanke and Mihov are right, the
evidence on inflation persistence reported in Cogley and Sargent (2001) paper may
be an artifact of model misspecification.
2.2
Of course, it is possible that both the coecients and the volatilities vary, but most
empirical models focus on one or the other. In this paper, we develop an empirical
model that allows both to vary. We use the model to consider the extent to which drift
in R undermines our evidence on drift in , and also to conduct power simulations for
the Andrews-Bernanke-Mihov test. Their null hypothesis, which they were unable
to reject, was that is time invariant. Whether this constitutes damning evidence
against our vision of the world depends on the power of the test. Their evidence
would be damning if the test reliably rejected a model like ours, but not so damning
otherwise.
To put both elements in motion, we retain much of the specification described
above, but now we assume that the VAR innovations can be expressed as
1/2
t = Rt t ,
(10)
E(t vs ) = 0
(11)
h1t 0
0
Ht = 0 h2t 0 ,
0
0 h3t
6
(12)
(13)
This formulation is closely related to the multi-factor stochastic volatility models of Aguilar and
West (2001), Jacquier, Polson, and Rossi (1999), and Pitt and Shephard (1999).
1
0 0
B = 21 1 0 .
31 32 1
(14)
ln hit = ln hit1 + i it .
(15)
(16)
HT =
... ... ...
h1T h2T h3T
(17)
p(T , Q, , , H T | Y T ),
(18)
3
3.1
Empirical Results
Data
In order to focus on the influence of drift in R, we use the same data as in our earlier
paper. Inflation is measured by the CPI for all urban consumers, unemployment
by the civilian unemployment rate, and the nominal interest rate by the yield on 3month Treasury bills. Inflation and unemployment data are quarterly and seasonally
adjusted, and Treasury bill data are the average of daily rates in the first month of
each quarter. The sample spans the period 1948.1 to 2000.Q4. We work with VAR(2)
representations for nominal interest, inflation, and the logit of unemployment.
3.2
Priors
The hyperparameters and initial states are assumed to be independent across blocks,
so that the joint prior can be expressed as the product of marginal priors,
f (0 , h10 , h20 , h30 , Q, , 1 , 2 , 3 )
= f (0 )f(h10 )f (h20 )f(h30 )f (Q)f ()f (1 )f(2 )p(3 ).
(19)
(20)
The mean and variance of the Gaussian piece are calibrated by estimating a time is set
invariant vector autoregression using data for 1948.Q3-1958.Q4. The mean, ,
equal to the point estimate, and the variance, P , is its asymptotic variance. Because
the initial estimates are based on a short stretch of data, the location of 0 is only
weakly restricted.
The matrix Q is a key parameter because it governs the rate of drift in . We
adopt an informative prior for Q, but we set its parameters to maximize the weight
that the posterior puts on sample information. Our prior for Q is inverse-Wishart,
1 , T0 ),
f (Q) = IW (Q
(21)
(22)
= 2 P
Q
(23)
we assume
To calibrate Q,
(24)
(25)
Finally, the prior for i2 is inverse gamma with a single degree of freedom,
f (i2 ) = IG(
.012 1
, ).
2 2
(26)
3.3
3.4
10
values mean rapid movements in , smaller values imply a slower rate of drift, and
Q = 0 represents a time-invariant model. The following table addresses two questions,
whether the results are sensitive to the VAR ordering and how the stability prior
influences the rate of drift in .
Table 1: Posterior Mean Estimates of Q
Stability Imposed Stability Not Imposed
tr(Q) max()
tr(Q) max()
VAR Orderings
0.055 0.025
0.056 0.027
i, , u
0.047 0.023
0.059 0.031
i, u,
0.064 0.031
0.082 0.044
, i, u
0.062 0.031
0.088 0.051
, u, i
u, i,
0.057 0.026
0.051 0.028
0.055 0.024
0.072 0.035
u, , i
Note: The headings tr(Q) and max() refer to the trace of Q and to the
largest eigenvalue.
Sims (1980) reported that the ordering of variables in an identified VAR mattered
for a comparison of interwar and postwar business cycles. In particular, for one ordering he found minimal changes in the shape of impulse response functions, with
most of the dierence between interwar and postwar cycles being due to a reduction
in shock variances. He suggested to us that the ordering of variables might matter in
our model too because of the way VAR innovation variances depend on the stochastic
volatilities. In our specification, the first and second variables share common sources
of stochastic volatility with the other variables, but the third variable has an independent source of volatility. Shuing the variables might alter estimates of VAR
innovation variances.
Accordingly, we estimated all possible orderings to see whether there exists an
ordering that mutes evidence for drift in , as in Sims (1980). This seems not to be the
case. With the stability condition imposed (our preferred specification), there are only
minor dierences in posterior estimates of Q. The ordering that minimizes the rate
of drift in is [it , ut , t ]0 , and the remainder of the paper focuses on this specification.
This is conservative for our perspective, but results for the other orderings are similar.
The second question concerns how the stability prior influences drift in . One
might conjecture that the stability constraint amplifies evidence for drift in by
pushing the system away from the unit root boundary, forcing the model to fit inflation persistence via shifts in the mean. Again, this seems not to be the case;
posterior mean estimates for Q are smaller when the stability condition is imposed.
Withdrawing the stability prior increases the rate of drift in .
11
The next table explores the structure of drift in , focusing on the minimumQ ordering [i, u, ]0 . Sargents (1999) learning model predicts that reduced form
parameters should drift in a highly structured way, because of the cross-equation
restrictions associated with optimization and foresight. A formal treatment of crossequation restrictions with parameter drift is a priority for future work. Here we report
some preliminary evidence based on the principal components of Q.
Table 2: Principal Components of Q
Variance Percent of Total Variation
1st PC
0.0230
0.485
2nd PC 0.0165
0.832
3rd PC
0.0054
0.945
4th PC
0.0008
0.963
5th PC
0.0007
0.978
Note: The second column reports the variance of the nth component (the
nth eigenvalue of Q), and the third states the fraction of the total variation
(trace of Q) for which the first n components account. The results refer
to the minimum-Q ordering [i, u, ]0 .
The table confirms that drift in is highly structured. There are 21 free parameters in a trivariate VAR(2) model, but only three linear combinations vary significantly over time. The first principal component accounts for almost half the total
variation, the first two components jointly account for more than 80 percent, and the
first three account for roughly 95 percent. These components load most heavily on
lags of nominal interest and unemployment in the inflation equation; they dier in the
relative weights placed on various lags. The remaining principal components, and the
coecients in the nominal interest and unemployment equations, are approximately
time invariant. Thus the models departure from time invariance is not as great as
it first may seem. There are two or three drifting components in that manifest
themselves in a variety of ways.
3.5
The Evolution of Rt
Next we consider evidence on the evolution of Rt . Figure 1 depicts the posterior mean
of Rt for the minimal-Q ordering [i, u, ]0 . The left-hand column portrays standard
deviations for VAR innovations, expressed in basis points at quarterly rates, and the
right-hand column shows correlation coecients.
12
Correlations
Nominal
Interest
-0.1
40
-0.2
20
InterestUnemployment
1960
1970
1980
1990
-0.3
1960
2000
0.8
Inflation
60
1980
1990
2000
1980
1990
2000
1980
1990
2000
InterestInflation
0.6
50
0.4
40
30
1960
6
1970
0.2
1970
1980
1990
2000
Unemployment
1960
-0.05
-0.1
-0.15
-0.2
3
1960
1970
1970
1980
1990
2000
Inflation-
-0.25 Unemployment
1960
1970
flation. At other times, the unemployment innovation was virtually orthogonal to the
others. Inflation and nominal interest innovations were positively correlated throughout the sample, with the maximum degree of correlation again occurring in the early
1980s.
This correlation pattern has some bearing on one strategy for identifying monetary
policy shocks. McCallum (1999) has argued that monetary policy rules should be
specified in terms of lagged variables, on the grounds that the Fed lacks good currentquarter information about inflation, unemployment, and other target variables. This
is especially relevant for decisions early in the quarter. If the Feds policy rule depends
only on lagged information, then it can be cast as the nominal interest equation
in a VAR. Among other things, this means that nominal interest innovations are
policy shocks and that correlations among VAR innovations represent unidirectional
causation from policy shocks to the other variables.
The signs of the correlations in figure 1 suggest that this interpretation is problematic for our VAR. If nominal interest innovations were indeed policy shocks, conventional wisdom suggests they should be inversely correlated with inflation and
positively correlated with unemployment, the opposite of what we find. A positive
correlation with inflation and a negative correlation with unemployment suggests a
policy reaction. There must be some missing information.9
Finally, figure 2 reports the total prediction variance, log |E(Rt|T )|. Following
Whittle (1953), we interpret this as a measure of the total uncertainty entering the
system at each date.
-37
-38
-39
-40
-41
-42
1960
1965
1970
1975
1980
1985
1990
1995
2000
Two possibilities come to mind. There may be omitted lagged variables, so that the nominal
interest innovation contains a component that is predictable based on a larger information set. The
Fed may also condition on current-quarter reports of commodity prices or long term bond yields
that are correlated with movements in inflation or unemployment.
14
The smoothed estimates shown here are similar to the filtered estimates reported
in our earlier paper. Both suggest a substantial increase in short-term uncertainty
between 1965 and 1981 and an equally substantial decrease thereafter. The increase
in uncertainty seems to have happened in two steps, one occurring between 1964
and 1972 and the other between 1977 and 1981. Most of the subsequent decrease
occurred in the mid-1980s, during the latter years of Volckers term. This picture
suggests that the growing stability of the economy may reflect a return to stability,
though the earlier period of stability proved to be short-lived.
3.6
The Evolution of t
The first set of figures depicts movements in core inflation and the natural rate of unemployment, which are estimated from local linear approximations to mean inflation
and unemployment, evaluated at the posterior mean, E(t|T ). Write (1) in companion
form as
zt = t|T + At|T zt1 + ut ,
(27)
where zt consists of current and lagged values of yt , t|T contains the intercepts in
E(t|T ), and At|T contains the autoregressive parameters. By analogy with a timeinvariant model, mean inflation at t can be approximated by
t = s (I At|T )1 t|T ,
(28)
where s is a row vector that selects inflation from zt . Similarly, mean unemployment
can be approximated as
ut = su (I At|T )1 t|T ,
(29)
around 1.5 percent in the early 1960s, rises to a peak of approximately 8 percent in
the late 1970s, and then falls to a range of 2.5 to 3.5 percent through most of the
1980s and 1990s. The natural rate of unemployment also rises in the late 1960s and
1970s and falls after 1980.
Second, it remains true that movements in
t and ut are highly correlated with
one another, in accordance with the predictions of Parkin (1993) and Ireland (1999).
The unconditional correlation is 0.748.
Core Inflation
Natural Rate of
Unemployment
0.07
0.06
0.05
0.04
0.03
0.02
1960
1965
1970
1975
1980
1985
1990
1995
2000
V 0 .
(30)
/ is the T x KT matrix
V is the KT x KT 10 covariance matrix for T and
of partial derivatives of the function that maps VAR parameters into core inflation,
evaluated at the posterior mean of T . The posterior covariance V is estimated from
the ensemble of Metropolis draws, and derivatives were calculated numerically.11
V is a large object, and we need a tractable way to represent the information
it contains. Sims and Zha recommend error bands based on the first few principal
10
K is the number of elements in , and T represents the number of years. We focused on every
fourth observation to keep V to a manageable size.
11
This roundabout method for approximating V was used because the direct estimate was contaminated by a few outliers, which dominated the principal components decomposition on which
Sims-Zha bands are based. The outliers may reflect shortcomings of our linear approximations near
the unit root boundary.
16
t 2i Wi ,
(31)
where i is the variance of the ith principal component and Wi is the ith column of
W.
Table 3 reports the cumulative proportion of the total variation for which the
principal components account. The second column refers to V , and the third column
decomposes the covariance matrix for the natural rate, Vu . The other columns are
discussed below.
One interesting feature is the number of non-trivial components. The first principal component in V and Vu accounts for 40 to 50 percent of the total variation, and
the first 5 jointly account for about 75 percent. This suggests an important departure from time invariance. In a time-invariant model, there would be a single factor
representing uncertainty about the location of the terminal estimate, but smoothed
estimates going backward in time would be perfectly correlated with the terminal estimate and would contribute no additional uncertainty.13 V would be a T xT matrix
12
If the elements of
t were uncorrelated across t, it would be natural to focus instead on the
diagonal elements of V , e.g. by graphing the posterior mean plus or minus two standard errors
at each date. But
t is serially correlated, and Sims and Zha argue that a collection of principal
components bands better represents the shape of the posterior in such cases.
13
Setting Q = 0 in the Kalman filter implies Pt+1|t = Pt|t . Then the covariance matrix in the
backward recursion of the Gibbs sampler would be Pt|t+1 = 0, implying a perfect correlation between
draws of t+1 and t .
17
with rank one, and the single principal component would describe uncertainty about
the terminal location. In a nearly time-invariant model, i.e. one with small Q, the
path to the terminal estimate might wiggle a little, but one would still expect uncertainty about the terminal estimate to dominate. That the first component accounts
for a relatively small fraction of the total suggests there is also substantial variation
in the shape of the path.
Error bands for core inflation are shown in figure 4. The central dotted line
is the posterior mean estimate, reproduced from figure 3. The horizontal line is a
benchmark, end-of-sample, time-invariant estimate of mean inflation.
The first principal component, which accounts for roughly half the total variation,
describes uncertainty about the location of core inflation in the late 1960s and 1970s.
As core inflation increased, so too did uncertainty about the mean, and by the end of
the decade a two-sigma band ranged from 2 to 14 percent. The growing uncertainty
about core inflation seems to be related to changes in inflation persistence. Core
inflation can be interpreted as a long-horizon forecast, and the variance of longhorizon forecasts depends positively on the degree of persistence. As shown below,
inflation also became more persistent as core inflation rose. Indeed, our estimates of
inflation persistence are highly correlated with the width of the first error band.
Components 3 through 5 portray uncertainty about the number of local peaks in
the 1970s, and they jointly account for about 15 percent of the total variation. Bands
for these components cross several times, a sign that some paths had more peaks than
others. For example, in panel 3, trajectories associated with a global peak at the end
of the 1970s tended also to have a local peak at the end of the 1960s. In contrast,
paths that reached a global peak in the mid-1970s tended to have a single peak.
Finally, the sixth component loads heavily on the last few years in the sample,
describing uncertainty about core inflation in the late 1990s. At the end of 2000, a
two-sigma band for this component ranged from approximately 1 to 5 percent.
Error bands for the natural rate are constructed in the same way, and they are
shown in figure 5. Once again, the central dotted line is the posterior mean estimate,
and the horizontal line is an end-of-sample, time-invariant estimate of mean unemployment. The first principal component in Vu also characterizes uncertainty about
the 1970s. The error band widens in the late 1960s when the natural rate began to
rise, and it narrows around 1980 when the mean estimate fell. The band achieved
its maximum width around the time of the oil shocks, when it ranged from roughly
4 to 11 percent. The width of this band also seems to be related to changes in the
persistence of shocks to unemployment.
The second, third, and fourth components load heavily on the other years of
the sample, jointly accounting for about 30 percent of the total variation. Roughly
speaking, they cover intervals of plus or minus 1 percentage point around the mean.
The fifth and sixth components account for 8 percent of the variation, and they seem
to be related to uncertainty about the timing and number of peaks in the natural
rate.
18
0.14
0.12
0.1
0.08
0.06
0.04
0.02
1960
1st PC
0.06
0.04
1970
1980
1990
2000
0.02
1960
0.06
0.06
0.04
0.04
0.02
1970
1980
0.08
1990
1970
1980
0.08
3rd PC
0.08
1960
2nd PC
0.08
2000
0.02
1960
2000
4th PC
1970
1980
0.08
5th PC
1990
1990
2000
6th PC
0.06
0.06
0.04
0.04
0.02
1960
1970
1980
1990
2000
0.02
1960
1970
1980
1990
2000
0.07
0.1
0.06
0.08
0.05
0.06
0.04
1960
0.04
1970
1980
1990
2000
1960
3rd PC
0.06
0.05
0.04
0.04
1970
1980
0.07
1990
0.03
1960
2000
5th PC
0.06
0.05
0.05
1970
1980
1990
1980
0.04
1960
2000
1990
2000
4th PC
1970
1980
0.07
0.06
0.04
1960
1970
0.06
0.05
0.03
1960
2nd PC
1st PC
1990
2000
6th PC
1970
1980
1990
2000
19
3.6.2
Inflation Persistence
E(Rt|T )
(I At|T ei )10 s 0 .
2
(32)
-3
x 10
1.8
1.6
1.4
Power
1.2
1
0.8
0.6
0.4
0.2
0
0
2000
0.116
1990
0.233
1980
0.349
0.465
1970
1960
Year
Again, the estimates are similar to those reported in Cogley and Sargent (2001).
The introduction of drift in Rt does not undermine our evidence on variation in the
spectrum for inflation.
The most significant feature of this graph is the variation over time in the magnitude of low frequency power. In our earlier paper, we interpreted the spectrum
at zero as a measure of inflation persistence. Here that interpretation is no longer
quite right, because variation in low-frequency power depends not only on drift in the
autoregressive parameters, At|T , but also on movements in the innovation variance,
E(Rt|T ). In this case, the normalized spectrum,
f (, t)
,
f (, t)d
g (, t) = R
(33)
Power
4
3
2
1
0
0
2000
0.116
1990
0.233
1980
0.349
0.465
1970
1960
Year
Figure 8 depicts two-sigma error bands for g (0, t), based on the principal components of its posterior covariance matrix, Vg . The latter was estimated in the same
way as V or Vu . The third column in table 3 indicates that the first component in
Vg accounts for only 37 percent of the total variation and that the first 5 components jointly account for 84 percent. Again, this signifies substantial variation in the
shape of the path for g (0, t).
Error bands for the first two components load heavily on the 1970s. Although the
bands suggest there was greater persistence than in the early 1960s or mid-1990s, the
precise magnitude of the increase is hard to pin down. Roughly speaking, error bands
for the first two components suggest that g (0, t) was somewhere between 2 and 10.
For the sake of comparison, a univariate AR(1) process with coecients of 0.85 to
0.97 has values of g (0) in this range. In contrast, the figure suggests that inflation
was approximately white noise in the early 1960s and not far from white noise in the
mid-1990s. Uncertainty about inflation persistence was increasing again at the end
of the sample.
The third, fourth, and fifth components reflect uncertainty about the timing and
number of peaks in g (0, t). For example, panels 3 and 5 suggest that paths on
which there was a more gradual increase in persistence tended to have a big global
peak in the late 1970s, while those on which there was a more rapid increase tended
21
10
8
6
4
2
1960
12
10
8
6
4
2
1st PC
1970
1980
1990
2000
1960
2
1970
1980
1990
2000
1960
2
1970
1980
1990
1980
2000
1960
1990
2000
4th PC
1970
1980
5th PC
1960
1970
3rd PC
1960
2nd PC
1990
2000
6th PC
1970
1980
1990
2000
1960
1965
1970
1975
1980
1985
1990
1995
2000
22
The relation between the two series is illustrated in figure 9, which reproduces
estimates from figures 3 and 7. As core inflation rose in the 1960s and 1970s, inflation
also became more persistent. Both features fell sharply during the Volcker disinflation.
This correlation is problematic for the escape route models of Sargent (1999) and Cho,
Williams, and Sargent (2002), which predict that inflation persistence grows along
the transition from high to low inflation. Our estimates suggest the opposite pattern.
3.6.3
(34)
where
t,t+h represents average inflation from t to t + h and ut,t+hu is average
unemployment. The activism parameter is defined as A = 1 (1 3 )1 , and the
policy rule is said to be activist if A 1. With a Ricardian fiscal policy, an activist
monetary rule delivers a determinate equilibrium. Otherwise, sunspots may matter
for inflation and unemployment.
We interpret the parameters of the policy rule as projection coecients and compute projections from our VAR. This is done via two-stage least squares on a date-byt,t+h and Et ut,t+hu
date basis. The first step involves projecting the Feds forecasts Et
onto a set of instruments, and the second involves projecting current interest rates
onto the fitted values. At each date, we parameterize the VAR with posterior mean
estimates of t and Rt and calculate population projections associated with those
values.
The instruments chosen for the first-stage projection must be elements of the Feds
information set. Notice that a complete specification of their information set is unnecessary; a subset of their conditioning variables is sucient for forming first-stage
projections, subject of course to the order condition for identification. Among other
variables, the Fed observes lags of inflation, unemployment, and nominal interest
when making current-quarter decisions, and we project future inflation and unemployment onto a constant and two lags of each. Thus, our instruments for the Feds
forecasts Et
t,t+h and Et ut,t+hu are the VAR forecasts Et1 t,t+h and Et1 ut,t+hu ,
respectively.
Here we follow McCallum, who warns against the assumption that the Fed sees
current quarter inflation and unemployment when making decisions. This strategy
also sidesteps assumptions about how to orthogonalize current quarter innovations.
This is an important advantage of the Clarida, et. al. approach relative to structural
VAR methods. Establishing that the Fed can observe some variables is easier than
compiling a complete list of what the Fed sees.
23
10
0
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
0.08
corr = -0.79
Core Inflation
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
4
6
8
Policy Activism
4
3
2
1
0
0
10
corr = -0.72
4
6
8
Policy Activism
10
40
30
1st PC
20
-10
2000 1960
1980
4th PC
10
1980
2000
7th PC
0
1980
2000
5th PC
10
1960
-20
0
1960
1960
1980
2000
0
1960
1960
6th PC
1980
1980
8th PC
2000
2000
1980
1980
10
1960
3rd PC
10
10
1960
15
2nd PC
20
2000
1960
2000
9th PC
1980
2000
But the shape of the path is well determined only in the middle of the sample. The
first four principal components record substantial uncertainty at the beginning and
end. We interpret this as a symptom of weak identification. Substantial uncertainty
about A occurs at times when inflation is weakly persistent. Our instruments have
little relevance when future inflation is weakly correlated with lagged variables, and
the policy rule parameters are weakly identified at such times. Thus, inferences
about A are fragile at the beginning and end of the sample. There is better evidence
of changes in A during the middle of the sample. Lagged variables are more relevant
as instruments for the 1970s, when inflation and unemployment were very persistent,
and for that period the estimates are more precise.
The next figure characterizes more precisely how the posterior for At diers across
the Burns and Volcker-Greenspan terms. It illustrates histograms for At for the years
1975, 1985, and 1995. The histograms were constructed by calculating an activism
25
parameter for each draw of t and Rt in our simulation, for a total of 5000 in each
year.14 Values for 1975 are shown in black, those for 1985 are in white, and estimates
for 1995 are shown in gray.
In 1975, the probability mass was concentrated near 1, and the probability that
At > 1 was 0.208. By 1985, the center of the distribution had shifted to the right,
and the probability that At > 1 had increased to 0.919. The distribution for 1995 is
4500
1975
1985
1995
4000
3500
3000
2500
2000
1500
1000
500
0
-4
-2
4
Activism Parameter
10
12
similar to that for 1985, with a 0.941 probability that At > 1. Comparing estimates
along the same sample paths, the probability that At increased between 1975 and
1985 is 0.923, and the probability that it increased between 1975 and 1995 is 0.943.
The estimates seem to corroborate those reported by Clarida, et. al. that monetary policy was passive in the 1970s and activist for much of the Volcker-Greenspan
era. Estimates for the latter period are less precise, but it seems clear that the
probability distribution for At shifted to the right.
3.7
Finally, we consider classical tests for variation in . Bernanke and Mihov (1998a,b)
were also concerned about the potential for shifts in VAR parameters arising from
changes in monetary policy, and they applied a test developed by Andrews (1993) to
examine stability of . For reduced form vector autoregressions similar to ours, they
were unable to reject the hypothesis of time invariance.
We applied the same test to our data and found the same results. We considered
two versions of Andrewss sup-LM test, one that examines parameter stability for the
VAR as a whole and another that tests stability on an equation-by-equation basis.
The results are summarized in table 4. Columns labelled with variable names refer
14
26
to single-equation tests, and the column labelled VAR refers to a test for the system
as a whole. In each case, we fail to reject that is time invariant.15
Bernanke and Mihov correctly concluded that the test provides little evidence
against stability of . But does the result constitute evidence against parameter
instability? A failure to reject provides evidence against an alternative hypothesis
only if it has reasonably high power. Whether this test has high power against a
model like ours is an open question, so we decided to investigate it.
Table 4: Andrewss sup-LM Test
Nominal Interest Unemployment Inflation VAR
Data
F
F
F
F
Power
0.136
0.172
0.112
0.252
Note: An F means the test fails to reject at the 10 percent level when applied
to actual data. Entries in the second row refer to the fraction of artificial
samples in which the null hypothesis is rejected at the 5 percent level.
To check the power of the test, we performed a Monte Carlo simulation using
our drifting parameter VAR as a data generating process. To generate artificial
data, we parameterized equation (1) with draws of T , H T , and B from the posterior
density. For each draw of (T , H T , B), we generated an artificial sample for inflation,
unemployment, and nominal interest and then calculated the sup-LM statistics. We
performed 10,000 replications and counted the fraction of samples in which the null
hypothesis of constant is rejected at the 5 percent level. The results are summarized
in the second row of table 4.
The power of the test is never very high. The VAR test has highest the success
rate, detecting drift in in about one-fourth of the samples. The detection probabilities are lower in the single equation tests, which reject at the 5 percent level in only
about 14 percent of the samples. Thus, even when drifts in the way we describe, a
failure to reject is at least 3 times as likely as a rejection.
Andrewss test is designed to have power against alternatives involving a single
shift in at some unknown break date. The results of this experiment may just
reflect that this test is less well suited to detect alternatives such as ours that involve
continual shifts in parameters. Accordingly, we also investigate a test developed by
Nyblom (1989) and Hansen (1992) that is designed to have power against alternatives
in which parameters evolve as driftless random walks. Results for the Nyblom-Hansen
test are summarized in table 5.
15
We also performed a Monte Carlo simulation to check the size of the Andrews test; the results
confirmed that size distortions do not explain the failure to reject.
27
When applied to actual data, the Nyblom-Hansen test also fails to reject time
invariance for . To examine its power, we conducted another Monte Carlo simulation
using our drifting parameter VAR as a data generating mechanism, and we found
that this test also has low power against our representation. Indeed, the detection
probabilities are a bit lower than those for the sup-LM test.
Boivin (1999) conjectures that the sup-Wald version of Andrewss test may have
higher power than the others, and so we also consider this procedure. The results,
which are shown in table 6, provide some support for his conjecture. The detection
probability is higher in each case, and it is substantially higher for the inflation
equation. Indeed, this is the only case among the ones we study in which the detection
probability exceeds 50 percent. It is noteworthy that in this case we also strongly
reject time invariance in the actual data. Time invariance is also rejected for the VAR
as a whole.
Table 6: Andrewss sup-Wald Test
Nominal Interest Unemployment Inflation VAR
Data
F
F
R 1%
R 5%
Power
0.173
0.269
0.711
0.296
Note: R x% signifies a rejection at the x percent level.
This assumes that shifts in policy are the only source of drift in .
We chose Andrewss tests because the CCG rule is estimated by GMM. The Nyblom-Hansen
test is based on ML estimates.
17
28
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
0.1
corr = 0.96
0.08
corr = 0.86
0.06
3
0.04
0.02
0
0.5
0.07
0.065
1
1
1.5
2.5
0
0.5
1.5
2.5
corr = -0.82
0.06
1st PC-Activism
3.5
corr = 0.71
2.5
0.055
0.05
1.5
0.045
0.04
0.5
0.035
0.5
1.5
2.5
0
0.5
1.5
2.5
More precisely, the figures illustrate partial sums of the first principal component for t|T .
29
Yet the results of a Monte Carlo simulation, shown in table 8, suggest that power
remains low, with a rejection probability of only about 15 percent. Indeed, the
procedure is inferior to the VAR tests reported above. Agnosticism about drifting
components in seems to be better. Despite the low power, one of the tests rejects
time invariance in actual data.
Table 8: Stability of the First Principal Component
sup-LM sup-Wald
Data
F
R 5%
Power
0.220
0.087
See the note to table 4.
To summarize, most of our tests fail to reject time invariance of , but most also
have low power to detect the patterns of drift we describe above. In the one case where
a test has a better-than-even chance of detecting drift in , for the data time invariance
is rejected at better than the one-percent level. One reasonable interpretation is that
is drifting, but that most of the procedures are unable to detect it.
Perhaps low power should not be a surprise. Our model nests the null of time
invariance as a limiting case, i.e. when Q = 0. One can imagine indexing a family
of alternative models in terms of Q. For Q close to zero, size and power should be
approximately the same. Power should increase as Q gets larger, and eventually the
tests are likely to reject with high probability. But in between there is a range of
alternative models, arrayed in terms of increasing Q, that the tests are unlikely to
reject. The message of the Monte Carlo detection statistics is that a model such as
ours with economically meaningful drift in often falls in the indeterminate range.
Conclusion
One respectable view is that either an erroneous model, insucient patience, or his
inability to commit to a better policy made Arthur Burns respond to the end of
Bretton Woods by administering monetary policy in a way that produced the greatest
peace time inflation in U.S. history; and that an improved model, more patience,
or greater discipline led Paul Volcker to administer monetary policy in a way that
conquered American inflation.19 Another respectable view is that what distinguished
Burns and Volcker was not their models or policies but their luck. This paper and
its predecessor (Cogley and Sargent (2001)) fit time series models that might help
distinguish these views.
This paper also responds to Simss (2001) and Stocks (2001) criticism of the
evidence for drifting systematic parts of vector autoregressions in Cogley and Sargent
19
30
(35)
(36)
where is Gaussian. Borrowing terms from the corresponding continuous time diffusion specification, we call the drift and the volatility.
Suppose that ut and vt are governed by the sequences of history-dependent policy
functions
ut = h(xt , t),
vt = g(xt , t),
(37)
(38)
(39)
Economic theory restricts h and g. Private agents optimum problems and market
equilibrium conditions imply a mapping20
h = Th (f, g)
(40)
from the technology and information process f and the government policy g to the
private sectors equilibrium policy h. Given Th , the normative theory of economic
policy would have the government choose g as the solution of the problem
max E
g,h
t W (xt , ut , vt )
(41)
t=0
where W is a one-period welfare criterion and the optimization is subject to (36) and
(40). Notice that the government chooses both g and h, although its manipulation of
h is subject to (40). Problem (41) is called a Stackelberg or Ramsey problem.
Lucass (1976) Critique was directed against a faulty econometric policy evaluation
procedure that ignores constraint (40). The faulty policy evaluation problem is21
max E
g
t W (xt , ut , vt )
(42)
t=0
where h
is a fixed sequence of decision rules for the
subject to (36) and h = h,
private sector. Lucas pointed out first that problem (42) ignores (40) and second
were misspecified because
that a particular class of models that had been used for h
they imputed irrational expectations to private decision makers. Let us express the
through
governments possibly misspecified econometric model for h
= S(f, g, h),
h
(43)
which maps the truth as embodied in the f, g, h that actually generate the data into
the governments beliefs about private agents behavior. The function S embodies
the governments model specification and also its estimation procedures. See Sargent
(1999) for a concrete example of S within a model of the Phillips curve.
The faulty policy evaluation problem (42) induces
g = Tg (f, h).
(44)
The heart of the Lucas critique is that this mapping does not solve the appropriate
policy problem (41).
20
See Stokey (1989) for a description of how households optimum problems and market clearing are embedded in the mapping (40). Stokey clearly explains why the policies h, g are history
dependent.
21
Sargent (1999) calls this a Phelps problem.
32
A.1
What outcomes should we expect under the faulty econometric policy evaluation procedure? The answer depends partly on how the governments econometric estimates
respond to observed outcomes through the function (43). Suppose that the govh
0 and consider the following iterative
ernment begins with an initial specification h
process for j 1:
j1 ),
gj = Tg (f, h
hj = Th(f, gj ),
j = S(f, gj , hj ).
h
(45)
(46)
(47)
(48)
where B(f, gj1 ) = Tg (f, S(f, gj1 , Th(f, gj1 ))). Eventually, this iterative process
might settle down to a fixed point
g = B(f, g).
(49)
In the spirit of Fudenberg and Levine (1993), Fudenberg and Kreps (1995), and
Sargent (1999), a self-confirming equilibrium is a government policy g that satisfies
A.2
Lucass (1976) Critique consisted of two parts. The first part of Lucass paper summarized empirical evidence for drift in representations like (36), that is, dependence
of on t, and interpreted it as evidence against particular econometric specifications
that had attributed suboptimal forecasts about (x, v) to private agents. The second
part of his paper focused on three concrete examples designed to show how the mapping (40) from g to h would influence time series outcomes. Though Lucas didnt
explicitly link the first and second parts, a reader can be forgiven for thinking that he
meant to suggest that a substantial part of the drift in described in the first part
of his paper came from drift in private agents decision rules that had been induced
through mapping (40) by drift in government decision rules.
33
If we could somehow make a version of the iterative process (45), (46), (47) occur
in real time, we get a model of coecient drift that is consistent with this vision.
The literature on least squares learning gets such a real time model by attributing to
both private agents and the government a sophisticated kind of adaptive behavior in
which the mappings Tg , Th , S play key roles. This literature uses recursive versions
of least squares learning to deduce drift in g whose average behavior can eventually
be described by the ordinary dierential equation22
d
g = B(f, g) g.
dt
(50)
In this way it is possible to use the transition dynamics of adaptive systems based
on (45), (46), (47) to explain the parameter drift that Lucas emphasized in the first
part of his critique. Sargent (1999) and Cho, Williams, and Sargent (2002) pursue
this line and use it to build models of drifting unemployment-inflation dynamics.23,24
A.3
Another view takes the data generating mechanism to be the self-confirming equilibrium composed of (49) and (36), unadorned by any transition dynamics based
on (45), (46), (47).25 This view assumes that any adaptation had ended before the
sample began. It would either exclude parameter drift or else would interpret it as
consistent with a self-confirming equilibrium.26 Thus, parameter drift would reflect
nonlinearities in the law of motion (36) that are accounted for in decision making
22
See Sargent (1999) and Evans and Honkapohja (2001) for examples and for precise statements
of the meanings of average and eventually. Equation (50) embodies the mean dynamics of the
system. See Cho, Williams, and Sargent (2002) and Sargent (1999). They also describe how escape
dynamics can be used to perpetuate adaptation.
23
As Bray and Kreps (1986) and Kreps (1998) describe, before it attains a self-confirming equilibrium, such an adaptive system embodies irrationality because, while the self-confirming equilibrium
is a rational expectations equilibrium, the least squares transition dynamics are not. During the
transition, both government and private agents are basing decisions on subjective models that ignore sources of time-dependence in the actual stochastic process that are themselves induced by the
transition process. Bray and Kreps (1986) and Kreps (1998) celebrate this departure from rational
expectations because they want models of learning about a rational expectations equilibrium, not
learning within a rational expectations equilibrium.
24
In their Phillips curve example, Kydland and Prescott (1977) explicitly use an example of
system (45), (46), (47) and compute its limit to argue informally that inflation would converge to
a suboptimal time consistent level. Unlike Lucas (1976), Kydland and Prescotts mapping (47)
= h. Lucass focus was partly to criticize versions of mapping (47) that violated rational
was h
expectations, but that was not Kydland and Prescotts concern.
25
The literature on least squares learning itself provides substantial support for this perspective
by proving almost sure convergence to a self-confirming equilibrium. Sargent (1999) and Cho,
Williams, and Sargent (2002) arrest such convergence by putting some forgetting or discounting
into least squares.
26
Sargent and Wallace (1976), Sims (1982), and Sargent (1984) have all expressed versions of this
point of view.
34
A.4
Empirical issues
Inspired by theoretical work within the adaptive tradition that permits shifts in policy
outside of a self-confirming equilibrium, our earlier paper (Cogley and Sargent (2001))
used a particular nonlinear vector autoregression (39) to compile evidence about
how the systematic part of the autoregression, xt + (xt ) in (39), has drifted over
time. Our specification excluded stochastic volatility (we assumed that (xt , t) = ).
We appealed to adaptive models and informally interpreted the patterns of drifting
coecients in our nonlinear time series model partly as reflecting shifting behavior
rules of the Fed, shifts due to the Feds changing preferences or views of the economy.27
Sims (1999) and Bernanke and Mihov (1998a, 1998b) analyzed a similar data set
in a way that seems compatible with a self-confirming equilibrium within a linear
time-invariant structure. They used specializations of the vector time series model
(39) that incorporate stochastic volatility but not drift in the systematic part of a
linear vector autoregression. Their models can be expressed as
xt+1 xt = Axt + (xt , t)t+1 ,
(51)
where we can regard xt as including higher order lags of variables and A is composed
of companion submatrices. They compiled evidence that this representation fits post
World War II data well and used it to interpret the behavior of the monetary authorities. They found that the systematic part of the vector autoregression A did not shift
over time, but that there was stochastic volatility ((xt , t) 6= ). Thus, they reconciled the data with a linear autoregression in which shocks drawn from time-varying
distributions nevertheless feed through the system linearly in a time-invariant way.
They reported a lack of evidence for alterations in policy rules (in contrast to the
perspective taken for example by Clarida, Gali, and Gertler (2000)).
A.5
Generalization
In this paper, we fit a model of the form (39) that, permits both drifting coecients
and stochastic volatility, thereby generalizing both our earlier model and some of
the specifications of Bernanke and Mihov and Sims. We use this specification to
confront criticisms from Sims (2001) and Stock (2001), both of whom suggested that
our earlier results were mainly artifacts of our exclusion of stochastic volatility.
27
Partly we appealed to adaptive models like ones described by Sims (1988) and Sargent (1999),
which emphasize changes in the Feds understanding of the structure of the economy.
35
B.1
Priors
(52)
where T represents the VAR parameters, Q is their innovation variance, and stands
for everything else. Because of the independence assumptions on the prior, this can
be written as
f (T , Q, ) = f(T |Q)f (Q)f ().
(53)
The restricted model adds an a priori condition that rules out explosive values of ,
p(T , Q, ) = RRR
I(T )f (T , Q, )
.
I(T )f (T , Q, )d T dQd
(54)
Thus, the stability condition truncates and renormalizes the unrestricted prior.
We can factor f(T , Q, ) as before to obtain
p(T , Q, ) = RRR
= R
= RR
I(T )f (T |Q)f(Q)f()
,
I(T )f( T |Q)f(Q)f ()dT dQd
(55)
I(T )f (T |Q)f(Q)f()
,
I(T )f (T |Q)f(Q)dT dQ
where the last equality follows from the fact that f () is proper. Now define
Z
m (Q) I(T )f (T |Q)d T ,
(56)
and
mQ
m (Q)f (Q)dQ.
(57)
The term m (Q) is the conditional probability of a non-explosive draw from the
unrestricted transition density, f (T |Q), as a function of Q. The number mQ is the
36
mean of the conditional probabilities, averaged across draws from the marginal prior
f (Q). Since both are probabilities, it follows that
0 m (Q) 1,
0 mQ 1.
(58)
(59)
(60)
m (Q)f (Q)
mQ
(61)
(equation 8 in the text). The marginal prior for remains the same as for the
unrestricted model, p() = f (). Notice that each term is normalized to integrate to
1; i.e., each component is proper.
From (7) we can derive the restricted transition density. This is defined as
p(t+1 |t , Q) =
p(t+1 , t |Q)
.
p(t |Q)
(62)
ZZ
p( T |Q)dt1 dt+2,T ,
(63)
where t1 represents the history of s up to date t 1 and t+2,T represents the path
from dates t + 2 to T . After substituting from equation (7), this becomes
ZZ Y
T 1
1
I(s+1 )f (s+1 |s , Q)dt1 d t+2,T ,
(64)
p(t+1 , t |Q) =
s=0
m (Q)
The integrand can be expanded as
YT 1
Yt1
I(s+1 )f (s+1 |s , Q) =
I(s+1 )f(s+1 |s , Q)
s=0
s=0
I(t+1 )f (t+1 |t , Q)
YT 1
37
(65)
It follows that
Z Y
t1
1
p(t+1 , t |Q) =
I(s+1 )f (s+1 |s , Q)dt1
s=0
m (Q)
I(t+1 )f(t+1 |t , Q)
Z Y
T 1
(66)
s=t+1
p(t |Q) =
p(t+1 , t |Q)dt+1 ,
Z Y
t1
1
I(s+1 )f(s+1 |s , Q)d t1
=
s=0
m (Q)
Z Y
T 1
(67)
s=t
(68)
R
I(t+1 )f(t+1 |t , Q) I(t+2,T )f (t+2,T |t+1 , Q)dt+2,T
R
=R
.
I(t+1 )f (t+1 |t , Q) I(t+2,T )f (t+2,T |t+1 , Q)dt+2,T dt+1
The integral in the numerator is the expectation of I(t+2,T ) with respect to the
conditional density f (t+2,T |t+1 , Q). This represents the probability that random
walk trajectories emanating from t+1 will remain in the nonexplosive region from
date t + 2 through date T . In the text, this term is denoted (t+1 , Q). Hence the
transition density is
B.2
p(t+1 |t , Q) = R
(69)
Posteriors
38
(70)
where m(Y T ) is the marginal likelihood. After substituting from equations (7) and
(8), we can express this as
p(T , Q, |Y T ) =
(71)
mU (Y T )
I(T )pU (T , Q, |Y T ),
T
m(Y )mQ
(72)
We use MCMC methods to simulate the restricted posterior density. As in our earlier paper, we simulate the unrestricted posterior pU (| Y T ), and then use rejection
sampling to rule out explosive outcomes. The first part of this appendix justifies
rejection sampling, and the second describes the algorithm used for simulating draws
from pU (| Y T ).
C.1
Rejection Sampling
R,
mQ m(Y T )
R(T , Q, ) =
39
(73)
(74)
This says we accept if T is non-explosive and reject otherwise. Thus, we can sample
from the posterior of the restricted model by simulating the unrestricted model and
discarding the explosive draws.
C.2
Sampling from pU ( | Y T )
We combine the techniques used in Cogley and Sargent (2001) with those of Jacquier,
Polson, and Rossi (1994) to construct a Metropolis-within-Gibbs sampler. The algorithm consists of 5 steps, one for T , Q, , the elements of , and the elements of
H T . Our prior is that the blocks of parameters are mutually independent, and we
assume the marginal prior for each block has a natural conjugate form; details are
given above. The first two steps of the algorithm are essentially the same as in our
earlier paper, is treated as a vector of regression parameters, and the elements of
are treated as inverse-gamma variates. To sample H T , we apply a univariate algorithm from Jacquier, et. al. to each element. This is possible because the stochastic
volatilities are assumed to be independent.
C.2.1
VAR parameters, T
We first consider the distribution of VAR parameters conditional on the data and
other blocks of parameters. Conditional on H T and , one can calculate the entire
sequence of variances Rt ; we denote this sequence by RT . Conditional on RT and Q,
the joint posterior density for VAR parameters can be expressed as28
pU (T | Y T , Q, RT ) = f (T | Y T , Q, RT )
YT 1
t=1
f (t | t+1 , Y t , Q, Rt ).
(75)
The unrestricted model is a linear, conditionally Gaussian state-space model. Assuming a Gaussian prior for 0 , all the conditional densities on the right hand side of
(75) are Gaussian. Their means and variances can be computed via a forward and
backward recursion.
The forward recursion uses the Kalman filter. Let
t|t E(t | Y t , Q, RT ),
Pt|t1 V ar(t | Y
t1
(76)
T
, Q, R ),
Pt|t V ar(t | Y t , Q, RT ),
28
40
represent conditional means and variances going forward in time. These can be
computed recursively, starting from the prior mean and variance for 0 ,
Kt = Pt|t1 Xt (Xt0 Pt|t1 Xt + Rt )1 ,
t|t = t1|t1 + Kt (yt Xt0 t1|t1 ),
Pt|t1 = Pt1|t1 + Q,
Pt|t = Pt|t1 Kt Xt0 Pt|t1 .
(77)
At the end of the sample, the forward recursion delivers the mean and variance for
T , and this pins down the first term in (75),
f (T | Y T , Q, RT ) = N (T |T , PT |T ).
(78)
The remaining terms in (75) are derived from a backward recursion, which updates
conditional means and variances to reflect the additional information about t contained in t+1 . Let
t|t+1 E(t | t+1 , Y t , Q, RT ),
t
(79)
(80)
1
Pt|t+1 = Pt|t Pt|t Pt+1|t
Pt|t .
The updated estimates determine the mean and variance for remaining elements in
(75),
f (t | t+1 , Y T , Q, RT ) = N(t|t+1 , Pt|t+1 ).
(81)
A random trajectory for T is generated by iterating backward. The backward recursion starts with a draw of T from (78). Then, conditional on its realization, T 1
is drawn from (81), T 2 is drawn conditional on the realization of T 1 , and so on
back to the beginning of the sample.
C.2.2
The next step involves the distribution of Q conditional on the data and other parameter blocks. Conditional on a realization for T , the VAR parameter innovations,
vt , are observable. Furthermore, the other conditioning variables are irrelevant at this
stage,
f(Q|Y T , T , , , H T ) = f (Q|Y T , T ).
41
(82)
(83)
(84)
T1 = T0 + T.
C.2.3
(85)
1 1
, ),
2 2
(86)
where
1 = 0 + T,
1 = 0 +
29
XT
t=1
( ln hit )2 .
(87)
(88)
The measurement innovations are informative for Rt , which depends indirectly on , but this
information is subsumed in Ht .
42
C.2.4
Covariance Parameters,
Next, we consider the distribution of conditional on the data and other parameters.
Knowledge of T and Y T implies knowledge of t , which satisfies
(89)
Bt = ut ,
1t = u1t .
The second and third equations can be expressed as transformed regressions,
1/2
2t ) = 21 (h2t
1/2
3t ) = 31 (h3t
(h2t
(h3t
1/2
1t ) + (h2t
1/2
1/2
1t ) + 32 (h3t
(91)
u2t ),
1/2
1/2
2t ) + (h3t
u3t ),
i = 2, 3,
(92)
i = 2, 3,
(93)
where
Vi1 = (Vi01 + Zi0 Zi )1 ,
(94)
(95)
The variables zi and Zi refer to the left and right-hand variables, respectively, in the
transformed regressions.
C.2.5
Stochastic Volatilities, H T
The final step involves the conditional distribution of the elements of H T . To sample
the stochastic volatilities, we apply the univariate algorithm of Jacquier, et. al. (1994)
to each element of the orthogonalized VAR residuals, ut . The latter are observable
43
(96)
where hit represents the vector of hs at all other dates. The simplification follows
from the assumption that hit is Markov. Knowledge of Q is redundant given T , and
hTj and j , i 6= j, are irrelevant because the stochastic volatilities are independent.
By Bayes theorem, the conditional kernel can be expressed as30
f (hit |hit1 , hit+1 , uTi , i ) f (uit |hit )f (hit |hit1 )f(hit+1 |hit ),
u2it
(ln hit it )2
1.5
hit exp
exp
.
2hit
2c2
(97)
Its form follows from the normal form of the conditional likelihood, f (uit |hit ), and
2
the log-normal form of the log-volatility equation, (15). The parameters it and ic
are the conditional mean and variance of hit implied by (15) and knowledge of hit1
and hit+1 . In the random walk case, they are
it = (1/2)(ln hit+1 + ln hit1 ),
2
ic
= (1/2)i2 .
(98)
Notice that the normalizing constant is absent from (97). Jacquier, et. al. say the
normalizing constant is costly to compute, and they recommend a Metropolis step
instead of a Gibbs step. One natural way to proceed is to draw a trial value for hit
from the log-normal density implied by (15), and then use the conditional likelihood
f (uit |hit ) to compute the acceptance probability. Thus, our proposal density is
(ln hit it )2
1
,
(99)
q(hit ) hit exp
2
2ic
and the acceptance probability for the mth draw is
m
q(hm1
f (uit |hm
)
it )q(hit )
it
m =
,
m1
m
q(hit )
f(uit |hit )q(hm1
)
it
1/2
(hm
exp(u2it /2hm
it )
it )
.
= m1
2
(hit )1/2 exp(uit /2hm1
)
it
(100)
m1
We set hm
if the proposal is rejected. The algorithm is applied on a date-byit = hit
date basis to each of the elements of ut .
30
The formulas are a bit dierent at the beginning and end of the sample.
44
References
Aguilar, Omar and Mike West, 2001, Bayesian Dynamic Factor Models and
Portfolio Allocation, Journal of Business and Economic Statistics.
Anderson, Evan, Lars Peter Hansen, and Thomas J. Sargent, 2000, Robustness,
Detection, and the Price of Risk, Mimeo, Department of Economics, Stanford University.
Andrews, Donald W.K., 1993, Tests for Parameter Instability and Structural
Change with Unknown Change Point, Econometrica 61, pp. 821-856.
Benati, Luca, 2001, Investigating Inflation Dynamics Across Monetary Regimes:
Taking the Lucas Critique Seriously, Bank of England working paper.
Bernanke, Ben S. and Ilian Mihov, 1998a, The Liquidity Eect and Long-Run
Neutrality. In Carnegie-Rochester Conference Series on Public Policy, 49, Bennett
T. McCallum and Charles I. Plosser, eds. (Amsterdam: North Holland), pp. 149-194.
and
, 1998b, Measuring Monetary Policy. Quarterly Journal of Economics, 113, August, pp. 869-902.
Boivin, Jean, 1999, Revisiting the Evidence on the Stability of Monetary VARs,
unpublished manuscript, Graduate School of Business, Columbia University.
Bray, Margaret M. and David Kreps, 1986, Rational Learning and Rational
Expectations, in W. Heller, R. Starr, and D. Starrett, eds., Essays in Honor of
Kenneth J. Arrow (Cambridge University Press: Cambridge, UK).
Cho, In Koo, Noah Williams, and Thomas J. Sargent, 2002, Escaping Nash
Inflation, Review of Economic Studies, Vol. 69, January, pp. 140.
Clarida, Richard, Jordi Gali, and Mark Gertler, 2000, Monetary Policy Rules
and Macroeconomic Stability: Evidence and Some Theory, Quarterly Journal of
Economics 115(1), pp. 147-180.
Cogley, Timothy and Thomas J. Sargent, 2001, Evolving Post World War II U.S.
Inflation Dynamics, NBER Macroeconomics Annual 16, pp. 331-373.
DeLong, J. Bradford, 1997, Americas Only Peacetime Inflation: the 1970s,
in Christina Romer and David Romer (eds.), Reducing Inflation. NBER Studies in
Business Cycles, Volume 30.
Evans, George W. and Seppo Honkapohja, 2001, Learning and Expectations in
Macroeconomics (Princeton University Press: Princeton, New Jersey).
Fudenberg, Drew and David K. Levine, 1993, Self-Confirming Equilibrium,
Econometrica 61, pp.523-545.
and David M. Kreps, 1995, Learning in Extensive Games, I: Self-Confirming
Equilibria, Games and Economic Behavior 8, pp. 20-55.
Hansen, Bruce E., 1992, Testing For Parameter Instability in Linear Models,
Journal of Policy Modeling 14, pp. 517-533.
45
Ireland, Peter, 1999, Does the Time-Consistency Problem Explain the Behavior
of Inflation in the United States? Journal of Monetary Economics 44(2), pp. 279292.
Jacquier, Eric, Nicholas G. Polson, and Peter Rossi, 1994, Bayesian Analysis of
Stochastic Volatility Models, Journal of Business and Economic Statistics 12, pp.
371-418.
,
, and,
, 1999, Stochastic Volatility: Univariate and Multivariate Extensions, unpublished manuscript, Finance Department, Boston College and Graduate School of Business, University of Chicago.
Kim, Chang-Jin and Charles R. Nelson, 1999a, Has The U.S. Economy Become
More Stable? A Bayesian Approach Based on a Markov Switching Model of the
Business Cycle, Review of Economics and Statistics 81(4), 608-661.
Kreps, David, 1998, Anticipated Utility and Dynamic Choice, Mimeo, 1997
Schwartz Lecture, Northwestern University.
Kydland, Finn and Edward C. Prescott, 1977, Rules Rather than Discretion: the
Inconsistency of Optimal Plans, Journal of Political Economy 85, pp. 473-491.
Leeper, Eric and Tao Zha, 2001a, Empirical Analysis of Policy Interventions,
Mimeo, Department of Economics, Indiana University and Research Department,
Federal Reserve Bank of Atlanta.
and
, 2001b, Toward a Theory of Modest Policy Interventions, Mimeo,
Department of Economics, Indiana University and Research Department, Federal
Reserve Bank of Atlanta.
Lucas, Robert E., Jr., 1976, Econometric Policy Evaluation: A Critique, in
The Phillips Curve and Labor Markets, edited by Karl Brunner and Alan Meltzer,
Carnegie-Rochester Series on Public Policy, vol. 1.
and Thomas J. Sargent, 1981, Introduction in Robert E. Lucas and Thomas
J. Sargent (eds.) Rational Expectations and Econometric Practice (Minneapolis: University of Minnesota Press).
McCallum, Bennett T., 1999, Issues in the Design of Monetary Policy Rules in
Taylor, John B. and Michael Woodford, eds., Handbook of Macroeconomics vol. 1C
(Amsterdam: Elsevier Science).
McConnell, Margaret and Gabriel Perez Quiros, 2000, Output Fluctuations in
the United States: What Has Changed Since the Early 1980s? American Economic
Review 90(5), 1464-1476.
Nyblom, Jukka, 1989, Testing for the Constancy of Parameters Over Time,
Journal of the American Statistical Association 84, pp. 223-230.
Parkin, Michael, 1993, Inflation in North America, in Price Stabilization in the
1990s, edited by Kumiharo Shigehara.
46
Pitt, Mark and Neil Shepard, 1999, Time-Varying Covariances: A Factor Stochastic Volatility Approach, in Bayesian Statistics 6, J.M. Bernardo, J.O. Berger, A.P.
Dawid, and A.F.M. Smith, eds., (Oxford University Press: Oxford).
Romer, Christina D. and David H. Romer, 2002, The Evolution of Economic
Understanding and Postwar Stabilization Policy, forthcoming in the 2002 Jackson
Hole conference volume, Federal Reserve Bank of Kansas City.
Rudebusch, Glenn D. and Lars E.O. Svensson, 1999, Policy Rules for Inflation
Targeting, in Monetary Policy Rules, edited by John B. Taylor, NBER Conference
Report (University of Chicago Press: Chicago, Illinois).
Samuelson, Paul A., and Robert M. Solow, 1960. Analytical Aspects of AntiInflation Policy, American Economic Review, Vol 50, May, pp. 177184.
Sargent, Thomas J., 1999, The Conquest of American Inflation (Princeton University Press: Princeton, New Jersey).
1984, Autoregressions, Expectations, and Advice, American Economic Review, Papers and Proceedings 74, pp. 408-415.
and Neil Wallace, 1976, Rational Expectations and the Theory of Economic
Policy, Journal of Monetary Economics 2, pp. 169-183.
Sims, Christopher A., 1980, Comparison of Interwar and Postwar Business Cycles: Monetarism Reconsidered, American Economic Review, pp. 250-257.
, 1982, Policy Analysis with Econometric Models, Brookings Papers on Economic Activity, Vol. 1, pp. 107-152.
, 1988, Projecting Policy Eects with Statistical Models, Revista de Analysis
Economico 3, pp. 3-20.
, 1999. Drifts and Breaks in Monetary Policy, mimeo, Princeton University.
, 2001, Comment on Sargent and Cogleys Evolving Post World War II U.S.
Inflation Dynamics, NBER Macroeconomics Annual 16, pp 373-379.
Sims, Christopher A. and Tao Zha, 1999, Error Bands for Impulse Responses,
Econometrica 67, pp. 1113-1155.
Stock, James H., 2001, Discussion of Cogley and Sargent Evolving Post World
War II U.S. Inflation Dynamics, NBER Macroeconomics Annual 16, pp. 379-387.
Stokey, Nancy L., 1989, Reputation and Time Consistency, American Economic
Review, Papers and Proceedings 79, pp. 134-139.
Taylor, John B., (1993), Discretion versus Policy Rules in Practice, in CarnegieRochester Conference Series on Public Policy, Vol. 39, December, pp. 195214.
1997, Comment on Americas Only Peacetime Inflation: the 1970s, in
Christina Romer and David Romer (eds.), Reducing Inflation. NBER Studies in
Business Cycles, Volume 30.
Whittle, Peter, 1953, The Analysis of Multiple Stationary Time Series, Journal
of the Royal Statistical Society, Series B, vol. 15, pp. 125-139.
47