Wavelets and Subband Codding
Wavelets and Subband Codding
Wavelets and Subband Codding
Subband Coding
Martin Vetterli
University of California at Berkeley
Jelena Kovačević
AT&T Bell Laboratories
Für meine Eltern.
A Marie-Laure.
— MV
A Giovanni.
Mojoj zvezdici, mami i tati.
— JK
Contents
Preface xiii
vii
viii CONTENTS
Bibliography 476
Index 499
Preface
A central goal of signal processing is to describe real life signals, be it for com-
putation, compression, or understanding. In that context, transforms or linear ex-
pansions have always played a key role. Linear expansions are present in Fourier’s
original work and in Haar’s construction of the first wavelet, as well as in Gabor’s
work on time-frequency analysis. Today, transforms are central in fast algorithms
such as the FFT as well as in applications such as image and video compression.
Over the years, depending on open problems or specific applications, theoreti-
cians and practitioners have added more and more tools to the toolbox called signal
processing. Two of the newest additions have been wavelets and their discrete-
time cousins, filter banks or subband coding. From work in harmonic analysis and
mathematical physics, and from applications such as speech/image compression
and computer vision, various disciplines built up methods and tools with a similar
flavor, which can now be cast into the common framework of wavelets.
This unified view, as well as the number of applications where this framework
is useful, are motivations for writing this book. The unification has given a new
understanding and a fresh view of some classic signal processing problems. Another
motivation is that the subject is exciting and the results are cute!
The aim of the book is to present this unified view of wavelets and subband
coding. It will be done from a signal processing perspective, but with sufficient
background material such that people without signal processing knowledge will
xiii
xiv PREFACE
find it useful as well. The level is that of a first year graduate engineering book
(typically electrical engineering and computer sciences), but elementary Fourier
analysis and some knowledge of linear systems in discrete time are enough to follow
most of the book.
After the introduction (Chapter 1) and a review of the basics of vector spaces,
linear algebra, Fourier theory and signal processing (Chapter 2), the book covers
the five main topics in as many chapters. The discrete-time case, or filter banks,
is thoroughly developed in Chapter 3. This is the basis for most applications, as
well as for some of the wavelet constructions. The concept of wavelets is developed
in Chapter 4, both with direct approaches and based on filter banks. This chapter
describes wavelet series and their computation, as well as the construction of mod-
ified local Fourier transforms. Chapter 5 discusses continuous wavelet and local
Fourier transforms, which are used in signal analysis, while Chapter 6 addresses
efficient algorithms for filter banks and wavelet computations. Finally, Chapter 7
describes signal compression, where filter banks and wavelets play an important
role. Speech/audio, image and video compression using transforms, quantization
and entropy coding are discussed in detail. Throughout the book we give examples
to illustrate the concepts, and more technical parts are left to appendices.
This book evolved from class notes used at Columbia University and the Uni-
versity of California at Berkeley. Parts of the manuscript have also been used at the
University of Illinois at Urbana-Champaign and the University of Southern Cali-
fornia. The material was covered in a semester, but it would also be easy to carve
out a subset or skip some of the more mathematical subparts when developing a
curriculum. For example, Chapters 3, 4 and 7 can form a good core for a course in
Wavelets and Subband Coding. Homework problems are included in all chapters,
complemented with project suggestions in Chapter 7. Since there is a detailed re-
view chapter that makes the material as self-contained as possible, we think that
the book is useful for self-study as well.
The subjects covered in this book have recently been the focus of books, special
issues of journals, special conference proceedings, numerous articles and even new
journals! To us, the book by I. Daubechies [73] has been invaluable, and Chapters 4
and 5 have been substantially influenced by it. Like the standard book by Meyer
[194] and a recent book by Chui [49], it is a more mathematically oriented book
than the present text. Another, more recent, tutorial book by Meyer gives an
excellent overview of the history of the subject, its mathematical implications and
current applications [195]. On the engineering side, the book by Vaidyanathan
[308] is an excellent reference on filter banks, as is Malvar’s book [188] for lapped
orthogonal transforms and compression. Several other texts, including edited books,
have appeared on wavelets [27, 51, 251], as well as on subband coding [335] and
multiresolution signal decompositions [3]. Recent tutorials on wavelets can be found
PREFACE xv
ACKNOWLEDGEMENTS
Some of the work described in this book resulted from research supported by the
National Science Foundation, whose support is gratefully acknowledged. We would
like also to thank Columbia University, in particular the Center for Telecommu-
nications Research, the University of California at Berkeley and AT&T Bell Lab-
oratories for providing support and a pleasant work environment. We take this
opportunity to thank A. Oppenheim for his support and for including this book in
his distinguished series. We thank K. Gettman and S. Papanikolau of Prentice-Hall
for their patience and help, and K. Fortgang of bookworks for her expert help in
the production stage of the book.
To us, one of the attractions of the topic of Wavelets and Subband Coding is
its interdisciplinary nature. This allowed us to interact with people from many
different disciplines, and this was an enrichment in itself. The present book is the
result of this interaction and the help of many people.
Our gratitude goes to I. Daubechies, whose work and help has been invaluable, to
C. Herley, whose research, collaboration and help has directly influenced this book,
and O. Rioul, who first taught us about wavelets and has always been helpful.
We would like to thank M.J.T. Smith and P.P. Vaidyanathan for a continuing
and fruitful interaction on the topic of filter banks, and S. Mallat for his insights
and interaction on the topic of wavelets.
Over the years, discussions and interactions with many experts have contributed
to our understanding of the various fields relevant to this book, and we would
like to acknowledge in particular the contributions of E. Adelson, T. Barnwell,
P. Burt, A. Cohen, R. Coifman, R. Crochiere, P. Duhamel, C. Galand, W. Lawton,
D. LeGall, Y. Meyer, T. Ramstad, G. Strang, M. Unser and V. Wickerhauser.
Many people have commented on several versions of the present text. We thank
I. Daubechies, P. Heller, M. Unser, P.P. Vaidyanathan, and G. Wornell for go-
ing through a complete draft and making many helpful suggestions. Comments
on parts of the manuscript were provided by C. Chan, G. Chang, Z. Cvetković,
V. Goyal, C. Herley, T. Kalker, M. Khansari, M. Kobayashi, H. Malvar, P. Moulin,
A. Ortega, A. Park, J. Princen, K. Ramchandran, J. Shapiro and G. Strang, and
are acknowledged with many thanks.
xvi PREFACE
T he topic of this book is very old and very new. Fourier series, or expansion of
periodic functions in terms of harmonic sines and cosines, date back to the early
part of the 19th century when Fourier proposed harmonic trigonometric series [100].
The first wavelet (the only example for a long time!) was found by Haar early in
this century [126]. But the construction of more general wavelets to form bases
for square-integrable functions was investigated in the 1980’s, along with efficient
algorithms to compute the expansion. At the same time, applications of these
techniques in signal processing have blossomed.
While linear expansions of functions are a classic subject, the recent construc-
tions contain interesting new features. For example, wavelets allow good resolution
in time and frequency, and should thus allow one to see “the forest and the trees.”
This feature is important for nonstationary signal analysis. While Fourier basis
functions are given in closed form, many wavelets can only be obtained through a
computational procedure (and even then, only at specific rational points). While
this might seem to be a drawback, it turns out that if one is interested in imple-
menting a signal expansion on real data, then a computational procedure is better
than a closed-form expression!
1
2 CHAPTER 1
The recent surge of interest in the types of expansions discussed here is due
to the convergence of ideas from several different fields, and the recognition that
techniques developed independently in these fields could be cast into a common
framework.
The name “wavelet” had been used before in the literature,1 but its current
meaning is due to J. Goupillaud, J. Morlet and A. Grossman [119, 125]. In the
context of geophysical signal processing they investigated an alternative to local
Fourier analysis based on a single prototype function, and its scales and shifts.
The modulation by complex exponentials in the Fourier transform is replaced by a
scaling operation, and the notion of scale2 replaces that of frequency. The simplicity
and elegance of the wavelet scheme was appealing and mathematicians started
studying wavelet analysis as an alternative to Fourier analysis. This led to the
discovery of wavelets which form orthonormal bases for square-integrable and other
function spaces by Meyer [194], Daubechies [71], Battle [21, 22], Lemarié [175],
and others. A formalization of such constructions by Mallat [180] and Meyer [194]
created a framework for wavelet expansions called multiresolution analysis, and
established links with methods used in other fields. Also, the wavelet construction
by Daubechies is closely connected to filter bank methods used in digital signal
processing as we shall see.
Of course, these achievements were preceded by a long-term evolution from the
1910 Haar wavelet (which, of course, was not called a wavelet back then) to work
using octave division of the Fourier spectrum (Littlewood-Paley) and results in
harmonic analysis (Calderon-Zygmund operators). Other constructions were not
recognized as leading to wavelets initially (for example, Stromberg’s work [283]).
Paralleling the advances in pure and applied mathematics were those in signal
processing, but in the context of discrete-time signals. Driven by applications such
as speech and image compression, a method called subband coding was proposed by
Croisier, Esteban, and Galand [69] using a special class of filters called quadrature
mirror filters (QMF) in the late 1970’s, and by Crochiere, Webber and Flanagan
[68]. This led to the study of perfect reconstruction filter banks, a problem solved
in the 1980’s by several people, including Smith and Barnwell [270, 271], Mintzer
[196], Vetterli [315], and Vaidyanathan [306].
In a particular configuration, namely when the filter bank has octave bands,
one obtains a discrete-time wavelet series. Such a configuration has been popular
in signal processing less for its mathematical properties than because an octave
band or logarithmic spectrum is more natural for certain applications such as audio
1
For example, for the impulse response of a layer in geophysical signal processing by Ricker
[237] and for a causal finite-energy function by Robinson [248].
2
For a beautiful illustration of the notion of scale, and an argument for geometric spacing of
scale in natural imagery, see [197].
1.1. SERIES EXPANSIONS OF SIGNALS 3
compression since it emulates the hearing process. Such an octave-band filter bank
can be used, under certain conditions, to generate wavelet bases, as shown by
Daubechies [71].
In computer vision, multiresolution techniques have been used for various prob-
lems, ranging from motion estimation to object recognition [249]. Images are suc-
cessively approximated starting from a coarse version and going to a fine-resolution
version. In particular, Burt and Adelson proposed such a scheme for image coding
in the early 1980’s [41], calling it pyramid coding.3 This method turns out to be
similar to subband coding. Moreover, the successive approximation view is similar
to the multiresolution framework used in the analysis of wavelet schemes.
In computer graphics, a method called successive refinement iteratively inter-
polates curves or surfaces, and the study of such interpolators is related to wavelet
constructions from filter banks [45, 92].
Finally, many computational procedures use the concept of successive approxi-
mation, sometimes alternating between fine and coarse resolutions. The multigrid
methods used for the solution of partial differential equations [39] are an example.
While these interconnections are now clarified, this has not always been the
case. In fact, maybe one of the biggest contributions of wavelets has been to bring
people from different fields together, and from that cross fertilization and exchange
of ideas and methods, progress has been achieved in various fields.
In what follows, we will take mostly a signal processing point of view of the
subject. Also, most applications discussed later are from signal processing.
The set {ϕi } is complete for the space S, if all signals x ∈ S can be expanded as in
(1.1.1). In that case, there will also exist a dual set {ϕ̃i }i∈Z such that the expansion
coefficients in (1.1.1) can be computed as
αi = ϕ̃i [n] x[n],
n
3
The importance of the pyramid algorithm was not immediately recognized. One of the review-
ers of the original Burt and Adelson paper said, “I suspect that no one will ever use this algorithm
again.”
4 CHAPTER 1
e1 ~
e1 = ϕ ϕ1
1
ϕ0 ϕ1 e1
e0 = ϕ0 e0 = ϕ0
e0
ϕ1 ϕ2
(a) (b) ϕ~0 (c)
Figure 1.1 Examples of possible sets of vectors for the expansion of R2 . (a)
Orthonormal case. (b) Biorthogonal case. (c) Overcomplete case.
when they are real continuous-time functions. The above expressions are the inner
products of the ϕ̃i ’s with the signal x, denoted by ϕ̃i , x. An important particular
case is when the set {ϕi } is orthonormal and complete, since then we have an
orthonormal basis for S and the basis and its dual are the same, that is, ϕi = ϕ̃i .
Then
ϕi , ϕj = δ[i − j],
where δ[i] equals 1 if i = 0, and 0 otherwise. If the set is complete and the vectors
ϕi are linearly independent but not orthonormal, then we have a biorthogonal basis,
and the basis and its dual satisfy
If the set is complete but redundant (the ϕi ’s are not linearly independent), then we
do not have a basis but an overcomplete representation called a frame. To illustrate
these concepts, consider the following example.
a possible reconstruction basis is identical (up to a scale factor), namely, ϕ̃i = 2/3 ϕi (the
reconstruction basis is not unique). This set behaves as an orthonormal basis, even though
the vectors are linearly dependent.
1.1. SERIES EXPANSIONS OF SIGNALS 5
4
The Fourier transform of nonperiodic signals is also possible. It is an integral transform rather
than a series expansion and lacks any time locality.
6 CHAPTER 1
(a)
(b)
Figure 1.2 Musical notation and orthonormal wavelet bases. (a) The western
musical notation uses a logarithmic frequency scale with twelve halftones per
octave. In this example, notes are chosen as in an orthonormal wavelet basis,
with long low-pitched notes, and short high-pitched ones. (b) Corresponding
time-domain functions.
A popular alternative to the STFT is the wavelet transform. Using scales and
shifts of a prototype wavelet, a linear expansion of a signal is obtained. Because the
scales used are powers of an elementary scale factor (typically 2), the analysis uses
a constant relative bandwidth (or, the frequency axis is logarithmic). The sampling
of the time-frequency plane is now very different from the rectangular grid used in
the STFT. Lower frequencies, where the bandwidth is narrow (that is, the basis
functions are stretched in time) are sampled with a large time step, while high
frequencies (which correspond to short basis functions) are sampled more often. In
Figure 1.2, we give an intuitive illustration of this time-frequency trade-off, and
relate it to musical notation which also uses a logarithmic frequency scale.5 What
is particularly interesting is that such a wavelet scheme allows good orthonormal
bases whereas the STFT does not.
In the discussions above, we implicitly assumed continuous-time signals. Of
course there are discrete-time equivalents to all these results. A local analysis
can be achieved using a block transform, where the sequence is segmented into
adjacent blocks of N samples, and each block is individually transformed. As is to be
expected, such a scheme is plagued by boundary effects, also called blocking effects.
A more general expansion relies on filter banks, and can achieve both STFT-like
analysis (rectangular sampling of the time-frequency plane) or wavelet-like analysis
(constant relative bandwidth in frequency). Discrete-time expansions based on
filter banks are not arbitrary, rather they are structured expansions. Again, for
5
This is the standard western musical notation based on J.S. Bach’s “Well Tempered Piano”.
Thus one could argue that wavelets were actually invented by J.S. Bach!
1.1. SERIES EXPANSIONS OF SIGNALS 7
t
t0 T
(a)
f f
t t
t0 T t0 T
(b) (c)
f f
t t
t0 T t0 T
(d) (e)
FIGURE 1.3 fig1.3
Figure 1.3 Time-frequency tilings for a simple discrete-time signal [130]. (a)
Sine wave plus impulse. (b) Expansion onto the identity basis. (c) Discrete-
time Fourier series. (d) Local discrete-time Fourier series. (e) Discrete-time
wavelet series.
Note that the local Fourier transform and the wavelet transform can be used
for signal analysis purposes. In that case, the goal is not to obtain orthonormal
bases, but rather to characterize the signal from the transform. The local Fourier
transform retains many of the characteristics of the usual Fourier transform with a
localization given by the window function, which is thus constant at all frequencies
1.2. MULTIRESOLUTION CONCEPT 9
(this phenomenon can be seen already in Figure 1.3(d)). The wavelet, on the
other hand, acts as a microscope, focusing on smaller time phenomenons as the
scale becomes small (see Figure 1.3(e) to see how the impulse gets better localized
at high frequencies). This behavior permits a local characterization of functions,
which the Fourier transform does not.8
The above example is just one among many schemes where multiresolution de-
compositions are useful in communications problems. Others include transmission
8
For example, in [137], this mathematical microscope is used to analyze some famous lacunary
Fourier series that was proposed over a century ago.
10 CHAPTER 1
coarse
D I I
−
x + + x
residual
MR encoder MR decoder
over error-prone channels, where the coarse resolution can be better protected to
guarantee some minimum level of quality.
Multiresolution decompositions are also important for computer vision tasks
such as image segmentation or object recognition: the task is performed in a suc-
cessive approximation manner, starting on the coarse version and then using this
result as an initial guess for the full task. However, this is a greedy approach which
is sometimes suboptimal. Figure 1.5 shows a famous counter-example, where a
multiresolution approach would be seriously misleading . . .
Interestingly, the multiresolution concept, besides being intuitive and useful in
practice, forms the basis of a mathematical framework for wavelets [181, 194]. As
in the pyramid example shown in Figure 1.4, one can decompose a function into a
coarse version plus a residual, and then iterate this to infinity. If properly done,
this can be used to analyze wavelet schemes and derive wavelet bases.
expansions which will reappear throughout the book as a recurring theme: the Haar
and the sinc bases. They are limit cases of orthonormal expansions with good time
localization (Haar) and good frequency localization (sinc). This naturally leads to
an in-depth study of two-channel filter banks, including analytical tools for their
analysis as well as design methods. The construction of orthonormal and linear
phase filter banks is described. Multichannel filter banks are developed next, first
through tree structures and then in the general case. Modulated filter banks, cor-
responding conceptually to a discrete-time local Fourier analysis, are addressed as
well. Next, pyramid schemes and overcomplete representations are explored. Such
schemes, while not critically sampled, have some other attractive features, such
as time invariance. Then, the multidimensional case is discussed both for simple
separable systems, as well as for general nonseparable ones. The latter systems
involve lattice sampling which is detailed in an appendix. Finally, filter banks for
telecommunications, namely transmultiplexers and adaptive subband filtering, are
presented briefly. The appendix details factorizations of orthonormal filter banks
(corresponding to paraunitary matrices).
Chapter 4 is devoted to the construction of bases for continuous-time signals,
in particular wavelets and local cosine bases. Again, the Haar and sinc cases play
illustrative roles as extremes of wavelet constructions. After an introduction to
series expansions, we develop multiresolution analysis as a framework for wavelet
constructions. This naturally leads to the classic wavelets of Meyer and Battle-
Lemarié or Stromberg. These are based on Fourier-domain analysis. This is followed
by Daubechies’ construction of wavelets from iterated filter banks. This is a time-
domain construction based on the iteration of a multirate filter. Study of the
iteration leads to the notion of regularity of the discrete-time filter. Then, the
wavelet series expansion is considered both in terms of properties and computation
of the expansion coefficients. Some generalizations of wavelet constructions are
considered next, first in one dimension (including biorthogonal and multichannel
wavelets) and then in multiple dimensions, where nonseparable wavelets are shown.
Finally, local cosine bases are derived and they can be seen as a real-valued local
Fourier transform.
Chapter 5 is concerned with continuous wavelet and Fourier transforms. Unlike
the series expansions in Chapters 3 and 4, these are very redundant representa-
tions useful for signal analysis. Both transforms are analyzed, inverses are derived,
and their main properties are given. These transforms can be sampled, that is,
scale/frequency and time shift can be discretized. This leads to redundant series
representations called frames. In particular, reconstruction or inversion is discussed,
and the case of wavelet and local Fourier frames is considered in some detail.
Chapter 6 treats algorithmic and computational aspects of series expansions.
First, a review of classic fast algorithms for signal processing is given since they
1.3. OVERVIEW OF THE BOOK 13
form the ingredients used in subsequent algorithms. The key role of the fast Fourier
transform (FFT) is pointed out. The complexity of computing filter banks, that is,
discrete-time expansions, is studied in detail. Important cases include the discrete-
time wavelet series or transform and modulated filter banks. The latter corresponds
to a local discrete-time Fourier series or transform, and uses FFT’s for efficient com-
putation. These filter bank algorithms have direct applications in the computation
of wavelet series. Overcomplete expansions are considered next, in particular for
the computation of a sampled continuous wavelet transform. The chapter concludes
with a discussion of special topics related to efficient convolution algorithms and
also application of wavelet ideas to numerical algorithms.
The last chapter is devoted to one of the main applications of wavelets and
filter banks in signal processing, namely signal compression. The technique is often
called subband coding because signals are considered in spectral bands for com-
pression purposes. First comes a review of transform based compression, including
quantization and entropy coding. Then follow specific discussions of one-, two- and
three-dimensional signal compression methods based on transforms. Speech and
audio compression, where subband coding was first invented, is discussed. The
success of subband coding in current audio coding algorithms is shown on spe-
cific examples such as the MUSICAM standard. A thorough discussion of image
compression follows. While current standards such as JPEG are block transform
based, some innovative subband or wavelet schemes are very promising and are
described in detail. Video compression is considered next. Besides expansions,
motion estimation/compensation methods play a key role and are discussed. The
multiresolution feature inherent in pyramid and subband coding is pointed out as
an attractive feature for video compression, just as it is for image coding. The final
section discusses the interaction of source coding, particularly the multiresolution
type, and channel coding or transmission. This joint source-channel coding is key
to new applications of image and video compression, as in transmission over packet
networks. An appendix gives a brief review of statistical signal processing which
underlies coding methods.
14 CHAPTER 1
2
15
16 CHAPTER 2
2.1 N OTATIONS
Let C, R, Z and N denote the sets of complex, real, integer and natural numbers,
respectively. Then, C n , and Rn will be the sets of all n-tuples (x1 , . . . , xn ) of
complex and real numbers, respectively.
The superscript ∗ denotes complex conjugation, or, (a + jb)∗ = (a − jb), where
the symbol j is used for the square root of −1 and a, b ∈ R. The subscript ∗ is used
to denote complex conjugation of the constants but not the complex variable, for
example, (az)∗ = a∗ z where z is a complex variable. The superscript T denotes the
transposition of a vector or a matrix, while the superscript ∗ on a vector or matrix
denotes hermitian transpose, or transposition and complex conjugation. Re(z) and
Im(z) denote the real and imaginary parts of the complex number z.
We define the N th root of unity as WN = e−j2π/N . It satisfies the following:
WNN = 1, (2.1.1)
WNkN +i = WNi , with k, i in Z, (2.1.2)
N −1
N n = lN, l ∈ Z,
WNk·n = (2.1.3)
0 otherwise.
k=0
x[n] = f (nT ), n ∈ Z, T ∈ R.
In particular, δ(t) and δ[n] denote continuous-time and discrete-time Dirac func-
tions, which are very different indeed. The former is a generalized function (see
Section 2.4.4) while the latter is the sequence which is 1 for n = 0 and 0 otherwise
(the Dirac functions are also called delta or impulse functions).
2.2. HILBERT SPACES 17
(a) Does the set {vk } span the space Rn or C n , that is, can every vector in Rn or
C n be written as a linear combination of vectors from {vk }?
(b) Are the vectors linearly independent, that is, is it true that no vector from
{vk } can be written as a linear combination of the others?
(c) How can we find bases for the space to be spanned, in particular, orthonormal
bases?
(b) The orthogonality of a vector with respect to another vector (or set of vectors),
for example,
x, y = 0,
with an appropriately defined scalar product,
n
x, y = xi yi .
i=1
So far, we relied on the fact that the spaces were finite-dimensional. Now, the idea
is to generalize our familiar notion of a vector space to infinite dimensions. It is
1
Unless otherwise specified, we will assume a squared norm.
18 CHAPTER 2
necessary to restrict the vectors to have finite length or norm (even though they
are infinite-dimensional). This leads naturally to Hilbert spaces. For example, the
space of square-summable sequences, denoted by l2 (Z), is the vector space “C ∞ ”
with a norm constraint. An example of a set of vectors spanning l2 (Z) is the set
{δ[n − k]}, k ∈ Z. A further extension with respect to linear algebra is that vectors
can be generalized from n-tuples of real or complex values to include functions of
a continuous variable. The notions of norm and orthogonality can be extended to
functions using a suitable inner product between functions, which are thus viewed
as vectors. A classic example of such orthogonal vectors is the set of harmonic sine
and cosine functions, sin(nt) and cos(nt), n = 0, 1, . . . , on the interval [−π, π].
The classic questions from linear algebra apply here as well. In particular, the
question of completeness, that is, whether the span of the set of vectors {vk } covers
the whole space, becomes more involved than in the finite-dimensional case. The
norm plays a central role, since any vector in the space must be expressed by a
linear combination of vk ’s such that the norm of the difference between the vector
and the linear combination of vk ’s is zero. For l2 (Z), {δ[n − k]}, k ∈ Z, constitute
a complete set which is actually an orthonormal basis. For the space of square-
integrable functions over the interval [−π, π], denoted by L2 ([−π, π]), the harmonic
sines and cosines are complete since they form the basis used in the Fourier series
expansion.
If only a subset of the complete set of vectors {vk } is used, one is interested in
the best approximation of a general element of the space by an element from the
subspace spanned by the vectors in the subset. This question has a particularly
easy answer when the set {vk } is orthonormal and the goal is least-squares approx-
imation (that is, the norm of the difference is minimized). Because the geometry
of Hilbert spaces is similar to Euclidean geometry, the solution is the orthogonal
projection onto the approximation subspace, since this minimizes the distance or
approximation error.
In the following, we formally introduce vector spaces and in particular Hilbert
spaces. We discuss orthogonal and general bases and their properties. We often use
the finite-dimensional case for intuition and examples. The treatment is not very
detailed, but sufficient for the remainder of the book. For a thorough treatment,
we refer the reader to [113].
(a) Commutativity: x + y = y + x.
(e) Additive inverse: for all x in E, there exists a (−x) in E, such that
x + (−x) = 0.
infinite set {δ[n − k]}k∈Z . Since they are linearly independent, the space is infinite-
dimensional.
Next, we equip the vector space with an inner product that is a complex function
fundamental for defining norms and orthogonality.
D EFINITION 2.2
An inner product on a vector space E over C (or R), is a comple-valued
function ·, ·, defined on E × E with the following properties:
Note that (b) and (c) imply ax, y = a∗ x, y. From (a) and (b), it is clear
that the inner product is linear. Note that we choose the definition of the inner
product which takes the complex conjugate of the first vector (follows from (b)).
For illustration, the standard inner product for complex-valued functions over R
and sequences over Z are
∞
f, g = f ∗ (t) g(t)dt,
−∞
and
∞
x, y = x∗ [n] y[n],
n=−∞
respectively (if they exist). The norm of a vector is defined from the inner product
as
Finally, the inner product can be used to define orthogonality of two vectors x and
y, that is, vectors x and y are orthogonal if and only if
x, y = 0.
If two vectors are orthogonal, which is denoted by x ⊥ y, then they satisfy the
Pythagorean theorem,
x + y2 = x2 + y2 ,
D EFINITION 2.3
A complete inner product space is called a Hilbert space.
We are particularly interested in those Hilbert spaces which are separable because a
Hilbert space contains a countable orthonormal basis if and only if it is separable.
Since all Hilbert spaces with which we are going to deal are separable, we implicitly
assume that this property is satisfied (refer to [113] for details on separability).
Note that a closed subspace of a separable Hilbert space is separable, that is, it also
contains a countable orthonormal basis.
Given a Hilbert space E and a subspace S, we call the orthogonal complement
of S in E, denoted S ⊥ , the set {x ∈ E | x ⊥ S}. Assume further that S is closed,
that is, it contains all limits of sequences of vectors in S. Then, given a vector y in
E, there exists a unique v in S and a unique w in S ⊥ such that y = v + w. We can
thus write
E = S ⊕ S⊥,
or, E is the direct sum of the subspace and its orthogonal complement.
Let us consider a few examples of Hilbert spaces.
The above holds for the real space R as well (note that then yi∗ = yi ). For
n
Thus, l2 (Z) is the space of all sequences such that x < ∞. This is obviously an
infinite-dimensional space, and a possible orthonormal basis is {δ[n − k]}k∈Z .
For the completeness of l2 (Z), one has to show that if xn [k] is a sequence of
vectors in l2 (Z) such that xn −xm → 0 as n, m → ∞ (that is, a Cauchy sequence),
then there exists a limit x in l2 (Z) such that xn −x → 0. The proof can be found,
for example, in [113].
where
k−1
vk = yi , xk yi .
i=1
As will be seen shortly, the vector vk is the orthogonal projection of xk onto the
subspace spanned by the previous orthogonalized vectors and this is subtracted
from xk , followed by normalization.
A standard example of such an orthogonalization procedure is the Legendre
polynomials over the interval [−1, 1]. Start with xk (t) = tk , k = 0, 1, . . . and apply
the Gram-Schmidt procedure to get yk (t), of degree k, norm 1 and orthogonal to
yi (t), i < k (see Problem 2.1).
The coefficients αk of the expansion are called the Fourier coefficients of y (with
respect to {xi }) and are given by
This can be shown by using the continuity of the inner product (that is, if xn → x,
and yn → y, then xn , yn → x, y) as well as the orthogonality of the xk ’s. Given
2.2. HILBERT SPACES 25
n
xk , y = lim xk , αi xi = αk ,
n→∞
i=0
x3
d x2
y^
x1
x2
x2
〈 x 2, y〉 y
〈 x̃ 2, y〉 y
x1 x1
ŷ = 〈 x 1, y〉 〈 x̃ 1, y〉 ŷ = 〈 x , y〉
1
(a) (b)
for x in S is attained for x = i αi xi with
αi = xi , y,
that is, the Fourier coefficients. An immediate consequence of this result is the
successive approximation property of orthogonal expansions. Call ŷ (k) the best
approximation of y on the subspace spanned by {x1 , x2 , . . . , xk } and given by the
coefficients {α1 , α2 , . . . , αk } where αi = xi , y. Then, the approximation ŷ (k+1) is
given by
ŷ (k+1) = ŷ (k) + xk+1 , yxk+1 ,
that is, the previous approximation plus the projection along the added vector xk+1 .
While this is obvious, it is worth pointing out that this successive approximation
property does not hold for nonorthogonal bases. When calculating the approxima-
tion ŷ (k+1) , one cannot simply add one term to the previous approximation, but has
to recalculate the whole approximation (see Figure 2.2). For a further discussion
of projection operators, see Appendix 2.A.
(b) There exist strictly positive constants A, B, Ã, B̃ such that, for all y in E
A y2 ≤ |xk , y|2 ≤ B y2 , (2.2.8)
k
à y 2
≤ |x̃k , y|2 ≤ B̃ y2 . (2.2.9)
k
Compare these inequalities with (2.2.5) in the orthonormal case. Bases which satisfy
(2.2.8) or (2.2.9) are called Riesz bases [73]. Then, the signal expansion formula
becomes
y = xk , y x̃k = x̃k , y xk . (2.2.10)
k k
It is clear why the term biorthogonal is used, since to the (nonorthogonal) basis
{xi } corresponds a dual basis {x̃i } which satisfies the biorthogonality constraint
28 CHAPTER 2
(2.2.7). If the basis {xi } is orthogonal, then it is its own dual, and the expansion
formula (2.2.10) becomes the usual orthogonal expansion given by (2.2.3–2.2.4).
Equivalences similar to Theorem 2.4 hold in the biorthogonal case as well, and
we give the Parseval’s relations which become
y2 = xi , y∗ x̃i , y, (2.2.11)
i
and
y1 , y2 = xi , y1 ∗ x̃i , y2 , (2.2.12)
i
= x̃i , y1 ∗ xi , y2 . (2.2.13)
i
A, B are called frame bounds, and when they are equal, we call the frame tight. In
a tight frame we have
|xk , y|2 = A y2 ,
k
While this last equation resembles the expansion formula in the case of an or-
thonormal basis, a frame does not constitute an orthonormal basis in general. In
particular, the vectors may be linearly dependent and thus not form a basis. If all
2.3. ELEMENTS OF LINEAR ALGEBRA 29
the vectors in a tight frame have unit norm, then the constant A gives the redun-
dancy ratio (for example, A = 2 means there are twice as many vectors as needed
to cover the space). Note that if A = B = 1, and xk = 1 for all k, then {xk }
constitutes an orthonormal basis.
Because of the linear dependence which exists among the vectors used in the
the expansion is not unique anymore. Consider the set {x1 , x2 , . . .}
expansion,
where i βi xi = 0 (where not all βi ’s are zero) because of linear dependence. If y
can be written as
y = αi xi , (2.2.15)
i
then one can add βi to each αi without changing the validity of the expansion
(2.2.15). The expansion (2.2.14) is unique in the sense that it minimizes the norm
of the expansion among all valid expansions. Similarly, for general frames, there
exists a unique dual frame which is discussed in Section 5.3.2 (in the tight frame
case, the frame and its dual are equal).
This concludes for now our brief introduction of signal expansions. Later, more
specific expansions will be discussed, such as Fourier and wavelet expansions. The
fundamental properties seen above will reappear in more specialized forms (for
example, Parseval’s equality).
While we have only discussed Hilbert spaces, there are of course many other
spaces of functions which are of interest. For example, Lp (R) spaces are those
containing functions f for which |f |p is integrable [113]. The norm on these spaces
is defined as ∞
f p = ( |f (t)|p dt)1/p , (2.2.16)
−∞
reference texts exist on the subject, see [106, 280]. Good reviews can also be found
in [150] and [308]. We give only a brief account here, focusing on basic concepts
and topics which are needed later, such as polynomial matrices.
Note that the matrix product is not commutative in general, that is, A B = B A.5
It can be shown that (A B)T = B T AT .
The inner product of two (column) vectors from RN is v 1 , v 2 = v T1 · v2 , and if
the vectors are from C n , then v 1 , v 2 = v ∗1 · v 2 . The outer product of two vectors
from Rn and Rm is an n × m matrix given by v 1 · v T2 .
To define the notion of a determinant, we first need to define a minor. A minor
M ij is a submatrix of the matrix A obtained by deleting its ith row and jth column.
More generally, a minor can be any submatrix of the matrix A obtained by deleting
some of its rows and columns. Then the determinant of an n × n matrix can be
defined recursively as
n
det(A) = Aij (−1)i+j det(M ij )
i=1
where j is fixed and belongs to {1, . . . , n}. The cofactor C ij is (−1)i+j det(M ij ).
A square matrix is said to be singular if det(A) = 0. The product of two matrices
is nonsingular only if both matrices are nonsingular. Some properties of interest
include the following:
(Aα)T (y − Ax̂) = 0
or
AT Ax̂ = AT y,
which are called the normal equations of the least-squares problem. If the columns
of A are linearly independent, then AT A is invertible. The unique least-squares
solution is
x̂ = (AT A)−1 AT y (2.3.4)
(recall that A is either rectangular or rank deficient, and does not have a proper
inverse) and the orthogonal projection ŷ is equal to
Ap = λp,
A = U ΛU ∗ .
This result constitutes the spectral theorem for hermitian matrices. Hermitian
symmetric matrices commute with their hermitian transpose. More generally, a
matrix N that commutes with its hermitian transpose is called normal, that is, it
satisfies N ∗ N = N N ∗ . Normal matrices are exactly those that have a complete
set of orthogonal eigenvectors.
The importance of eigenvectors in the study of linear operators comes from the
following fact: Assuming a full set of eigenvectors,
a vector x can be written as a
linear combination of eigenvectors x = αi v i . Then,
Ax = A αi vi = αi (Avi ) = αi λi v i .
i i i
U x = x, ∀x ∈ C \ ,
as well as
U x, U y = x, y, ∀x, y ∈ C \ ,
2.3. ELEMENTS OF LINEAR ALGEBRA 35
Sometimes, the elements ti are matrices themselves, in which case the matrix is
called block Toeplitz. Another important matrix is the DFT (Discrete Fourier
Transform) matrix. The (i, k)th element of the DFT matrix of size n × n is
Wnik = e−j2πik/n . The DFT matrix diagonalizes circulant matrices, that is, its
columns and rows are the eigenvectors of circulant matrices (see Section 2.4.8 and
Problem 2.18).
A real symmetric matrix A is called positive definite if all its eigenvalues are
greater than 0. Equivalently, for all nonzero vectors x, the following is satisfied:
xT Ax > 0.
Finally, for a positive definite matrix A, there exists a nonsingular matrix W such
that
A = WTW,
where W is intuitively a “square root” of A. One possible way to choose such a
T
√ as A = QΛQ and then, since all the eigenvalues
square root is to diagonalize A
T
are positive, choose W = Q Λ (the square root is applied on each eigenvalue in
the diagonal matrix Λ). The above discussion carries over to hermitian symmetric
matrices by using hermitian transposes.
that is, it can be written either as a matrix containing polynomials as its entries,
or a polynomial having matrices as its coefficients.
The question of the rank in polynomial matrices is more subtle. For example,
the matrix
a + bx 3(a + bx)
,
c + dx λ(c + dx)
with λ = 3, always has rank less than 2, since the two columns are proportional
to each other. On the other hand, if λ = 2, then the matrix would have the rank
2.4. FOURIER THEORY AND SAMPLING 37
less than 2 only if x = −a/b or x = −c/d. This leads to the notion of normal rank.
First, note that H(x) is nonsingular only if det(H(x)) is different from 0 for some
x. Then, the normal rank of H(x) is the largest of the orders of minors that have
a determinant not identically zero. In the above example, for λ = 3, the normal
rank is 1, while for λ = 2, the normal rank is 2.
An important class of polynomial matrices are unimodular matrices, whose de-
terminant is not a function of x. An example is the following matrix:
1+x x
H(x) = ,
2+x 1+x
whose determinant is equal to 1. There are several useful properties pertaining
to unimodular matrices. For example, the product of two unimodular matrices
is again unimodular. The inverse of a unimodular matrix is unimodular as well.
Also, one can prove that a polynomial matrix H(x) is unimodular, if and only if
its inverse is a polynomial matrix. All these facts can be proven using properties
of determinants (see, for example, [308]).
The extension of the concept of unitary matrices to polynomial matrices leads
to paraunitary matrices [308] as studied in circuit theory. In fact, these matrices
are unitary on the unit circle or the imaginary axis, depending if they correspond
to discrete-time or continuous-time linear operators (z-transforms or Laplace trans-
forms). Consider the discrete-time case and x = ejω . Then, a square matrix U (x)
is unitary on the unit circle if
[U (ejω )]∗ U (ejω ) = U (ejω )[U (ejω )]∗ = I.
Extending this beyond the unit circle leads to
[U (x−1 )]T U (x) = U (x)[U (x−1 )]T = I, (2.3.7)
since (ejω )∗ = e−jω . If the coefficients of the polynomials are complex, the coeffi-
cients need to be conjugated in (2.3.7), which is usually written [U ∗ (x−1 )]T . This
will be studied in Chapter 3.
As a generalization of polynomial matrices, one can consider the case of rational
matrices. In that case, each entry is a ratio of two polynomials. As will be shown
in Chapter 3, polynomial matrices in z correspond to finite impulse response (FIR)
discrete-time filters, while rational matrices can be associated with infinite impulse
response (IIR) filters. Unimodular and unitary matrices can be defined in the
rational case, as in the polynomial case.
(a) The continuous-time Fourier transform (CTFT), often simply called the Fourier
transform.
In all the Fourier cases, {ψ} = {ψ̃}. The above transforms and series will be
discussed in this section. Later, more general expansions will be introduced, in par-
ticular, series expansions of discrete-time signals using filter banks in Chapter 3,
series expansions of continuous-time signals using wavelets in Chapter 4, and in-
tegral expansions of continuous-time signals using wavelets and short-time Fourier
bases in Chapter 5.
which is called the Fourier analysis formula. The inverse Fourier transform is given
by ∞
1
f (t) = F (ω)ejωt dω, (2.4.2)
2π −∞
or, the Fourier synthesis formula. Note that ejωt is not in L2 (R), and that the set
{ejωt } is not countable. The exact conditions under which (2.4.2) is the inverse
of (2.4.1) depend on the behavior of f (t) and are discussed in standard texts on
Fourier theory [46, 326]. For example, the inversion is exact if f (t) is continuous
(or if f (t) is defined as (f (t+ ) + f (t− ))/2 at a point of discontinuity).6
When f (t) is square-integrable, then the formulas above hold in the L2 sense
(see Appendix 2.C), that is, calling fˆ(t) the result of the analysis followed by the
synthesis formula,
f (t) − fˆ(t) = 0.
Assuming that the Fourier transform and its inverse exist, we will denote by
f (t) ←→ F (ω)
6
We assume that f (t) is of bounded
variation. That is, for f (t) defined on a closed interval [a, b],
there exists a constant A such that N n=1 |f (tn ) − f (tn−1 )| < A for any finite set {ti } satisfying
a ≤ t0 < t1 < . . . < tN ≤ b. Roughly speaking, the graph of f (t) cannot oscillate over an infinite
distance as t goes over a finite interval.
40 CHAPTER 2
Linearity Since the Fourier transform is an inner product (see (2.4.1)), it follows
immediately from the linearity of the inner product that
which indicates the essential symmetry of the Fourier analysis and synthesis formu-
las.
ejω0 t f (t) ←→ F (ω − ω0 ).
∂ n F (ω)
(−jt)n f (t) ←→ .
∂ω n
2.4. FOURIER THEORY AND SAMPLING 41
and is denoted h(t) = f (t) ∗ g(t) = g(t) ∗ f (t) since (2.4.9) is symmetric in f (t)
and g(t). Denoting by F (ω) and G(ω) the Fourier transforms of f (t) and g(t),
respectively, the convolution theorem states that
This result is fundamental, and we will prove it for f (t) and g(t) being in L1 (R).
Taking the Fourier transform of f (t) ∗ g(t),
∞ ∞
f (τ )g(t − τ )dτ e−jωt dt,
−∞ −∞
changing the order of integration (which is allowed when f (t) and g(t) are in L1 (R);
see Fubini’s theorem in [73, 250]) and using the shift property, we get
∞ ∞ ∞
−jωt
f (τ ) g(t − τ )e dt dτ = f (τ )e−jωτ G(ω)dτ = F (ω) G(ω).
−∞ −∞ −∞
The result holds as well when f (t) and g(t) are square-integrable, but requires a
different proof [108].
An alternative view of the convolution theorem is to identify the complex ex-
ponentials ejωt as the eigenfunctions of the convolution operator, since
∞ ∞
ejω(t−τ )
g(τ )dτ = ejωt
e−jωτ g(τ )dτ = ejωt G(ω).
−∞ −∞
The associated eigenvalue G(ω) is simply the Fourier transform of the impulse
response g(τ ) at frequency ω.
42 CHAPTER 2
that is,
h (t) = f (t) ∗ g(t) = f (t) ∗ g (t).
This is useful when convolving a signal with a filter which is known to be the
derivative of a given function such as a Gaussian, since one can think of the result
as being the convolution of the derivative of the signal with a Gaussian.
Note that the factor 1/2π comes from our definition √ of the Fourier transform (2.4.1–
2.4.2). A symmetric definition, with a factor 1/ 2π in both the analysis and
synthesis formulas (see, for example, [73]), would remove the scale factor in (2.4.12).
The proof of (2.4.11) uses the fact that
f ∗ (t) ←→ F ∗ (−ω)
2.4. FOURIER THEORY AND SAMPLING 43
and the frequency-domain convolution relation (2.4.10). That is, since f ∗ (t) · g(t)
has Fourier transform (1/2π)(F ∗ (−ω) ∗ G(ω)), we have
∞ ∞
∗ −jωt 1
f (t) g(t) e dt = F ∗ (−Ω) G(ω − Ω) dΩ,
−∞ 2π −∞
f (t + T ) = f (t),
with T /2
1
F [k] = f (t) e−jkω0 t dt. (2.4.14)
T −T /2
7
Again, we consider nonpathological functions (that is, of bounded variation).
44 CHAPTER 2
That the set {ϕk } is complete is shown in [326] and means that there exists no
periodic function f (t) with L2 norm greater than zero that has all its Fourier series
coefficients equal to zero. Actually, there is equivalence between norms, as shown
below.
Best Approximation Property While the following result is true in a more gen-
eral setting (see Section 2.2.3), it is sufficiently important to be restated for Fourier
series, namely
N N
f (t) − ϕk , f ϕk (t) ≤ f (t) − ak ϕk (t) ,
k=−N k=−N
where {ak } is an arbitrary set of coefficients. That is, the Fourier series coefficients
are the best ones for an approximation in the span of {ϕk (t)}, k = −N, . . . , N .
Moreover, if N is increased, new coefficients are added without affecting the previous
ones.
Fourier series, beside their obvious use for characterizing periodic signals, are
useful for problems of finite size through periodization. The immediate concern,
however, is the introduction of a discontinuity at the boundary, since periodization
of a continuous signal on an interval results, in general, in a discontinuous periodic
signal.
Fourier series can be related to the Fourier transform seen earlier by using
sequences of Dirac functions which are also used in sampling. We will turn our
attention to these functions next.
2.4. FOURIER THEORY AND SAMPLING 45
then δ(t) = limε→0 δε (t). More generally, one can use any smooth function ψ(t)
with integral 1 and define [278]
1 t
δ(t) = lim ψ .
→0
Any operation involving a Dirac function requires a limiting operation. Since we are
reviewing standard results, and for notational convenience, we will skip the limiting
process. However, let us emphasize that Dirac functions have to be handled with
care in order to get meaningful results. When in doubt, it is best to go back to the
definition and the limiting process. For details see, for example, [215]. It follows
from (2.4.15) that ∞
δ(t) dt = 1, (2.4.16)
−∞
as well as8
∞ ∞
f (t − t0 ) δ(t) dt = f (t) δ(t − t0 ) dt = f (t0 ). (2.4.17)
−∞ −∞
One more standard relation useful for the Dirac function is [215]
The Fourier transform of δ(t − t0 ) is, from (2.4.1) and (2.4.17), equal to
δ(t − t0 ) ←→ e−jωt0 .
Using the symmetry property (2.4.3) and the previous results, we see that
According to the above and using the modulation theorem (2.4.10), f (t) ejω0 t has
Fourier transform F (ω − ω0 ).
Next, we introduce the train of Dirac functions spaced T > 0 apart, denoted
sT (t) and given by
∞
sT (t) = δ(t − nT ). (2.4.20)
n=−∞
Before getting its Fourier transform, we derive the Poisson sum formula. Note that,
given a function f (t) and using (2.4.18),
∞ ∞
f (τ ) sT (t − τ ) dτ = f (t − nT ). (2.4.21)
−∞ n=−∞
Call the above T -periodic function f0 (t). Further assume that f (t) is sufficiently
smooth and decaying rapidly such that the above series converges uniformly to
f0 (t). We can then expand f0 (t) into a uniformly convergent Fourier series
∞
1 T /2
f0 (t) = f0 (τ )e−j2πkτ /T dτ ej2πkt/T .
T −T /2
k=−∞
Consider the Fourier series coefficient in the above formula, using the expression
for f0 (t) in (2.4.21)
T /2 ∞
(2n+1)T /2
f0 (τ )e−j2πkτ /T dτ = f (τ ) e−j2πkτ /T dτ
−T /2 n=−∞ (2n−1)T /2
2πk
= F .
T
One can use the Poisson formula to derive the Fourier transform of the impulse
train sT (t) in (2.4.20). It can be shown that
∞
2π 2πk
ST (ω) = δ(ω − ). (2.4.23)
T T
k=−∞
We have explained that sampling the spectrum and periodizing the time-domain
function are equivalent. We will see the dual situation, when sampling the time-
domain function leads to a periodized spectrum. This is also an immediate appli-
cation of the Poisson formula.
2.4.5 Sampling
The process of sampling is central to discrete-time signal processing, since it pro-
vides the link with the continuous-time domain. Call fT (t) the sampled version of
f (t), obtained as
∞
fT (t) = f (t) sT (t) = f (nT ) δ(t − nT ). (2.4.24)
n=−∞
Using the modulation theorem of the Fourier transform (2.4.10) and the transform
of sT (t) given in (2.4.23), we get
∞ ∞
1 2π 1 2π
FT (ω) = F (ω) ∗ δ ω−k = F ω−k , (2.4.25)
T T T T
k=−∞ k=−∞
where we used (2.4.18). Thus, FT (ω) is periodic with period 2π/T , and is obtained
by overlapping copies of F (ω) at every multiple of 2π/T . Another way to prove
(2.4.25) is to use the Poisson formula. Taking the Fourier transform of (2.4.24)
results in
∞
FT (ω) = f (nT ) e−jnT ω ,
n=−∞
since fT (t) is a weighted sequence of Dirac functions with weights f (nT ) and shifts
of nT . To use the Poisson formula, consider the function gΩ (t) = f (t) e−jtΩ , which
has Fourier transform GΩ (ω) = F (ω + Ω) according to (2.4.19). Now, applying
(2.4.22) to gΩ (t), we find
∞
∞
1 2πk
gΩ (nT ) = GΩ
n=−∞
T T
k=−∞
48 CHAPTER 2
where
sin (πt/T )
sincT (t) = .
πt/T
Note that sincT (nT ) = δ[n], that is, it has the interpolation property since it is 1
at the origin but 0 at nonzero multiples of T . It follows immediately that (2.4.27)
holds at the sampling instants t = nT .
P ROOF
The proof that (2.4.27) is valid for all t goes as follows: Consider the sampled version of
f (t), fT (t), consisting of weighted Dirac functions (2.4.24). We showed that its Fourier
transform is given by (2.4.25). The sampling frequency ωs equals 2ωm , where ωm is the
bandlimiting frequency of F (ω). Thus, F (ω − kωs ) and F (ω − lωs ) do not overlap for k = l.
To recover F (ω), it suffices to keep the term with k = 0 in (2.4.25) and normalize it by
T . This is accomplished with a function that has a Fourier transform which is equal to T
from −ωm to ωm and 0 elsewhere. This is called an ideal lowpass filter. Its time-domain
impulse response, denoted sincT (t) where T = π/ωm , is equal to (taking the inverse Fourier
transform)
ωm
1 T jπt/T sin(πt/T )
sincT (t) = T e−jωt dω = e − e−jπt/T = . (2.4.28)
2π −ωm 2πjt πt/T
9
We will say that a function f (t) is bandlimited to ωm if its Fourier transform F (ω) = 0 for
|ω| ≥ ωm .
2.4. FOURIER THEORY AND SAMPLING 49
Convolving fT (t) with sincT (t) filters out the repeated spectrums (terms with k = 0 in
(2.4.25)) and recovers f (t), as is clear in frequency domain. Because fT (t) is a sequence
of Dirac functions of weights f (nT ), the convolution results in a weighted sum of shifted
impulse responses,
∞
∞
f (nT )δ(t − nT ) ∗ sincT (t) = f (nT ) sincT (t − nT ),
n=−∞ n=−∞
proving (2.4.27)
Now, assume a bandlimited signal f (t) and consider the inner product ϕn,T , f .
Again using Parseval’s relation,
√ ω
T m √
ϕn,T , f = ejωnT F (ω) dω = T f (nT ),
2π −ωm
(the only change is that we normalized the sinc basis functions to have unit norm).
What happens if f (t) is not bandlimited? Because {ϕn,T } is an orthogonal set,
the interpolation formula (2.4.30) represents the orthogonal projection of the input
50 CHAPTER 2
signal onto the subspace of bandlimited signals. Another way to write the inner
product in (2.4.30) is
∞
ϕn,T , f = ϕ0,T (τ − nT ) f (τ ) dτ = ϕ0,T (−t) ∗ f (t)|t=nT ,
−∞
which equals ϕ0,T (t)∗f (t) since ϕ0,T (t) is real and symmetric in t. That is, the inner
products, or coefficients, in the interpolation formula are simply the outputs of an
ideal lowpass filter with cutoff π/T sampled at multiples of T . This is the usual
view of the sampling theorem as a bandlimiting convolution followed by sampling
and reinterpolation.
To conclude this section, we will demonstrate a fact that will be used in Chap-
ter 4. It states that the following can be seen as a Fourier transform pair:
f (t), f (t + n) = δ[n] ←→ |F (ω + 2kπ)|2 = 1. (2.4.31)
k∈Z
The left side of the equation is simply the deterministic autocorrelation10 of f (t)
evaluated at integers, that is, sampled autocorrelation. If we denote the auto-
correlation of f (t) as p(τ ) = f (t), f (t + τ ), then the left side of (2.4.31) is
p1 (τ ) = p(τ )s1 (τ ), where s1 (τ ) is as defined in (2.4.20) with T = 1. The Fourier
transform of p1 (τ ) is (apply (2.4.25))
P1 (ω) = P (ω − 2kπ).
k∈Z
Since the Fourier transform of p(t) is P (ω) = |F (ω)|2 , we get that the Fourier
transform of the right side of (2.4.31) is the left side of (2.4.31).
Comparing (2.4.32–2.4.33) with the equivalent expressions for Fourier series (2.4.13–
2.4.14), one can see that they are duals of each other (within scale factors). Fur-
thermore, if the sequence f [n] is obtained by sampling a continuous-time function
f (t) at instants nT ,
f [n] = f (nT ), (2.4.34)
then the discrete-time Fourier transform is related to the Fourier transform of f (t).
Denoting the latter by Fc (ω), the Fourier transform of its sampled version is equal
to (see (2.4.26))
∞
∞
−jnT ω 1 2π
FT (ω) = f (nT ) e = Fc ω − k . (2.4.35)
n=−∞
T T
k=−∞
Because of these close relationships with the Fourier transform and Fourier series,
it follows that all properties seen earlier carry over and we will only repeat two of
the most important ones (for others, see [211]).
Convolution Given two sequences f [n] and g[n] and their discrete-time Fourier
transforms F (ejω ) and G(ejω ), then
∞
∞
f [n] ∗ g[n] = f [n − l] g[l] = f [l] g[n − l] ←→ F (ejω ) G(ejω ).
l=−∞ l=−∞
52 CHAPTER 2
N −1
F [k] = f [n] WNnk , k ∈ Z, (2.4.38)
n=0
N−1
1
f [n] = F [k] WN−nk , n ∈ Z, (2.4.39)
N
k=0
N −1
N −1
f [n] ∗ g[n] = f [n − l] g[l] = f0 [(n − l) mod N ] g0 [l], (2.4.40)
l=0 l=0
where f0 [·] and g0 [·] are equal to one period of f [·] and g[·] respectively. That is,
f0 [n] = f [n], n = 0, . . . , N − 1, and 0 otherwise, and similarly for g0 [n]. Then, the
convolution property is given by
−1 N −1
1 ∗
N
∗
f [n] g[n] = F [k] G[k].
N
n=0 k=0
2.4. FOURIER THEORY AND SAMPLING 53
Just as the Fourier series coefficients were related to the Fourier transform of one
period (see (2.4.14)), the coefficients of the discrete-time Fourier series can be ob-
tained from the discrete-time Fourier transform of one period. If we call F0 (ejω )
the discrete-time Fourier transform of f0 [n], (2.4.32) and (2.4.38) imply that
∞
N −1
F0 (ejω ) = f0 [n] e−jωn = f [n] e−jωn ,
n=−∞ n=0
leading to
F [k] = F0 (ejω )|ω=k2π/N .
The sampling of F0 (ejω ) simply repeats copies of f0 [n] at integer multiples of N ,
and thus we have
∞
N −1
1
N −1
1
f [n] = f0 [n − lN ] = F [k] ejnk2π/N
= F0 ejk2π/N ejnk2π/N ,
N N
l=−∞ k=0 k=0
(2.4.42)
which is the discrete-time version of the Poisson sum formula. It actually holds
for f0 [·] with support larger than 0, . . . , N − 1, as long as the first sum in (2.4.42)
converges. For n = 0, (2.4.42) yields
∞
N −1
1
f0 [lN ] = F0 ejk2π/N .
N
l=−∞ k=0
N −1
F [k] = f [n] WNnk , (2.4.43)
n=0
where WN = e−j2π/N . These are the same formulas as (2.4.38–2.4.39), except that
f [n] and F [k] are not defined for n, k ∈ {0, . . . , N −1}. Recall that the discrete-time
54 CHAPTER 2
Fn,k = WNnk , n, k = 0, . . . , N − 1,
f ∗p g = Cg = F −1 ΛF g,
f̂ = F f ,
where we used (2.4.45), that is, the fact that F ∗ is the inverse of F up to a scale
factor of N .
Other properties of the DFT follow from their counterparts for the discrete-time
Fourier transform, bearing in mind the underlying circular structure implied by the
discrete-time Fourier series (for example, a shift is a circular shift).
2.4. FOURIER THEORY AND SAMPLING 55
f (t) F (ω)
(a)
t ω
f (t) F (ω)
(b)
t ω
T 2π
------
T
f (t) F (ω)
(c)
t ω
2π ωs
-------
ωs
f [n] F [k]
(d)
n k
N N
Between the Fourier transform, where both time and frequency variables are con-
tinuous, and the discrete-time Fourier series (DTFS), where both variables are
discrete, there are a number of intermediate cases.
First, in Table 2.1 and Figure 2.3, we compare the Fourier transform, Fourier
56 CHAPTER 2
f (t) F (ω)
(a)
t ω
2π ωs
------ ------
ωs 2
f (t) F (ω)
(b)
t ω
T 2π
------
T
f (t) F (ω)
(c)
•••
t ω
2π T 2π ωs
------ ------
ωs T ------
2
f (t) F (ω)
(d)
•••
t ω
0 1 2 ••• N-1 2π 2π
------
N
series, discrete-time Fourier transform and discrete-time Fourier series. The table
shows four combinations of continuous versus discrete variables in time and fre-
quency. As defined in Section 2.4.1, we use a short-hand CT or DT for continuous-
versus discrete-time variable, and we call it a Fourier transform or series if the
synthesis formula involves an integral or a summation.
Then, in Table 2.2 and Figure 2.4, we consider the same transforms but when
Table 2.1 Fourier transforms with various combinations of continuous/discrete time and fre-
quency variables. CT and DT stand for continuous and discrete time, while FT and FS stand
for Fourier transform (integral synthesis) and Fourier series (summation synthesis). P stands
for a periodic signal. The relation between sampling period T and sampling frequency ωs is
ωs = 2π/T . Note that in the DTFT case, ωs is usually equal to 2π (T = 1).
Analysis
Transform Time Freq. Duality
Synthesis
F (ω) = t f (t) e−jωt dt
(a) Fourier transform self-
C C
CTFT dual
f (t) = 1/2π ω F (ω) ejωt dω
2.4. FOURIER THEORY AND SAMPLING
T /2
F [k] = 1/T −T /2 f (t) e−j2πkt/T dt dual
(b) Fourier series C
D with
CTFS P
f (t) = F [k] ej2πkt/T DTFT
k
(c) Discrete-time F (ejω ) = n f [n] e−j2πωn/ωs dual
C
Fourier transform D with
P ωs /2 jω
DTFT f [n] = 1/ωs −ωs /2 F (e ) ej2πωn/ωs dω CTFS
N −1
(d) Discrete-time F [k] = n=0 f [n] e−j2πnk/N
D D self
Fourier series
P P N −1 -dual
DTFS f [n] = 1/N n=0 F [k] ej2πnk/N
57
58
Table 2.2 Various Fourier transforms with restrictions on the signals involved. Either the signal
is of finite length (FL) or the Fourier transform is bandlimited (BL).
the signal satisfies some additional restrictions, that is, when it is limited either in
time or in frequency. In that case, the continuous function (of time or frequency)
can be sampled without loss of information.
finite-length signal has the whole complex plane as its ROC (assuming it converges
anywhere), since it is both left- and right-sided and connected.
If a signal is two-sided, that is, neither left- nor right-sided, then its ROC is the
intersection of the ROC’s of its left- and right-sided parts. This ROC is therefore
either empty or of the form of a vertical strip.
Given a Laplace transform (such as a rational expression), different ROC’s lead
to different time-domain signals. Let us illustrate this with an example.
Example 2.1
Assume F (s) = 1/((s + 1)(s + 2)). The ROC {Re(s) < −2} corresponds to a left-sided
signal
f (t) = −(e−t − e−2t ) u(−t).
The ROC {Re(s) > −1} corresponds to a right-sided signal
Finally, the ROC {−2 < Re(s) < −1} corresponds to a two-sided signal
Note that only the right-sided signal would also have a Fourier transform (since its ROC
includes the jω-axis).
For the inversion of the Laplace transform, recall its relation to the Fourier
transform of an exponentially weighted signal. Then, it can be shown that its
inverse is σ+j∞
1
f (t) = F (s) est ds,
2πj σ−j∞
where σ is chosen inside the ROC. We will denote a Laplace transform pair by
For a review of Laplace transform properties, see [212]. Next, we will concentrate
on filtering only.
with an ROC containing the intersection of the ROC’s of H(s) and G(s).
The differentiation property of the Laplace transform says that
∂f (t)
←→ s F (s),
∂t
with ROC containing the ROC of F (s). Then, it follows that linear constant-
coefficient differential equations can be characterized by a Laplace transform called
the transfer function H(s). Linear, time-invariant differential equations, given by
N
∂ k y(t)
M
∂ k x(t)
ak = bk , (2.5.1)
∂tk ∂tk
k=0 k=0
that is, the input and the output are related by a convolution with a filter having
impulse response h(t), where h(t) is the inverse Laplace transform of H(s).
To take this inverse Laplace transform, we need to specify the ROC. Typically,
we look for a causal solution, where we solve the differential equation forward
in time. Then, the ROC extends to the right of the vertical line which passes
through the rightmost pole. Stability11 of the filter corresponding to the transfer
function requires that the ROC include the jω-axis. This leads to the well-known
requirement that a causal system with rational transfer function is stable if and
only if all the poles are in the left half-plane (the real part of the pole location is
smaller than zero). In the above discussion, we have assumed initial rest conditions,
that is, the homogeneous solution of differential Equation (2.5.1) is zero (otherwise,
the system is neither linear nor time-invariant).
1
|HN (jω)|2 = , (2.5.2)
1 + (jω/jωc )2N
where ωc is a parameter which will specify the cutoff frequency beyond which sinusoids are
substantially attenuated. Thus, ωc defines the bandwidth of the lowpass Butterworth filter.
11
Stability of a filter means that a bounded input produces a bounded output.
62 CHAPTER 2
Since |HN (jω)|2 = H(jω)H ∗ (jω) = H(jω)H(−jω) when the filter is real, and noting that
(2.5.2) is the Laplace transform for s = jω, we get
1
H(s) H(−s) = . (2.5.3)
1 + (s/jωc )2N
π(2k + 1) π
|sk | = ωc , arg[sk ] = + ,
2N 2
and k = 0, . . . , 2N − 1. The poles thus lie on a circle, and they appear in pairs at ±sk .
To get a stable and causal filter, one simply chooses the N poles which lie on the left-hand
side half-circle. Since pole locations specify the filter only up to a scale factor, set s = 0
in (2.5.3) which leads to H(0) = 1. For example, a second-order Butterworth filter has the
following Laplace transform:
ωc2
H2 (s) = . (2.5.4)
(s + ωc e jπ/4 )(s + ωc e−jπ/4 )
One can find its “physical” implementation by going back, through the inverse Laplace
transform, to the equivalent linear constant-coefficient differential equation. See also Ex-
ample 3.6 in Chapter 3, for discrete-time Butterworth filters.
where z ∈ C. On the unit circle z = ejω , this is the discrete-time Fourier transform
(2.4.32), and for z = ρejω , it is the discrete-time Fourier transform of the sequence
f [n] · ρn . Similarly to the Laplace transform, there is a region of convergence
(ROC) associated with the z-transform F (z), namely a region of the complex plane
where F (z) converges. Consider the case where the z-transform is rational and
the sequence is bounded in amplitude. The ROC does not contain any pole. If the
sequence is right-sided (left-sided), the ROC extends outward (inward) from a circle
with the radius corresponding to the modulus of the outermost (innermost) pole. If
the sequence is two-sided, the ROC is a ring. The discrete-time Fourier transform
2.5. SIGNAL PROCESSING 63
converges absolutely if and only if the ROC contains the unit circle. From the
above discussion, it is clear that the unit circle in the z-plane of the z-transform
and the jω-axis in the s-plane of the Laplace transform play equivalent roles.
Also, just as in the Laplace transform, a given z-transform corresponds to dif-
ferent signals, depending on the ROC attached to it.
The inverse z-transform involves contour integration in the ROC and Cauchy’s
integral theorem [211]. If the contour of integration is the unit circle, the inver-
sion formula reduces to the discrete-time Fourier transform inversion (2.4.33). On
circles centered at the origin but of radius ρ different from 1, one can think of for-
ward and inverse z-transforms as the Fourier analysis and synthesis of a sequence
f [n] = ρn f [n]. Thus, convergence properties are as for the Fourier transform of the
exponentially weighted sequence. In the ROC, we can write formally a z-transform
pair as
f [n] ←→ F (z), z ∈ ROC.
When z-transforms are rational functions, the inversion is best done by partial frac-
tion expansion followed by term-wise inversion. Then, the z-trans-
form pairs,
1
an u[n] ←→ |z| > |a|, (2.5.6)
1 − az −1
and
1
−an u[−n − 1] ←→ |z| < |a|, (2.5.7)
1 − az −1
are useful, where u[n] is the unit-step function (u[n] = 1, n ≥ 0, and 0 otherwise).
The above transforms follow from the definition (2.5.5) and the sum of geometric
series, and they are a good example of identical z-transforms with different ROC’s
corresponding to different signals.
As a simple example, consider the sequence
f [n] = a|n|
that is, a nonempty ROC only if |a| < 1. For more z-transform properties, see
[211].
64 CHAPTER 2
The z-transform G(z) is thus the eigenvalue of the convolution operator for that
particular value of z. The convolution theorem follows as
with an ROC containing the intersection of the ROC’s of H(z) and G(z). Convo-
lution with a time-reversed filter can be expressed as an inner product,
f [n] = x[k] h[n − k] = x[k] h̃[k − n] = x[k], h̃[k − n],
k k
x[n − k] ←→ z −k X(z).
N
M
ak y[n − k] = bk x[n − k], (2.5.8)
k=0 k=0
and taking its z-transform using the delay property, we get the transfer function as
the ratio of the output and input z-transforms,
M −1
Y (z) k=0 bk z
H(z) = = N .
X(z) −1
k=0 ak z
The output is related to the input by a convolution with a discrete-time filter having
as impulse response h[n], the inverse z-transform of H(z). Again, the ROC depends
2.5. SIGNAL PROCESSING 65
that is, P (ejω ) is a nonnegative function on the unit circle. In other words, the
following is a Fourier-transform pair:
(recall that the subscript * implies conjugation of the coefficients but not of z).
Note that from the above, it is obvious that if zk is a zero of P (z), so is 1/zk∗ (that
also means that zeros on the unit circle are of even multiplicity). When h[n] is
real, and zk is a zero of H(z), then zk∗ , 1/zk , 1/zk∗ are zeros as well (they are not
necessarily different).
Suppose now that we are given an autocorrelation function P (z) and we want
to find H(z). Here, H(z) is called a spectral factor of P (z) and the technique of
extracting it, spectral factorization. These spectral factors are not unique, and are
obtained by assigning one zero out of each zero pair to H(z) (we assume here that
p[m] is FIR, otherwise allpass functions (2.5.10) can be involved). The choice of
which zeros to assign to H(z) leads to different spectral factors. To obtain a spectral
factor, first factor P (z) into its zeros as follows:
"
Nu "
N "
N
P (z) = α ((1 − z1i z ) (1 − z1i z)) (1 − z2i z ) (1 − z2∗i z),
−1 −1
where the first product contains the zeros on the unit circle, and thus |z1i | = 1,
and the last two contain pairs of zeros inside/outside the unit circle, respectively.
In that case, |z2i | < 1. To obtain various H(z), one has to take one zero out of
each zero pair on the unit circle, as well as one of two zeros inside/outside the
unit circle. Note that all these solutions have the same magnitude response but
different phase behavior. An important case is the minimum phase solution which
is the one, among all causal spectral factors, that has the smallest phase term. To
get a minimum phase solution, we will consistently choose the zeros inside the unit
circle. Thus, H(z) would be of the form
√ "
Nu "
N
H(z) = α (1 − z1i z −1 ) (1 − z2i z −1 ).
i=1 i=1
FIR filters of length L. When the impulse response is symmetric, one can write
where L is the length of the filter, and A(ω) is a real function of ω. Thus, the phase
is a linear function of ω. Similarly, when the impulse response is antisymmetric,
one can write
H(ejω ) = je−jω(L−1)/2 B(ω),
where B(ω) is a real function of ω. Here, the phase is an affine function of ω (but
usually called linear phase).
One way to design discrete-time filters is by transformation of an analog filter.
For example, one can sample the impulse response of the analog filter if its magni-
tude frequency response is close enough to being bandlimited. Another approach
consists of mapping the s-plane of the Laplace transform into the z-plane. From
our previous discussion of the relationship between the two planes, it is clear that
the jω-axis should map into the unit circle and the left half-plane should become
the inside of the unit circle in order to preserve stability. Such a mapping is given
by the bilinear transformation [211]
1 − z −1
B(z) = β .
1 + z −1
Then, the discrete-time filter Hd is obtained from a continuous-time filter Hc by
setting
Hd (z) = Hc (B(z)).
Considering what happens on the jω-axis and the unit circle, it can be verified that
the bilinear transform warps the frequency axis as ω = 2 arctan(ωc /β), where ω
and ωc are the discrete and continuous frequency variables, respectively.
As an example, the discrete-time Butterworth filter has a magnitude frequency
response equal to
1
|H(ejω )|2 = . (2.5.9)
1 + (tan(ω/2)/ tan(ω0 /2))2N
This squared magnitude is flat at the origin, in the sense that its first 2N − 1
derivatives are zero at ω = 0. Note that since we have a closed-form factorization of
the continuous-time Butterworth filter (see (2.5.4)), it is best to apply the bilinear
transform to the factored form rather than factoring (2.5.9) in order to obtain
H(ejω ) in its cascade form.
Instead of the above indirect construction, one can design discrete-time filters
directly. This leads to better designs at a given complexity of the filter or, con-
versely, to lower-complexity filters for a given filtering performance.
68 CHAPTER 2
In the particular case of FIR linear phase filters (that is, a finite-length sym-
metric or antisymmetric impulse response), a powerful design method called the
Parks-McClellan algorithm [211] leads to optimal filters in the minimax sense (the
maximum deviation from the desired Fourier transform magnitude is minimized).
The resulting approximation of the desired frequency response becomes equiripple
both in the passband and stopband (the approximation error is evenly spread out).
It is thus very different from a monotonically decreasing approximation as achieved
by a Butterworth filter.
Finally, we discuss the allpass filter, which is an example of what could be called
a unitary filter. An allpass filter has the property that
for all ω. Calling y[n] the output of the allpass when x[n] is input, we have
1 1 1
y2 = Y (ejω )2 = Hap (ejω ) X(ejω )2 = X(ejω )2 = x2 ,
2π 2π 2π
which means it conserves the energy of the signal it filters. An elementary single-
pole/zero allpass filter is of the following form (see also Appendix 3.A in Chapter
3):
z −1 − a∗
Hap (z) = . (2.5.11)
1 − az −1
Writing the pole location as a = ρejθ , the zero is at 1/a∗ = (1/ρ)ejθ . A general
allpass filter is made up of elementary sections as in (2.5.11)
"
N
z −1 − a∗i P̃ (z)
Hap (z) = = , (2.5.12)
1 − ai z −1 P (z)
i=1
P ∗ (ejω )
Hap (ejω ) = e−jωN ,
P (ejω )
and property (2.5.10) follows easily. That all rational functions satisfying (2.5.10)
can be factored as in (2.5.12) is shown in [308].
continuous-time signal and resample it at a different rate, most often, the rate
changes are being done in the discrete-time domain. We review some of the key
results. For further details, see [67] and [308].
y[n] = x[nN ],
that is, all samples with indexes modulo N different from zero are discarded. In
the Fourier domain, we get
1 j(ω−2πk)/N
N −1
jω
Y (e ) = X e , (2.5.13)
N
k=0
1 k 1/N
N −1
Y (z) = X WN z , (2.5.14)
N
k=0
where WN = e−j2π/N as usual. To prove (2.5.14), consider first a signal x [n] which
equals x[n] at multiples of N , and 0 elsewhere. If x[n] has z-transform X(z), then
X (z) equals
N −1
1
X (z) = X(WNk z) (2.5.15)
N
k=0
as can be shown by using the orthogonality of the roots of unity (2.1.3). To obtain
y[n] from x [n], one has to drop the extra zeros between the nonzero terms or
contract the signal by a factor of N . This is obtained by substituting z 1/N for z in
(2.5.15), leading to (2.5.14). Note that (2.5.15) contains the signal X as well as its
13
Sometimes, the term decimation is used even though it historically stands for “keep 9 out of
10” in reference to a Roman practice of killing every tenth soldier of a defeated army.
70 CHAPTER 2
(a) jω
1 X(e )
ω
π 2π 3π 4π 5π 6π
ω
π 2π 3π 4π 5π 6π
N − 1 modulated versions (on the unit circle, X(WNk z) = X(ej(ω−k2π/N ) )). This
is the reason why in Chapter 3, we will call the analysis dealing with X(WNk z),
modulation-domain analysis.
An alternative proof of (2.5.13) (which is (2.5.14) on the unit circle) consists
of going back to the underlying continuous-time signal and resampling with an
N -times larger sampling period. This is considered in Problem 2.10.
By way of an example, we show the case N = 3 in Figure 2.5. It is obvious
that in order to avoid aliasing, downsampling by N should be preceded by an ideal
lowpass filter with cutoff frequency π/N (see Figure 2.6(a)). Its impulse response
h[n] is given by
π/N
1 sin πn/N
h[n] = ejωn dω = . (2.5.16)
2π −π/N πn
2.5. SIGNAL PROCESSING 71
perfect interpolation of x[n] in the sense that the missing samples have been filled
in without disturbing the original ones.
A rational sampling rate change by M/N is obtained by cascading upsampling
and downsampling with an interpolation filter in the middle, as shown in Figure
2.6(c). The interpolation filter is the cascade of the ideal lowpass for the upsampling
and for the downsampling, that is, the narrower of the two in the ideal filter case.
Finally, we demonstrate a fact that will be extensively used in Chapter 3. It
can be seen as an application of downsampling followed by upsampling to the de-
terministic autocorrelation of g[n]. This is the discrete-time equivalent of (2.4.31).
We want to show that the following holds:
N −1
g[n], g[n + N l] = δ[l] ←→ G(WNk z) G(WN−k z −1 ) = N. (2.5.19)
k=0
The left side of the above equation is simply the autocorrelation of g[n] evaluated
at every N th index m = N l. If we denote the autocorrelation of g[n] as p[n], then
the left side of (2.5.19) is p [n] = p[N n]. The z-transform of p [n] is (apply (2.5.14))
N −1
1
P (z) = P (WNk z 1/N ).
N
k=0
Replace now z 1/N by z and since the z-transform of p[n] is P (z) = G(z)G(z −1 ), we
get that the z-transform of the left side of (2.5.19) is the right side of (2.5.19).
Multirate Identities
For the two expressions to be equal, kM mod N has to be a permutation, that is,
kM mod N = l has to have a unique solution for all l ∈ {0, . . . , N − 1}. If M and N
2.5. SIGNAL PROCESSING 73
(M,N) coprime
(a) M N N M
have a common factor L > 1, then M = M L and N = N L. Note that (kM mod
N ) mod L is zero, or kM mod N is a multiple of L and thus not a permutation.
If M and N are coprime, then Bezout’s identity [209] guarantees that there exist
two integers m and n such that mM + nN = 1. It follows that mM mod N = 1
thus, k = ml mod N is the desired solution to the equation k M mod N = l. This
property has an interesting generalization in multiple dimensions (see for example
[152]).
N −1
N −1
X(WNK z 1/N k 1/N N
) H (WN z ) = H(z) X(WNk z 1/N ),
k=0 k=0
Interchange of Filtering and Upsampling Filtering with a filter having the z-transform
H(z), followed by upsampling by N , is equivalent to upsampling followed by filtering
with H(z N ).
Using (2.5.18), it is immediate that both systems lead to an output with z-
transform X(z N )H(z N ) when the input is X(z).
In short, the last two properties simply say that filtering in the downsampled
domain can always be realized by filtering in the upsampled domain, but then with
74 CHAPTER 2
x[n]
3 3 +
z 3 3 z-1 +
z2 3 3 z-2
Figure 2.8 Polyphase transform (forward and inverse transforms for the case
N = 3 are shown). FIGURE 2.8 fignew2.4.4
the upsampled filter (down and upsampled stand for low versus high sampling rate
domain). The last two relations are shown in Figures 2.7(b) and (c).
These are called signal polyphase components. In z-transform domain, we can write
X(z) as the sum of shifted and upsampled polyphase components. That is,
N −1
X(z) = z −i Xi (z N ), (2.5.20)
i=0
2.5. SIGNAL PROCESSING 75
where
∞
Xi (z) = x[nN + i] z −n . (2.5.21)
n=−∞
Figure 2.8 shows the signal polyphase transform and its inverse (for the case N = 3).
Because the forward shift requires advance operators which are noncausal, a causal
version would produce a total delay of N − 1 samples between forward and inverse
polyphase transform. Such a causal version is obtained by multiplying the noncausal
forward polyphase transform by z −N +1 .
Later we will need to express the output of filtering with H followed by down-
sampling in terms of the polyphase components of the input signal. That is, we
need the 0th polyphase component of H(z)X(z). This is easiest if we define a
polyphase decomposition of the filter to have the reverse phase of the one used for
the signal, or
N −1
H(z) = z i Hi (z N ), (2.5.22)
i=0
with
∞
Hi (z) = h[N n − i]z −n , i = 0, . . . , N − 1. (2.5.23)
n=−∞
N −1
Y (z) = Hi (z) Xi (z).
i=0
⎛ . ⎞ ⎛ ⎞⎛
.. .. .. .. .. .. ⎞
⎜ . . . . ⎟ .
⎜ ⎟ ⎜ · · · h[L − 1] · · · h[L − N ] h[L − N − 1] · · · ⎟ ⎜ ⎟
⎜ y[0] ⎟ ⎜ x[0] ⎟
⎜ ⎟ = ⎜
⎜
⎟⎜
⎟ ⎟,
⎝ y[1] ⎠ ⎝ · · · 0 · · · 0 h[L − 1] · · · ⎠⎝ x[1] ⎠
.. .. .. .. .. ..
. . . . . .
where L is the filter length, and the matrix operator will be denoted by H. Simi-
76 CHAPTER 2
Here the matrix operator is denoted by G. Note that if h[n] = g[−n], then H = GT ,
a fact that will be important when analyzing orthonormal filter banks in Chapter 3.
Iω
|F (ω)|2 It t
| f (t)|2
ω ω
f”
f
6ω0
5ω0
4ω0
f f'
ω0 3ω0
f'
2ω0
ω0
t t
τ τ0 2τ0 3τ0 4τ0 5τ0 6τ0
(a) (b)
example, one can define intervals It and Iω which contain 90% of the energy of
the time- and frequency-domain functions, respectively, and are centered around
the center of gravity of |f (t)|2 and |F (ω)|2 (see Figure 2.9). This defines what we
call a tile in the time-frequency domain, as shown in Figure 2.9. For simplicity, we
assumed a complex basis function. A real basis function would be represented by
two mirror tiles at positive and negative frequencies.
Consider now elementary operations on a basis function and their effects on the
tile. Obviously, a shift in time by τ results in shifting of the tile by τ . Similarly,
modulation by ejω0 t shifts the tile by ω0 in frequency (vertically). This is shown
78 CHAPTER 2
where the function ψ(t) is usually a bandpass filter. Thus, large a’s (a 1)
correspond to long basis functions, and will identify long-term trends in the signal
to be analyzed. Small a’s (0 < a < 1) lead to short basis functions, which will follow
short-term behavior of the signal. This leads to the following: Scale is proportional
to the duration of the basis functions used in the signal expansion.
Because of this, and assuming that a basis function is a bandpass filter as in
wavelet analysis, high-frequency basis functions are obtained by going to small
scales, and therefore, scale is loosely related to inverse frequency. This is only
a qualitative statement, since scaling and modulation are fundamentally different
operations as was seen in Figure 2.10. The discussed scale is similar to those in
geographical maps, where large means a coarse, global view, and small corresponds
to a fine, detailed view.
Scale changes can be inverted if the function is continuous-time. In discrete
time, the situation is more complicated. From the discussion of multirate signal
processing in Section 2.5.3, we can see that upsampling (that is, a stretching of the
sequence) can be undone by downsampling by the same factor, and this with no
loss of information if done properly. Downsampling (or contraction of a sequence)
involves loss of information in general, since either a bandlimitation precedes the
downsampling, or aliasing occurs. This naturally leads to the notion of resolution of
a signal. We will thus say that the resolution of a finite-length signal is the minimum
number of samples required to represent it. It is thus related to the information
content of the signal. For infinite-length signals having finite energy and sufficient
decay, one can define the length as the essential support (for example, where 99%
of the energy is).
In continuous time, scaling does not change the resolution, since a scale change
affects both the sampling rate and the length of the signal, thus keeping the number
of samples constant. In discrete time, upsampling followed by interpolation does
2.6. TIME-FREQUENCY REPRESENTATIONS 79
not affect the resolution, since the interpolated samples are redundant. Downsam-
pling by N decreases the resolution by N , and cannot be undone. Figure 2.11 shows
the interplay of scale and resolution on simple discrete-time examples. Note that
the notion of resolution is central to multiresolution analysis developed in Chap-
ters 3 and 4. There, the key idea is to split a signal into several lower-resolution
components, from which the original, full-resolution signal can be recovered.
P ROOF
Consider the integral of t f (t) f (t). Using Cauchy-Schwarz inequality (2.2.2),
! !2
! !
! tf (t) f
(t) dt ! ≤ |tf (t)|2
dt |f (t)|2 dt. (2.6.4)
! !
R R R
The first integral on the right side is equal to Δ2t . Because f (t) has Fourier trans-
form jωF (ω), and using Parseval’s formula, we find that the second integral is equal
to (1/(2π))Δ2ω . Thus, the integral on the left side of (2.6.4) is bounded from above by
(1/(2π))Δ2t Δ2ω . Using integration by parts, and noting that f (t)f (t) = (1/2)(∂f 2 (t))/(∂t),
1 ∂f 2 (t) 1 !∞ 1
tf (t) f (t) dt = t dt = t f 2 (t)!−∞ − f 2 (t) dt.
R 2 R ∂t 2 2 R
By assumption, the limit of tf 2 (t) is zero at infinity, and, because the function is of unit
norm, the above equals −1/2. Replacing this into (2.6.4), we obtain
1 1 2 2
≤ Δt Δω ,
4 2π
or (2.6.2). To find a function that meets the lower bound note that Cauchy-Schwarz in-
equality is an equality when the two functions involved are equal within a multiplicative
factor, that is, from (2.6.4),
f (t) = ktf (t).
is maximized. It can be shown [216, 268] that the solution f (t) is the eigenfunction with
the largest eigenvalue satisfying
T
sin ω0 (t − τ )
f (τ ) dτ = λf (t). (2.6.6)
−T π(t − τ )
Call fn (t) the eigenfunction of (2.6.6) with eigenvalue λn . Then (i) each fn (t) is unique (up
to a scale factor), (ii) fn (t) and fm (t) are orthogonal for n = m, and (iii) with proper nor-
malization the set {fn (t)} forms an orthonormal basis for functions bandlimited to (−ω0 , ω0 )
[216]. These functions are called prolate spheroidal wave functions. Note that while (2.6.6)
seems to depend on both T and ω0 , the solution depends only on the product T · ω0 .
That is, one measures the similarity between the signal and shifts and modulates
of an elementary window, or
where
gω,τ (t) = w(t − τ )ejωt .
Thus, each elementary function used in the expansion has the same time and fre-
quency resolution, simply a different location in the time-frequency plane. It is
82 CHAPTER 2
t
(a) (b)
(c) (d)
Figure 2.12 The short-time Fourier and wavelet transforms. (a) Modulates
fig2.5.4
and shifts of a Gaussian window usedFIGURE 2.13
in the expansion. (b) Tiling of the time-
frequency plane. (c) Shifts and scales of the prototype bandpass wavelet. (d)
Tiling of the time-frequency plane.
thus natural to discretize the STFT on a rectangular grid (mω0 , nτ0 ). If the win-
dow function is a lowpass filter with a cutoff frequency of ωb , or a bandwidth of
2ωb , then ω0 is chosen smaller than 2ωb and τ0 smaller than π/ωb in order to get an
adequate sampling. Typically, the STFT is actually oversampled. A more detailed
discussion of the sampling of the STFT is given in Section 5.2, where the inversion
formula is also given. A real-valued version of the STFT, using cosine modulation
and an appropriate window, leads to orthonormal bases, which are discussed in
Section 4.8.
Examples of STFT basis functions and the tiling of the time-frequency plane
are given in Figures 2.12(a) and (b). To achieve good time-frequency resolution, a
Gaussian window (see (2.6.5)) can be used, as originally proposed by Gabor [102].
Thus, the STFT is often called Gabor transform as well.
The spectrogram is the energy distribution associated with the STFT, that is,
Because the STFT can be thought of as a bank of filters with impulse responses
gω,τ (−t) = w(−t − τ ) e−jωτ , the spectrogram is the magnitude squared of the filter
outputs.
where a ∈ R+ and b ∈ R. That is, we measure the similarity between the signal
f (t) and shifts and scales of an elementary function, since
where
1 t−b
ψa,b (t) = √ ψ
a a
√
and the factor 1/ a is used to conserve the norm. Now, the functions used in
the expansion have changing time-frequency tiles because of the scaling. For small
a (a < 1), ψa,b (t) will be short and of high frequency, while for large a (a > 1),
ψa,b (t) will be long and of low frequency. Thus, a natural discretization will use
large time steps for large a, and conversely, choose fine time steps for small a. The
discretization of (a, b) is then of the form (an0 , an0 · τ0 ), and leads to functions for the
expansion as shown in Figure 2.12(c). The resulting tiling of the time-frequency
plane is shown in Figure 2.12(d) (the case a = 2 is shown). Special choices for
ψ(t) and the discretization lead to orthonormal bases or wavelet series as studied
in Chapter 4, while the overcomplete, continuous wavelet transform in (2.6.8) is
discussed in Section 5.1.
function of the interval [nT, (n+1)T ), periodizing each windowed signal with period
T and applying an expansion such as the Fourier series on each periodized signal (see
Section 4.1.2). Of course, the arbitrary segmentation at points nT creates artificial
boundary problems. Yet, such transforms are used due to their simplicity. For
example, in discrete time, block transforms such as the Karhunen-Loève transform
(see Section 7.1.1) and its approximations are quite popular.
T F Dg (ω, τ ) = T DFf (ω − ω0 , τ − τ0 ).
D EFINITION 2.8
An operator A which maps one Hilbert space H1 into another Hilbert space
H2 (which may be the same) is called a linear operator if for all x, y in H1
and α in C
The operator A−1 is called the inverse of A. An important result is the following:
Suppose A is a bounded linear operator mapping H onto itself, and A < 1. Then
I − A is invertible, and for every y in H,
∞
−1
(I − A) y = Ak y. (2.A.1)
k=0
86 CHAPTER 2
Note that although the above expansion has the same form for a scalar as well
as an operator, one should not forget the distinction between the two. Another
important notion is that of an adjoint operator.15 It can be shown that for every x
in H1 and y in H2 , there exists a unique y ∗ from H1 , such that
±1
± 1 U1
•••
(a) U2
•••
±1 Un-2
•••
•••
•••
Un-1
±1
•••
±1
•••
Un
(b) •••
•••
•••
•••
•••
•••
•••
•••
Ui
cos α − sin α
Gα = . (2.B.1)
sin α cos α
The way to demonstrate this is to show that any real, unitary n × n matrix U n can
be expressed as
U n−1 0
U n = Rn−2 · · · R0 , (2.B.2)
0 ±1
88 CHAPTER 2
where D is diagonal with dii = ejθi , and H i are Householder blocks I − 2ui uTi .
The fact that we mention the Householder factorization here is because we will
use its polynomial version to factor lossless matrices in Chapter 3.
Note that the Householder building block is unitary, and that the factorization
in (2.B.3) can be proved similarly to the factorization using Givens rotations. That
is, we can first show that
jα
1 e 0 0
√ H 1U = ,
c 0 U1
In Section 2.4.3, when discussing Fourier series, we pointed out possible convergence
problems such as the Gibbs phenomenon. In this appendix, we first review different
types of convergence and then discuss briefly some convergence properties of Fourier
series and transforms. Then, we discuss regularity of functions and the associated
decay of the Fourier series and transforms. More details on these topics can be
found for example in [46, 326].
2.C.1 Convergence
This is a relatively weak form of convergence, since certain properties of fn (t), such
as continuity, are not passed on to the limit. Consider the truncated Fourier series,
that is (from (2.4.13))
n
fn (t) = F [k] ejkwot . (2.C.1)
k=−n
This Fourier series converges pointwise for all t when F [k] are the Fourier coefficients
(see (2.4.14)) of a piecewise smooth17 function f (t). Note that while each fn (t) is
continuous, the limit need not be.
17
A piecewise smooth function on an interval is piecewise continuous (finite number of disconti-
nuities) and its derivative is also piecewise continuous.
90 CHAPTER 2
lim f − fn 2 = 0.
n→∞
Note that this does not mean that limn→∞ fn = f for all t, but only almost ev-
erywhere. For example, the truncated Fourier series (2.C.1) of a piecewise smooth
function converges in the mean square sense to f (t) when F [k] are the Fourier se-
ries coefficients of f (t), even though at a point of discontinuity t0 , f (t0 ) might be
different from limn→∞ fn (t0 ) which equals the mean of the right and left limits.
In the case of the Fourier transform, the concept analogous to the truncated
Fourier series (2.C.1) is the truncated integral defined from the Fourier inversion
formula (2.4.2) as
c
1
fc (t) = F (ω) ejωt dω
2π −c
where F (ω) is the Fourier transform of f (t) (see (2.4.1)). The convergence of the
above integral as c → ∞ is an important question, since the limit limc→∞ fc (t)
might not equal f (t). Under suitable restrictions on f (t), equality will hold. As an
example, if f (t) is piecewise smooth and absolutely integrable, then limc→∞ fc (t0 ) =
f (t0 ) at each point of continuity and is equal to the mean of the left and right limits
at discontinuity points [326].
2.C.2 Regularity
So far, we have mostly discussed functions satisfying some integral conditions (abso-
lutely or square-integrable functions for example). Instead, regularity is concerned
with differentiability. The space of continuous functions is called C 0 , and similarly,
C n is the space of functions having n continuous derivatives.
A finer analysis is obtained using Lipschitz (or Hölder) exponents. A function
f is called Lipschitz of order α, 0 < α ≤ 1, if for any t and some small , we have
It can be shown (see [216]) that if a function f (t) and all its derivatives up
to order n exist and are of bounded variation, then the Fourier transform can be
bounded by
c
F (ω) ≤ , (2.C.3)
1 + |ω|n+1
that is, it decays as O(1/|ω|n+1 ) for large ω. Conversely, if F (ω) has a decay as in
(2.C.3), then f (t) has n−1 continuous derivatives, and the nth derivative exists but
might be discontinuous. A finer analysis of regularity and associated localization in
Fourier domain can be found in [241], in particular for functions in Hölder spaces
and using different norms in Fourier domain.
92 CHAPTER 2
P ROBLEMS
2.1 Legendre polynomials: Consider the interval [−1, 1] and the vectors 1, t, t2 , t3 , . . .. Using
Gram-Schmidt orthogonalization, find an equivalent orthonormal set.
2.2 Prove Theorem 2.4, parts (a), (b), (d), (e), for finite-dimensional Hilbert spaces, Rn or C n .
2.3 Orthogonal transforms and l∞ norm: Orthogonal transforms conserve the l2 norm, but not
others, in general. The l∞ norm of a vector is defined as (assume v ∈ Rn ):
(a) Consider n = 2 and the set of real orthogonal transforms T2 , that is, plane rotations.
Given the set of vectors v with unit l2 norm (that is, vectors on the unit circle), give
lower and upper bounds such that
a2 ≤ l∞ [T2 · v] ≤ b2 .
(b) Give the lower and upper bounds for the general case n > 2, that is, an and bn .
2.4 Norm of operators: Consider operators that map l2 (Z) to itself, and indicate their norm,
or bounds on their norm.
2.6 Least-squares solution: Show that for the least-squares solution obtained in Section 2.3.2,
the partial derivatives ∂(|y − ŷ|2 )/∂ x̂i are all zero.
2.7 Least-squares solution to a linear system of equations: The general solution was given in
Equation (2.3.4–2.3.5).
2.8 Parseval’s formulas can be proven by using orthogonality and biorthogonality relations of
the basis vectors.
(a) Show relations (2.2.5–2.2.6) using the orthogonality of the basis vectors.
(b) Show relations (2.2.11–2.2.13) using the biorthogonality of the basis vectors.
PROBLEMS 93
2.9 Consider the space of square-integrable real functions on the interval [−π, π], L2 ([−π, π]),
and the associated orthonormal basis given by
&
1 cos nx sin nx
√ , √ , √ , n = 1, 2, . . .
2π π π
Consider the following two subspaces: S – space of symmetric functions, that is, f (x) =
f (−x), on [−π, π], and A – space of antisymmetric functions, f (x) = −f (−x), on [−π, π].
(a) Show how any function f (x) from L2 ([−π, π]) can be written as f (x) = fs (x) + fa (x),
where fs (x) ∈ S and fa (x) ∈ A.
(b) Give orthonormal bases for S and A.
(c) Verify that L2 ([−π, π]) = S ⊕ A.
2.10 Downsampling by N : Prove (2.5.13) by going back to the underlying time-domain signal
and resampling it with an N -times longer sampling period. That is, consider x[n] and
y[n] = x[nN ] as two sampled versions of the same continuous-time signal, with sampling
periods T and N T , respectively. Hint: Recall that the discrete-time Fourier transform
X(ejω ) of x[n] is (see (2.4.36))
1
∞
ω ω 2π
X(ejω ) = XT ( ) = XC −k ,
T T T T
k=−∞
where T is the sampling period. Then Y (ejω ) = XNT (ω/N T ) (since the sampling period
is now N T ), where XNT (ω/N T ) can be written similarly to the above equation. Finally,
split the sum involved in XNT (ω/N T ) into k = nN + l, and gathering terms, (2.5.13) will
follow.
2.11 Downsampling and aliasing: If an arbitrary discrete-time sequence x[n] is input to a filter
followed by downsampling by 2, we know that an ideal half-band lowpass filter (that is,
|H(ejω )| = 1, |ω| < π/2, and H(ejω ) = 0, π/2 ≤ |ω| ≤ π) will avoid aliasing.
2.12 In pattern recognition, it is sometimes useful to expand a signal using the desired pattern,
or template, and its shifts, as basis functions. For simplicity, consider a signal of length N ,
x[n], n = 0, . . . , N − 1, and a pattern p[n], n = 0, . . . , N − 1. Then, choose as basis functions
(a) Derive a simple condition on p[n], so that any x[n] can be written as a linear combi-
nation of {ϕk }.
94 CHAPTER 2
(b) Assuming the previous condition is met, give the coefficients αk of the expansion
N−1
x[n] = αk ϕk [n].
k=0
2.13 Show that a linear, periodically time-varying system of period N can be implemented with
a polyphase transform followed by upsampling by N , N filter operations and a summation.
(a) Give the expression for h2 (t), and verify that it decays as 1/t2 .
(b) Same for h3 (t), which decays as 1/t3 . Show that H3 (ω) has a continuous derivative.
(c) By generalizing the construction above of H2 (ω) and H3 (ω), show that one can obtain
hi (t) with decay 1/ti . Also, show that Hi (ω) has a continuous (i − 2)th derivative.
However, the filters involved become spread out in time, and the result is only inter-
esting asymptotically.
2.15 Uncertainty relation: Consider the uncertainty relation Δ2ω Δ2t ≥ π/2.
2 2
√ does not change Δω · Δt . Either use scaling that conserves the L2
(a) Show that scaling
norm (f (t) = af (at)) or be sure to renormalize Δ2ω , Δ2t .
(b) Can you give the time-bandwidth product of a rectangular pulse, p(t) = 1, −1/2 ≤
t ≤ 1/2, and 0 otherwise?
(c) Same as above, but for a triangular pulse.
(d) What can you say about the time-bandwidth product as the time-domain function is
obtained from convolving more and more rectangular pulse with themselves?
(a) Assume the filter has real coefficients. Show pole-zero locations, and that numerator
and denominator polynomials are mirrors of each other.
(b) Given h[n], the causal, real-coefficient
impulse response of a stable allpass filter, give
its autocorrelation a[k] = n h[n]h[n − k]. Show that the set {h[n − k]}, k ∈ Z, is an
orthonormal basis for l2 (Z). Hint: Use Theorem 2.4.
(c) Show that the set {h[n − 2k]} is an orthonormal set but not a basis for l2 (Z).
2.17 Parseval’s relation for nonorthogonal bases: Consider the space V = Rn and a biorthogonal
basis, that is, two sets {αi } and {βi } such that
αi , βi = δ[i − j] i, j = 0, . . . , n − 1
PROBLEMS 95
(a) Show that any vector v ∈ V can be written in the following two ways:
n−1
n−1
v = αi , v βi = βi , v αi
i=0 i=0
(b) Call vα the vector with entries αi , v and similarly vβ with entries βi , v. Given v,
what can you say about vα and vβ ?
(c) Show that the generalization of Parseval’s identity to biorthogonal systems is
v2 = v, v = vα , vβ
and
v, g = vα , gβ .
2.18 Circulant matrices: An N × N circulant matrix C is defined by its first line, since subse-
quent lines are obtained by a right circular shift. Denote the first line by {c0 , cN−1 , . . . , c1 }
so that C corresponds to a circular convolution with a filter having impulse response
{c0 , c1 , c2 , . . . , cN−1 }.
2.19 Walsh basis: To define the Walsh basis, we need the Kronecker product of matrices defined
in (2.3.2). Then, the matrix W k , of size 2k × 2k , is
1 1 1 1
Wk = ⊗ W k−1 , W 0 = [1], W1 = .
1 −1 1 −1
Therefore, we would like to construct orthonormal sets of basis functions, {ϕk [n]},
which are complete in the space of square-summable sequences, l2 (Z). More general,
biorthogonal and overcomplete sets, will be considered as well.
The discrete-time Fourier series, seen in Chapter 2, is an example of such an
orthogonal series expansion, but it has a number of shortcomings. Discrete-time
bases better suited for signal processing tasks will try to satisfy two conflicting
requirements, namely to achieve good frequency resolution while keeping good time
locality as well. Additionally, for both practical and computational reasons, the set
of basis functions has to be structured. Typically, the infinite set of basis functions
{ϕk } is obtained from a finite number of prototype sequences and their shifted
versions in time. This leads to discrete-time filter banks for the implementation of
97
98 CHAPTER 3
such structured expansions. This filter bank point of view has been central to the
developments in the digital signal processing community, and to the design of good
basis functions or filters in particular. While the expansion is not time-invariant,
it will at least be periodically time-invariant. Also, the expansions will often have
a successive approximation property. This means that a reconstruction based on
an appropriate subset of the basis functions leads to a good approximation of the
signal, which is an important feature for applications such as signal compression.
Linear signal expansions have been used in digital signal processing since at
least the 1960’s, mainly as block transforms, such as piecewise Fourier series and
Karhunen-Loève transforms [143]. They have also been used as overcomplete ex-
pansions, such as the short-time Fourier transform (STFT) for signal analysis and
synthesis [8, 226] and in transmultiplexers [25]. Increased interest in the subject,
especially in orthogonal and biorthogonal bases, arose with work on compression,
where redundancy of the expansion such as in the STFT is avoided. In particular,
subband coding of speech [68, 69] spurred a detailed study of critically sampled
filter banks. The discovery of quadrature mirror filters (QMF) by Croisier, Esteban
and Galand in 1976 [69], which allows a signal to be split into two downsampled
subband signals and then reconstructed without aliasing (spectral foldbacks) even
though nonideal filters are used, was a key step forward.
Perfect reconstruction filter banks, that is, subband decompositions, where the
signal is a perfect replica of the input, followed soon. The first orthogonal solution
was discovered by Smith and Barnwell [270, 271] and Mintzer [196] for the two-
channel case. Fettweiss and coworkers [98] gave an orthogonal solution related
to wave digital filters [97]. Vaidyanathan, who established the relation between
these results and certain unitary operators (paraunitary matrices of polynomials)
studied in circuit theory [23], gave more general orthogonal solutions [305, 306]
as well as lattice factorizations for orthogonal filter banks [308, 310]. Biorthogonal
solutions were given by Vetterli [315], as well as multidimensional quadrature mirror
filters [314]. Biorthogonal filter banks, in particular with linear phase filters, were
investigated in [208, 321] and multidimensional filter banks were further studied in
[155, 163, 257, 264, 325]. Recent work includes filter banks with rational sampling
factors [166, 206] and filter banks with block sampling [158]. Additional work on
the design of filter banks has been done in [144, 205] among others.
In parallel to this work on filter banks, a generalization of block transforms
called lapped orthogonal transforms (LOT’s) was derived by Cassereau [43] and
Malvar [186, 188, 189]. An attractive feature of a subclass of LOT’s is the existence
of fast algorithms for their implementation since they are modulated filter banks
(similar to a “real” STFT). The connection of LOT’s with filter banks was shown,
in [321].
99
Multidimensional expansions and filter banks are derived in Section 3.6. Both
separable and nonseparable systems are considered. In the nonseparable case, the
focus is mostly on two-channel decompositions, while more general cases are indi-
cated as well.
Section 3.7 discusses a scheme that has received less attention in the filter bank
literature, but is nonetheless very important in applications, and is called a trans-
multiplexer. It is dual to the analysis/synthesis scheme used in compression appli-
cations, and is used in telecommunications.
The two appendices contain more details on orthogonal solutions and their fac-
torizations as well as on multidimensional sampling.
The material in this chapter covers filter banks at a level of detail which is
adequate for the remainder of the book. For a more exhaustive treatment of filter
banks, we refer the reader to the text by Vaidyanathan [308]. Discussions of fil-
ter banks and multiresolution signal processing are also contained in the book by
Akansu and Haddad [3].
where
X[k] = ϕk [l], x[l] = ϕ∗k [l] x[l], (3.1.2)
l
is the transform of x[n]. The basis functions ϕk satisfy the orthonormality1 con-
straint
ϕk [n], ϕl [n] = δ[k − l]
1
The first constraint is orthogonality between basis vectors. Then, normalization leads to
orthonormality. The terms “orthogonal” and “orthonormal” will often be used interchangeably,
unless we want to insist on the normalization and then use the latter.
3.1. SERIES EXPANSIONS OF DISCRETE-TIME SIGNALS 101
and the set of basis functions is complete, so that every signal from l2 (Z) can
be expressed using (3.1.1). An important property of orthonormal expansions is
conservation of energy,
x2 = X2 .
Biorthogonal expansions, on the other hand, are given as
x[n] = ϕk [l], x[l] ϕ̃k [n] = X̃[k] ϕ̃k [n], (3.1.3)
k∈Z k∈Z
= ϕ̃k [l], x[l] ϕk [n] = X[k] ϕk [n],
k∈Z k∈Z
where
X̃[k] = ϕk [l], x[l] and X[k] = ϕ̃k [l], x[l]
are the transform coefficients of x[n] with respect to {ϕ̃k } and {ϕk }. The dual bases
{ϕk } and {ϕ̃k } satisfy the biorthogonality constraint
Note that in this case, conservation of energy does not hold. For stability of the
expansion, the transform coefficients have to satisfy
A |X[k]|2 ≤ x2 ≤ B |X[k]|2
k k
with a similar relation for the coefficients X̃[k]. In the biorthogonal case, conserva-
tion of energy can be expressed as
Finally, overcomplete expansions can be of the form (3.1.1) or (3.1.3), but with
redundant sets of functions, that is, the functions ϕk [n] used in the expansions are
not linearly independent.
N −1
(i)
X [k] = x(i) [iN + l] e−j2πkl/N k = 0, 1, . . . , N − 1. (3.1.8)
l=0
Reconstruction of x[n] from X (i) [k] is obvious. Recover x(i) [n] by inverting (3.1.8)
(see also (3.1.6)) and then get x[n] following (3.1.7) by juxtaposing the various
x(i) [n]. This leads to
∞ N −1
(i)
x[n] = X (i) [k] ϕk [n],
i=−∞ k=0
3.1. SERIES EXPANSIONS OF DISCRETE-TIME SIGNALS 103
where
(i)
1 j2πkn/N
Ne n = iN + l, l = 0, 1, . . . , N − 1,
ϕk [n] =
0 otherwise.
(i)
The ϕk [n] are simply the basis functions of the DFT shifted to the appropriate
interval [iN, . . . , (i + 1)N − 1].
The above expansion is called a block discrete-time Fourier series, since the
signal is divided into blocks of size N , which are then Fourier transformed. In
matrix notation, the overall expansion of the transform is given by a block diagonal
matrix, where each block is an N × N Fourier matrix F N ,
⎛ . ⎞ ⎛. ⎞⎛ . ⎞
.. .. ..
⎜ (−1) ⎟ ⎜ ⎟ ⎜ (−1) ⎟
⎜X ⎟ ⎜ FN ⎟⎜x ⎟
⎜ (0) ⎟ ⎜ ⎟ ⎜ (0) ⎟
⎜ X ⎟ = ⎜ FN ⎟⎜ x ⎟,
⎜ (1) ⎟ ⎜ ⎟ ⎜ (1) ⎟
⎝ X ⎠ ⎝ FN ⎠⎝ x ⎠
.. .. ..
. . .
√
and X (i) , x(i) are size-N vectors. Up to a scale factor of 1/ N (see (3.1.6)), this is
a unitary transform. This transform is not shift-invariant in general, that is, if x[n]
has transform X[k], then x[n − l] does not necessarily have the transform X[k − l].
However, it can be seen that
That is, the transform is periodically time-varying with period N .2 Note that we
have achieved a certain time locality. Components of the signal that exist only in
an interval [iN . . . (i + 1)N − 1] will only influence transform coefficients in the same
interval. Finally, the basis functions in this block transform are naturally divided
into size-N subsets, with no overlaps between subsets, that is
(i) (m)
ϕk [n], ϕl [n] = 0, i = m,
simply because the supports of the basis functions are disjoint. This abrupt change
between intervals, and the fact that the interval length and position are arbitrary,
are the drawbacks of this block DTFS.
In this chapter, we will extend the idea of block transforms in order to address
these drawbacks, and this will be done using filter banks. But first, we turn our
attention to the simplest block transform case, when N = 2. This is followed by
the simplest filter bank case, when the filters are ideal sinc filters. The general case,
to which these are a prelude, lies between these extremes.
2
Another way to say this is that the ”shift by N ” and the size-N block transform operators
commute.
104 CHAPTER 3
It follows that the even-indexed basis functions are translates of each other, and so
are the odd-indexed ones, or
The transform is
1
X[2k] = ϕ2k , x = √ (x[2k] + x[2k + 1]) , (3.1.12)
2
1
X[2k + 1] = ϕ2k+1 , x = √ (x[2k] − x[2k + 1]) . (3.1.13)
2
The reconstruction is obtained from
x[n] = X[k] ϕk [n], (3.1.14)
k∈Z
as usual for an orthonormal basis. Let us prove that the set ϕk [n] given in (3.1.10)
is an orthonormal basis for l2 (Z). While the proof is straightforward in this simple
case, we indicate it for two reasons. First, it is easy to extend it to any block
transform, and second, the method of the proof can be used in more general cases
as well.
P ROPOSITION 3.1
The set of functions as given in (3.1.10) is an orthonormal basis for signals
from l2 (Z).
3.1. SERIES EXPANSIONS OF DISCRETE-TIME SIGNALS 105
P ROOF
To check that the set of basis functions {ϕk }k∈Z indeed constitutes an orthonormal basis
for signals from l2 (Z), we have to verify that:
Consider (a). We want to show that ϕk , ϕl = δ[k − l]. Take k even, k = 2i. Then, for l
smaller than 2i or larger than 2i + 1, the inner product is automatically zero since the basis
functions do not overlap. For l = 2i, we have
1 1
ϕ2i , ϕ2i = ϕ22i [2i] + ϕ22i [2i + 1] = + = 1.
2 2
For l = 2i + 1, we get
ϕ2i , ϕ2i+1 = ϕ2i [2i] · ϕ2i+1 [2i] + ϕ2i [2i + 1] · ϕ2i+1 [2i + 1] = 0.
A similar argument can be followed for odd l’s, and thus, orthonormality is proven. Now
consider (b). We have to demonstrate that any signal belonging to l2 (Z) can be expanded
using (3.1.14). This is equivalent to showing that there exists no x[n] with x > 0, such
that it has a zero expansion, that is, such that ϕk , x = 0, for all k. To prove this,
suppose it is not true, that is, suppose that there exists an x[n] with x > 0, such that
ϕk , x = 0, for all k. Thus
ϕk , x = 0 ⇐⇒ ϕk , x2 = 0 ⇐⇒ |ϕk [n], x[n]|2 = 0. (3.1.15)
k∈Z
Since the last sum consists of strictly nonnegative terms, (3.1.15) is possible if and only if
First, take k even, and consider X[2k] = 0. Because of (3.1.12), it means that x[2k] =
−x[2k + 1] for all k. Now take the odd k’s, and look at X[2k + 1] = 0. From (3.1.13), it
follows that x[2k] = x[2k+1] for all k. Thus, the only solution to the above two requirements
is x[2k] = x[2k + 1] = 0, or a contradiction with our assumption. This shows that there is
no sequence x[n], x > 0 such that X = 0, and proves completeness.
Now, we would like to show how the expansion (3.1.12–3.1.14) can be implemented
using convolutions, thus leading to filter banks. Consider the filter h0 [n] with the
following impulse response:
√1
2
n = −1, 0,
h0 [n] = (3.1.16)
0 otherwise.
Note that this is a noncausal filter. Then, X[2k] in (3.1.12) is the result of the
convolution of h0 [n] with x[n] at instant 2k since
1 1
h0 [n] ∗ x[n] |n=2k = h0 [2k − l] x[l] = √ x[2k] + √ x[2k + 1] = X[2k].
l∈Z
2 2
106 CHAPTER 3
analysis synthesis
(a) y1 x1
H1 2 2 G1
x + x^
H0 2 2 G0
y0 x0
low high
band band
0 π π ω
---
2
Figure 3.1 Two-channel filter bank with analysis filters h0 [n], h1 [n] and synthe-
sis filters g0 [n], g1 [n]. If the filter bank implements an orthonormal transform,
then g0 [n] = h0 [−n] and g1 [n] = h1 [−n]. (a) Block diagram. (b) Spectrum
splitting performed by the filter bank.
It is important to note that the impulse responses of the analysis filters are time-
reversed versions of the basis functions,
since convolution is an inner product involving time reversal. Also, the filters we
defined in (3.1.16) and (3.1.17) are noncausal, which is to be expected since, for
example, the computation of X[2k] in (3.1.12) involves x[2k + 1], that is, a future
sample. To summarize this discussion, it is easiest to visualize the analysis in matrix
notation as
⎛. ⎞
..
⎜ ϕ0 [n] ⎟
⎜ 1 23 4 ⎟
⎛ . ⎞ ⎛ . ⎞ ⎜ ⎟⎛ . ⎞
.. .. ⎜ h 0 [0] h 0 [−1] ⎟ ..
⎜ ⎟
⎜ y [0] ⎟ ⎜ X[0] ⎟ ⎜ h1 [0] h1 [−1] ⎟ ⎜ x[0] ⎟
⎜ 0 ⎟ ⎜ ⎟ ⎜ 3 41 2 ⎟⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ y1 [0] ⎟ ⎜ X[1] ⎟ ⎜ ϕ1 [n] ⎟ ⎜ x[1] ⎟
⎜ ⎟ = ⎜ ⎟ = ⎜ ⎟⎜ ⎟,
⎜ y0 [1] ⎟ ⎜ X[2] ⎟ ⎜ 1
ϕ2 [n]
23 4 ⎟ ⎜ x[2] ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝ y1 [1] ⎠ ⎝ X[3] ⎠ ⎜ h0 [0] h0 [−1] ⎟ ⎝ x[3] ⎠
⎜ ⎟
.. .. ⎜ h1 [0] h1 [−1] ⎟ ..
. . ⎜ 3 41 2 ⎟ .
⎜ ⎟
⎝ ϕ3 [n] ⎠
..
.
(3.1.18)
where we again see the shift property of the basis functions (see (3.1.11)). We can
verify the shift invariance of the analysis with respect to even shifts. If x [n] =
x[n − 2l], then
1 1
X [2k] = √ (x [2k] + x [2k + 1]) = √ (x[2k − 2l] + x[2k + 1 − 2l])
2 2
= X[2k − 2l]
and similarly for X [2k + 1] which equals X[2k + 1 − 2l], thus verifying (3.1.9).
This does not hold√ for odd shifts, however. For example, δ[n] √ has the transform
(δ[n] + δ[n − 1])/ 2 while δ[n − 1] leads to (δ[n] − δ[n − 1])/ 2.
What about the synthesis or reconstruction given by (3.1.14)? Define two filters
g0 and g1 with impulse responses equal to the basis functions ϕ0 and ϕ1
Therefore
ϕ2k [n] = g0 [n − 2k], ϕ2k+1 [n] = g1 [n − 2k], (3.1.20)
108 CHAPTER 3
That is, each sample from yi [k] adds a copy of the impulse response of gi [n] shifted by
2k. This can be implemented by an upsampling by 2 (inserting a zero between every
two samples of yi [k]) followed by a convolution with gi [n] (see also Section 2.5.3).
This is shown in the right side of Figure 3.1(a), and is called a synthesis filter bank.
What we have just explained is a way of implementing a structured orthogonal
expansion by means of filter banks. We summarize two characteristics of the filters
which will hold in general orthogonal cases as well.
(a) The impulse responses of the synthesis filters equal the first set of basis func-
tions
gi [n] = ϕi [n], i = 0, 1.
(b) The impulse responses of the analysis filters are the time-reversed versions of
the synthesis ones
hi [n] = gi [−n], i = 0, 1.
What about the signal processing properties of our decomposition? From (3.1.12)
and (3.1.13), we recall that one channel computes the average and the other the
difference of two successive samples. While these are not the ”best possible” low-
pass and highpass filters (they have, however, good time localization), they lead to
an important interpretation. The reconstruction from y0 [k] (that is, the first sum
in (3.1.21)) is the orthogonal projection of the input onto the subspace spanned by
ϕ2k [n], that is, an average or coarse version of x[n]. Calling it x0 , it equals
1
x0 [2k] = x0 [2k + 1] = (x[2k] + x[2k + 1]) .
2
The other sum in (3.1.21), which is the reconstruction from y1 [k], is the orthogonal
projection onto the subspace spanned by ϕ2k+1 [n]. Denoting it by x1 , it is given by
1
x1 [2k] = (x[2k] − x[2k + 1]) , x1 [2k + 1] = −x1 [2k].
2
This is the difference or added detail necessary to reconstruct x[n] from its coarse
version x0 [n]. The two subspaces spanned by {ϕ2k } and {ϕ2k+1 } are orthogonal
and the sum of the two projections recovers x[n] perfectly, since summing (x0 [2k] +
x1 [2k]) yields x[2k] and similarly (x0 [2k + 1] + x1 [2k + 1]) gives x[2k + 1].
3.1. SERIES EXPANSIONS OF DISCRETE-TIME SIGNALS 109
While the time reversal is only formal here (since g0 [n] is symmetric in n), the
shift by one is important for the completeness of the highpass and lowpass impulse
responses in the space of square-summable sequences.
Just as in the Haar case, the basis functions are obtained from the filter impulse
responses and their even shifts,
and the coefficients of the expansion ϕ2k , x and ϕ2k+1 , x are obtained by filtering
with h0 [n] and h1 [n] followed by downsampling by 2, with hi [n] = gi [−n].
P ROPOSITION 3.2
The set of functions as given in (3.1.25) is an orthonormal basis for signals
from l2 (Z).
P ROOF
To prove that the set of functions ϕk [n] is indeed an orthonormal basis, again we would
have to demonstrate orthonormality of the set as well as completeness. Let us demonstrate
orthonormality of basis functions. We will do that only for
as an exercise (Problem 3.1). First, because ϕ2k [n] = ϕ0 [n − 2k], it suffices to show (3.1.26)
for k = 0, or equivalently, to prove that
As we said, the filters in this case have perfect frequency resolution. However,
the decay of the filters in time is rather poor, being of the order of 1/n. The
multiresolution interpretation we gave for the Haar case holds here as well. The
perfect lowpass filter h0 , followed by downsampling, upsampling and interpolation
by g0 , leads to a projection of the signal onto the subspace of sequences bandlimited
to [−π/2, π/2], given by x0 . Similarly, the other path in Figure 3.1 leads to a
projection onto the subspace of half-band highpass signals given by x1 . The two
subspaces are orthogonal and their sum is l2 (Z). It is also clear that x0 is a coarse,
lowpass approximation to x, while x1 contains the additional frequencies necessary
to reconstruct x from x0 .
An example describing the decomposition of a signal into downsampled lowpass
and highpass components, with subsequent reconstruction using upsampling and
interpolation, is shown in Figure 3.2. Ideal half-band filters are assumed. The
reader is encouraged to verify this spectral decomposition using the downsampling
and upsampling formulas (see (2.5.13) and (2.5.17)) from Section 2.5.3.
3.1.4 Discussion
In both the Haar and sinc cases above, we noticed that the expansion was not
time-invariant, but periodically time-varying. We show below that time invariance
in orthonormal expansions leads only to trivial solutions, and thus, any meaningful
orthonormal expansion of l2 (Z) will be time-varying.
P ROPOSITION 3.3
An orthonormal time-invariant signal decomposition will have no frequency
resolution.
3.1. SERIES EXPANSIONS OF DISCRETE-TIME SIGNALS 111
|X(ejω)|
−π π ω
(a)
(b)
(c)
(d)
(e)
|X(ejω)|
−π π ω
(f)
Figure 3.2 Two-channel decomposition of a signal using ideal filters. Left side
FIGURE
depicts the process in the lowpass TUT3.1
channel, while the right side depicts
figtut3.1
the
process in the highpass channel. (a) Original spectrum. (b) Spectrums after
filtering. (c) Spectrums after downsampling. (d) Spectrums after upsampling.
(e) Spectrums after interpolation filtering. (f) Reconstructed spectrum.
P ROOF
An expansion is time-invariant if x[n] ←→ X[k], then x[n − m] ←→ X[k − m] for all x[n] in
l2 (Z). Thus, we have that
By a change of variable, the left side is equal to ϕk [n+m], x[n], and then using k = k −m,
we find that
ϕk +m [n + m] = ϕk [n], (3.1.29)
that is, the expansion operator is Toeplitz. Now, we want the expansion to be orthonormal,
that is, using (3.1.29),
|Φ(ejω )|2 = 1,
showing that the basis functions have no frequency selectivity since they are allpass func-
tions.
112 CHAPTER 3
Haar Sinc
√ sin(π/2)n
g0 [n] (δ[n] + δ[n − 1])/ 2 √1
√ 2 (π/2)n
g1 [n] (δ[n] − δ[n − 1])/ 2 √ (−1)n g0 [−n +
1]
jω
√ −j(ω/2) 2 for ω ∈ [−π/2, π/2],
G0 (e ) 2e cos(ω/2)
√ −j(ω/2) 0 otherwise.
G1 (ejω ) 2je sin(ω/2) −e−jω G0 (−e−jω )
chapter. We start with tools for analyzing general filter banks. Then, we examine
orthonormal and linear phase two-channel filter banks in more detail. We then
present results valid for general two-channel filter banks and examine some special
cases, such as IIR solutions.
Time-Domain Analysis Recall that in the Haar case (see (3.1.18)), in order to vi-
sualize block time invariance, we expressed the transform coefficients via an infinite
matrix, that is
⎛ . ⎞ ⎛ . ⎞ ⎛ . ⎞
.. .. ..
⎜ y [0] ⎟ ⎜ X[0] ⎟ ⎜ x[0] ⎟
⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ y1 [0] ⎟ ⎜ X[1] ⎟ ⎜ x[1] ⎟
⎜ ⎟ = ⎜ ⎟ = Ta · ⎜ ⎟. (3.2.1)
⎜ y0 [1] ⎟ ⎜ X[2] ⎟ ⎜ x[2] ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ y1 [1] ⎠ ⎝ X[3] ⎠ ⎝ x[3] ⎠
.. .. ..
. . .
3 41 2 3 41 2 3 41 2
y X x
Here, the transform coefficients X[k] are expressed in another form as well. In
the filter bank literature, it is more common to write X[k] as outputs of the two
branches in Figure 3.1(a), that is, as two subband outputs denoted by y0 [k] = X[2k],
114 CHAPTER 3
and y1 [k] = X[2k + 1]. Also, in (3.2.1), T a · x represents the inner products, where
T a is the analysis matrix and can be expressed as
⎛ .. .. .. .. .. .. ⎞
. . . . . .
⎜ h [L − 1] h [L − 2] h [L − 3] · · · h [0] ⎟
⎜ 0 0 0 0 0 0 ⎟
⎜ ⎟
⎜ h [L − 1] h1 [L − 2] h1 [L − 3] · · · h1 [0] 0 0 ⎟
Ta = ⎜ 1 ⎟,
⎜ 0 0 h0 [L − 1] · · · h0 [2] h0 [1] h0 [0] ⎟
⎜ ⎟
⎝ 0 0 h1 [L − 1] · · · h1 [2] h1 [1] h1 [0] ⎠
.. .. .. .. .. ..
. . . . . .
where we assume that the analysis filters hi [n] are finite impulse response (FIR)
filters of length L = 2K. To make the block Toeplitz structure of T a more explicit,
we can write
⎛ .. .. .. .. ⎞
. . . .
⎜ ⎟
⎜ · · · A0 A1 · · · AK−1 0 ···⎟
Ta = ⎜ ⎟. (3.2.2)
⎝ · · · 0 A0 · · · AK−2 AK−1 · · · ⎠
.. .. .. ..
. . . .
x = T s y = T s X = T s T a x. (3.2.6)
3.2. TWO-CHANNEL FILTER BANKS 115
where the block S i is of size 2 × 2 and FIR filters are of length L = 2K . The block
S i is
g0 [2i] g1 [2i]
Si = ,
g0 [2i + 1] g1 [2i + 1]
where g0 [n] and g1 [n] are the synthesis filters. The dual synthesis basis functions
are
Let us go back for a moment to (3.2.6). The requirement that {h0 [2k−n], h1 [2k−n]}
and {g0 [n − 2k], g1 [n − 2k]} form a dual bases pair is equivalent to
T s T a = T a T s = I. (3.2.8)
This is the biorthogonality condition or, in the filter bank literature, the perfect
reconstruction condition. In other words,
Consider the two branches in Figure 3.1(a) which produce y0 and y1 . Call H i the
operator corresponding to filtering by hi [n] followed by downsampling by 2. Then
116 CHAPTER 3
(G0 H 0 + G1 H 1 ) x.
Thus, to resynthesize the signal (the condition for perfect reconstruction), we have
that
G0 H 0 + G1 H 1 = I.
Of course, by interleaving the rows of H 0 and H 1 , we get T a , and similarly, T s
corresponds to interleaving the columns of G0 and G1 .
To summarize this part on time-domain analysis, let us stress once more that
biorthogonal expansions of discrete-time signals, where the basis functions are ob-
tained from two prototype functions and their even shifts (for both dual bases), is
implemented using a perfect reconstruction, two-channel multirate filter bank. In
other words, perfect reconstruction is equivalent to the biorthogonality condition
(3.2.8).
Completeness is also automatically satisfied. To prove it, we show that there
exists no x[n] with x > 0, such that it has a zero expansion, that is, such that
X = 0. Suppose it is not true, that is, suppose that there exists an x[n] with
x > 0, such that X = 0. But, since X = T a x, we have that
T a x = 0,
Ta x = 0 (3.2.10)
(since in a Hilbert space — l2 (Z) in this case, v2 = v, v = 0, if and only
if v ≡ 0). We know that (3.2.10) has a nontrivial solution if and only if T a is
singular. However, due to (3.2.8), T a is nonsingular and thus (3.2.10) has only a
trivial solution, x ≡ 0, violating our assumption and proving completeness.
3.2. TWO-CHANNEL FILTER BANKS 117
In the above, H m (z) is the analysis modulation matrix containing the modulated
versions of the analysis filters and xm (z) contains the modulated versions of X(z).
Relation (3.2.14) is illustrated in Figure 3.3, where the time-varying part is in
the lower channel. If the channel signals Y0 (z) and Y1 (z) are desired, that is, the
downsampled domain signals, it follows from (3.2.11) and (3.2.14) that
Y0 (z) 1 H0 (z 1/2 ) H0 (−z 1/2 ) X(z 1/2 )
= ,
Y1 (z) 2 H1 (z 1/2 ) H1 (−z 1/2 ) X(−z 1/2 )
118 CHAPTER 3
G0
+ 1
x Hm ---
2
x^
G1
(-1)n
FIGURE 3.2 figlast3.2.1
The above two conditions then ensure perfect reconstruction. Expressing (3.2.15)
and (3.2.16) in matrix notation, we get
We can solve now for G0 (z) and G1 (z) (transpose (3.2.17) and multiply by (H Tm (z))−1
from the left)
G0 (z) 2 H1 (−z)
= . (3.2.18)
G1 (z) det(H m (z)) −H0 (−z)
In the above, we assumed that H m (z) is nonsingular; that is, its normal rank is
equal to 2. Define P (z) as
2
P (z) = G0 (z) H0 (z) = H0 (z)H1 (−z), (3.2.19)
det(H m (z))
where we used (3.2.18). Observe that det(H m (z)) = − det(H m (−z)). Then, we
can express the product G1 (z)H1 (z) as
−2
G1 (z) H1 (z) = H0 (−z) H1 (z) = P (−z).
det(H m (z))
3.2. TWO-CHANNEL FILTER BANKS 119
We will show later, that the function P (z) plays a crucial role in analyzing and
designing filter banks. It suffices to note at this moment that, due to (3.2.20), all
even-indexed coefficients of P (z) equal 0, except for p[0] = 1. Thus, P (z) is of the
following form:
P (z) = 1 + p[2k + 1] z −(2k+1) .
k∈Z
or equivalently,
g0 [k] h0 [2n − k] = δ[n],
k∈Z
Similarly, starting from (3.2.15) or (3.2.16) and expressing G0 (z) and H0 (z) as
a function of G1 (z) and H1 (z) would lead to the other biorthogonality relations,
namely
Note that we obtained these relations for ϕ̃0 and ϕ̃1 but they hold also for ϕ̃2l and
ϕ̃2l+1 , respectively. This shows once again that perfect reconstruction implies the
biorthogonality conditions. The converse can be shown as well, demonstrating the
equivalence of the two conditions.
120 CHAPTER 3
y0
2 2
x + x^
y1
z 2 2 z-1
(a)
2 y0
x
Hp
z 2 y1
(b)
y0 2
Gp + x^
y1 2 z-1
(c)
where Hij is the jth polyphase component of the ith filter, or, following (2.5.22–
2.5.23),
Hi (z) = Hi0 (z 2 ) + zHi1 (z 2 ).
In (3.2.22) y(z) contains the signals in the middle of the system in Figure 3.1(a).
H p (z) contains the polyphase components of the analysis filters, and is conse-
quently denoted the analysis polyphase matrix, while xp (z) contains the polyphase
components of the input signal or, following (2.5.20),
X(z) = X0 (z 2 ) + z −1 X1 (z 2 ).
where
Gi (z) = Gi0 (z 2 ) + z −1 Gi1 (z 2 ). (3.2.24)
The synthesis filter polyphase components are defined such as those of the signal
(2.5.20–2.5.21), or in reverse order of those of the analysis filters. In Figure 3.4(c),
we show how the output signal is synthesized from the channel signals Y0 and Y1 as
−1 G00 (z 2 ) G10 (z 2 ) Y0 (z 2 )
X̂(z) = ( 1 z ) . (3.2.25)
G01 (z 2 ) G11 (z 2 ) Y1 (z 2 )
3 41 2 3 41 2
Gp (z 2 ) y (z 2 )
This equation reflects that the channel signals are first upsampled by 2 (leading to
Yi (z 2 )) and then filtered by filters Gi (z) which can be written as in (3.2.24). Note
that the matrix-vector product in (3.2.25) is in z 2 and can thus be implemented
before the upsampler by 2 (replacing z 2 by z) as shown in the figure.
Note the duality between the analysis and synthesis filter banks. The former
uses a forward, the latter an inverse polyphase transform, and Gp (z) is a transpose
of H p (z). The phase reversal in the definition of the polyphase components in
analysis and synthesis comes from the fact that z and z −1 are dual operators, or,
on the unit circle, ejω = (e−jω )∗ .
122 CHAPTER 3
Obviously the transfer function between the forward and inverse polyphase
transforms defines the analysis/synthesis filter bank. This transfer polyphase matrix
is given by
T p (z) = Gp (z) H p (z).
In order to find the input-output relationship, we use (3.2.22) as input to (3.2.25),
which yields
X̂(z) = ( 1 z −1 ) Gp (z 2 ) H p (z 2 ) xp (z 2 ),
= ( 1 z −1 ) T p (z 2 ) xp (z 2 ). (3.2.26)
following (2.5.20), that is, the analysis/synthesis filter bank achieves perfect recon-
struction with no delay and is equivalent to Figure 3.4(a).
thus relating polyphase and modulation representations of the signal, that is, xp (z)
and xm (z). For the analysis filter bank, we have that
H00 (z 2 ) H01 (z 2 ) 1 H0 (z) H0 (−z) 1 1 1
= , (3.2.28)
H10 (z 2 ) H11 (z 2 ) 2 H1 (z) H1 (−z) 1 −1 z −1
establishing the relationship between H p (z) and H m (z). Finally, following the
definition of Gp (z) in (3.2.23) and similarly to (3.2.28) we have
G00 (z 2 ) G10 (z 2 ) 1 1 1 1 G0 (z) G1 (z)
= , (3.2.29)
G01 (z 2 ) G11 (z 2 ) 2 z 1 −1 G0 (−z) G1 (−z)
Again, note that (3.2.28) is the transpose of (3.2.29), with a phase change in the
diagonal matrix. The change from the polyphase to the modulation representation
3.2. TWO-CHANNEL FILTER BANKS 123
(and vice versa) involves not only a diagonal matrix with a delay (or phase factor),
but also a sum and/or a difference operation (see the middle matrix in (3.2.27–
3.2.29)). This is actually a size-2 Fourier transform, as will become clear in cases
of higher dimension.
The relation between time domain and polyphase domain is most obvious for
the synthesis filters gi , since their impulse responses correspond to the first basis
functions ϕi . Consider the time-domain synthesis matrix, and create a matrix T s (z)
−1
K
T s (z) = S i z −i ,
i=0
where S i are the successive 2×2 blocks along a column of the block Toeplitz matrix
(there are K of them for length 2K filters), or
g0 [2i] g1 [2i]
Si = .
g0 [2i + 1] g1 [2i + 1]
where
K−1
T a (z) = Ai z −i ,
i=0
and
h0 [2(K − i) − 1] h0 [2(K − i) − 2]
Ai = ,
h1 [2(K − i) − 1] h1 [2(K − i) − 2]
K being the number of 2 × 2 blocks in a row of the block Toeplitz matrix. The
above relations can be used to establish equivalences between results in the various
representations (see also Theorem 3.7 below).
In the filter bank language, perfect reconstruction means that the output is a
delayed and possibly scaled version of the input,
X̂(z) = cz −k X(z).
This is equivalent to saying that, up to a shift and scale, the impulse responses of the
analysis filters (with time reversal) and of the synthesis filters form a biorthogonal
basis.
Among approximate reconstructions, the most important one is alias-free re-
construction. Remember that because of the periodic time-variance of analy-
sis/synthesis filter banks, the output is both a function of x[n] and its modulated
version (−1)n x[n], or X(z) and X(−z) in the z-transform domain. The aliased
component X(−z) can be very disturbing in applications and thus cancellation of
aliasing is of prime importance. In particular, aliasing represents a nonharmonic
distortion (new sinusoidal components appear which are not harmonically related
to the input) and this is particularly disturbing in audio applications.
What follows now, are results on alias cancellation and perfect reconstruction
for the two-channel case. Note that all the results are valid for a general, N -channel
case as well (substitute N for 2 in statements and proofs).
For the first result, we need to introduce pseudocirculant matrices [311]. These
are N × N circulant matrices with elements Fij (z), except that the lower triangular
elements are multiplied by z, that is
F0,j−i (z) j ≥ i,
Fij (z) =
z · F0,N +j−i (z) j < i.
P ROPOSITION 3.4
Aliasing in a one-dimensional subband coding system will be cancelled if and
only if the transfer polyphase matrix T p is pseudocirculant [311].
P ROOF
Consider a 2 × 2 pseudocirculant matrix
F0 (z) F1 (z)
T p (z) = ,
zF1 (z) F0 (z)
A corollary to Proposition 3.4, is that for perfect reconstruction, the transfer func-
tion matrix has to be a pseudocirculant delay, that is, for an even delay 2k
−k 1 0
T p (z) = z ,
0 1
while for an odd delay 2k + 1
−k−1 0 1
T p (z) = z .
z 0
The next result indicates when aliasing can be cancelled for a given analysis filter
bank. Since the analysis and synthesis filter banks play dual roles, the result that
we will discuss holds for synthesis filter banks as well.
P ROPOSITION 3.5
Given a two-channel filter bank downsampled by 2 with the polyphase matrix
H p (z), then alias-free reconstruction is possible if and only if the determinant
of H p (z) is not identically zero, that is, H p (z) has normal rank 2.
P ROOF
Choose the synthesis matrix as
resulting in
T p (z) = Gp (z) H p (z) = det (H p (z)) · I
which is pseudocirculant, and thus cancels aliasing. If, on the other hand, the system is
alias-free, then we know (see Proposition 3.4) that T p (z) is pseudocirculant and therefore
has full rank 2. Since the rank of a matrix product is bounded above by the ranks of its
terms, H p (z) has rank 2.4
Often, one is interested in perfect reconstruction filter banks where all filters
involved have a finite impulse response (FIR). Again, analysis and synthesis filter
banks play the same role.
4
Note that we excluded the case of zero reconstruction, even if technically it is also aliasing free
(but of zero interest!).
126 CHAPTER 3
P ROPOSITION 3.6
Given a critically sampled FIR analysis filter bank, perfect reconstruction
with FIR filters is possible if and only if det(H p (z)) is a pure delay.
P ROOF
Suppose that the determinant of H p (z) is a pure delay, and choose
It is obvious that the above choice leads to perfect reconstruction with FIR filters. Suppose,
on the other hand, that we have perfect reconstruction with FIR filters. Then, T p (z) has
to be a pseudocirculant shift (corollary below Proposition 3.4), or
meaning that it has l poles at z = 0. Since the synthesis has to be FIR as well, det(Gp (z))
has only zeros (or poles at the origin). Therefore, det(H p (z)) cannot have any zeros (except
possibly at the origin or ∞).
If det(H p (z)) has no zeros, neither does det(H m (z)) (because of (3.2.28) and
assuming FIR filters). Since det(H m (z)) is an odd function of z, it is of the form
be a row-vector with only the first component different from zero. One could expand
( G0 (z) G1 (z) ) into a matrix Gm (z) by modulation, that is
G0 (z) G1 (z)
Gm (z) = . (3.2.32)
G0 (−z) G1 (−z)
3.2. TWO-CHANNEL FILTER BANKS 127
The matrix T m (z) is sometimes called the aliasing cancellation matrix [272].
Let us for a moment return to (3.2.14). As we said, X(−z) is the aliased version
of the signal. A necessary and sufficient condition for aliasing cancellation is that
The solution proposed by Croisier, Esteban, Galand [69] is known under the name
QMF (quadrature mirror filters), which cancels aliasing in a two-channel filter bank:
Substituting the above into (3.2.33) leads to H0 (z)H0 (−z)− H0 (−z)H0 (z) = 0, and
aliasing is indeed cancelled. In order to achieve perfect reconstruction, the following
has to be satisfied:
Note that the left side is an odd function of z, and thus, l has to be odd. The above
relation explains the name QMF. On the unit circle H0 (−z) = H(ej(ω+π) ) is the
mirror image of H0 (z) and both the filter and its mirror image are squared. For FIR
filters, the condition (3.2.37) cannot be satisfied exactly except for the Haar filters
√
introduced in Section 3.1. Taking a causal Haar filter, or H0 (z) = (1 + z −1 )/ 2,
(3.2.37) becomes
1 1
(1 + 2z −1 + z −2 ) − (1 − 2z −1 + z −2 ) = 2z −1 .
2 2
For larger, linear phase filters, (3.2.37) can only be approximated (see Section 3.2.4).
T HEOREM 3.7
In a two-channel, biorthogonal, real-coefficient filter bank, the following are
equivalent:
(b) G0 (z)H0 (z) + G1 (z)H1 (z) = 2, and G0 (z)H0 (−z) + G1 (z)H1 (−z) = 0.
(c) T s · T a = T a · T s = I.
The proof follows from the equivalences between the various representations intro-
duced in this section and is left as an exercise (see Problem 3.4). Note that we are
assuming a critically sampled filter bank. Thus, the matrices in points (c)–(e) are
square, and left inverses are also right inverses.
or, the even shifts of synthesis filters (even shifts of time-reversed analysis filters).
We will show here that (3.2.38–3.2.40) describe orthonormal expansions, in the
general case.
5
The term orthogonal is often used, especially for the associated filters or filter banks. For filter
banks, the term unitary or paraunitary is also often used, as well as the notion of losslessness (see
Appendix 3.A).
3.2. TWO-CHANNEL FILTER BANKS 129
Orthonormality in Time Domain Start with a general filter bank as given in Fig-
ure 3.1(a). Impose orthonormality on the expansion, that is, the dual basis {ϕ̃k [n]}
becomes identical to {ϕk [n]}. In filter bank terms, the dual basis — synthesis filters
— now becomes
{g0 [n−2k], g1 [n−2k]} = {ϕ̃k [n]} = {ϕk [n]} = {h0 [2k −n], h1 [2k −n]}, (3.2.41)
or,
gi [n] = hi [−n], i = 0, 1. (3.2.42)
Thus, we have encountered the first important consequence of orthonormality: The
synthesis filters are the time-reversed versions of the analysis filters. Also, since
(3.2.41) holds and ϕk is an orthonormal set, the following are the orthogonality
relations for the synthesis filters:
with a similar relation for the analysis filters. We call this an orthonormal filter
bank.
Let us now see how orthonormality can be expressed using matrix notation.
First, substituting the expression for gi [n] given by (3.2.42) into the synthesis matrix
T s given in (3.2.7), we see that
T s = T Ta ,
T s T a = T Ta T a = I. (3.2.44)
That is, the above condition means that the matrix T a is unitary. Because it is
full rank, the product commutes and we have also T a T Ta = I. Thus, having an
orthonormal basis, or perfect reconstruction with an orthonormal filter bank, is
equivalent to the analysis matrix T a being unitary.
If we separate the outputs now as was done in (3.2.9), and note that
Gi = H Ti ,
H i H Tj = δ[i − j] I, i, j = 0, 1.
Now, the output of one channel in Figure 3.1(a) (filtering, downsampling, upsam-
pling and filtering) is equal to
M i = H Ti H i .
130 CHAPTER 3
K−1
ATi Ai = I,
i=0
K−1
ATi+j Ai = 0, j = 1, . . . , K − 1.
i=0
Using the same arguments for the other cases in (3.2.43), we also have that
On the unit circle, (3.2.46–3.2.47) become (use G(e−jω ) = G∗ (ejω ) since the filter
has real coefficients)
that is, the filter and its modulated version are power complementary (their mag-
nitudes squared sum up to a constant). Since this condition was used in [270]
for designing the first orthogonal filter banks, it is also called the Smith-Barnwell
condition. Writing (3.2.46–3.2.48) in matrix form,
G0 (z −1 ) G0 (−z −1 ) G0 (z) G1 (z) 2 0
= , (3.2.50)
G1 (z −1 ) G1 (−z −1 ) G0 (−z) G1 (−z) 0 2
that is, using the synthesis modulation matrix Gm (z) (see (3.2.32))
Since gi and hi are identical up to time reversal, a similar relation holds for the
analysis modulation matrix H m (z) (up to a transpose), or H m (z −1 ) H Tm (z) = 2I.
A matrix satisfying (3.2.51) is called paraunitary (note that we have assumed
that the filter coefficients are real). If all its entries are stable (which they are in this
case, since we assumed the filters to be FIR), then such a matrix is called lossless.
The concept of losslessness comes from classical circuit theory [23, 308] and is
discussed in more detail in Appendix 3.A. It suffices to say at this point that having
a lossless transfer matrix is equivalent to the filter bank implementing an orthogonal
transform. Concentrating on lossless modulation matrices, we can continue our
analysis of orthogonal systems in the modulation domain. First, from (3.2.50) we
can see that ( G1 (z −1 ) G1 (−z −1 ) )T has to be orthogonal to ( G0 (z) G0 (−z) )T .
It will be proven in Appendix 3.A (although in polyphase domain), that this implies
that the two filters G0 (z) and G1 (z) are related as follows:
where we used (3.2.51). Since (3.2.53) also implies Gp (z) GTp (z −1 ) = I (left inverse
is also right inverse), it is clear that given a paraunitary Gp (z) corresponding to
an orthogonal synthesis filter bank, we can choose the analysis filter bank with a
polyphase matrix H p (z) = GTp (z −1 ) and get perfect reconstruction with no delay.
(c) T Ts T s = T s T Ts = I, T a = T Ts .
Again, we used the fact that the left inverse is also the right inverse in a square
matrix in relations (c), (d) and (e). The proof follows from the relations between
the various representations, and is left as an exercise (see Problem 3.7). Note that
the theorem holds in more general cases as well. In particular, the filters do not have
to be restricted to be FIR, and if their coefficients are complex valued, transposes
have to be hermitian transposes (in the case of Gm and Gp , only the coefficients of
the filters have to be conjugated, not z since z −1 plays that role).
3.2. TWO-CHANNEL FILTER BANKS 133
Because all filters are related to a single prototype satisfying (a) or (b), the
other filter in the synthesis filter bank follows by modulation, time reversal and an
odd shift (see (3.2.52)). The filters in the analysis are simply time-reversed versions
of the synthesis filters. In the FIR case, the length of the filters is even. Let us
formalize these statements:
C OROLLARY 3.9
In a two-channel, orthonormal, FIR, real-coefficient filter bank, the following
hold:
|G0 (ejω )|2 +|G0 (ej(ω+π) )|2 = 2, |G0 (ejω )|2 +|G1 (ejω )|2 = 2. (3.2.54)
(c) The highpass filter is specified (up to an even shift and a sign change)
by the lowpass filter as
G1 (z) = −z −2K+1 G0 (z −1 ).
(d) If the lowpass filter has a zero at π, that is, G0 (−1) = 0, then
√
G0 (1) = 2. (3.2.55)
Also, an orthogonal filter bank has, as any orthogonal transform, an energy conser-
vation property:
P ROPOSITION 3.10
In an orthonormal filter bank, that is, a filter bank with a unitary polyphase
or modulation matrix, the energy is conserved between the input and the
channel signals,
x2 = y0 2 + y1 2 . (3.2.56)
P ROOF
The energy of the subband signals equals
2π
1
y0 2 + y1 2 = |Y 0 (ejω )|2 + |Y 1 (ejω )|2 dω,
2π 0
134 CHAPTER 3
by Parseval’s relation (2.4.37). Using the fact that y(z) = H p (z) xp (z), the right side can
be written as,
2π ∗
2π ∗ ∗
1 1
y(ejω ) · y(ejω )dω = xp (ejω ) H p (ejω )
2π 0 2π 0
× H p (ejω ) xp (ejω ) dω,
2π ∗
1
= xp (ejω ) xp (ejω ) dω,
2π 0
= x0 2 + x1 2 .
We used the fact that H p (ejω ) is unitary and Parseval’s relation. Finally, (3.2.56) follows
from the fact that the energy of the signal is equal to the sum of the polyphase components’
energy, x2 = x0 2 + x1 2 .
Designing Orthogonal Filter Banks Now, we give two design procedures: the
first, based on spectral factorization, and the second, based on lattice structures.
Let us just note that most of the methods in the literature design analysis filters.
We will give designs for synthesis filters so as to be consistent with our approach;
however, analysis filters are easily obtained by time reversing the synthesis ones.
Designs Based on Spectral Factorizations The first solution we will show is due to
Smith and Barnwell [271]. The approach here is to find an autocorrelation se-
quence P (z) = G0 (z)G0 (z −1 ) that satisfies (3.2.46) and then to perform spectral
factorization as explained in Section 2.5.2. However, factorization becomes numeri-
cally ill-conditioned as the filter size grows, and thus, the resulting filters are usually
only approximately orthogonal.
Example 3.1
Choose p[n] as a windowed version of a perfect half-band lowpass filter,
w[n] sin(π/2n)
π/2·n
n = −2K + 1, . . . , 2K − 1,
p[n] =
0 otherwise.
where w[n] is a symmetric window function with w[0] = 1. Because p[2n] = δ[n], the
z-transform of p[n] satisfies
P (z) + P (−z) = 2. (3.2.57)
Also since P (z) is an approximation to a half-band lowpass filter, its spectral factor will be
such an approximation as well. Now, P (ejω ) might not be positive everywhere, in which
case it is not an autocorrelation and has to be modified. The following trick can be used
to find an autocorrelation sequence p [n] close to p[n] [271]. Find the minimum of P (ejω ),
δmin = minω [P (ejω )]. If δmin > 0, we need not do anything, otherwise, subtract it from
p[0] to get the sequence p [n] . Now,
and P (z) still satisfies (3.2.57) up to a scale factor (1 − δmin ) which can be divided out.
3.2. TWO-CHANNEL FILTER BANKS 135
0 0
-10 -10
-20 -20
Magnitude response [dB]
-40 -40
-50 -50
-60 -60
-70 -70
-80 -80
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Frequency [radians] Frequency [radians]
(a) (b)
0 0
-10 -10
-20 -20
Magnitude response [dB]
-40 -40
-50 -50
-60 -60
-70 -70
-80 -80
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Frequency [radians] Frequency [radians]
(c) (d)
Figure 3.5 Orthogonal filter designs. Magnitude responses of: (a) Smith and
Barnwell filter of length 8 [271], (b) Daubechies’ filter of length 8 (D4 ) [71],
(c) Vaidyanathan and Hoang filter of length 8 [310], (d) Butterworth filter for
N = 4 [133]. FIGURE 3.4 fignew3.2.1
which satisfies (3.2.57), where R(z) is symmetric (R(z −1 ) = R(z)) and positive
on the unit circle, R(ejω ) ≥ 0. Of particular interest is the case when R(z) is
136 CHAPTER 3
of minimal degree, which turns out to be when R(z) has powers of z going from
(−k+1) to (k−1). Once the solution to this constrained problem is found, a spectral
factorization of R(z) yields the desired filter G0 (z), which has automatically k zeros
at π. As always with spectral factorization, there is a choice of taking zeros either
inside or outside the unit circle. Taking them systematically from inside the unit
circle, leads to Daubechies’ family of minimum-phase filters.
The function R(z) which is required so that P (z) satisfies (3.2.57) can be found
by solving a system of linear equations or a closed form is possible in the minimum-
degree case [71]. Let us indicate a straightforward approach leading to a system of
linear equations. Assume the minimum-degree solution. Then P (z) has powers of
z going from (−2k + 1) to (2k − 1) and (3.2.57) puts 2k − 1 constraints on P (z).
But because P (z) is symmetric, k − 1 of them are redundant, leaving k active
constraints. Because R(z) is symmetric, it has k degrees of freedom (out of its
2k − 1 nonzero coefficients). Since P (z) is the convolution of (1 + z −1 )k (1 + z)k with
R(z), it can be written as a matrix-vector product, where the matrix contains the
impulse response of (1 + z −1 )k (1 + z)k and its shifts. Gathering the even terms of
this matrix-vector product (which correspond to the k constraints) and expressing
them in terms of the k free parameters of R(z), leads to the desired k × k system
of equation. It is interesting to note that the matrix involved is never singular, and
the R(z) obtained by solving the system of equations is positive on the unit circle.
Therefore, this method automatically leads to an autocorrelation, and by spectral
factorization, to an orthogonal filter bank with filters of length 2k having k zeros
at π and 0 for the lowpass and highpass, respectively.
As an example, we will construct Daubechies’ D2 filter, that is, a length-4
orthogonal filter with two zeros at ω = π (the maximum number of zeros at π is
3.2. TWO-CHANNEL FILTER BANKS 137
Example 3.2
Let us choose k = 2 and construct length-4 filters. This means that
Now, recall that since P (z) + P (−z) = 2, all even-indexed coefficients in P (z) equal 0,
except for p[0] = 1. To obtain a length-4 filter, the highest-degree term has to be z −3 , and
thus R(z) is of the form
R(z) = (az + b + az −1 ). (3.2.58)
P (z) = az 3 + (4a + b)z 2 + (7a + 4b)z + (8a + 6b) + (4b + 7a)z −1 + (b + 4a)z −2 + az −3 .
Equating the coefficients of z 2 or z −2 with 0, and the one with z 0 with 1 yields
4a + b = 0, 8a + 6b = 1.
1 1
a = − , b = ,
16 4
1 1 1 −1
R(z) = − z+ − z .
16 4 16
1 √ √
G0 (z) = √ (1 + z −1 )2 (1 + 3 + (1 − 3)z −1 ),
4 2
1 √
= √ ((1 + 3)
4 2
√ √ √
+ (3 + 3)z −1 + (3 − 3)z −2 + (1 − 3)z −3 ). (3.2.59)
Note that this lowpass filter has a double zero at z = −1 (important for constructing wavelet
bases, as will be seen in Section 4.4). A longer filter with four zeros at ω = π is shown in
Figure 3.5(b) (magnitude responses of the lowpass/highpass pair) while the impulse response
coefficients are given in Table 3.2 [71].
138 CHAPTER 3
UΚ−1 UΚ−2 U0
x0 ••• y0
That the resulting structure is paraunitary is easy to check (it is the product of
paraunitary elementary blocks). What is much more interesting is that all pa-
raunitary matrices of a given degree can be written in this form [310] (see also
Appendix 3.A.1). The lattice factorization is given in Figure 3.6.
As an example of this approach, we construct the D2 filter from the previous
example, using the lattice factorization.
Example 3.3
We construct the D2 filter which is of length 4, thus L = 2K = 4. This means that
cos α0 − sin α0 1 cos α1 − sin α1
Gp (z) = −1 ,
sin α0 cos α0 z sin α1 cos α1
cos α0 cos α1 − sin α0 sin α1 z −1 − cos α0 sin α1 − sin α0 cos α1 z −1
= .
sin α0 cos α1 + cos α0 sin α1 z −1 − sin α0 sin α1 + cos α0 cos α1 z −1
(3.2.61)
6
By canonical we mean complete factorizations with a minimum number of free parameters.
However, such factorizations are not unique in general.
3.2. TWO-CHANNEL FILTER BANKS 139
We now obtain the D2 filter by imposing a second-order zero at z = −1. So, we obtain the
first equation as
or,
cos(α0 + α1 ) − sin(α0 + α1 ) = 0.
This equation implies that
π
α0 + α1 = kπ + .
4
√
Since we also know that G0 (1) = 2 (see (3.2.55)
√
cos(α0 + α1 ) + sin(α0 + α1 ) = 2,
we get that
π
α0 + α1 = . (3.2.62)
4
Imposing now a zero at ejω = −1 on the derivative of G0 (ejω ), we obtain
!
dG0 (ejω ) !!
! = cos α1 sin α0 + 2 sin α1 sin α0 + 3 sin α1 cos α0 = 0. (3.2.63)
dω ω=π
π π
α0 = , α1 = − .
3 12
Substituting the angles α0 , α1 into the expression for G0 (z) (3.2.61) and comparing it to
(3.2.59), we can see that we have indeed obtained the D2 filter.
H1 first (that is, in this case we design the analysis part of the system, or, one of
the two biorthogonal bases).
First, note that if a filter is linear phase, then it can be written as
The right-hand side of (3.2.65) is the determinant of the polyphase matrix H p (z),
while the right-hand side of (3.2.66) is the determinant of the modulation matrix
H m (z). The synthesis filters are then equal to (see (3.2.30–3.2.31))
P ROPOSITION 3.11
In a two-channel, perfect reconstruction filter bank, where all filters are linear
phase, the analysis filters have one of the following forms:
(a) Both filters are symmetric and of odd lengths, differing by an odd mul-
tiple of 2.
(b) One filter is symmetric and the other is antisymmetric; both lengths are
even, and are equal or differ by an even multiple of 2.
(c) One filter is of odd length, the other one of even length; both have all
zeros on the unit circle. Either both filters are symmetric, or one is
symmetric and the other one is antisymmetric (this is a degenerate case)
.
3.2. TWO-CHANNEL FILTER BANKS 141
The proof can be found in [319] and is left as an exercise (see Problem 3.8).
We will discuss it briefly. The idea is to consider the product polynomial P (z) =
H0 (z)H1 (−z) that has to satisfy (3.2.66). Because H0 (z) and H1 (z) (as well as
H1 (−z)) are linear phase, so is P (z). Because of (3.2.66), when P (z) has more
than two nonzero coefficients, it has to be symmetric with one central coefficient
at 2l − 1. Also, the end terms of P (z) have to be of an even index, so they cancel
in P (z) − P (−z). The above two requirements lead to the symmetry and length
constraints for cases (a) and (b). In addition, there is a degenerate case (c), of little
practical interest, when P (z) has only two nonzero coefficients,
P (z) = z −j (1 ± z 2N −1−2j ),
which leads to zeros at odd roots of ±1. Because these are distributed among H0 (z)
and H1 (−z) (rather than H1 (z)), the resulting filters will be a poor set of lowpass
and highpass filters.
Another result that we mentioned at the beginning of this section is:
P ROPOSITION 3.12
There are no two-channel perfect reconstruction, orthogonal filter banks, with
filters being FIR, linear phase, and with real coefficients (except for the Haar
filters).
P ROOF
We know from Theorem 3.8 that orthonormality implies that
H p (z)H Tp (z −1 ) = I,
We also know that in orthogonal filter banks, the filters are of even length. Therefore,
following Proposition 3.11, one filter is symmetric and the other one is antisymmetric. Take
the symmetric one, H0 (z) for example, and use (3.2.64)
1
H00 (z) H00 (z −1 ) = .
2
142 CHAPTER 3
1
H00 (z) = √ z −l .
2
√
Performing a similar analysis for H01 (z), we obtain that H01 (z) = 1/ 2z −k , which, in turn,
means that
1
H0 (z) = √ (z −2l + z −2k−1 ), H1 (z) = H0 (−z),
2
or, the only solution yields Haar filters (l = k = 0) or trivial variations thereof.
Lattice Structure for Linear Phase FiltersUnlike in the paraunitary case, there are no
canonical factorizations for general matrices of polynomials.7 But there are lattice
structures that will produce, for example, linear phase perfect reconstruction filters
[208, 321]. To obtain it, note that H p (z) has to satisfy (if the filters are of the same
length)
1 0 −k −1 0 1
H p (z) = · z · H p (z ) · . (3.2.69)
0 −1 1 0
Here, we assume that Hi (z) = Hi0 (z 2 ) + z −1 Hi1 (z 2 ) in order to have causal filters.
This is referred to as the linear phase testing condition (see Problem 3.9). Then,
assume that H p (z) satisfies (3.2.69) and construct H p (z) as
1 1 α
H p (z) = H p (z) .
z −1 α 1
It is then easy to show that H p (z) satisfies (3.2.69) as well. The lattice
K−1
"
1 1 1 1 αi
H p (z) = C , (3.2.70)
−1 1 z −1 αi 1
i=1
5K−1
with C = −(1/2) i=1 (1/(1 − α2i )), produces length L = 2K symmetric (lowpass)
and antisymmetric (highpass) filters leading to perfect reconstruction filter banks.
Note that the structure is incomplete [321] and that |αi | = 1. Again, just as in the
paraunitary lattice, perfect reconstruction is structurally guaranteed within a scale
factor (in the synthesis, replace simply αi by −αi and pick C = 1).
7
There exist factorizations of polynomial matrices based on ladder steps [151], but they are not
canonical like the lattice structure in (3.2.60).
3.2. TWO-CHANNEL FILTER BANKS 143
Example 3.4
Let us construct filters of length 4 where the lowpass has a maximum number of zeros at
z = −1 (that is, the linear phase counterpart of the D2 filter). From the cascade structure,
−1 1 1 1 1 α
H p (z) =
2(1 − α2 ) −1 1 z −1 α 1
−1 1 + αz −1 α + z −1
= .
2(1 − α2 ) −1 + αz −1 −α + z −1
1 + αz −1 + αz −2 + z −3
H0 (z) = H00 (z 2 ) + z −1 H01 (z 2 ) = .
−2(1 − α2 )
1 1
H0 (z) = (1 + 3z −1 + 3z −2 + z −3 ) = (1 + z −1 )3 , (3.2.71)
16 16
which means that H0 (z) has a triple zero at z = −1. The highpass filter is equal to
1
H1 (z) = (−1 − 3z −1 + 3z −2 + z −3 ). (3.2.72)
16
Note that det(H m (z)) = (1/8) z −3 . Following (3.2.30–3.2.31), G0 (z) = 16z 3 H1 (−z) and
G1 (z) = −16z 3 H0 (−z). A causal version simply skips the z 3 factor. Recall that the key
144 CHAPTER 3
to perfect reconstruction is the product P (z) = H0 (z) · H1 (−z) in (3.2.66), which equals in
this case (using (3.2.71–3.2.72))
1
P (z) = (−1 + 9z −1 + 16z −3 + 9z −4 − z −6 )
256
1
= (1 + z −1 )4 (−1 + 4z −1 − z −2 ),
256
that is, the same P (z) as in Example 3.2. One can refactor this P (z) into a different set of
{H0 (z), H1 (−z)}, such as, for example,
that is, odd-length linear phase lowpass and highpass filters with impulse responses 1/16 [1,
2, 1] and 1/16 [-1, -2, 6, -2, -1], respectively. Table 3.3 gives impulse response coefficients
for both analysis and synthesis filters for the two cases given above.
The above example showed again the central role played by P (z) = H0 (z) · H1 (−z).
In some sense, designing two-channel filter banks boils down to designing P (z)’s
with particular properties, and factoring them in a particular way.
If one relaxes the perfect reconstruction constraint, one can obtain some desir-
able properties at the cost of some small reconstruction error. For example, popular
QMF filters have been designed by Johnston [144], which have linear phase and “al-
most” perfect reconstruction. The idea is to approximate perfect reconstruction in
a QMF solution (see (3.2.37)) as well as possible, while obtaining a good lowpass
filter (the highpass filter H1 (z) being equal to H0 (−z), is automatically as good as
the lowpass). Therefore, define an objective function depending on two quantities:
(a) stopband attenuation error of H0 (z)
π
S = |H0 (ejω )|2 dω,
ωs
Note that the solution H1 (z) is not unique [32, 319]. Also, coprimeness of
H00 (z), H01 (z) is equivalent with H0 (z) not having any pair of zeros at locations α
and −α. This can be used to prove that the filter H0 (z) = (1 + z −1 )N always has
a complementary filter (see Problem 3.12).
Example 3.5
Consider the filter H0 (z) = (1 + z −1 )4 = 1 + 4z −1 + 6z −2 + 4z −3 + z −4 . It can be verified
that its two polyphase components are coprime, and thus, there is a complementary filter.
We will find a solution to the equation
det(H p (z)) = H00 (z) · H11 (z) − H01 (z) · H10 (z) = z −1 , (3.2.73)
with H00 (z) = 1 + 6z −1 + z −2 and H01 (z) = 4 + 4z −1 . The right side of (3.2.73) was chosen
so that there is a linear phase solution. For example,
1 1
H10 (z) = (1 + z −1 ), H11 (z) = ,
16 4
is a solution to (3.2.73), that is, H1 (z) = (1 + 4z −1 + z 2 )/16. This of course leads to the
same P (z) as in Examples 3.3 and 3.4.
image coding. The main advantage of such filter banks is good frequency selectivity
and low computational complexity, just like in regular IIR filtering. However, this
advantage comes with a cost. Recall that in orthogonal filter banks, the synthesis
filter impulse response is the time-reversed version of the analysis filter. Now if
the analysis uses causal filters (with impulse response going from 0 to +∞), then
the synthesis has anticausal filters. This is a drawback from the point of view of
implementation, since in general anticausal IIR filters cannot be implemented unless
their impulse responses are truncated. However, a case where anticausal IIR filters
can be implemented appears when the signal to be filtered is of finite length, a case
encountered in image processing [234, 269]. IIR filter banks have been less popular
because of this drawback, but their attractive features justify a brief treatment as
given below. For more details, the reader is referred to [133].
First, return to the lattice factorization for FIR orthogonal filter banks (see
(3.2.60)). If one substitutes an allpass section8 for the delay z −1 in (3.2.60), the
factorization is still paraunitary. For example, instead of the diagonal matrix used
in (3.2.60), take a diagonal matrix D(z) such that
−1 F0 (z) 0 F0 (z −1 ) 0
D(z) D(z ) = = I,
0 F1 (z) 0 F1 (z −1 )
where we have assumed that the coefficients are real, and have used two allpass
sections (instead of 1 and z −1 ). What is even more interesting is that such a
factorization is complete [84].
Alternatively, recall that one of the ways to design orthogonal filter banks is to
find an autocorrelation function P (z) which is valid, that is, which satisfies
P (z) + P (−z) = 2, (3.2.74)
and then factor it into P (z) = H0 (z)H0 (z −1 ). This approach is used in [133] to
construct all possible orthogonal filter banks with rational filters. The method goes
as follows:
First, one chooses an arbitrary polynomial R(z) and forms P (z) as
2R(z)R(z −1 )
P (z) = . (3.2.75)
R(z)R(z −1 ) + R(−z)R(−z −1 )
It is easy to see that this P (z) satisfies (3.2.74). Since both the numerator and the
denominator are autocorrelations (the latter being the sum of two autocorrelations),
P (z) is as well. It can be shown that any valid autocorrelation can be written as
in (3.2.75) [133]. Then factor P (z) as H(z)H(z −1 ) and form the filter
H0 (z) = AH0 (z) H(z),
8
Remember that a filter H(ejω ) is allpass if |H(ejω )| = c, c > 0, for all ω. Here we choose
c = 1.
3.2. TWO-CHANNEL FILTER BANKS 147
where AH1 (z) is again an arbitrary allpass. The synthesis filters are then
The above construction covers the whole spectrum of possible solutions. For exam-
ple, if R(z)R(z −1 ) is in itself a valid function, then
R(z)R(z −1 ) + R(−z)R(−z −1 ) = 2,
and by choosing AH0 , AH1 to be pure delays, the solutions obtained by the above
construction are FIR.
(1 + z −1 )N (1 + z)N
P (z) = = H(z)H(z −1 ). (3.2.78)
(z −1 + 2 + z)N + (−z −1 + 2 − z)N
These filters are the IIR counterparts of the Daubechies’ filters given in Example 3.2. These
are, in fact, the N th order half-band digital Butterworth filters [211] (see also Example 2.2).
That these particular filters satisfy the conditions for orthogonality was also pointed out
in [269]. The Butterworth filters are known to be the maximally flat IIR filters of a given
order.
Choose N = 5, or P (z) equals
(1 + z)5 (1 + z −1 )5
P (z) = .
10z 4 + 120z 3 + 252 + 120z −2 + 10z −4
In this case, we can obtain a closed form spectral factorization of P (z), which leads to
1 + 5z −1 + 10z −2 + 10z −3 + 5z −4 + z −5
H0 (z) = √ , (3.2.79)
2(1 + 10z −2 + 5z −4 )
1 − 5z + 10z 2 − 10z 3 + 5z 4 − z 5
H1 (z) = z −1 √ . (3.2.80)
2(1 + 10z 2 + 5z 4 )
For the purposes of implementation, it is necessary to factor H i (z) into stable causal (poles
inside the unit circle) and anticausal (poles outside the unit circle) parts. For comparison
with earlier designs, where length-8 FIR filters were designed, we show in Figure 3.5(d) the
magnitude responses of H0 (ejω ) and H1 (ejω ) for N = 4. The form of the P (z) is then
z −4 (1 + z)4 (1 + z −1 )4
P (z) = .
1 + 28z −2 + 70z −4 + 28z −6 + z −8
148 CHAPTER 3
As we pointed out in Proposition 3.12, there are no real FIR orthogonal sym-
metric/antisymmetric filter banks. However, if we allow IIR filters instead, then
solutions do exist. There are two cases, depending if the center of symmetry/anti-
symmetry is at a half integer (such as in an even-length FIR linear phase filter)
or at an integer (such as in the odd-length FIR case). We will only consider the
former case. For discussion of the latter case as well as further details, see [133].
It can be shown that the polyphase matrix for an orthogonal, half-integer sym-
metric/antisymmetric filter bank is necessarily of the form
A(z) z −l A(z −1 )
H p (z) = ,
−z l−n A(z) z −n A(z −1 )
1 + 6z −1 + (15/7)z −2
A(z) = . (3.2.82)
(15/7) + 6z −1 + z −2
This particular solution will prove useful in the construction of wavelets (see Sec-
tion 4.6.2). Again, for the purposes of implementation, one has to implement stable
causal and anticausal parts separately.
Remarks The main advantage of IIR filters is their good frequency selectivity and
low computational complexity. The price one pays, however, is the fact that the
filters become noncausal. For the sake of discussion, assume a finite-length signal,
and a causal analysis filter, which will be followed by an anticausal synthesis filter.
The output will be infinite even though the input is of finite length. One can take
care of this problem in two ways. Either one stores the state of the filters after
the end of the input signal and uses this as an initial state for the synthesis filters
[269], or one takes advantage of the fact that the outputs of the analysis filter bank
decay rapidly after the input is zero, and stores only a finite extension of these
signals. While the former technique is exact, the latter is usually a good enough
approximation. This short discussion indicates that the implementation of IIR filter
banks is less straightforward than that of their FIR counterparts, and explains their
lesser popularity.
x H1 2
H0 2 H1 2
stage 1
H0 2
stage 2 H1 2
H0 2
stage J
(a)
2 G1 + x^
W1
2 G1 + 2 G0
W2 V1
stage 1
2 G0
V2
2 G1 + stage 2
WJ
2 G0
VJ
stage J
(b)
Example 3.7
Consider what happens if the filters gi [n] from Figure 3.7(a)-(b) are Haar filters defined in
z-transform domain as
1 1
G0 (z) = √ (1 + z −1 ), G1 (z) = √ (1 − z −1 ).
2 2
Take, for example, J = 3, that is, we will use three two-channel filter banks. Then, using
the multirate identity which says that G(z) followed by upsampling by 2 is equivalent to
upsampling by 2 followed by G(z 2 ) (see Section 2.5.3), we can transform this filter bank
into a four-channel one as given in Figure 3.8. The equivalent filters are
(1) 1
G1 (z) = G1 (z) = √ (1 − z −1 ),
2
(2) 1
G1 (z) = G0 (z) G1 (z ) = (1 + z −1 − z −2 − z −3 ),
2
2
(3)
G1 (z) = G0 (z) G0 (z 2 ) G1 (z 4 )
1
= √ (1 + z −1 + z −2 + z −3 − z −4 − z −5 − z −6 − z −7 ),
2 2
(3)
G0 (z) = G0 (z) G0 (z 2 ) G0 (z 4 )
1
= √ (1 + z −1 + z −2 + z −3 + z −4 + z −5 + z −6 + z −7 ),
2 2
9
This is also sometimes called a discrete-time wavelet transform in the literature.
3.3. TREE-STRUCTURED FILTER BANKS 151
1
y0(n) 2 ------- ( 1, – 1 )
2
1
y1(n) 48 --- ( 1, 1, – 1, – 1 )
2
+ x(n)
1
y2(n) 8 ---------- ( 1, 1, 1, 1, – 1, – 1, – 1, – 1 )
2 2
1
y3(n) 8 ---------- ( 1, 1, 1, 1, 1, 1, 1, 1 )
2 2
FIGURE 3.6
fignew3.3.2
Figure 3.8 Octave-band synthesis filter bank with Haar filters and three stages.
It is obtained by transforming the filter bank from Figure 3.7(b) using the mul-
tirate identity for filtering followed by upsampling.
stages of lowpass filters g0 [n] each preceded by upsampling by 2. It can be defined recursively
as (we give it in z-domain for simplicity)
(3) 2 (2)
"
2
k
G0 (z) = G0 (z 2 ) G0 (z) = G0 (z 2 ).
k=0
(1) (i)
Note that this implies that G0 (z) = G0 (z). On the other hand, we denote by g1 [n], the
equivalent filter corresponding to highpass filtering followed by (i − 1) stages of lowpass
filtering, each again preceded by upsampling by 2. It can be defined recursively as
Since this is an orthonormal system, the time-domain matrices representing analysis and
synthesis are just transposes of each other. Thus the analysis matrix T a representing the
(1) (2) (3) (3)
actions of the filters h1 [n], h1 [n], h1 [n], h0 [n] contains as lines the impulse responses
(1) (2) (3) (3) (j)
of g1 [n], g1 [n], g1 [n], and g0 [n] or of hi [−n] since analysis and synthesis filters are
linked by time reversal. The matrix T a is block-diagonal,
⎛ ⎞
..
⎜ . ⎟
⎜ A0 ⎟
Ta = ⎜
⎜
⎟,
⎟ (3.3.1)
⎝ A0 ⎠
..
.
152 CHAPTER 3
Now that we have seen how it works in a simple case, we take more general
filters gi [n], and a number of stages J. We concentrate on the orthonormal case
(the biorthogonal one would follow similarly). In an orthonormal octave-band filter
bank with J stages, the equivalent filters (basis functions) are given by (again we
give them in z-domain for simplicity)
(J) (J−1) J −1
"
J−1
K
G0 (z) = G0 (z) G0 (z 2 ) = G0 (z 2 ), (3.3.3)
K=0
"
j−2
(j) (j−1) 2j−1 2j−1 K
G1 (z) = G0 (z) G1 (z ) = G1 (z ) G0 (z 2 ),
K=0
j = 1, . . . , J. (3.3.4)
In time domain, each of the outputs in Figure 3.7(a) can be described as
H 1 H j−1
0 x, j = 1, . . . , J − 1
except for the last, which is obtained by
H J0 x.
Here, the time-domain matrices H 0 , H 1 are as defined in Section 3.2.1, that is,
each line is an even shift of the impulse response of gi [n], or equivalently, of hi [−n].
Since each stage in the analysis bank is orthonormal and invertible, the overall
scheme is as well. Thus, we get a unitary analysis matrix T a by interleaving the
rows of H 1 , H 1 H 0 , . . ., H 1 H J−1
0 , H J0 , as was done in (3.3.1–3.3.2). A formal
proof of this statement will be given in Section 3.3.2 under orthogonality of basis
functions.
3.3. TREE-STRUCTURED FILTER BANKS 153
Example 3.8
Let us go back to the Haar case and three stages. We can form matrices H 1 , H 1 H 0 ,
H 1 H 20 , H 30 as
⎛ ⎞
.. .. .. ..
⎜ . . . . ⎟
1 ⎜··· 1 −1 0 0 ···⎟
H1 = √ ⎜ ⎟, (3.3.5)
⎜··· 0 0 1 −1 ···⎟
2 ⎝ ⎠
.. .. .. ..
. . . .
⎛ ⎞
.. .. .. ..
⎜ . . . . ⎟
1 ⎜··· ···⎟
H0 = √ ⎜ 1 1 0 0 ⎟, (3.3.6)
⎜··· 0 0 1 1 ···⎟
2 ⎝ ⎠
.. .. .. ..
. . . .
⎛ ⎞
.. .. .. .. .. .. .. ..
⎜ . . . . . . . . ⎟
1 ⎜··· 1 1 −1 −1 0 0 0 0 ···⎟
H 1H 0 = ⎜ ⎟, (3.3.7)
2 ⎜··· 0 0 0 0 1 1 −1 −1 ···⎟
⎝ ⎠
.. .. .. .. .. .. .. ..
. . . . . . . .
⎛ ⎞
.. .. .. .. .. .. .. .. .. ..
⎜ . . . . . . . . . . ⎟
1 ⎜··· −1 −1 −1 −1 0 ···⎟
H 1 H 20 = √ ⎜ 1 1 1 1 0 ⎟, (3.3.8)
⎜··· 0 0 0 0 0 0 0 0 1 1 ···⎟
2 2 ⎝ ⎠
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
⎛ ⎞
.. .. .. .. .. .. .. .. .. ..
⎜ . . . . . . . . . . ⎟
1 ⎜··· ···⎟
H 30 = √ ⎜ 1 1 1 1 1 1 1 1 0 0 ⎟. (3.3.9)
2 2⎜
⎝··· 0 0 0 0 0 0 0 0 1 1 ···⎟
⎠
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
Now, it is easy to see that by interleaving (3.3.5–3.3.9) we obtain the matrix T a as in (3.3.1–
3.3.2). To check that it is unitary, it is enough to check that A0 is unitary (which it is, just
compute the product A0 AT0 ).
Until now, we have concentrated on the orthonormal case. If one would relax
the orthonormality constraint, we would obtain a biorthogonal tree-structured filter
bank. Now, hi [n] and gi [n] are not related by simple time reversal, but are impulse
responses of a biorthogonal perfect reconstruction filter bank. We therefore have
(j) (J)
both equivalent synthesis filters g1 [n − 2j k], g0 [n − 2J k] as given in (3.3.3–3.3.4)
(j) (J)
and analysis filters h1 [n−2j k], h0 [n−2J k], which are defined similarly. Therefore
if the individual two-channel filter banks are biorthogonal (perfect reconstruction),
then the overall scheme is as well. The proof of this statement will follow the proof
for the orthonormal case (see Section 3.3.2 for the discrete-time wavelet series case),
and is left as an exercise to the reader.
154 CHAPTER 3
where
(1)
X (1) [2k] = h0 [21 k − l], x[l],
(1)
X (1) [2k + 1] = h1 [21 k − l], x[l],
are the convolutions of the input with h0 [n] and h1 [n] evaluated at even indexes
(1) (1)
2k. In these equations hi [n] = hi [n], and gi [n] = gi [n]. In an octave-band
filter bank or discrete-time wavelet series, the lowpass channel is further split by
lowpass/highpass filtering and downsampling. Then, the first term on the right side
of (3.3.10) remains unchanged, while the second can be expressed as
(1)
(2)
X (1) [2k] h0 [21 k − n] = X (2) [2k + 1] g1 [n − 22 k]
k∈Z k∈Z
(2)
+ X (2) [2k] g0 [n − 22 k], (3.3.11)
k∈Z
where
(2)
X (2) [2k] = h0 [22 k − l], x[l],
(2)
X (2) [2k + 1] = h1 [22 k − l], x[l],
that is, we applied (3.3.10) once more. In the above, basis functions g(i) [n] are as
(2)
defined in (3.3.3) and (3.3.4). In other words, g0 [n] is the time-domain version of
(2)
G0 (z) = G0 (z) G0 (z 2 ),
(2)
while g1 [n] is the time-domain version of
(2)
G1 (z) = G0 (z) G1 (z 2 ).
3.3. TREE-STRUCTURED FILTER BANKS 155
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
g(1)
1
g(2)
1
g(3)
1
g(4)
1
g(4)
0
Repeating the process in (3.3.12) J times, one obtains the discrete-time wavelet
series over J octaves, plus the final octave containing the lowpass version. Thus,
(3.3.12) becomes
J
(j)
(J)
x[n] = X (j) [2k + 1] g1 [n − 2j k] + X (J) [2k] g0 [n − 2J k], (3.3.13)
j=1 k∈Z k∈Z
where
(j)
X (j) [2k + 1] = h1 [2j k − l], x[l], j = 1, . . . , J, (3.3.14)
(J)
X (J) [2k] = h0 [2J k − l], x[l].
(j) (J)
In (3.3.13) the sequence g1 [n] is the time-domain version of (3.3.4), while g0 [n]
(j) (j)
is the time-domain version of (3.3.3) and hi [n] = gi [−n]. Because any input
(j)
sequence can be decomposed as in (3.3.13), the family of functions {g1 [2j k −
(J)
n], g0 [2J k − n]}, j = 1, . . . , J, and k, n ∈ Z, is an orthonormal basis for l2 (Z).
Note the special sampling used in the discrete-time wavelet series. Each sub-
sequent channel is downsampled by 2 with respect to the previous one and has a
156 CHAPTER 3
Linearity Since the discrete-time wavelet series involves inner products or convo-
lutions (which are linear operators) it is obviously linear.
Shift Recall that multirate systems are not shift-invariant in general, and two-
channel filter banks downsampled by 2 are shift-invariant with respect to even
shifts only. Therefore, it is intuitive that a J-octave discrete-time wavelet series
will be invariant under shifts by multiples of 2J . A visual interpretation follows
from the fact that the dyadic grid in Figure 3.9, when moved by k2J , will overlap
with itself, whereas it will not if the shift is a noninteger multiple of 2J .
P ROPOSITION 3.14
In a discrete-time wavelet series expansion over J octaves, if
then
x[l − m2J ] ←→ X (j) [2(k − m2J−j ) + 1].
P ROOF
If y[l] = x[l − m2J ], then its transform is, following (3.3.14),
(j)
Y (j) [2k + 1] = h1 [2j k − l], x[l − m2J ]
(j)
= h1 [2j k − l − m2J ], x[l ]
(j)
= X [2j (k − m2J −j ) + 1].
Very similarly, one proves for the lowpass channel that, when x[l] produces X (J) [2k],
then x[l − m2J ] leads to X (J) [2(k − m)].
(J) (j)
Orthogonality We have mentioned before that g0 [n] and g1 [n], j = 1, . . . , J, with
appropriate shifts, form an orthonormal family of functions (see [274]). This stems
from the fact that we have used two-channel orthogonal filter banks, for which we
know that
gi [n − 2k], gj [n − 2l] = δ[i − j] δ[k − l].
3.3. TREE-STRUCTURED FILTER BANKS 157
P ROPOSITION 3.15
In a discrete-time wavelet series expansion, the following orthogonality rela-
tions hold:
(J) (J)
g0 [n − 2J k], g0 [n − 2J l] = δ[k − l], (3.3.15)
(j) (i)
g1 [n − 2j k], g1 [n − 2i l] = δ[i − j] δ[k − l], (3.3.16)
(J) (j)
g0 [n − 2J k], g1 [n − 2j l] = 0. (3.3.17)
P ROOF
We will here prove only (3.3.15), while (3.3.16) and (3.3.17) are left as an exercise to the
reader (see Problem 3.15). We prove (3.3.15) by induction.
It will be convenient to work with the z-transform of the autocorrelation of the filter
(j)
G0 (z), which we call P (j) (z) and equals
(j) (j)
P (j) (z) = G0 (z) G0 (z −1 ).
Recall that because of the orthogonality of g0 [n] with respect to even shifts, we have that
or, equivalently, that the polyphase decomposition of P (1) (z) is of the form
(1)
P (1) (z) = 1 + zP1 (z 2 ).
(j)
This is the initial step for our induction. Now, assume that g0 [n] is orthogonal to its
translates by 2j . Therefore, the polyphase decomposition of its autocorrelation can be
written as
j
2 −1
(j) j
P (j) (z) = 1 + z i Pi (z 2 ).
i=1
Now, because of the recursion (3.3.3), the autocorrelation of G(j+1) (z) equals
j
P (j+1) (z) = P (j) (z) P (1) (z 2 ).
i=1
We need to verify that the 0th polyphase component of P (j+1) (z) is equal to 1, or that
coefficients of z’s which are raised to powers multiple of 2j+1 are 0. Out of the four products
that appear when multiplying out the above right-hand side, only the product involving the
polyphase components needs to be considered,
j
2 −1
(j) j j (1) j+1
z i Pi (z 2 ) · z 2 P1 (z 2 ).
i=1
158 CHAPTER 3
The powers of z appearing in the above product are of the form l = i + k2j + 2j + m2j+1 ,
where i = 0 · · · 2j − 1 and k, m ∈ Z. Thus, l cannot be a multiple of 2j+1 , and we have
shown that
2j+1
−1 i (j+1) 2j+1
j+1
P (z) = 1 + z Pi (z ),
i=1
J
x[n] =
2
(|X (J) 2
[2k]| + |X (j) [2k + 1]|2 ).
k∈Z j=1
V0 = l2 {Z}. (3.3.18)
VJ ⊂ · · · ⊂ V2 ⊂ V1 ⊂ V0 . (3.3.19)
3.3. TREE-STRUCTURED FILTER BANKS 159
6
J
Vj = V0 = l2 {Z}.
j=0
with Vj+1 ⊥ Wj+1 , where ⊕ denotes the direct sum (see Section 2.2.2). Assume
that there exists a sequence g0 [n] ∈ V0 such that
{g0 [n − 2k]}k∈Z
is a basis for V1 . Then, it can be shown that there exists a sequence g1 [n] ∈ V such
that
{g1 [n − 2k]}k∈Z
is a basis for W1 . Such a sequence is given by
V0 = W1 ⊕ W2 ⊕ · · · ⊕ WJ ⊕ VJ , (3.3.22)
..
.
V2
V1 V0
•••
VJ WJ ••• W2 W1
••• ω
π π π π π
----- ------------- --- ---
2J 2J – 1 4 2
Figure 3.10 Ideal division of the spectrum by the discrete-time wavelet series
using sinc filters. Note that the spectrums are symmetric around zero. Division
into Vi spaces (note how Vi ⊂ Vi−1 ), and resulting Wi spaces. (Actually, Vj
and Wj are of height 2j/2 , so they have unit norm).
Because we deal with ideal filters, there is an obvious frequency interpretation. How-
ever, one has to be careful with the boundaries between intervals. With our definition of
g0 [n] and g1 [n], cos((π/2)n)10 belongs to V1 while sin((π/2)n) belongs to W1 .
(j) (j)
Denote the equivalent filters by gi [n], i = 0, . . . , 2j − 1. In other words, gi is
the ith equivalent filter going through one of the possible paths of length j. The
ordering is somewhat arbitrary, and we will choose the one corresponding to a full
tree with a lowpass in the lower branch of each fork, and start numbering from the
bottom.
Example 3.10
Let us find all equivalent filters in Figure 3.11, or the filters corresponding to depth-1 and
depth-2 trees. Since we will be interested in the basis functions, we consider the synthesis
filter banks. For simplicity, we do it in z-domain.
(1) (1)
G0 (z) = G0 (z), G1 (z) = G1 (z),
(2) (2)
G0 (z) = G0 (z) G0 (z 2 ), G1 (z) = G0 (z) G1 (z 2 ), (3.3.23)
(2) 2 (2) 2
G2 (z) = G1 (z) G0 (z ), G3 (z) = G1 (z) G1 (z ). (3.3.24)
Note that with the ordering chosen in (3.3.23–3.3.24), increasing index does not always cor-
(2)
respond to increasing frequency. It can be verified that for ideal filters, G2 (ejω ) chooses
(2) jω
the range [3π/4, π], while G3 (e ) covers the range [π/2, 3π/4] (see Problem 3.16). Be-
side the identity basis, which corresponds to the no-split situation, we have four possible
orthonormal bases, corresponding to the four trees in Figure 3.11. Thus, we have a family
W = {W0 , W1 , W2 , W3 , W4 }, where W4 is simply {δ[n − k]}k∈Z .
(2) (2) (2) (2)
W0 = {g0 [n − 22 k], g1 [n − 22 k], g2 [n − 22 k], g3 [n − 22 k]}k∈Z ,
This small example should have given the intuition behind orthonormal bases
generated from tree-structured filter banks. In the general case, with filter banks of
depth J, it can be shown that, counting the no-split tree, the number of orthonormal
bases satisfies
2
MJ = MJ−1 + 1. (3.3.25)
Among this myriad of bases, there are the STFT-like basis, given by
(J) (J)
W0 = {g0 [n − 2J k], . . . , g2J −1 [n − 2J k]}k∈Z , (3.3.26)
3.4. MULTICHANNEL FILTER BANKS 163
It can be shown that the sets of basis functions in (3.3.26) and (3.3.27), as well as
in all other bases generated by the filter bank tree, are orthonormal (for example,
along the lines of the proof in the discrete-time wavelet series case). However, this
would be quite cumbersome. A more immediate proof is sketched here. Note that
we have a perfect reconstruction system by construction, and that the synthesis
and the analysis filters are related by time reversal. That is, the inverse operator
of the analysis filter bank (whatever its particular structure) is its transpose, or
equivalently, the overall filter bank is orthonormal. Therefore, the impulse responses
of all equivalent filters and their appropriate shifts form an orthonormal basis for
l2 (Z).
It is interesting to consider the time-frequency analysis performed by various
filter banks. This is shown schematically in Figure 3.12 for three particular cases
of binary trees. Note the different trade-offs in time and frequency resolutions.
Figure 3.13 shows a dynamic time-frequency analysis, where the time and fre-
quency resolutions are modified as time evolves. This is achieved by modifying the
frequency split on the fly [132], and can be used for signal compression as discussed
in Section 7.3.4.
f f
(a) (b)
t t
(c)
figtut3.2
Figure 3.12 Time-frequency analysis achieved by different binary subband
trees. The trees are on bottom, the time-frequency tilings on top. (a) Full tree
or STFT. (b) Octave-band tree or wavelet series. (c) Arbitrary tree or one
possible wavelet packet.
analyze such filter banks in a manner similar to Section 3.2. Therefore, the channel
signals, after filtering and sampling can be expressed as
⎛ .. ⎞
.
⎜ ⎟
⎜ y0 [0] ⎟
⎜ ⎟ ⎛ ⎞⎛ . ⎞
⎜ .. ⎟ .. .. ..
⎜ . ⎟ . .
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ yN −1 [0] ⎟ ⎜··· A0 0 · · · ⎟ ⎜ x[0] ⎟
⎜ ⎟ = ⎜ ⎟⎜ ⎟, (3.4.1)
⎜ y0 [1] ⎟ ⎝··· 0 A0 · · · ⎠ ⎝ x[1] ⎠
⎜ ⎟ .. ..
⎜ .. ⎟ ..
⎜ . ⎟ . . .
⎜ ⎟
⎝ yN −1 [1] ⎠
..
.
3.4. MULTICHANNEL FILTER BANKS 165
figtut3.3
yN-1
HN – 1 Ν
•••
Ν GN – 1
•••
•••
•••
x + x^
y1
H1 Ν
•••
Ν G1
y0
H0 Ν
•••
Ν G0
that is, we obtained the orthonormality relations for this case. Denoting by
ϕkN +i [n] = gi [n − kN ], we have that the set of basis functions {ϕkN +i [n]} =
{g0 [n − kN ], g1 [n − kN ], . . . , gN −1 [n − kN ]}, with i = 0, . . . , N − 1, and k ∈ Z, is
an orthonormal basis for l2 (Z).
Generalizations What we have seen in these two simple cases, is how to obtain
N-channel filter banks with filters of length N (block transforms) and filters of
length 2N (lapped orthogonal transforms). It is obvious that by allowing longer
filters, or more blocks Ai in (3.4.4), we can obtain general N-channel filter banks.
Defining the synthesis matrix as in (3.2.7), we obtain the basis functions of the dual
basis
ϕ̃N k+i [n] = gi [n − N k],
and they satisfy the following biorthogonality relations:
T s T a = I.
As was done in Section 3.2, we can define single operators for each branch. If the
operator H i represents filtering by hi followed by downsampling by N , its matrix
168 CHAPTER 3
representation is
⎛ ⎞
.. .. ..
⎜ . . . ⎟
⎜ · · · hi [L − 1] · · · hi [L − N ] hi [L − N − 1] · · · ⎟
⎜
Hi = ⎜ ⎟.
· · · 0 · · · 0 h [L − 1] · · · ⎟
⎝ i ⎠
.. .. ..
. . .
N −1
GTi H i = I.
i=0
We leave the details and proofs of the above relationships as an exercise (Problem
3.21), since they are simple extensions of the two-channel case seen in Section 3.2.
N −1
1 √
Y (z) = X(WNi z), WN = e−j2π/N , j= −1
N
i=0
because of the orthogonality of the roots of unity. Then, the output of the system
in Figure 3.14 becomes, in a similar fashion to (3.2.14)
1 T
X̂(z) = g (z) H m (z) xm (z),
N
the first one. To obtain perfect reconstruction, this only nonzero element has to be
equal to a scaled pure delay.
As in the two-channel case, it can be shown that the perfect reconstruction
condition is equivalent to the system being biorthogonal, as given earlier. The
proof is left as an exercise for the reader (Problem 3.21). For completeness, let us
define Gm (z) as the matrix with the ith row equal to
N −1
X(z) = z −j Xj (z N ),
j=0
where
∞
Xj (z) = x[nN + j] z −n .
n=−∞
The polyphase components of the synthesis filter gi are defined similarly, that is
N −1
Gi (z) = z −j Gij (z N ),
j=0
where
∞
Gij (z) = gi [nN + j] z −n .
n=−∞
N −1
Hi (z) = z j Hij (z N ), (3.4.7)
j=0
where
∞
Hij (z) = hi [nN − j] z −n . (3.4.8)
n=−∞
Putting it all together, the output of the analysis/synthesis filter bank in Figure 3.14
can be written as
X̂(z) = ( 1 z −1 z −1 ... z −N +1 ) · Gp (z N ) · H p (z N ) · xp (z N ).
Similarly to the two-channel case, we can define the transfer function matrix T p (z) =
Gp (z)H p (z). Then, the same results hold as in the two-channel case. Here, we just
state them (the proofs are N-channel counterparts of the two-channel ones).
(c) Given a critically sampled FIR analysis filter bank, perfect reconstruction
with FIR filters is possible if and only if det(H p (z)) is a pure delay.
3.4. MULTICHANNEL FILTER BANKS 171
Note that the modulation and polyphase representations are related via the Fourier
matrix. For example, one can verify that
⎛ ⎞
1
1 ⎜⎜ z ⎟
⎟ F xm (z),
xp (z N ) = ⎝ .. ⎠ (3.4.9)
N .
z N −1
where F kl = WNkl = e−j(2π/N )kl . Similar relationships hold between H m (z), Gm (z)
and H p (z), Gp (z), respectively (see Problem 3.22). The important point to note
is that modulation and polyphase matrices are related by unitary operations (such
as F and delays as in (3.4.9)).
Orthogonal Multichannel FIR Filter Banks Let us now consider the particular
but important case when the filter bank is unitary or orthogonal. This is an ex-
tension of the discussion in Section 3.2.3 to the N-channel case. The idea is to
implement an orthogonal transform using an N-channel filter bank, or in other
words, we want the following set:
{g0 [n − N K], . . . , gN −1 [n − N K]} , n ∈ Z
to be an orthonormal basis for l2 (Z). Then
gi [n − N k], gj [n − N l] = δ[i − j] δ[l − k]. (3.4.10)
Since in the orthogonal case analysis and synthesis filters are identical up to a time
reversal, (3.4.10) holds for hi [N k − l] as well. By using (2.5.19), (3.4.10) can be
expressed in z-domain as
N −1
Gi (WNk z) Gj (WN−k z −1 ) = N δ[i − j], (3.4.11)
k=0
or
GTm∗ (z −1 ) Gm (z) = N I,
where the subscript ∗ stands for conjugation of the coefficients but not of z (this is
necessary since Gm (z) has complex coefficients). Thus, as in the two-channel case,
having an orthogonal transform is equivalent to having a paraunitary modulation
matrix. Unlike the two-channel case, however, not all of the filters are obtained
from a single prototype filter.
Since modulation and polyphase matrices are related, it is easy to check that
having a paraunitary modulation matrix is equivalent to having a paraunitary
polyphase matrix, that is
GTm∗ (z −1 ) Gm (z) = N I ⇐⇒ GTp (z −1 ) Gp (z) = I. (3.4.12)
172 CHAPTER 3
Gi GTj = δ[i − j] I, i, j = 0, 1,
or
T Ta T a = I.
The above relations lead to a direct extension of Theorem 3.8, where the particular
case N = 2 was considered.
Thus, according to (3.4.12), designing an orthogonal filter bank with N channels
reduces to finding N × N paraunitary matrices. Just as in the two-channel case,
where we saw a lattice realization of orthogonal filter banks (see (3.2.60)), N ×
N paraunitary matrices can be parametrized in terms of cascades of elementary
matrices (2×2 rotations and delays). Such parametrizations have been investigated
by Vaidyanathan, and we refer to his book [308] for a thorough treatment. An
overview can be found in Appendix 3.A.2. As an example, we will see how to
construct three-channel paraunitary filter banks.
Example 3.11
We use the factorization given in Appendix 3.A.2, (3.A.8). Thus, we can express the 3 × 3
polyphase matrix as
⎡ ⎛ −1 ⎞ ⎤
"
K−1 z
Gp (z) = U 0 ⎣ ⎝ 1 ⎠ U i⎦ ,
i=1 1
where
⎛ ⎞⎛ ⎞
1 0 0 cos α01 0 − sin α01
U0 = ⎝ 0 cos α00 − sin α00 ⎠⎝ 0 1 0 ⎠
0 sin α00 cos α00 sin α01 0 cos α01
⎛ ⎞
cos α02 − sin α02 0
× ⎝ sin α02 cos α02 0 ⎠,
0 0 1
The degrees of freedom are given by the angles αij . To obtain the three analysis filters, we
upsample the polyphase matrix, and thus
To design actual filters, one could minimize an objective function as the one given in [306],
where the sum of all the stopbands was minimized.
3.4. MULTICHANNEL FILTER BANKS 173
HN−1 Μ yN-1
•••
•••
x Ν>Μ
H1 Μ y1
H0 Μ y0
T HEOREM 3.17
There are no finite-support bases with filters as in (3.4.13) (except trivial ones
with only N nonzero coefficients).
P ROOF
The proof consists in analyzing the polyphase matrix H p (z). Write the prototype filter
Hpr (z) in terms of its polyphase components (see (3.4.7–3.4.8))
N−1
Hpr (z) = z j Hprj (z N ),
j=0
where Fkl = WNkl = e−j(2π/N)kl . For FIR perfect reconstruction, the determinant of Hp (z)
has to be a delay (by Theorem 3.16). Now,
"
N−1
det(H p (z)) = c Hprj (z),
j=0
where c is a complex number equal to det(F ). Therefore, for perfect FIR reconstruction,
Hprj (z) has to be of the form αi · z −m , that is, the prototype filter has exactly N nonzero
coefficients. For an orthogonal solution, the αi ’s have to be unit-norm constants.
What happens if we relax the FIR requirement? For example, one can choose
the following prototype:
N −1
Hpr (z) = Pi (z N ) z i , (3.4.16)
i=0
where Pi (z) are allpass filters. The factorization (3.4.15) still holds, with Hpri (z) =
Pi (z), and since Pi (z −1 ) · Pi (z) = 1, H p (z) is paraunitary. While this gives an
orthogonal modulated filter bank, it is IIR (either analysis or synthesis will be
noncausal), and the quality of the filter in (3.4.16) can be poor.
Cosine Modulated Filter Banks The problems linked to complex modulated fil-
ter banks can be solved by using appropriate cosine modulation. Such cosine-
modulated filter banks are very important in practice, for example in audio com-
pression (see Section 7.2.2). Since they are often of length L = 2N (where N is the
downsampling rate), they are sometimes referred to as modulated LOT’s, or MLT’s.
A popular version was proposed in [229] and thus called the Princen-Bradley filter
bank. We will study one class of cosine modulated filter banks in some depth, and
refer to [188, 308] for a more general and detailed treatment. The cosine modulated
filter banks we consider here are a particular case of pseudoquadrature mirror filter
banks (PQMF) when the filter length is restricted to twice the number of channels
L = 2N . Pseudo QMF filters have been proposed as an extension to N channels
of the classical two-channel QMF filters. Pseudo QMF analysis/synthesis systems
achieve in general only cancellation of the main aliasing term (aliasing from neigh-
boring channels). However, when the filter length is restricted to L = 2N , they
can achieve perfect reconstruction. Due to the modulated structure and just as in
the STFT case, there are fast computational algorithms, making such filter banks
attractive for implementations.
A family of PQMF filter banks that achieves cancellation of the main aliasing
176 CHAPTER 3
0
k = 0 k = 1
1 1
-10
0.5 0.5
Magnitude response [dB]
2 6 10 14 2 6 10 14 -20
-0.5 -0.5
-30
-1 -1
k = 2 k = 3 -40
1 1
2 6 10 14 2 6 10 14
-60
-0.5 -0.5
(a) (b)
Example 3.12
Consider the case N = 8. The center frequency of the modulated filter hk [n] is (2k+1)2π/32,
and since this is a cosine modulation and the filters are real, there is a mirror lobe at
(32 − 2k − 1)2π/32. For the filters h0 [n] and h7 [n], these two lobes overlap to form a single
lowpass and highpass, respectively, while h1 [n], . . . , h6 [n] are bandpass filters. A possible
symmetric window of length 16 and satisfying (3.4.24) is given in Table 3.4, while the impulse
responses of the first four filters as well as the magnitude responses of all the modulated
filters are given in Figure 3.16.
Note that cosine modulated filter banks which are orthogonal have been recently
generalized to lengths L = KN where K can be larger than 2. For more details,
refer to [159, 188, 235, 308].
and perfect reconstruction is easily achievable. For example, in the FIR case if
H0 (z) and H1 (z) have no zeros in common (that is, the polynomials in z −1 are
coprime), then one can use Euclid’s algorithm [32] to find G0 (z) and G1 (z) such
that
G0 (z) H0 (z) + G1 (z) H1 (z) = 1
is satisfied leading to X̂(z) = X(z) in (3.5.1). Note how coprimeness of H0 (z) and
H1 (z), used in Euclid’s algorithm, is also a very natural requirement in terms of
signal processing. A common zero would prohibit FIR reconstruction, or even IIR
reconstruction (if the common zero is on the unit circle). Another case appears
when we have two filters G0 (z) and G1 (z) which have unit norm and satisfy
Writing this in time domain (see Example 5.2), we realize that the set {gi [n − k]},
i = 0, 1, and k ∈ Z, forms a tight frame for l2 (Z) with a redundancy factor R = 2.
The fact that {gi [n − k]} form a tight frame simply means that they can uniquely
represent any sequence from l2 (Z) (see also Section 5.3). However, the basis vectors
are not linearly independent and thus they do not form an orthonormal basis. The
redundancy factor indicates the oversampling rate; we can indeed check that it is
two in this case, that is, there are twice as many basis functions than actually needed
to represent sequences from l2 (Z). This is easily seen if we remember that until
now we needed only the even shifts of gi [n] as basis functions, while now we use the
odd shifts as well. Also, the expansion formula in a tight frame is similar to that in
the orthogonal case, except for the redundancy (which means the functions in the
expansion are not linearly independent). There is an energy conservation relation,
or Parseval’s formula, which says that the energy of the expansion coefficients equals
R times the energy of the original. In our case, calling yi [n] the output of the filter
hi [n], we can verify (Problem 3.26) that
To design such a tight frame for l2 (Z) based on filter banks, that is, to find solutions
to (3.5.2), one can find a unit norm12 filter G0 (z) which satisfies
that is, G0 (z) and G1 (z) are power complementary. Note that (3.5.2) is less restric-
tive than the usual orthogonal solutions we have seen in Section 3.2.3. For example,
odd-length filters are possible.
Of course, one can iterate such nondownsampled two-channel filter banks, and
get more general solutions.
7 In2particular,
8 by adding two-channel nondownsampled
2
filter banks with filters H0 (z ), H1 (z ) to the lowpass analysis channel and iter-
ating (raising z to the appropriate power) one can devise a discrete-time wavelet
12
Note that the unit norm requirement is not necessary for constructing a tight frame.
3.5. PYRAMIDS AND OVERCOMPLETE EXPANSIONS 181
coarse
version
~
H0 2 2 H0 V1
original difference
signal − signal
V0 + W1
of the same scale as the original. Also, a successive approximation flavor is easily
seen: One could start with the coarse version at level J, and by adding difference
signals, obtain versions at levels J − 1, . . . , 1, 0, (that is, the original).
An advantage of the pyramid scheme in image coding is that nonlinear inter-
polation and decimation operators can be used. A disadvantage, however, as we
have already mentioned, is that the scheme is oversampled, although the overhead
in number of samples decreases as the dimensionality increases. In n dimensions,
oversampling s as a function of the number of levels L in the pyramid is given by
L−1
1
i
2n
s = < , (3.5.4)
2n 2n − 1
i=0
d = (I − H T0 H 0 ) x.
I − H T0 H 0 = H T1 H 1 ,
that is, d is the projection onto the space spanned by {h1 [2k−n], k ∈ Z}. Therefore,
we can filter and downsample d by 2, since
H 1 H T1 H 1 = H 1 .
In that case, the redundancy of d is removed (d is now critically sampled) and the
pyramid is equivalent to an orthogonal subband coding system.
The signal d can be reconstructed by upsampling by 2 and filtering with h1 [n].
Then we have
H T1 (H 1 H T1 H 1 ) x = H T1 H 1 x = d
and this, added to x̄ = H T0 H 0 x, is indeed equal to x. In the notation of the
multiresolution scheme the prediction x̄ is the projection onto the space V1 and d
3.5. PYRAMIDS AND OVERCOMPLETE EXPANSIONS 183
is the projection onto W1 . This is indicated in Figure 3.17. We have thus shown
that pyramidal schemes can be critically sampled as well, that is, in Figure 3.17 the
difference signal can be followed by a filter h1 [n] and a downsampler by 2 without
any loss of information.
Note that we assumed an orthogonal filter and no quantization of the coarse
version. The benefit of the oversampled pyramid comes from the fact that arbitrary
filters (including nonlinear ones) can be used, and that quantization of the coarse
version does not influence perfect reconstruction (see Section 7.3.2).
This scheme is very popular in computer vision, not so much because perfect
reconstruction is desired but because it is a computationally efficient way to obtain
multiple resolution of an image. As a lowpass filter, an approximation to a Gaus-
sian, bell-shaped filter is often used and because the difference signal resembles the
original filtered by the Laplace operator, such a scheme is usually called a Laplacian
pyramid.
HN – 1 Μ CN − 1 Μ GN – 1
•••
•••
•••
x + x^
H1 Μ C1 Μ G1
H0 Μ C0 Μ G0
Generalizations The above two schemes are examples from a general class of
oversampled filter banks which compute running convolution. For example, the
pointwise multiplication in the above schemes can be replaced by a true convolu-
tion and will result in a longer overall convolution if adequately chosen. Another
possibility is to use analysis and synthesis filters based on fast convolution algo-
rithms other than Fourier ones. For more details, see [276, 317] and Section 6.5.1.
are obtained in this way, leading to fairly constrained designs (nonseparable filters of
size N1 ×N2 would offer N1 ·N2 free design variables versus N1 +N2 in the separable
case). Then, only rectangular divisions of the spectrum are possible, though one
might need divisions that would better capture the signal’s energy concentration
(for example, close to circular).
Choosing nonseparable solutions, while solving some of these problems, comes
at a price: the design is more difficult, and the complexity is substantially higher.
The first step toward using multidimensional techniques on multidimensional
signals is to use the same kind of sampling as before (that is, in the case of an im-
age, sample first along the horizontal and then along the vertical dimension), but use
nonseparable filters. A second step consists in using nonseparable sampling as well
as nonseparable filters. This calls for the development of a new theory that starts
by pointing out the major difference between one- and multidimensional cases —
sampling. Sampling in multiple dimensions is represented by lattices. An excellent
presentation of lattice sampling can be found in the tutorial by Dubois [86] (Ap-
pendix 3.B gives a brief overview). Filter banks using nonseparable downsampling
were studied in [11, 314]. The generalization of one-dimensional analysis methods
to multidimensional filter banks using lattice downsampling was done in [155, 325].
The topic has been quite active recently (see [19, 47, 48, 160, 257, 264, 288]).
In this section, we will give an overview of the field of multidimensional filter
banks. We will concentrate mostly on two cases: the separable case with down-
sampling by 2 in two dimensions, and the quincunx case, that is, the simplest
multidimensional nonseparable case with overall sampling density of 2. Both of
these cases are of considerable practical interest, since these are the ones mostly
used in image processing applications.
f2
π
H1 2 HH
H
H1L 2
H0 2 HL f1
−π π
x
H1 2 LH
−π
H0Η
H 2
H0 2 LL LL LH HL HH
horizontal
vertical
(a) (b)
Figure 3.19 Separable filter bank in two dimensions, with separable downsam-
FIGURE
pling by 2. (a) Cascade of horizontal and 3.15 fignew3.6.1
vertical decompositions. (b) Division
of the frequency spectrum.
n2 n2
n1 n1
(a) (b)
Figure 3.20 Two often used lattices. (a) Separable sampling by 2 in two
dimensions. (b) Quincunx sampling.
nonseparable filters. In other words, one could have a direct four-channel implemen-
tation of Figure 3.19 where the four filters could be H0 , H1 , H2 , H3 . While before,
Hi (z1 , z2 ) = Hi1 (z1 )Hi2 (z2 ) where Hi (z) is a one-dimensional filter, Hi (z1 , z2 ) is now a true
two-dimensional filter. This solution, while more general, is more complex to design and
implement. It is possible to obtain an orthogonal linear phase FIR solution [155, 156], which
cannot be achieved using separable filters (see Example 3.15 below).
xm (z1 , z2 ) =
( X(z1 , z2 ) X(−z1 , z2 ) X(z1 , −z2 ) X(−z1 , −z2 ) ) .
1 H (z , z ) H (−z , −z )
0 1 2 0 1 2
Y (z1 , z2 ) = G0 (z1 , z2 ) G1 (z1 , z2 )
2 H1 (z1 , z2 ) H1 (−z1 , −z2 )
X(z1 , z2 )
.
X(−z1 , −z2 )
Similarly to the one-dimensional case, it can be verified that the orthogonality of the system
is achieved when the lowpass filter satisfies
H0 (z1 , z2 )H0 (z1−1 , z2−1 ) + H0 (−z1 , −z2 )H0 (−z1−1 , −z2−1 ) = 2, (3.6.4)
that is, the lowpass filter is orthogonal to its shifts on the quincunx lattice. Then, a possible
highpass filter is given by
The synthesis filters are the same (within shift reversal, or Gi (z1 , z2 ) = Hi (z1−1 , z2−1 )). In
polyphase domain, define the two polyphase components of the filters as
Hi0 (z1 , z2 ) = hi [n1 + n2 , n1 − n2 ]z1−n1 z2−n2 ,
(n1 ,n2 )∈Z 2
Hi1 (z1 , z2 ) = hi [n1 + n2 + 1, n1 − n2 ]z1−n1 z2−n2 ,
(n1 ,n2 )∈Z 2
with
Hi (z1 , z2 ) = Hi0 (z1 z2 , z1 z2−1 ) + z1−1 Hi1 (z1 z2 , z1 z2−1 ).
The results on alias cancellation and perfect reconstruction are very similar to
their one-dimensional counterparts. For example, perfect reconstruction with FIR
filters is achieved if and only if the determinant of the analysis polyphase matrix is
a monomial, that is,
Cascade Structures When synthesizing filter banks, one of the most obvious
approaches is to try to find cascade structures that would generate filters of the
desired form. This is because cascade structures (a) usually have low complexity,
(b) higher-order filters are easily derived from lower-order ones, and (c) the coef-
ficients can be quantized without affecting the desired form. However, unlike in
one dimension, there are very few results on completeness of cascade structures in
multiple dimensions.
While cascades of orthogonal building blocks (that is, orthogonal matrices and
diagonal delay matrices) obviously will yield orthogonal filter banks, producing
linear phase solutions needs more care. For example, one can make use of the
linear phase testing condition given in [155] or [163] to obtain possible cascades.
As one of the possible approaches consider the generalization of the linear phase
cascade structure proposed in [155, 156, 321]. Suppose that a linear phase system
has been already designed and a higher-order one is needed. Choosing
H p (z) = R D(z) H p (z),
In the above, D is the matrix of delays containing ( 1 z1−1 z2−1 (z1 z2 )−1 ) along the
diagonal, and Ri and S 0 are scalar persymmetric matrices, that is, they satisfy
Ri = J Ri J . (3.6.6)
Equation (3.6.6) along with the requirement that the Ri be unitary, allows one to design fil-
ters being both linear phase and orthogonal. Recall that in the two-channel one-dimensional
case these two requirements are mutually exclusive, thus one cannot design separable filters
satisfying both properties in this four-channel two-dimensional case. This shows how using
a true multidimensional solution offers greater freedom in design. To obtain both linear
phase and orthogonality, one has to make sure that, on top of being persymmetric, matrices
Ri have to be unitary as well. These two requirements lead to
1 I I I R2i I I I
Ri = ,
2 J I −I R2i+1 I −I J
This cascade is a two-dimensional counterpart of the one given in [275, 321], and will be
shown to be useful in producing regular wavelets being both linear phase and orthonormal
[165] (see Chapter 4).
For the filters to be orthogonal the matrices Rji have to be unitary. To be linear, phase
matrices have to be symmetric. In the latter case the filters obtained will have opposite
symmetry. Consider, for example, the orthogonal case. The smallest lowpass filter obtained
from the above cascade would be
⎛ ⎞
−a1 −a0 a1
h0 [n1 , n2 ] = ⎝ −a2 −a0 a2 −a0 1⎠, (3.6.7)
a0 a1 a2 −a1 a2
where ai are free variables, and h0 [n1 , n2 ] is denormalized for simplicity. The highpass filter
is obtained by modulation and time reversal (see (3.6.5)). This filter, with some additional
constraints, will be shown to be the smallest regular two-dimensional filter (the counterpart
of the Daubechies’ D2 filter [71]). Note that this cascade has its generalization in more than
two dimensions (its one-dimensional counterpart is the lattice structure given in (3.2.60)).
3.6. MULTIDIMENSIONAL FILTER BANKS 191
L
H(ω) = a[n] cos(nω),
n=−L
192 CHAPTER 3
where a[0] = h[0] and a[n] = 2h[n], n = 0. Using Tchebycheff polynomials, one can
replace cos(nω) by Tn [cos(ω)], where Tn [.] is the nth Tchebycheff polynomial, and
thus H(ω) can be written as a polynomial of cos(ω)
L
H(ω) = a[n] Tn [cos(ω)].
n=−L
problem.
The dual problem is to start from some components and to synthesize a signal
from which the components can be recovered. This has some important appli-
cations, in particular in telecommunications. For example, several users share a
common channel to transmit information. Two obvious ways to solve the problem
are to either multiplex in time (each user receives a time slot out of a period) or
multiplex in frequency (each user gets a subchannel). In general, the problem can
be seen as one of designing (orthogonal) functions that are assigned to the different
users within a time window so that each user can use “his” function for signal-
ing (for example, by having it on or off). Since the users share the channel, the
functions are added together, but because of orthogonality,13 each user can mon-
itor “his” function at the receiving end. The next time period looks exactly the
same. Therefore, the problem is to design an orthogonal set of functions over a
window, possibly meeting some boundary constraints as well. Obviously, time- and
frequency-division multiplexing are just two particular cases.
Because of the fact that the system is invariant to shifts by a multiple of the
time window, it is also clear that, in discrete time, this is a multirate filter bank
problem. Below, we describe briefly the analysis of such systems, which is very
similar to its dual problem, as well as some applications.
•••
•••
•••
x0 Μ G0 H0 Μ x^0
(a)
y
x0 Μ + Μ x^0
x1 Μ z−1 z Μ x^1
Gp Hp
•••
•••
•••
•••
xN−1 Μ z−N+1 zN−1 Μ x^N−1
(b)
P ROPOSITION 3.18
In a transmultiplexer with polyphase matrices H p (z) and Gp (z), the following
holds:
The above result holds for any M and N . One can show that M ≥ N is a necessary
condition for crosstalk cancellation and perfect reconstruction. In the critical sam-
pling case, or M = N , there is a simple duality result between transmultiplexers
and analysis/synthesis systems seen earlier.
P ROPOSITION 3.19
In the critically sampled case (number of channels equal to sampling rate
change), a perfect reconstruction subband coding system is equivalent to a
perfect reconstruction transmultiplexer.
P ROOF
Since Gp (z)H p (z) = I and they are square, it follows that H p (z)Gp (z) = I as well.
Therefore, the design of perfect subband coding systems and of perfect transmul-
tiplexers is equivalent, at least in theory. A problem in the transmultiplexer case
is that the channel over which y is transmitted can be far from ideal. In order to
highlight the potential problem, consider the following simple case: Multiplex two
signals X0 (z) and X1 (z) by upsampling by 2, delaying the second one by 2 and
adding them. This gives a channel signal
Y (z) = X0 (z 2 ) + z −1 X1 (z 2 ).
behavior. Since a filter bank computes a form of frequency analysis, subband adap-
tive filtering is a version of frequency-domain adaptive filtering. See [263] for an
excellent overview on the topic.
We will briefly discuss a simple example. Assume that a filter with z-transform
F (z) is to be implemented in the subbands of a two-channel perfect reconstruction
filter bank with critical sampling. Then, it can be shown that the channel transfer
function between the analysis and synthesis filter banks, C(z), is not diagonal in
general [112]. That is, one has to estimate four components, two direct channel
components, and two crossterms. These components can be relatively short (es-
pecially the crossterms) and run at half the sampling rate, and thus, the scheme
can be computationally attractive. Yet, the crossterms turn out to be difficult to
estimate accurately (they correspond to aliasing terms). Therefore, it is more in-
teresting to implement an oversampled system, that is, decompose into N channels
and downsample by M < N . Then, the matrix C(z) can be well approximated by
a diagonal matrix, making the estimation of the components easier. We refer to
[112, 263], and to references therein for more details and discussions of applications
such as acoustic echo cancellation.
where the superscript ∗ stands for hermitian conjugation (note that H ∗ (ejω ) =
H T∗ (e−jω )). For the scalar case (single input/single output), lossless transfer func-
tions are allpass filters given by [211]
a(z)
F (z) = , (3.A.1)
z −k a∗ (z −1 )
where k = deg(a(z)) (possibly, there is a multiplicative delay and scaling factor
equal to cz −k ). Thus, to any zero at z = a corresponds a pole at z = 1/a∗ , that
is, at a mirror location with respect to the unit circle. This guarantees a perfect
transmission at all frequencies (in amplitude) and only phase distortion. It is easy
to verify that (3.A.1) is lossless (assuming all poles inside the unit circle) since
a∗ (z −1 ) a(z)
F∗ (z −1 ) F (z) = = 1.
z k a(z) z −k a∗ (z −1 )
Obviously, nontrivial scalar allpass functions are IIR, and are thus not linear phase.
Interestingly, matrix allpass functions exist that are FIR, and linear phase behavior
is possible. Trivial examples of matrix allpass functions are unitary matrices, as
well as diagonal matrices of delays.
Suppose now that L0 and L1 are not coprime, and call their common factor P (z), that is,
L0 (z) = P (z)L0 (z), L1 (z) = P (z)L1 (z). Substituting this into (3.A.3)
which for all zeros of P (z) goes to 0, contradicting the fact the right side is identically 1.
Consider (3.A.4). Since L0 and L1 , as well as L2 and L3 are coprime, we have that
L3 (z) = C1 z −K L̃0 (z) and L2 (z) = C2 z −K L̃1 (z) where K and K are large enough integers
to make L3 and L2 causal. Take now (3.A.5). This implies that K = K and C1 = −C2 .
Finally, (3.A.3) or (3.A.6) imply that C1 = ±1.
A very important point is that the above structure is complete, that is, all orthog-
onal systems with filters of length 2K can be generated in this fashion. The lattice
factorization was given in Figure 3.6.
xN−1 yN−1
..
.
.. .. ..
. . .
(a) UΚ−1 UΚ−2 U0
x1 y1
..
.
z−1 z−1 z−1
x0 y0
..
.
Ui
.. .. ..
. . .
(b)
.. ..
. .
.. .. ..
. . .
FIGURE 3.19
Figure 3.22 Factorization of a lossless matrix using Givens figA.1.1
rotations (after
[306]). (a) General lossless transfer matrix H(z) of size N ×N. (b) Constrained
orthogonal matrix for U1 , . . . , UK−1 , where each cross represents a rotation as
in (3.A.7).
where U 1 . . . U K−1 are special orthogonal matrices as given in Figure 3.22(b) (each
cross is a rotation as in (3.A.7)). U 0 is a general orthogonal matrix as given in
Figure 2.13 with n = N , and D(z) are delay matrices of the form
Such a general, real, lossless, FIR, N -input N -output system, is shown in Fig-
ure 3.22(a). Figure 3.22(b) $indicates
% the form of the matrices U 1 . . . U K−1 . Note
that U 0 is characterized by N2 rotations [202] while the other orthogonal matrices
200 CHAPTER 3
where a is the diagonal matrix of symmetries (+1 for a symmetric filter and −1
for an antisymmetric filter), L is the filter length and J is an antidiagonal matrix.
Note that there exist linear phase systems which cannot be described by (3.A.9)
but many useful solutions do satisfy it. The cascade is given by
"1
U 2i
H p (z) = S P W W D(z)
U 2i+1
i=K−1
U0
×W WP,
U1
where
1 S0 I J
S = √ ,
2 S1 I −J
is a unitary matrix. S 0 , S 1 are unitary matrices of size N/2,
I 1 I I I
P = , W = √ , D(z) = ,
J 2 I −I z −1 I
and U i are all size-(N/2) unitary matrices. Note that all subblocks in the above
matrices are of size N/2. In the same paper [275], the authors develop a cascade
structure for filter banks with an odd number of channels as well.
is unitary. This gives another way to parametrize lossless transfer function matrices.
In particular, H(z) will be FIR if A is lower triangular with a zero diagonal, and
thus, it is sufficient to find orthogonal matrices with an upper right triangular corner
of size K − 1 with only zeros to find all lossless transfer matrices of a given size and
degree [85].
points on the sampling lattice are kept while all the others are discarded. The
time- and Fourier-domain expressions for the output of a downsampler are given
by [86, 325]
y[n] = x[Dn],
1
Y (ω) = X((D t )−i (ω − 2πk)),
N t k∈Uc
Its Voronoi cell is a square and the corresponding critically sampled filter bank will have
N = det(D) = 4 channels. This is the case most often used in practice in image coding,
since it represents separable one-dimensional treatment of an image. Looking at it this way
(in terms of lattices), however, will give us the additional freedom to design nonseparable
filters even if sampling is separable. The expression for upsampling in this case is
Since its determinant equals 2, the corresponding critically sampled filter bank will have two
channels. The Voronoi cell for this lattice is a diamond (tilted square). Since the reciprocal
lattice for this case is again quincunx, its Voronoi cell will have the same diamond shape.
This fact has been used in some image and video coding schemes [12, 320] since, if restricted
to this region, (a) the spectrums of the signal and its repeated occurrences that appear due
to sampling will not overlap and (b) due to the fact that the human eye is less sensitive
to resolution along diagonals, it is more appropriate for the lowpass filter to have diagonal
cutoff. Note that the two vectors belonging to the unit cell are
0 1
n0 = , n1 = ,
0 0
while their z-domain counterparts are 1 and z1−1 and are the same for the unit cell of the
transposed lattice. Shifting the origin of the quincunx lattice to points determined by the
unit cell vectors yields the two cosets for this lattice. Obviously, their union gives back the
original lattice. Write now the expression for the output of an upsampler in Fourier domain
Y (ω1 , ω2 ) = X(ω1 + ω2 , ω1 − ω2 ).
1
Y (ω1 , ω2 ) = (X(ω1 , ω2 ) + X(ω1 + π, ω2 + π)).
2
It is easy to see that all the samples at locations where (n1 + n2 ) is even are kept, while
where (n1 + n2 ) is odd, they are put to zero.
PROBLEMS 205
P ROBLEMS
3.1 Orthogonality and completeness of the sinc basis (Section 3.1.3):
3.3 Show that Proposition 3.3 does not hold in the nonorthogonal case, that is, there exist
nonorthogonal time-invariant expansions with frequency selectivity.
3.5 Based on the fact that in an orthogonal FIR filter bank, the autocorrelation of the lowpass
filter satisfies P (z) + P (−z) = 2, show that the length of the filter has to be even.
3.6 For A(z) = (1+z)3 (1+z −1 )3 , verify that B(z) = 1/256(3z 2 −18z +38−18z −1 +3z −2 ) is the
solution such that P (z) = A(z) B(z) is valid. If you have access to adequate software (for
example, Matlab), do the spectral factorization (obviously, only B(z) needs to be factored).
Give the filters of this orthogonal filter bank.
3.8 Prove the three statements on the structure of linear phase solutions given in Proposition
3.11. Hint: Use P (z) = H0 (z) G0 (z) = z −k H0 (z) H1 (−z), and determine when it is valid.
3.9 Show that, when the filters H0 (z) and H1 (z) are of the same length and linear phase, the
linear phase testing condition given by (3.2.69), holds. Hint: Find out the form of the
polyphase components of each linear phase filter.
3.10 In Proposition 3.12, it was shown that there are no real symmetric/antisymmetric orthogonal
FIR filter banks.
(a) Show that if the filters can be complex valued, then solutions exist.
(b) For length-6 filters, find the solution with a maximum numbers of zeros at ω = π.
Hint: Refactor the P (z) that leads to the D3 filter into complex-valued symmet-
ric/antisymmetric filters.
3.11 Spectral factorization method for two-channel filter banks: Consider the factorization of P (z)
in order to obtain orthogonal or biorthogonal filter banks.
(a) Take
P (z) = −1/4z 3 + 1/2z + 1 + 1/2z −1 − 1/4z −3 .
Build an orthogonal filter bank based on this P (z). If the function is not positive on
the unit circle, apply an adequate correction (see Smith-Barnwell method in Section
3.2.3).
206 CHAPTER 3
3.12 Using Proposition 3.13, prove that the filter H0 (z) = (1+z −1 )N has always a complementary
filter.
3.13 Prove that in the orthogonal lattice structure, the sum of angles has to be equal to π/4
or 5π/4 in order to have one zero at ω = π in H0 (ejω ). Hint: There are several ways to
prove this, but an intuitive one is to consider the sequence x[n] = (−1)n at the input, or,
to consider z-transforms at z = ejω = −1. See also Example 3.3.
3.14 Interpolation followed by decimation: Given an input x[n], consider upsampling by 2, fol-
lowed by interpolation with a filter having z-transform H(z) for magnification of the signal.
Then, to recover the original signal size, apply filtering by a decimation filter G(z) followed
by downsampling by 2, in order to obtain a reconstruction x̂[n].
(a) What does the product filter P (z) = H(z) · G(z) have to satisfy in order for x̃[n] to
be a perfect replica of x[n] (possibly with a shift).
(b) Given an interpolation filter H(z), what condition does it have to satisfy so that one
can find a decimation filter G(z) in order to achieve perfect reconstruction. Hint:
This is similar to the complementary filter problem in Section 3.2.3.
(c) For the following two filters,
H (z) = 1 + z −1 + z −2 + z −3 , H (z) = 1 + z −1 + z −2 + z −3 + z −4 ,
give filters G (z) and G (z) so that perfect reconstruction is achieved (if possible, give
shortest such filter, if not, say why).
3.15 Prove the orthogonality relations (3.3.16) and (3.3.17) for an octave-band filter bank, using
similar arguments as in the proof of (3.3.15).
3.16 Consider tree-structured orthogonal filter banks as discussed in Example 3.10, and in par-
ticular the full tree of depth 2.
(2)
(a) Assume ideal sinc filters, and give the frequency response magnitude of Gi0 (ejω ), i =
0, . . . , 3. Note that this is not the natural ordering one would expect.
(2)
(b) Now take the Haar filters, and give gi [n], i = 0, . . . , 3. These are the discrete-time
Walsh-Hadamard functions of length 4.
(c) Given that {g0 [n], g1 [n]} is an orthogonal pair, prove orthogonality for any of the
equivalent filters with respect to shifts by 4.
3.17 In the general case of a full-grown binary tree of depth J, define the equivalent filters such
that their indexes increase as the center frequency increases. In Example 3.10, it would
(2) (2)
mean interchanging G3 with G2 (see (3.3.23)).
PROBLEMS 207
3.18 Show that in a filter bank with linear phase filters, the iterated filters are also linear phase.
In particular, consider the case where h0 [n] and h1 [n] are of even length, symmetric and
antisymmetric respectively. Consider a four-channel bank, with Ha (z) = H0 (z)H0 (z 2 ),
Hb (z) = H0 (z)H1 (z 2 ), Hc (z) = H1 (z)H0 (z 2 ), and Hd (z) = H1 (z)H1 (z 2 ). What are the
lengths and symmetries of these four filters?
3.19 Consider a general perfect reconstruction filter bank (not necessary orthogonal). Build a
tree-structured filter bank. Give and prove the biorthogonality relations for the equivalent
impulse responses of the analysis and synthesis filters. For simplicity, consider a full tree of
depth 2 rather than an arbitrary tree. Hint: The method is similar to the orthogonal case,
except that now analysis and synthesis filters are involved.
3.20 Prove that the number of wavelet packet bases generated from a depth-J binary tree is
equal to (3.3.25).
3.21 Prove that the perfect reconstruction condition given in terms of the modulation matrix for
the N -channel case, is equivalent to the system being biorthogonal. Hint: Mimic the proof
for the two-channel case given in Section 3.2.1.
3.22 Give the relationship between Gp (z) and Gm (z), which is similar to (3.4.9), as well as
between H p (z) and H m (z) and this in the general N -channel case.
3.23 Consider a modulated filter bank with filters H0 (z) = H(z), H1 (z) = H(W3 z), and H2 (z) =
H(W32 z). The modulation matrix H m (z) is circulant. (Note that W3 = e−j2π/3 ).
(a) Prove that (3.4.5–3.4.6) hold for the cosine modulated filter bank with filters given in
(3.4.18) and hpr [n] = 1, n = 0, . . . , 2N − 1.
(b) Prove that in this case (3.4.23) holds as well.
Hint: Show that left and right tails are symmetric/antisymmetric, and thus the tails are
orthogonal.
3.25 Orthogonal pyramid: Consider a pyramid decomposition as discussed in Section 3.5.2 and
shown in Figure 3.17. Now assume that h[n] is an “orthogonal” filter, that is, h[n], h[n −
2l] = δl . Perfect reconstruction is achieved by upsampling the coarse version, filtering it
by h̃, and adding it to the difference signal.
(a) Analyze the above system in time domain and in z-transform domain, and show
perfect reconstruction.
√ √
(b) Take h[n] = (1/ 2)[1, 1]. Show that y1 [n] can be filtered by (1/ 2)[1, −1] and
downsampled by 2 while still allowing perfect reconstruction.
(c) Show that (b) is equivalent
√ to a two-channel
√ perfect reconstruction filter bank with
filters h0 [n] = (1/ 2)[1, 1] and h1 [n] = (1/ 2)[1, −1].
208 CHAPTER 3
(d) Show that (b) and (c) are true for general orthogonal lowpass filters, that is, y1 [n] can
be filtered by g[n] = (−1)n h[−n + L − 1] and downsampled by 2, and reconstruction
is still perfect using an appropriate filter bank.
3.26 Verify Parseval’s formula (3.5.3) in the tight frame case given in Section 3.5.1.
3.27 Consider a two-dimensional two-channel filter bank with quincunx downsampling. Assume
that H0 (z1 , z2 ) and H1 (z1 , z2 ) satisfy (3.6.4–3.6.5). Show that their impulse responses with
shifts on a quincunx lattice form an orthonormal basis for l2 (Z ∈ ).
3.28 Linear phase diamond-shaped quincunx filters: We want to construct a perfect reconstruc-
tion linear phase filter bank for quincunx sampling and the matrix
1 1
D = .
1 −1
To that end, we start with the following filters h0 [n1 , n2 ] and h1 [n1 , n2 ]:
⎛ ⎞
b
h0 [n1 , n2 ] = ⎝ 1 a 1 ⎠,
b
⎛ 1 ⎞
⎜ b+ c
a b+ c
⎟
⎜ a a
⎟
h1 [n1 , n2 ] = ⎜ bc
c d c bc
⎟,
⎝ a
c c
a
⎠
b+ a
a b+ a
1
(a) Using the sampling matrix above, identify the polyphase components and verify that
perfect FIR reconstruction is possible (the determinant of the polyphase matrix has
to be a monomial).
(b) Instead of only having top-bottom, left-right symmetry, impose circular symmetry on
the filters. What are b, c? If a = −4, d = −28, what type of filters do we obtain
(lowpass/highpass)?
4
209
210 CHAPTER 4
wavelet series. We also discuss local Fourier series and the construction of local
cosine bases, which are “good” modulated bases [61]. Note that in this chapter we
construct bases for L2 (R); however, these bases have much stronger characteristics
as they are actually unconditional bases for Lp spaces, 1 < p < ∞ [73].
The development of wavelet orthonormal bases has been quite explosive in the
last decade. While the initial work focused on the continuous wavelet transform
(see Chapter 5), the discovery of orthonormal bases by Daubechies [71], Meyer
[194], Battle [21, 22], Lemarié [175], Stromberg [283], and others, lead to a wealth
of subsequent work.
Compactly supported wavelets, following Daubechies’ construction, are based
on discrete-time filter banks, and thus many filter banks studied in Chapter 3
can lead to wavelets. We list below, without attempting to be exhaustive, a few
such constructions. Cohen, Daubechies and Feaveau [58] and Vetterli and Her-
ley [318, 319] considered biorthogonal wavelet bases. Bases with more than one
wavelet were studied by Zou and Tewfik [343, 344], Steffen, Heller, Gopinath and
Burrus [277], and Soman, Vaidyanathan and Nguyen [275], among others. Mul-
tidimensional, nonseparable wavelets following from filter banks were constructed
by Cohen and Daubechies [57] and Kovačević and Vetterli [163]. Recursive filter
banks leading to wavelets with exponential decay were derived by Herley and Vet-
terli [133, 130]. Rioul studied regularity of iterated filter banks [239], complexity of
wavelet decomposition algorithms [245], and design of “good” wavelet filters [246].
More constructions relating filter banks and wavelets can be found, for example, in
the work of Akansu and Haddad [3, 4], Blu [33], Cohen [55], Evangelista [96, 95],
Gopinath [115], Herley [130], Lawton [170, 171], Rioul [240, 242, 243, 244] and
Soman and Vaidyanathan [274].
The study of the regularity of the iterated filter that leads to wavelets was done
by Daubechies and Lagarias [74, 75], Cohen [55], and Rioul [239] and is related to
work on recursive subdivision schemes which was done independently of wavelets
(see [45, 80, 87, 92]). The regularity condition and approximation property occur-
ring in wavelets are related to the Strang-Fix condition first derived in the context
of finite-element methods [282].
Direct wavelet constructions followed the work of Meyer [194], Battle [21, 22]
and Lemarié [175]. They rely on the multiresolution framework established by
Mallat [181, 179, 180] and Meyer [194]. In particular, the case of wavelets related
to splines was studied by Chui [52, 49, 50] and by Aldroubi and Unser [7, 296, 297].
The extension of the wavelet construction for rational rather than integer dilation
factors was done by Auscher [16] and Blu [33]. Approximation properties of wavelet
expansions have been studied by Donoho [83], and DeVore and Lucier [82]. These
results have interesting consequences for compression.
4.1. DEFINITION OF THE PROBLEM 211
The computation of the wavelet series coefficients using filter banks was studied
by Mallat [181, 179] and Shensa [261], among others. Wavelet sampling theorems
are given by Aldroubi and Unser [6], Walter [328] and Xia and Zhang [340]. Local
cosine bases were derived by Coifman and Meyer [61] (see also [17]). The wave-
let framework has also proven useful in the context of analysis and synthesis of
stochastic processes, see for example [20, 178, 338, 339].
The material in this chapter is covered in more depth in Daubechies’ book [73]
to which we refer for more details. Our presentation is less formal and based mostly
on signal processing concepts.
The outline of the chapter is as follows: First, we discuss series expansions in
general and the need for structured series expansion with good time and frequency
localization. In particular, the local Fourier series is contrasted with the Haar
expansion and a proof that the Haar system is an orthonormal basis for L2 (R) is
given. In Section 4.2, we introduce multiresolution analysis and show how a wavelet
basis can be constructed. As an example, the sinc (or Littlewood-Paley) wavelet is
derived. Section 4.3 gives wavelet bases constructions in the Fourier domain, using
the Meyer and Battle-Lemarié wavelets as important examples. Section 4.4 gives
the construction of wavelets based on iterated filter banks. The regularity (condi-
tions under which filter banks generate wavelet bases) of the discrete-time filters is
studied. In particular, the Daubechies’ family of compactly supported wavelets is
given. Section 4.5 discusses some of the properties of orthonormal wavelet series
expansions as well as the computation of the expansion coefficients. Variations on
the theme of wavelets from filter banks are explored in Section 4.6, where biorthog-
onal bases, wavelets based on IIR filter banks and wavelets with integer dilation
factors greater than 2 are given. Section 4.7 discusses multidimensional wavelets
obtained from multidimensional filter banks. Finally, Section 4.8 gives an interest-
ing alternative to local Fourier series in the form of local cosine bases which have
better time-frequency behavior than their Fourier counterparts.
In the last chapter orthonormal bases were built for discrete-time sequences, that
is, sets of orthogonal sequences {ϕk [n]}k∈Z were found such that any signal x[n] ∈
l2 (Z) could be written as
∞
x[n] = ϕk [m], x[m] ϕk [n],
k=−∞
212 CHAPTER 4
where
∞
ϕk [m], x[m] = ϕ∗k [m] x[m].
m=−∞
where ∞
ϕk (u), f (u) = ϕ∗k (u) f (u) du.
−∞
In other words, f (t) can be written as the sum of its orthogonal projections onto
the basis vectors ϕk (t). Beside having to meet orthonormality constraints, or
the set {ϕk (t)} has also to be complete. Its span has to cover the space of functions
to be represented.
We start by briefly reviewing two standard series expansions that were studied
in Section 2.4. The better-known series expansion is certainly the Fourier series.
A periodic function, f (t + nT ) = f (t), can be written as a linear combination of
sines and cosines or complex exponentials, as
∞
f (t) = F [k] ej(2πkt)/T , (4.1.1)
k=−∞
that is, the Fourier transform of one period evaluated at integer multiples of ω0 =
2π/T . It is easy to see that the set of functions {ej(2πkt)/T , k ∈ Z, ∈ [−T /∈, T /∈]}
is an orthogonal set, that is,
The other standard series expansion is that of bandlimited signals (see also
Section 2.4.5). Provided that |X(ω)| = 0 for |ω| ≥ ωs /2 = π/T , then sampling
x(t) by multiplying with Dirac impulses at integer multiples of T leads to the
function xs (t) given by
∞
xs (t) = x(nT ) δ(t − nT ).
n=−∞
The Fourier transform of xs (t) is periodic with period ωs and is given by (see Section
2.4.5)
∞
1
Xs (ω) = X(ω − kωs ). (4.1.3)
T
k=−∞
From (4.1.3) it follows that the Fourier transforms of x(t) and xs (t) coincide over the
interval (−ωs /2, ωs /2) (up to a scale factor), that is, X(ω) = T Xs (ω), |ω| < ωs /2.
Thus, to reconstruct the original signal X(ω), we have to window the sampled signal
spectrum Xs (ω), or X(ω) = G(ω)Xs (ω), where G(ω) is the window function
T |ω| < ωs /2,
G(ω) =
0 otherwise.
is called the sinc function.1 In time domain, we convolve the sampled function xs (t)
with the window function g(t) to recover x(t):
∞
x(t) = xs (t) ∗ g(t) = x(nT ) sincT (t − nT ). (4.1.5)
n=−∞
This is usually referred to as the sampling theorem (see Section 2.4.5). Note that
the interpolation functions {sincT (t − nT )}n∈Z , form an orthogonal set, that is
Then, since x(t) is bandlimited, the process of sampling at times nT can be written
as
1
x(nT ) = sincT (u − nT ), x(u),
T
1
The standard definition from the digital
√ signal processing literature is used here, even if it
would make sense to divide the sinc by 1/ T to make it of unit norm.
214 CHAPTER 4
or convolving x(t) with sincT (−t) and sampling the resulting function at times nT .
Thus, (4.1.5) is an expansion of a signal into an orthogonal basis
∞
1
x(t) = sincT (u − nT ), x(u) sincT (t − nT ). (4.1.6)
T n=−∞
(b) Desirable localization properties in both time and frequency, that is, appro-
priate decay in both domains.
(c) Invariance under certain elementary operations (for example, shifts in time).
However, some of the above requirements conflict with each other and ultimately,
the application at hand will greatly influence the choice of the basis.
In addition, it is often desirable to look at a signal at different resolutions, that
is, both globally and locally. This feature is missing in classical Fourier analysis.
Such a multiresolution approach is not only important in many applications (ranging
from signal compression to image understanding), but is also a powerful theoretical
framework for the construction and analysis of wavelet bases as alternatives to
Fourier bases.
In order to satisfy some of the above requirements, let us first review how one
can modify Fourier analysis so that local signal behavior in time can be seen even
4.1. DEFINITION OF THE PROBLEM 215
where
√
1/ T ej2πn(u−mT )/T u ∈ [mT − T /2, mT + T /2),
ϕm,n (u) =
0 otherwise.
√
The 1/ T factor makes the basis functions of unit norm. The expansion x̂(t)
is equal to x(t) almost everywhere (except at t = (m + 1/2)T ) and thus, the L2
norm of the difference x(t)− x̂(t) is equal to zero. We call this transform a piecewise
Fourier series.
Consider what has been achieved. The expansion in (4.1.7) is valid for arbitrary
functions. Then, instead of an integral expansion as in the Fourier transform, we
have a double-sum expansion, and the set of basis functions is orthonormal and
complete. Time locality is now achieved and there is some frequency localization
(not very good, however, because the basis functions are rectangular windowed
sinusoids and therefore discontinuous; their Fourier transforms decay only as 1/ω).
In terms of time-frequency resolution, we have the rectangular tiling of the time-
frequency plane that is typical of the short-time Fourier transform (as was shown
in Figure 2.12(b)).
However, there is a price to be paid. The size of the interval T (that is, the
location of the boundaries) is arbitrary and leads to problems. The reconstruction
216 CHAPTER 4
x̂(t) has singular points even if x(t) is continuous and the transform of x(t) can have
infinitely many “high frequency” components even if x(t) is a simple sinusoid (for
example, if its period Ts is such that Ts /T is irrational). Therefore, the expansion
will converge slowly to the function. In other words, if one wants to approximate
the signal with a truncated series, the quality of the approximation will depend on
the choice of T . In particular, the convergence at points of discontinuity (created
by periodization) is poor due to the Gibbs phenomenon [218]. Finally, a shift of
the signal can lead to completely different transform coefficients and the transform
is thus time-variant.
In short, we have gained the flexibility of a double-indexed transform indicating
time and frequency, but we have lost time invariance and convergence is sometimes
poor. Note that some of these problems are inherent to local Fourier bases and can
be solved with local cosine bases discussed in Section 4.8.
and the whole set of basis functions is obtained by dilation and translation as
We call m the scale factor, since ψm,n (t) is of length 2m , while n is called the shift
factor, and the shift is scale dependent (ψm,n (t) is shifted by 2m n). The normal-
ization factor 2−m/2 makes ψm,n (t) of unit norm. The Haar wavelet is shown in
Figure 4.1(c) (part (a) shows the scaling function which will be introduced shortly).
A few of the basis functions are shown in Figure 4.2(a). It is easy to see that the set
4.1. DEFINITION OF THE PROBLEM 217
0.5
Magnitude response
0.8
Amplitude
0
0.6
-0.5 0.4
0.2
-1
0 0.5 1 1.5 2 0
20 40 60 80 100
Time
Frequency [radians]
(a) (b)
0.5
Magnitude response 0.8
Amplitude
0 0.6
0.4
-0.5
0.2
-1
0
0 0.5 1 1.5 2 20 40 60 80 100
Time Frequency [radians]
(c) (d)
is orthonormal. At a given scale, ψm,n (t) and ψm,n (t) have no common support.
Across scales, even if there is common support, the larger basis function is constant
over the support of the shorter one. Therefore, the inner product amounts to the
average of the shorter one which is zero (see Figure 4.2(b)). Therefore,
The advantage of these basis functions is that they are well localized in time (the
support is finite). Actually, as m → −∞, they are arbitrarily sharp in time, since
the length goes to zero. That is, a discontinuity (for example, a step in a function)
will be localized with arbitrary precision. However, the frequency localization is not
very good since the Fourier transform of (4.1.8) decays only as 1/ω when ω → ∞.
The basis functions are not smooth, since they are not even continuous.
218 CHAPTER 4
1
t
1
-------
2
t
(a)
1
-------
2
t
(b)
T HEOREM 4.1
The set of functions {ψm,n (t)}m,n∈Z , with ψ(t) and ψm,n (t) as in (4.1.8–4.1.9),
is an orthonormal basis for L2 (R).
4.1. DEFINITION OF THE PROBLEM 219
f (0) f (2)
(a) (d)
-8 -7 -6 -5 -4 -3 6 7 8 -8 -7 -6 -5 -4
-2-1 0 1 2 3 4 5 t -3 -2 -1 0 1 2 3 4 5 6 7 8 t
f (1)
(b) (e) d (2)
-8 -7 -6 -5 -4 -3 7 8 -8 -7 -3 7 8
-1 0 1 2 3 4 5 t -5 -1 1 3 4 5 t
P ROOF
The idea is to consider functions which are constant on intervals [n2−m0 , (n + 1)2−m0 ) and
which have finite support on [−2m1 , 2m1 ), as shown in Figure 4.3(a). By choosing m0 and
m1 large enough, one can approximate any L2 (R) function arbitrarily well. Call such a
piecewise constant function f (−m0 ) (t). Introduce a unit norm indicator function for the
interval [n2−m0 , (n + 1)2−m0 )
m0
2 2 n2−m0 ≤ t < (n + 1)2−m0 ,
ϕ−m0 ,n (t) = (4.1.10)
0 otherwise.
This is called the scaling function in the Haar case. Obviously, f (−m0 ) (t) can be written as
a linear combination of indicator functions from (4.1.10)
N−1
f (−m0 ) (t) = fn(−m0 ) ϕ−m0 ,n (t), (4.1.11)
n=−N
(−m )
where N = 2m0 +m1 , and fn 0 = 2−m0 /2 f (−m0 ) (n · 2−m0 ). Now comes the key step:
Examine two intervals [2n · 2−m0 , (2n + 1)2−m0 ) and [(2n + 1) · 2−m0 , (2n + 2)2−m0 ). The
function over these two intervals is from (4.1.11)
(−m0 ) (−m )
f2n ϕ−m0 ,2n (t) + f2n+10 ϕ−m0 ,2n+1 (t). (4.1.12)
However, the same function can be expressed as the average over the two intervals plus the
difference needed to obtain (4.1.12). The average is given by
(−m0 ) (−m )
f2n + f2n+10 √
· 2 · ϕ−m0 +1,n (t),
2
220 CHAPTER 4
This decomposition in local “average” and “difference” is shown in Figures 4.3(b) and (c)
respectively. In order to obtain f (−m0 +2) (t) plus some linear combination of ψ−m0 +2,n (t),
one can iterate the averaging process on the function f (−m0 +1) (t) exactly as above (see
Figures 4.3(d),(e)). Repeating the process until the average is over intervals of length 2m1
leads to
m1 2m1
−m
−1
f (−m0 ) (t) = f (m1 ) (t) + d(m)
n ψm,n (t). (4.1.13)
m=−m0 +1 n=−2m1 −m
The function f (m1 ) (t) is equal to the average of f (−m0 ) (t) over the intervals [−2m1 , 0) and
(m )
[0, 2m1 ), respectively (see Figure 4.3(f)). Consider the right half, which equals f0 1 from
(m1 ) m1 /2
0 to 2 . It has L2 norm equal to |f0
m1
|2 . This function can further be decom-
posed as the average over the interval [0, 2m1 +1 ) plus a Haar function. The new average
(m ) √ (m )
function has norm (|f0 1 |2m1 /2 / 2 = |f0 1 |2(m1 −1)/2 (since there is no contribution from
m1 m1 +1
[2 , 2 )). Iterating this M times shows that the norm of the average function decreases
(m ) (m )
as (|f0 1 |2m1 /2 )/2M/2 = |f0 1 |2(m1 −M )/2 . The same argument holds for the left side as
(−m0 )
well and therefore, f (t) can be approximated from (4.1.13), as
m1 +M
2m1
−m
−1
f (−m0 ) (t) = d(m)
n ψm,n (t) + εM ,
m=−m0 +1 n=−2m1 −m
(m ) (m )
where εM = (|f−1 1 | + |f0 1 |) · 2(m1 −M )/2 . The approximation error
M can thus be
(m )
made arbitrarily small since |fn 1 |, n = −1, 0, are bounded and M can be made arbitrarily
large. This, together with the fact that m0 and m1 can be arbitrarily large completes the
proof that any L2 (R) function can be represented as a linear combination of Haar wavelets.
4.1. DEFINITION OF THE PROBLEM 221
The key in the above proof was the decomposition into a coarse approximation
(the average) and a detail (the difference). Since the norm of the coarse version
goes to zero as the scale goes to infinity, any L2 (R) function can be represented
as a succession of multiresolution details. This is the crux of the multiresolution
analysis presented in Section 4.2 and will prove to be a general framework, of which
the Haar case is a simple but enlightening example.
Let us point out a few features of the Haar case above. First, we can define
spaces Vm of piecewise constant functions over intervals of length 2m . Obviously,
Vm is included in Vm−1 , and an orthogonal basis for Vm is given by ϕm and its shifts
by multiples of 2m . Now, call Wm the orthogonal complement of Vm in Vm−1 . An
orthogonal basis for Wm is given by ψm and its shifts by multiples of 2m . The proof
above relied on decomposing V−m0 into V−m0 +1 and W−m0 +1 , and then iterating
the decomposition again on V−m0 +1 and so on. It is important to note that once
we had a signal in V−m0 , the rest of the decomposition involved only discrete-time
computations (average and difference operations on previous coefficients). This is
a fundamental and attractive feature of wavelet series expansions which holds in
general, as we shall see.
4.1.4 Discussion
As previously mentioned, the Haar case (seen above) and the sinc case (in Section
4.2.3) are two extreme cases, and the purpose of this chapter is to construct “in-
termediate” solutions with additional desirable properties. For example, Figure 4.4
shows a wavelet constructed first by Daubechies [71] which has finite (compact)
support (its length is L = 3, that is, less local than the Haar wavelet which has
length 1) but is continuous and has better frequency resolution than the Haar wave-
let. While not achieving a frequency resolution comparable to the sinc wavelet, its
time resolution is much improved since it has finite length. This is only one of many
possible wavelet constructions, some of which will be shown in more detail later.
We have shown that it is possible to construct series expansions of general
functions. The resulting tiling of the time-frequency plane is different from that
of a local Fourier series. It has the property that high frequencies are analyzed
with short basis functions, while low frequencies correspond to long basis functions.
While this trade-off is intuitive for many “natural” functions or signals, it is not the
only one; therefore, alternative tilings will also be explored. One elegant property
of wavelet type bases is the self-similarity of the basis functions, which are all
obtained from a single prototype “mother” wavelet using scaling and translation.
This is unlike local Fourier analysis, where modulation is used instead of scaling.
The basis functions and the associated tiling for the local Fourier analysis (short-
time Fourier transform) were seen in Figures 2.12 (a) and (b). Compare these to the
wavelet-type tiling and the corresponding basis functions given in Figures 2.12(c)
222 CHAPTER 4
1.25
1
1
Magnitude response
0.8
0.75
Amplitude
0.5 0.6
0.25
0.4
0
0.2
-0.25
(a) (b)
1.5
1
Magnitude response
0.8
0.5
Amplitude
0.6
0
0.4
-0.5
0.2
-1
(c) (d)
and (d) where scaling has replaced modulation. One can see that a dyadic tiling
has been obtained.
Note that this multiresolution approach, pioneered by Mallat [180] and Meyer
[194], is not only a set of tools for deriving wavelet bases, but also a mathematical
framework which is very useful in conceptualizing problems linked to wavelet and
subband decompositions of signals. We will also see that multiresolution analysis
leads to particular orthonormal bases, with basis functions being self-similar at
different scales. We will also show that a multiresolution analysis leads to the two-
scale equation property and that some special discrete-time sequences play a special
role in that they are equivalent to the filters in an orthogonal filter bank.
such that
{ϕ(t − n) | n ∈ Z} (4.2.6)
Remarks
(a) If we denote by ProjVm [f (t)], the orthogonal projection of f (t) onto Vm , then
(4.2.2) states that limm→−∞ ProjVm [f (t)] = f (t).
(b) The multiresolution notion comes into play only with (4.2.4), since all the
spaces are just scaled versions of the central space V0 [73].
(c) As seen earlier for the Haar case, the function ϕ(t) in (4.2.6) is called the
scaling function.
(d) Using the Poisson formula, the orthonormality of the family {ϕ(t − n)}n∈Z
as given in (4.2.6) is equivalent to the following in the Fourier domain (see
(2.4.31)):
∞
|Φ(ω + 2kπ)|2 = 1. (4.2.7)
k=−∞
(e) Using (4.2.4–4.2.6), one obtains that {2m/2 ϕ(2m t − n) | n ∈ Z} is a basis for
V−m .
(f) The orthogonality of ϕ(t) is not necessary, since a nonorthogonal basis (with
the shift property) can always be orthogonalized [180] (see also Section 4.3.2).
we obtain
√ ∞
−jωt
Φ(ω) = ϕ(t)e dt = 2 g0 [n] ϕ(2t − n)e−jωt dt
n=−∞
√ ∞
1
= 2 g0 [n] ϕ(t)e−jωt/2 e−jωn/2 dt
n=−∞
2
∞
1
= √ g0 [n] e−j(ω/2)n ϕ(t)e−j(ω/2)t dt
2 n=−∞
1
= √ G0 (ejω/2 ) Φ(ω/2), (4.2.9)
2
where
G0 (ejω ) = g0 [n] e−jωn .
n∈Z
Note that (4.2.10) was already given in (3.2.54) (again a hint that there is a strong
connection between discrete and continuous time). Equation (4.2.10) can be proven
by using (4.2.7) for 2ω:
∞
|Φ(2ω + 2kπ)|2 = 1. (4.2.11)
k=−∞
1 1
= |G0 (ejω )|2 |Φ(ω + 2kπ)|2 + |G0 (ej(ω+π) )|2 |Φ(ω + (2k + 1)π)|2
2 2
k k
1
= (|G0 (ejω )|2 + |G0 (ej(ω+π) )|2 ),
2
which completes the proof of (4.2.10). With a few restrictions on the Fourier trans-
form Φ(ω) (bounded, continuous in ω = 0, and Φ(0) = 0), it can be shown that
G0 (ejω ) satisfies
√
|G0 (1)| = 2
G0 (−1) = 0
(see Problem 4.3). Note that the above restrictions on Φ(ω) are always satisfied in
practice.
Vm−1 = Vm ⊕ Wm .
Also, due to the scaling property of the Vm spaces (4.2.4), there exists a scaling property
for the Wm spaces as well:
Our aim here is to explicitly construct2 a wavelet ψ(t) ∈ W0 , such that ψ(t − n), n ∈ Z is
an orthonormal basis for W0 . If we have such a wavelet ψ(t), then by the scaling property
(4.2.13), ψm,n (t), n ∈ Z will be an orthonormal basis for Wm . On the other hand, (4.2.12)
together with upward/downward completeness properties (4.2.2–4.2.3), imply that {ψm,n },
m, n ∈ Z is an orthonormal basis for L2 (R), proving the theorem. Thus, we start by
constructing the wavelet ψ(t), such that ψ ∈ W0 ⊂ V−1 . Since ψ ∈ V−1
√
ψ(t) = 2 g1 [n]ϕ(2t − n). (4.2.14)
n∈Z
1 ω
Ψ(ω) = √ G1 (ejω/2 ) · Φ , (4.2.15)
2 2
where G1 (ejω ) is a 2π-periodic function from L2 ([0, 2π]). The fact that ψ(t) belongs to W0 ,
which is orthogonal to V0 , implies that
or equivalently,
2π
ejωk dω Ψ(ω + 2πl) Φ∗ (ω + 2πl) = 0.
0 l
Now substitute (4.2.9) and (4.2.15) into (4.2.16) and split the sum over l into two sums over
even and odd l’s
1
G1 (ej(ω/2+2lπ) ) Φ(ω/2 + 2lπ) G∗0 (ej(ω/2+2lπ) ) Φ∗ (ω/2 + 2lπ)
2
l
1
+ G1 (ej(ω/2+(2l+1)π) ) Φ(ω/2 + (2l + 1)π) G∗0 (ej(ω/2+(2l+1)π) ) Φ∗ (ω/2 + (2l + 1)π)
2
l
= 0.
However, since G0 and G1 are both 2π-periodic, substituting Ω for ω/2 gives
G1 (ejΩ ) G∗0 (ejΩ ) |Φ(Ω + 2lπ)|2 + G1 (ej(Ω+π) ) G∗0 (ej(Ω+π) ) |Φ(Ω + (2l + 1)π)|2 = 0.
l l
2
Note that the wavelet we construct is not unique.
228 CHAPTER 4
Using now (4.2.7), the sums involving Φ(ω) become equal to 1, and thus
Note how (4.2.17) is the same as (3.2.48) in Chapter 3 (on the unit circle). Again, this dis-
plays the connection between discrete and continuous time. Since G∗0 (ejω ) and G∗0 (ej(ω+π) )
cannot go to zero at the same time (see (4.2.10)), it means that
λ(ejω ) + λ(ej(ω+π) ) = 0.
1
Ψ(ω) = − √ e−jω/2 G∗0 (ej(ω/2+π) ) Φ(ω/2), (4.2.19)
2
√
ψ(t) = 2 (−1)n g0 [−n + 1] ϕ(2t − n).
n∈Z
To prove that this wavelet, together with its integer shifts, indeed generates an orthonormal
basis for W0 , one would have to prove the orthogonality of basis functions
ψ0,n (t) as well as
completeness; that is, that any f (t) ∈ W0 can be written as f (t) = n αn ψ0,n . This part
is omitted here and can be found in [73], pp. 134-135.
∞
f (m) ∈ Vm ⇔ f (m) = fn(m) ϕm,n (t).
n=−∞
4.2. MULTIRESOLUTION CONCEPT AND ANALYSIS 229
The process of taking the average over two successive intervals creates a function f (m+1) ∈
Vm+1 (since it is a function which is constant over intervals [n2m+1 , (n + 1)2m+1 )). Also, it
is clear that
Vm+1 ⊂ Vm .
The averaging operation is actually an orthogonal projection of f (m) ∈ Vm onto Vm+1 , since
the difference d(m+1) = f (m) − f (m+1) is orthogonal to Vm+1 (the inner product of d(m+1)
with any function from Vm+1 is equal to zero). In other words, d(m+1) belongs to a space
Wm+1 which is orthogonal to Vm+1 . The space Wm+1 is spanned by translates of ψm+1,n (t)
∞
d(m+1) ∈ Wm+1 ⇔ d(m+1) = d(m+1)
n ψm+1,n (t).
n=−∞
This difference function is again the orthogonal projection of f (m) onto Wm+1 . We have
seen that any function f (m) can be written as an “average” plus a “difference” function
Vm = Vm+1 ⊕ Wm+1
Repeating the process (decomposing Vm+1 into Vm+2 ⊕ Wm+2 and so on), the following is
obtained:
Vm = Wm+1 ⊕ Wm+2 ⊕ Wm+3 ⊕ · · ·
Since piecewise constant functions are dense in L2 (R), as the step size goes to zero (4.2.2) is
satisfied as well as (4.2.12), and thus the Haar wavelets form a basis for L2 (R). Now, let us
see how we can construct the Haar wavelet using the technique from the previous section.
As we said before, the basis for V0 is {ϕ(t − n)}n∈Z with
1 0 ≤ t < 1,
ϕ(t) =
0 otherwise.
1 + ej(ω+π) 1 − e−jω
G1 (ejω ) = −e−jω G0 (ej(ω+π) ) = −e−jω √ = √ ,
2 2
230 CHAPTER 4
sin πt
ϕ(t) = ,
πt
which is thus the scaling function for the sinc case and the space V0 of functions bandlimited
to [−π, π]. Using (4.2.9) one gets that
1 sin(πn/2)
g0 [n] = √ , (4.2.21)
2 πn/2
that is, √
2 − π2 ≤ ω ≤ π
,
G0 (ejω ) = 2
0 otherwise,
or, G0 (ejω ) is an ideal lowpass filter. Then G1 (ejω ) becomes (use (4.2.18))
√ −jω
− 2e ω ∈ [−π, − π2 ] ∪ [ π2 , π],
G1 (ejω ) =
0 otherwise,
3
In the mathematical literature, this is often referred to as the Littlewood-Paley wavelet [73].
4.2. MULTIRESOLUTION CONCEPT AND ANALYSIS 231
..
. V1
V0
... V-1
... W1 W0 ...
... ω
π π 2π
---
2
which is an ideal highpass filter with a phase shift. The sequence g1 [n] is then
g1 [n] = (−1)n g0 [−n + 1], (4.2.22)
whereupon √
ψ(t) = 2 (−1)−n+1 g0 [n] ϕ(2t + n − 1).
n
Alternatively, we can construct the wavelet directly by taking the inverse Fourier transform
of the indicator function of the intervals [−2π, −π] ∪ [π, 2π]:
−π 2π
1 1 sin(2πt) sin(πt) sin(πt/2)
ψ(t) = ejωt dω + ejωt dω = 2 − = cos(3πt/2).
2π −2π 2π π 2πt πt πt/2
(4.2.23)
This function is orthogonal to its translates by integers, or ψ(t), ψ(t − n) = δ[n], as
can be verified using Parseval’s formula (2.4.11). To be coherent with our definition of W0
(which excludes cos(πt)), we need to shift ψ(t) by 1/2, and thus {ψ(t − n − 1/2)}, n ∈ Z,
is an orthogonal basis for W0 . The wavelet basis is now given by
; <
ψm,n (t) = 2−m/2 ψ(2−m t − n − 1/2) , m, n ∈ Z,
To conclude this section, we summarize the expressions for the scaling function and
the wavelet as well as their Fourier transforms in Haar and sinc cases in Table 4.1.
The underlying discrete-time filters were given in Table 3.1.
232 CHAPTER 4
1.5
1
1.25
Magnitude response
0.5
Amplitude
0.75
0
0.5
0.25
-0.5
-1
-15 -10 -5 0 5 10 15 2 4 6 8
Time Frequency [radians]
(a) (b)
1.5
1
1.25
1
Magnitude response
0.5
Amplitude
0.75
0
0.5
0.25
-0.5
-1
-15 -10 -5 0 5 10 15 2 4 6 8
Time Frequency [radians]
(c) (d)
Figure 4.6 Scaling function and FIGURE 4.6 in the sinc case. (a) fignew4.2.2
the wavelet Scaling
function ϕ(t). (b) Fourier transform magnitude |Φ(ω)|. (c) Wavelet ψ(t). (d)
Fourier transform magnitude |Ψ(ω)|.
What we have seen until now, is the conceptual framework for building orthonormal
bases with the specific structure of multiresolution analysis, as well as two particular
cases of such bases: Haar and sinc. We will now concentrate on ways of building
such bases in the Fourier domain. Two constructions are indicated, both of which
rely on the multiresolution framework derived in the previous section. First, Meyer’s
wavelet is derived, showing step by step how it verifies the multiresolution axioms.
Then, wavelets for spline spaces are constructed. In this case, one starts with
the well-known spaces of piecewise polynomials and shows how to construct an
orthonormal wavelet basis.
4.3. CONSTRUCTION OF WAVELETS USING FOURIER TECHNIQUES 233
Haar Sinc
1 0 ≤ t < 1, sin πt
ϕ(t) πt
⎧ 0 otherwise.1
⎨ 1 0 ≤ t < 2,
sin(π(t/2−1/4))
ψ(t) −1 12 ≤ t < 1, π(t/2−1/4) cos(3π(t/2 − 1/4))
⎩
0 otherwise.
1 |ω| < π,
Φ(ω) e−jω/2 sinω/2
ω/2
0 otherwise.
−jω/2
2 −e π ≤ |ω| < 2π,
Ψ(ω) je−jω/2 (sinω/4
ω/4)
0 otherwise.
θ(x)
Φ(ω)
θ ⎛⎝ 2 + 3------ω-⎞⎠ 3 ω⎞
θ ⎛⎝ 2 – ------
-⎠
2π 2π
1
2
1 -------
--- 2
2
x ω
1 1 4π −π – -----
2π 2π π 4-----π-
--- – ------ - ------
2 3 3 3 3
(a) (b)
The idea behind Meyer’s wavelet is to soften the ideal — sinc case. Recall that
the sinc scaling function and the wavelet are as given in Figure 4.6. The idea of
the proof is to construct a scaling function ϕ(t) that satisfies the orthogonality and
scaling requirements of the multiresolution analysis and then construct the wavelet
using the standard method. In order to soften the sinc scaling function, we find a
smooth function (in frequency) that satisfies (4.2.7).
We are going to show the construction step by step, leading first to the scaling
function and then to the associated wavelet.
(a) Start with a nonnegative function θ(x) that is differentiable (maybe several
234 CHAPTER 4
ω
10 π 8π 4π 2π 2π π 4π 8π 10 π
– --------- −3π – ------ −2π – ------ −π – ------ ------ ------ 2π ------ 3π ---------
3 3 3 3 3 3 3 3
2
Figure 4.8 Pictorial proof that {ϕ(t − FIGURE
n)}n∈Z4.8form an orthonormal
fignew4.3.2family in L (R).
(b) Construct the scaling function Φ(ω) such that (see Figure 4.7(b))
⎧ =
⎨ θ(2 + 3ω
ω ≤ 0,
= 2π )
Φ(ω) =
⎩ θ(2 − 3ω
0 ≤ ω.
2π )
(c) {ϕ(t − n)}n∈Z is an orthonormal family from L2 (R). To that end, we use the
Poisson formula and instead show that (see (4.2.7))
|Φ(ω + 2kπ)|2 = 1. (4.3.3)
k∈Z
From Figure 4.8 it is clear that for ω ∈ [−(2π/3) − 2nπ, (2π)/3 − 2nπ]
|Φ(ω + 2kπ)|2 = |Φ(ω + 2nπ)|2 = 1.
k
4.3. CONSTRUCTION OF WAVELETS USING FOURIER TECHNIQUES 235
The only thing left is to show (4.3.3) holds in overlapping regions. Thus, take
for example, ω ∈ [(2π)/3, (4π)/3]:
3ω 3(ω − 2π)
Φ(ω) + Φ(ω − 2π)
2 2
= θ 2− +θ 2+
2π 2π
3ω 3ω
= θ 2− + θ −1 +
2π 2π
3ω 3ω
= θ 2− +θ 1− 2−
2π 2π
= 1.
Now we are ready to show that the Vm ’s form a multiresolution analysis. Until
now, by definition, we have taken care of (4.2.4–4.2.6), those left to be shown
are (4.2.1–4.2.3).
Φ(ω)
2π π 4-----π- 2π 3π 4π ω
------
3 3
G0(ejω) 2Φ(2ω) 2Φ(2ω − 4π) 2 Φ(2ω − 8π)
π 2π π 2π 3π 4π ω
--- ------
3 3
Φ(2ω) Φ(ω) G0(ejω)/ 2
π 2π π 2π 3π 4π ω
--- ------
3 3
√
L2 ([0, 2π]) such that Φ(2ω) = (1/ 2)G0 (ejω )Φ(ω) (see 4.2.9). Then choose
√
G0 (ejω ) = 2 Φ(2ω + 4kπ). (4.3.4)
k∈Z
f, ϕm,n = 0, m, n ∈ Z ⇒ { = ,
then
f, ϕm,n = 0 ⇐⇒ F (2m (ω + 2kπ)) Φ∗ (ω + 2kπ) = 0.
k∈Z
and for k = 0
F (2m ω) Φ(ω) = 0.
For any m
F (2m ω) = 0, ω ∈ [− 2π 2π
3 , 3 ] ,
and thus
F (ω) = 0, ω ∈ R,
or
f = 0.
> >
(g) Show (4.2.3): If f ∈ m∈Z Vm then F ∈ m∈Z F {Vm } where F {Vm } is the
−m
Fourier transform of Vm with the basis 2m/2 e−jkω2 Φ(2−m ω). Since Φ(2−m ω)
has its support in the interval
4π m 4π m
I= − 2 , 2 ,
3 3
In other words, 9
F (ω) ∈ F {Vm } = 0,
m∈Z
or
f (t) = 0.
4.3. CONSTRUCTION OF WAVELETS USING FOURIER TECHNIQUES 237
|Φ(ω/2)|
8π
−3π – ----- −2π 4π 2π 2π π 4-----π- 8 π 3π ω
- – ------ −π – ------ ------ 2π ------
3 3 3 3 3 3
|G0(ejω)|/ 2
Φ(ω + 2π) Φ(ω − 2π)
8π
−3π – ----- −2π 4π 2π 2π π 4-----π- 8π ω
- – ------ −π – ------ ------ 2π ------ 3π
3 3 3 3 3 3
|Ψ(ω)|
4π 2π 2π ω
8π
−3π – -----
- −2π – ------ −π – ------ ------ π 4-----π- 2π 8 π 3π
------
3 3 3 3 3 3
(h) Finally,
√ one just has to find the corresponding wavelet using (4.2.19): Ψ(ω) =
−(1/ 2) e−jω/2 G∗0 (ej(ω/2+π) ) Φ(ω/2).
and Ψ(ω) is an even function of ω (except for a phase factor e−jw/2). Note
that (see Problem 4.4)
|Ψ(2k ω)|2 = 1. (4.3.6)
k∈Z
0.8
0.8
Magnitude response
0.6 0.6
Amplitude
0.4
0.4
0.2
0.2
0
-0.2 0
-3 -2 -1 0 1 2 3 -9.42 -6.28 -3.14 0.0 3.14 6.28 9.42
Time Frequency [radians]
(a) (b)
0.5
0.6
Amplitude
0 0.4
0.2
-0.5
0
-3 -2 -1 0 1 2 3 -9.42 -6.28 -3.14 0.0 3.14 6.28 9.42
Time Frequency [radians]
(c) (d)
Figure 4.11 Meyer’s scaling FIGURE the wavelet. (a) Scaling fignew4.3.6
4.11
function and function
ϕ(t). (b) Fourier transform magnitude |Φ(ω)|. (c) Wavelet ψ(t). (d) Fourier
transform magnitude |Ψ(ω)|.
sequence g0 [n] which has similarly fast decay. However, G0 (ejω ) is not a rational
function of ejω and thus, the filter g0 [n] cannot be efficiently implemented. Thus,
Meyer’s wavelet is more of theoretical interest.
intervals [k2i , (k + 1)2i ) are obviously also piecewise polynomial over subintervals
[k2j , (k + 1)2j ], j < i. Second, there exist simple bases for such spaces, namely the
B-splines. Call:
⎧ ⎫
⎨ functions which are piecewise polynomial of degree l ⎬
(l)
Vi = over intervals [k2i , (k + 1)2i ) and having l − 1 .
⎩ ⎭
continuous derivatives at k2i , k ∈ Z
(1)
For example, V−1 is the space of all functions which are linear over half-integer
intervals and continuous at the interval boundaries. Consider first, the spaces with
(l)
unit intervals, that is, V0 . Then, bases for these spaces are given by the B-splines
[76, 255]. These are obtained by convolution of box functions (indicator functions
of the unit interval) with themselves. For example, the hat function, which is a
box function convolved with itself, is a (nonorthogonal) basis for piecewise linear
(1)
functions over unit intervals, that is V0 .
The idea of the wavelet construction is to start with these nonorthogonal bases
(l)
for the V0 ’s and apply a suitable orthogonalization procedure in order to get an
orthogonal scaling function. Then, the wavelet follows from the usual construction.
Below, we follow the approach and notation of Unser and Aldroubi [6, 298, 299, 296].
Note that the relation between splines and digital filters has also been exploited in
[118].
Call I(t) the indicator function of the interval [−1/2, 1/2] and I (k) (t) the k-time
convolution of I(t) with itself, that is, I (k) (t) = I(t) ∗ I (k−1) (t), I (0) (t) = I(t).
Denote by β (N ) (t) the B-spline of order N where
The shift by 1/2 in (4.3.9) is necessary so that the nodes of the spline are at integer
intervals. The first few examples, namely N = 0 (constant spline), N = 1 (linear
spline), and N = 2 (quadratic spline) are shown in Figure 4.12.
240 CHAPTER 4
1.5 1.5
1.25 1.25
1 1
Amplitude
Amplitude
0.75 0.75
0.5 0.5
0.25 0.25
0 0
(a) (b)
1.5
1.25
1
Amplitude
0.75
0.5
0.25
(c)
Orthogonalization Procedure While the B-spline β (N ) (t) and its integer trans-
(N )
lates form a basis for V0 , it is not an orthogonal basis (except for N = 0).
Therefore, we have to apply an orthogonalization procedure. Recall that a function
f (t) that is orthogonal to its integer translates satisfies (see (4.2.7))
f (t), f (t − n)n∈Z = δ[n] ⇐⇒ |F (ω + 2kπ)|2 = 1.
k∈Z
In this case4 B (2N +1) (ω) is the discrete-time Fourier transform of the discrete-time
B-spline b(2N +1) [n], which is the sampled version of the continuous-time B-spline
[299], !
!
b(2N +1) [n] = β (2N +1) (t)! . (4.3.12)
t=n
(N )
Because {β (N ) (t − n)} is a basis for V0 , one can show that there exist two positive
constants A and C such that [71]
B (N ) (ω)
Φ(ω) =
. (4.3.14)
B (2N +1) (ω)
Because of (4.3.13), Φ(ω) is well defined. Obviously
1
|Φ(ω + 2kπ)|2 = |B (N ) (ω + 2kπ)|2 = 1,
k
B (2N +1) (ω) k
(N )
and thus, the set {ϕ(t − n)} is orthogonal. That it is a basis for V0 follows
(N )
from the fact that (from (4.3.14)) β (t) can be written as a linear combination of
(N )
ϕ(t−n) and therefore, since any f (t) ∈ V0 can be written in terms of β (N ) (t−n),
it can be expressed in terms of ϕ(t − n) as well.
Now, both β (N ) (t) and ϕ(t) satisfy a two-scale equation because they belong to
(N ) (N )
V0 and thus V−1 ; therefore, they can be expressed in terms of β (N ) (2t − n) and
ϕ(2t − n), respectively. In Fourier domain we have
ω ω
B (N ) (ω) = M B (N ) , (4.3.15)
2 2
1 ω
Φ(ω) = √ G0 (ejω/2 ) Φ , (4.3.16)
2 2
where we used (4.2.9) for Φ(ω). Using (4.3.14) and (4.3.15), we find that
that is,
D(ω)
corresponds to an orthogonal scaling function for V0 and the rest of the procedure
follows as above.
Orthonormal Wavelets for Spline Spaces We will apply the method just de-
scribed to construct wavelets for spaces of piecewise polynomial functions intro-
duced at the beginning of this section. This construction was done by Battle [21, 22]
and Lemarié [175], and the resulting wavelets are often called Battle-Lemarié wave-
lets. Earlier work by Stromberg [283, 284] also derived orthogonal wavelets for
piecewise polynomial spaces. We will start with a simple example of the linear
spline, given by
(1) 1 − |t| |t| ≤ 1,
β (t) =
0 otherwise.
It satisfies the following two-scale equation:
1 (1) 1
β (1) (t) = β (2t + 1) + β (1) (2t) + β (1) (2t − 1). (4.3.21)
2 2
The Fourier transform, from (4.3.7), is
2
sin(ω/2)
B (1)
(ω) = . (4.3.22)
ω/2
4.3. CONSTRUCTION OF WAVELETS USING FOURIER TECHNIQUES 243
In order to find B (2N +1) (ω) (see (4.3.11)), we note that its inverse Fourier transform
is equal to
2π
1
b(2N +1) = ejnω |B (N ) (ω + 2πk)|2 dω
2π 0
k∈Z
∞
1
= ejnw |B (N ) (ω)|2 dω
2π −∞
∞
= β (N ) (t) β (N ) (t − n) dt, (4.3.23)
−∞
by Parseval’s formula (2.4.11). In the linear spline case, we find b(3) [0] = 2/3 and
b(3) [1] = b(3) [−1] = 1/6, or
2 1 jω 1 −jω 2 1 2 ω
B (3) (ω) = + e + e = + cos(ω) = 1 − sin2 ,
3 6 6 3 3 3 2
which is the discrete-time cubic spline [299]. From (4.3.14) and (4.3.22), one gets
sin2 (ω/2)
Φ(ω) = ,
(ω/2)2 (1 − (2/3) sin2 (ω/2))1/2
(1)
which is an orthonormal scaling function for the linear spline space V0 .
Observation of the inverse Fourier transform of the 2π-periodic function
(1 − (2/3) sin2 (ω/2))1/2 , which corresponds to a sequence {αn }, indicates that ϕ(t)
can be written as a linear combination of {β (1) (t − n)}:
ϕ(t) = αn β (1) (t − n).
n∈Z
This function is thus piecewise linear as can be verified in Figure 4.13(a). Taking
the Fourier transform of the two-scale equation (4.3.21) leads to
ω
1 −j ω 1 1 jω/2 1 ω ω
B (ω) =
(1)
e 2 + + e B (1) = 1 + cos B (1) ,
4 2 4 2 2 2 2
1
1.2
1 0.8
Magnitude response
0.8
0.6
Amplitude
0.6
0.4 0.4
0.2
0.2
-0.2 0
-3 -2 -1 0 1 2 3 -18.8 -12.5 -6.28 0.0 6.28 12.5 18.8
Time Frequency [radians]
(a) (b)
1.5
0.8
1
Magnitude response
0.6
Amplitude
0.5
0.4
0
0.2
-0.5
0
-3 -2 -1 0 1 2 3 -18.8 -12.5 -6.28 0.0 6.28 12.5 18.8
Time Frequency [radians]
(c) (d)
Figure 4.13 Linear spline basis. (a) Scaling function ϕ(t). (b) Fourier trans-
form magnitude |Φ(ω)|. (c) Wavelet ψ(t). (d) Fourier transform magnitude
|Ψ(ω)|.
FIGURE 4.13 fignew4.3.8
where the definition of Q(ω), which is 4π-periodic, follows from (4.3.24). Taking
the inverse Fourier transform of (4.3.25) leads to
ψ(t) = q[n] β (1) (2t − n),
n∈Z
with the sequence {q[n]} being the inverse Fourier transform of Q(ω). Therefore,
ψ(t) is piecewise linear over half-integer intervals, as can be seen in Figure 4.13(b).
In this simple example, the multiresolution approximation is particularly clear.
(1)
As said at the outset, V0 is the space of functions piecewise linear over integer
(1)
intervals, and likewise, V−1 has the same property but over half-integer intervals.
(1) (1) (1)
Therefore, W0 (which is the orthogonal complement to V0 in V−1 ) contains
(1) (1)
the difference between a function in V−1 and its approximation in V0 . Such a
difference is obviously piecewise linear over half-integer intervals.
(1)
With the above construction, we have obtained orthonormal bases for V0 and
(1)
W0 as the sets of functions {ϕ(t−n)} and {ψ(t−n)} respectively. What was given
up, however, is the compact support that β (N ) (t) has. But it can be shown that the
scaling function and the wavelet have exponential decay. The argument begins with
the fact that ϕ(t) is a linear combination of functions β (N ) (t − n). Because β (N ) (t)
has compact support, a finite number of functions from the set {β (N ) (t − n)}n∈Z
contribute to ϕ(t) for a given t (for example, two in the linear spline case). That
L−1
is, |ϕ(t)| is of the same order as | l=0 αk+l | where k = t. Now, {αk } is the
impulse response of a stable filter (noncausal in general) because it has no poles
on the unit circle (this follows from (4.3.13)). Therefore, the sequence αk decays
exponentially and so does ϕ(t). The same argument holds for ψ(t) as well. For a
formal proof of this result, see [73]. While the compact support of β (N ) (t) has been
lost, the fast decay indicates that ϕ(t) and ψ(t) are concentrated around the origin,
as is clear from Figures 4.13(a) and (c). The above discussion on orthogonalization
was limited to the very simple linear spline case. However, it is clear that it works
for the general B-spline case since it is based on the orthogonalization (4.3.14). For
example, the quadratic spline, given by
3
−jω/2 sin(ω/2)
B (2)
(ω) = e , (4.3.26)
(ω/2)
Note that instead of taking a square root of B (2N +1) (ω) in the orthogonaliza-
tion of B (N ) (ω) (see (4.3.14)), one can use spectral factorization which leads to
wavelets based on IIR filters [133, 296] (see also Section 4.6.2 and Problem 4.8).
Alternatively, it is possible to give up intrascale orthogonality (but keep interscale
orthogonality). See [299] for such a construction where a possible scaling function
is a B-spline. One advantage of keeping a scaling function that is a spline is that,
as the order increases, its localization in time and frequency rapidly approaches the
optimum since it tends to a Gaussian [297].
An interesting limiting result occurs in the case of orthogonal wavelets for B-
spline space. As the order of splines goes to infinity, the scaling function tends to
the ideal lowpass or sinc function [7, 175]. In our B-spline construction with N = 0
and N → ∞, we thus recover the Haar and sinc cases discussed in Section 4.2.3 as
extreme cases of a multiresolution analysis.
2 G1
2 G1 +
2 G1 + 2 G0
+ 2 G0
2 G0
As seen earlier, the Haar and sinc cases are two particular examples which are duals
of each other, or two extreme cases. Both are useful to explain the iterated filter
bank construction. The Haar case is most obvious in time domain, while the sinc
case is immediate in frequency domain.
Haar Case Consider the discrete-time Haar filters (see also Section 4.1.3). The
lowpass is the average of two neighboring samples, while the highpass is√their √ dif-
ference. The corresponding
√ √ orthogonal filter bank has filters g0 [n] = [1/ 2, 1/ 2]
and g1 [n] = [1/ 2, −1/ 2] which are the basis functions of the discrete-time Haar
expansion. Now consider what happens if we iterate the filter bank on the lowpass
channel, as shown in Figure 4.14. In order to derive an equivalent filter bank, we
recall the following result from multirate signal processing (Section 2.5.3): Filtering
by g0 [n] followed by upsampling by two is equivalent to upsampling by two, followed
by filtering by g0 [n], where g0 [n] is the upsampled version of g0 [n].
Using this equivalence, we can transform the filter-bank tree into one equivalent
to the one depicted in Figure 3.8 where we assumed three stages and Haar filters. It
is easy to verify that this corresponds to an orthogonal filter bank (it is the cascade
of orthogonal filter banks). This is a size-8 discrete Haar transform on successive
blocks of 8 samples. Iterating the lowpass channel in Figure 4.14 i times, will lead
to the equivalent last two filters
(i) 2−i/2 n = 0, . . . , 2i − 1,
g0 [n] =
0 otherwise,
248 CHAPTER 4
⎧ −i/2
⎨ 2 n = 0, . . . , 2i−1 − 1,
(i)
g1 [n] = −2−i/2 n = 2i−1 , . . . , 2i − 1,
⎩
0 otherwise,
(i)
ψ (i) (t) = 2i/2 g1 [n] n
2i
≤t< n+1
2i
.
These functions are piecewise constant and because the interval diminishes at
(i) (i)
the same speed as the length of g0 [n] and g1 [n] increases, their lengths remain
bounded.
For example, ϕ(3) (t) and ψ (3) (t) (the functions associated with the two bottom
filters of Figure 3.8) are simply the indicator functions of the [0, 1] interval and
the difference between the indicator functions of [0, 12 ] and [ 12 , 1], respectively. Of
course, in this particular example, it is clear that ϕ(i) (t) and ψ (i) (t) are all identical,
regardless of i. What is also worth noting is that ϕ(i) (t) and ψ (i) (t) are orthogonal
to each other and to their translates. Note that
ϕ(i) (t) = 21/2 (g0 [0] ϕ(i−1) (2t) + g0 [1] ϕ(i−1) (2t − 1))
Sinc Case Recall the sinc case (see Example 4.2). Take an orthogonal filter bank
where the lowpass and highpass filters are ideal half-band filters. The impulse
response of the lowpass filter is
1 sin(π/2n)
g0 [n] = √ , (4.4.2)
2 π/2n
(see also (4.2.21)) which is orthogonal√to its even translates and of norm 1. Its 2π-
periodic Fourier transform is equal to 2 for |ω| ≤ π/2, and 0 for π/2 < |ω| < π. A
perfect half-band highpass can be obtained by modulating g0 [n] with (−1)n , since
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 249
this shifts the passband by π. For completeness, a shift by one is required as well.
Thus (see (4.2.22))
g1 [n] = (−1)(n) g0 [−n + 1].
Its 2π-periodic Fourier transform is
√ −jω
jω − 2e π/2 ≤ |ω| ≤ π,
G1 (e ) = (4.4.3)
0 0 ≤ |ω| < π/2.
Now consider the iterated filter bank as in Figure 4.14 with ideal filters. Upsampling
the filter impulse response by 2 (to pass it across the upsampler) leads to a filter
g0 [n] with discrete-time Fourier transform (see Section 2.5.3)
which is π-periodic. It is easy to check that G0 (ejω )G0 (ejω ) is a quarter-band filter.
Similarly, with G1 (ejω ) = G1 (ej2ω ), it is clear that G1 (ejω )G0 (ejω ) is a bandpass
filter with a passband from π/4 to π/2. Figure 4.15 shows the amplitude frequency
responses of the equivalent filters for a three-step division.
(i) (i)
Let us emulate the Haar construction with g0 [n] and g1 [n] which are the
lowpass and bandpass equivalent filters for the cascade of i-banks. In Figures 4.15(c)
(3) (3)
and (d), we have thus the frequency responses of g1 [n] and g0 [n], respectively.
Then, we define ϕ(i) (t) as in (4.4.1). The procedure for obtaining ϕ(i) (t) can be
described by the following two steps:
(a) Associate with g0 [n] a sequence of weighted Dirac pulses spaced 2−i apart.
(i)
(b) Convolve this pulse sequence with an indicator function for the interval [0, 2−i ]
of height 2i/2 (so it is of norm 1).
sin(ω/2i+1 )
Φ(i) (ω) = 2−i/2 G0 (ejω/2 ) e−jω/2
(i) i i+1
.
ω/2i+1
Now,
(i) i−1 ω
G0 (ejω ) = G0 (ejω ) G0 (ej2ω ) · · · G0 (ej2 ). (4.4.4)
We introduce the shorthand
1
M0 (ω) = √ G0 (ejω ). (4.4.5)
2
250 CHAPTER 4
| G1(e jω) |
(a)
ω
π/2 π
ω
π/4 π/2 3π/4 π
(c) | G0(e jω) G0(e j2ω) G1(e j4ω) | (d) | G0(e jω) G0(e j2ω) G0(e j4ω) |
2 2 2 2
2 2
ω ω
π/8 π/4 3π/8 π/2 π/8 π/4 π/2
The important part in (4.4.6) is the product inside the square brackets (the rest
is just a phase factor and the interpolation function). In particular, as i becomes
large, the second part tends toward 1 for any finite ω. Thus, let us consider the
product involving M0 (ω) in (4.4.6). Because of the definitions of M0 (ω) in (4.4.5)
and of G0 (ejω ) following (4.4.2), we get
ω
1 (2l − 12 )2k π ≤ ω ≤ (2l + 12 )2k π, l ∈ Z
M0 =
2k 0 otherwise.
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 251
sin(2π(t − 12 )) sin(π(t − 12 ))
lim ψ (i) (t) = 2 − .
i→∞ 2π(t − 12 ) π(t − 12 )
This is of course the sinc wavelet we had introduced in Section 4.2 (see (4.2.23)).
What we have just seen seems a cumbersome way to rederive a known result.
However, it is an instance of a general construction and some properties can be
readily seen. For example, assuming that the infinite product converges, the scaling
function satisfies (from (4.4.6))
∞
" ω ω ω
Φ(ω) = lim Φ(i) (ω) = M0 = M0 Φ ,
i→∞ 2k 2 2
k=1
That is, the two-scale equation property is implicit in the construction of the iter-
ated function. The key in this construction is the behavior of the infinite product of
the M0 (ω/2k )’s. This leads to the fundamental regularity property of the discrete-
time filters involved, which will be studied below. But first, we formalize the iterated
filter bank construction.
"
i−1 k
(i)
G0 (z) = G0 z 2 , (4.4.9)
k=0
i−1
"
i−2 k
(i)
G1 (z) = G1 (z 2 ) G0 z 2 , i = 1, 2, . . .
k=0
5
This is more restrictive than necessary, but makes the treatment easier.
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 253
(0) (0)
These filters are preceded by upsampling by 2i (note that G0 (z) = G1 (z) = 1).
(i) (i)
Then, associate the discrete-time iterated filters g0 [n], g1 [n] with the continuous-
time functions ϕ(i) (t), ψ (i) (t) as follows:
(i)
ϕ(i) (t) = 2i/2 g0 [n], n/2i ≤ t < n+1
2i
, (4.4.10)
(i)
ψ (i) (t) = 2i/2 g1 [n], n/2i ≤ t < n+1
2i
. (4.4.11)
Note that the elementary interval is divided by 1/2i . This rescaling is necessary
because if the length of the filter g0 [n] is L then the length of the iterated filter
(i)
g0 [n] is
L(i) = (2i − 1)(L − 1) + 1
which will become infinite as i → ∞. Thus, the normalization ensures that the
associated continuous-time function ϕ(i) (t) stays compactly supported (as i → ∞,
ϕ(i) (t) will remain within the interval [0, L − 1]). The factor 2i/2 which multi-
(i) (i)
plies g0 [n] and g1 [n] is necessary to preserve the L2 norm between discrete and
(i)
continuous-time cases. If g0 [n] = 1, then ϕ(i) (t) = 1 as well, since each
(i)
piecewise constant block has norm |g0 [n]|.
In Figure 4.16 we show the graphical function for the first four iterations of a
length-4 filter. This indicates the piecewise constant approximation and the halving
of the interval. √ √
In Fourier domain, using M0 (ω) = G0 (ejω )/ 2 and M1 (ω) = G1 (ejω )/ 2, we
can write (4.4.10) and (4.4.11) as (from (4.4.6))
i
" ω
(i)
Φ (ω) = M0 k Θ(i) (ω),
2
k=1
where
sin(ω/2i+1 )
Θ(i) (ω) = e−jω/2
i+1
,
ω/2i+1
as well as (from (4.4.7)
i
ω " ω
Ψ(i) (ω) = M1 M0 Θ(i) (ω).
2 2k
k=2
A fundamental question is: To what, if anything, do the functions ϕ(i) (t) and ψ (i) (t)
converge as i → ∞? We will proceed by assuming convergence to piecewise smooth
functions in L2 (R):
ϕ(t) = lim ϕ(i) (t), (4.4.12)
i→∞
254 CHAPTER 4
0.8
0.8
0.6
0.6
Amplitude
Amplitude
0.4 0.4
0.2
0.2
0
0
-0.2
0 1 2 0 1 2
Time Time
(a) (b)
0.8 0.8
0.6 0.6
Amplitude
Amplitude
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 1 2 0 1 2
Time Time
(c) (d)
Two-Scale Equation Property Let us show that the scaling function ϕ(t) satisfies
a two-scale equation, as required by (4.2.8). Following (4.4.9), one can write the
equivalent filter after i steps in terms of the equivalent filter after (i − 1) steps as
(i)
(i−1)
g0 [n] = g0 [k] g0 [n − 2i−1 k]. (4.4.16)
k
both for n/2i ≤ t < (n + 1)/2i . Substituting (4.4.17) and (4.4.18) into (4.4.16)
yields √
ϕ(i) (t) = 2 g0 [k] ϕ(i−1) (2t − k). (4.4.19)
k
By assumption, the iterated function ϕ(i) (t) converges to the scaling function ϕ(t).
Hence, take limits on both sides of (4.4.19) to obtain
√
ϕ(t) = 2 g0 [k] ϕ(2t − k), (4.4.20)
k
that is, the limit of the discrete-time iterated filter (4.4.12) satisfies a two-scale
equation. Similarly, √
ψ(t) = 2 g1 [k] ϕ(2t − k).
k
These relations also follow directly from the Fourier-domain expressions Φ(ω) and
Ψ(ω), since, for example, from (4.4.14) we get
∞
" ω ω "
∞ ω
Φ(ω) = M0 = M0 M0
2k 2 2k
k=1 k=2
ω ω 1 ω
= M0 Φ = √ G0 ejω/2 Φ .
2 2 2 2
(a) g0 [k], g1 [k + 2n] = 0, g0 [k], g0 [k + 2n] = g1 [k], g1 [k + 2n] = δ[n], that
is, filters g0 and g1 are orthogonal to each other and their even translates as
given in Section 3.2.3.
√
(b) G0 (z)|z=1 = 2, G0 (z)|z=−1 = 0, that is, the lowpass filter has a zero at
the aliasing frequency π (see also the next section).
(e) The scaling function and the wavelet are given by (4.4.12) and (4.4.13).
In the Haar case, it was shown that the scaling function and the wavelet were
orthogonal to each other. Using appropriate shifts and scales, it was shown that
the wavelets formed an orthonormal set. Here, we demonstrate these relations in
the general case, starting from discrete-time iterated filters. The proof is given only
for the first fact, the others would follow similarly.
P ROPOSITION 4.4 Orthogonality Relations for the Scaling Function and Wavelet
(a) The scaling function is orthogonal to its appropriate translates at a given
scale
ϕ(2m t − n), ϕ(2m t − n ) = 2−m δ[n − n ].
(c) The scaling function is orthogonal to the wavelet and its integer shifts
(d) Wavelets are orthogonal across scales and with respect to shifts
ψ(2m t − n), ψ(2m t − n ) = 2−m−m δ[m − m ] δ[n − n ].
P ROOF
To prove the first fact, we use induction on ϕ(i) and then take the limit (which exists by
assumption). For clarity, this fact will be proven only for scale 0 (scale m would follow
similarly). The first step ϕ(0) (t), ϕ(0) (t − l) = δ[n] is obvious since, by definition, ϕ(0) (t)
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 257
is just the indicator function of the interval [0, 1). For the inductive step, write
√ √
ϕ(i+1) (t), ϕ(i+1) (t − l) = 2 g0 [k] ϕ(i) (2t − k), 2 g0 [m] ϕ(i) (2t − 2l − m)
k m
(i)
= 2 g0 [k] g0 [m] ϕ (2t − k), ϕ(i) (2t − 2l − m)
k m
= g0 [m] g0 [2l + m] = g0 [m], g0 [2l + m] = δ[l],
m
where the orthogonality relations between discrete-time filters, given at the beginning of
this subsection, were used. Taking the limits of both sides of the previous equation, the first
fact is obtained. The proofs of the other facts follow similarly.
is an orthonormal set. The only remaining task is to show that the members of the
set S constitute an orthonormal basis for L2 (R), as stated in the following theorem.
|ψm,n , f |2 = f 2 .
m,n∈Z
Since the proof is rather technical and does not have an immediate intuitive in-
terpretation, an outline is given in Appendix 4.A. For more details, the reader is
referred to [71, 73]. Note that the statement of the theorem is nothing else but the
Parseval’s equality as given by (d) in Theorem 2.4.
4.4.3 Regularity
We have seen that the conditions under which (4.4.12–4.4.13) exist are critical. We
will loosely say that they exist and lead to piecewise smooth functions if the filter
g0 [n] is regular. In other words, a regular filter leads, through iteration, to a scaling
function with some degree of smoothness or regularity.
Given a filter G0 (z) and an iterated filter bank scheme, the limit function ϕ(t)
depends on the behavior of the product
"
i ω
M0 , (4.4.21)
2k
k=1
258 CHAPTER 4
for large i, where M0 (ω) = G0 (ejω )/G0 (1) so that M0 (0) = 1. This normalization
is necessary since otherwise either the product blows up at ω = 0 (if M0 (0) > 1) or
goes to zero (if M0 (0) < 1) which would mean that ϕ(t) is not a lowpass function.
Key questions are: Does the product converge (and in what sense)? If it con-
verges, what are the properties of the limit function (continuity, differentiability,
etc.)? It can be shown that if |M0 (ω)| ≤ 1 and M0 (0) = 1, then we have pointwise
convergence of the infinite product to a limit function Φ(ω) (see Problem 4.12). In
particular, if M0 (ω) corresponds to the normalized lowpass filter in an orthonor-
mal filter bank, then this condition is automatically satisfied. However, pointwise
convergence is not sufficient. To build orthonormal bases we need L2 convergence.
This can be obtained by imposing some additional constraints on M0 (ω). Finally,
beyond mere L2 convergence, we would like to have a limit Φ(ω) corresponding to
a smooth function ϕ(t). This can be achieved with further constraints of M0 (ω).
Note that we will concentrate on the regularity of the lowpass filter, which leads
to the scaling function ϕ(t) in iterated filter bank schemes. The regularity of the
wavelet ψ(t) is equal to that of the scaling function when the filters are of finite
length since ψ(t) is a finite linear combination of ϕ(2t − n).
First, it is instructive to reconsider a few examples. In the case of the perfect
half-band lowpass filter, the limit function associated with the iterated filter con-
verged to sin(πt)/πt in time. Note that this limit function is infinitely differentiable.
In the Haar case, the lowpass filter, after normalization, gives
1 + e−jω
M0 (ω) = ,
2
which converged to the box function, that is, it converged to a function with two
discontinuous points. In other words, the product in (4.4.21) converges to
∞
" ω "∞
1 + e−jω/2
k
sin(ω/2)
M0 k = = e−jω/2 . (4.4.22)
2 2 ω/2
k=1 k=1
For an alternative proof of this formula, see Problem 4.11. Now consider a filter
with impulse response [ 12 , 1, 12 ], that is, the Haar lowpass filter convolved with itself.
The corresponding M0 (ω) is
2
1 + 2e−jω + e−j2ω 1 + e−jω
M0 (ω) = = . (4.4.23)
4 2
The product (4.4.21) can thus be split into two parts; each of which converges to
the Fourier transform of the box function. Therefore, the limit function ϕ(t) is
the convolution of two boxes, or, the hat function. This is a continuous function
and is differentiable except at the points t = 0, 1 and 2. It is easy to see that
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 259
if we have the N th power instead of the square in (4.4.23), the limit function
will be the (N − 1)-time convolution of the box with itself. This function is (N − 1)-
times differentiable (except at integers where it is once less differentiable). These are
the well-known B-spline functions [76, 255] (see also Section 4.3.2). An important
fact to note is that each additional factor (1 + ejω )/2 leads to one more degree of
regularity. That is, zeros at ω = π in the discrete-time filter play an important role.
However, zeros at ω = π are not sufficient to insure regularity. We can see this in
the following counter-example [71]:
"
∞ ω sin(3ω/2)
Φ(ω) = M0 = e−j3ω/2 , (4.4.24)
2k 3ω/2
k=1
which is the Fourier transform of 1/3 times the indicator function of the interval [0, 3]. This
function is clearly not orthogonal to its integer translates, even though every finite iteration
of the graphical function is. That is, (4.2.21) is not satisfied by the limit. Also, while every
finite iteration is of norm 1, the limit is not. Therefore, we have failure of L2 convergence
of the infinite product.
Looking at the time-domain graphical function (see Figure 4.17), it is easy to check
that ϕ(i) (t) takes only the values 0 or 1, and therefore, there is no pointwise convergence
on the interval [0, 3]. Note that ϕ(i) (t) is not of bounded variation as i → ∞. Thus, even
though ϕ(i) (t) and Φ(i) (ω) are valid Fourier transform pairs for any finite i, their limits are
not; since ϕ(t) does not exist while Φ(ω) is given by (4.4.24). This simple example indicates
that the convergence problem is nontrivial.
It is easy to verify that the above example does not meet it since M0 (π/3) = 0.
Another sufficient condition by Daubechies also allows one to impose regularity.
This will be discussed in Proposition 4.7. Necessary and sufficient conditions for
L2 convergence are more involved, and were derived by Cohen [55] and Lawton
[169, 170] (see [73] for a discussion of these conditions).
The next example considers the orthogonal filter family that was derived in
Section 3.2.3. It shows that very different behavior can be obtained within a family.
260 CHAPTER 4
ϕ(1)(t)
1
(a)
t
0 1/2 1 2 3
ϕ(2)(t)
1
(b)
t
0 1/2 1 2 3
ϕ(i)(t)
1
(c) ...
3-1/2i-1
... t
0 1/2i 3
g0 [n] = [cos α0 cos α1 , cos α1 sin α0 , − sin α0 sin α1 , cos α0 sin α1 ]. (4.4.25)
The above example should give an intuition for the notion of regularity. The Haar
filter, leading to a discontinuous function, is less regular than the Daubechies filter.
In the literature, regularity is somewhat loosely defined (continuity in [194], conti-
nuity and differentiability in [181]). As hinted in the spline example, zeros at the
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 261
Pi/2
Amplitude
0.0
α
FIGURE 4.18
1.0 fignew4.4.8
Time 2.0
3.0 Pi/3
Figure 4.18 Iterated orthogonal lowpass filters of length 4 with one zero at
ω = π (or α1 = π/4−α0 ). For α0 = π/3, there are two zeros at π and this leads
to a regular iterated filter of length 4. This corresponds to the Daubechies’
scaling function. The sixth iteration is shown.
aliasing frequency ω = π (or z = −1) play a key role for the regularity of the filter.
First, let us show that a zero at ω = π is necessary for the limit function to exist.
There are several proofs of this result (for example in [92]) and we follow Rioul’s
derivation [239].
(i)
Given a lowpass filter G0 (z) and its iteration G0 (z) (see (4.4.9)), consider the
associated graphical function ϕ(i) (t) (see (4.4.10)).
P ROOF
For the limit of ϕ(i) (t) to exist it is necessary that, as i increases, the even and odd samples
(i)
of g0 [n] tend to the same limit sequence. This limit sequence has an associated limit
function ϕ(2t). Use the fact that (see 4.4.4)
where the subscripts e and o stand for even and odd-indexed samples of g0 [n], respectively.
(i)
We can write the even and odd-indexed samples of g0 [n] in z-transform domain as
(i−1)
G(i)
e (z) = Ge (z) G0 (z),
(i−1)
G(i)
o (z) = Go (z) G0 (z),
262 CHAPTER 4
1.2
0.8
Amplitude
0.6
0.4
0.2
-0.2
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Time
Figure 4.19 Eighth iteration of the filter which fails to converge because of
the absence of an exact zero at ω = π. The filter is a Smith and Barnwell filter
of length 8 [271] (see Table 3.2).
When considering the associated continuous function ϕ(i) (t) and its limit as i goes to infinity,
the left side of the above two equations tends to ϕ(2t). For the right side, note that k is
bounded while n is not. Because the intervals for the interpolation diminish as 1/2i , the
(i−1)
shift by k vanishes as i goes to infinity and g0 [n − k] leads also to ϕ(2t). That is, (4.4.26)
and (4.4.27) become equal and
g0 [2k] ϕ(2t) = g0 [2k + 1] ϕ(2t),
k k
"
i "
i−1
(i−k)
M0 (π2 ) = M0 (π) M0 (2π2(i−k−1) ) = M0 (π),
k=1 k=1
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 263
since M0 (ω) is 2π-periodic and M0 (0) = 1. That is, unless M0 (π) is exactly
zero, there is a nonzero Fourier component at an arbitrary high frequency. This
(i) (i)
indicates that g0 [n] and g0 [2n + 1] will never be the same. This results in highest
frequency “wiggles” in the iterated impulse response. As an example, we show, in
Figure 4.19, the iteration of a filter which is popular in subband coding [271], but
which does not have an exact zero at ω = π. The resulting iterated function has
small wiggles and will not converge. Note that most filters designed for subband
coding have high (but maybe not infinite) attenuation at ω = π, thus the problem
is usually minor.
P ROOF
It is sufficient to show that for a large enough ω, the decay of Φ(ω) is faster than C(1+|ω|)−1 .
This indicates that ϕ(t) will be continuous. Rewrite (4.4.29) as follows:
N
"
∞ ω "∞
1 + ejω/2
k
"
∞ ω
M0 k = R .
k=1
2 k=1
2 k=1
2k
In the above, the first product on the right side is a smoothing part and equals
N
sin(ω/2)
,
ω/2
264 CHAPTER 4
which leads to a decay of the order of C (1 + |ω|)−N . But then, there is the effect of the
remainder R(ω). Recall that |R(0)| = 1. Now, |R(ω)|
5 can
$ be kbounded
% above by 1 + c|ω|, for
some c, and thus |R(ω)| ≤ ec|ω| . Consider now ∞ k=1 R ω/2 , for |ω| < 1. In particular,
∞ !
" ω !! "
∞
! k
sup|ω|<1 !R k ! ≤ ec(ω/2 ) = ecω(1/2+1/4+...) ≤ ec .
2
k=1 k=1
Thus, for |ω| < 1, we have an upper bound. For any ω, |ω| > 1, there exists J ≥ 1 such
that 2J −1 ≤ |ω| < 2J . Therefore, split the infinite product into two parts:
∞ ! J ! ∞ !
" ! ω !! " ! ω !! " ! ω !!
!R k ! = !R k ! !R J k ! .
k=1
2 k=1
2 k=1
2 2
Since |ω| < 2J , we can bound the second product by ec . The first product is smaller than,
or equal to B J . Thus
∞ !
" ! ω !!
!R k ! ≤ B e .
J c
2
k=1
N−1
Now, B < 2 thus,
"
∞ ω
M0 < (1 + |ω|)−1− .
2k
k=1
ω
4π/3 2π 4π 16π/3 6π 8π
| M0(ω/4) |
ω
2π 8π/3 4π 16π/3 6π 8π
| M0(ω/8) |
ω
2π 4π 16π/3 6π 8π
Figure 4.20 Critical frequencies used in Cohen’s fixed point method (the shape
FIGURE
of the Fourier transform is only for 4.20of example).
the sake fignew4.4.11
When evaluating the product (4.4.21), certain critical frequencies will align.
These are fixed points of the mapping ω → 2ω modulo 2π. For example, ω = ±2π/3
266 CHAPTER 4
is a critical frequency. This can be seen in Figure 4.20 where we show M0 (ω/2),
M0 (ω/4) and M0 (ω/8). It is clear from this figure that the absolute value of the
product of M0 (ω/2), M0 (ω/4) and M0 (ω/8) evaluated at ω = 16π/3 is equal to
|M0 (2π/3)|3 . In general
i !
! ! !i
" ω ! ! ! 2π !!
! !! !
! 0 k !!
M = !M 0 ! .
2 ! 3
k=1 i
ω=2 π/3
From this, it is clear that if |M0 (2π/3)| is larger than 1/2, the decay of the Fourier
transform will not be of the order of 1/ω and continuity would be disproved. Be-
cause it involves only certain values of the Fourier transform, the fixed-point method
can be used to test large filters quite easily. For a thorough discussion of the fixed-
point method, we refer to [55, 57].
Another possible method for studying regularity uses L × L matrices corre-
sponding to a length-L filter downsampled by 2 (that is, the rows contain the
filter coefficients but are shifted by 2). By considering a subset of eigenvalues of
these matrices, it is possible to estimate the regularity of the scaling function using
Littlewood-Paley theory (which divides the Fourier domain into dyadic blocks and
uses norms on these dyadic blocks to characterize, for example, continuity). These
methods are quite sophisticated and we refer to [57, 73] for details.
Finally, Rioul [239, 242] derived direct regularity estimates on the iterated filters
which not only give sharp estimates but are quite intuitive. The idea is to consider
(i)
iterated filters g0 [n] and the maximum difference between successive coefficients.
For continuity, it is clear that this difference has to go to zero. The normalization is
now different because we consider the discrete-time sequences directly. Normalizing
G0 (z) such that G0 (1) = 2 and requiring again the necessary condition G0 (−1) = 0,
we have ! !
! (i) (i) !
lim max !g0 [n + 1] − g0 [n]! = 0,
i→∞ n
(i)
where g0 [n] is the usual iterated sequence. For the limit function ϕ(t) to be
continuous, Rioul shows that the convergence has to be uniform in n and that the
following bound has to be satisfied for a positive α:
! !
! (i) !
max !g0 [n + 1] − g0 [n]! ≤ C2−iα .
(i)
where N ≥ 1. Note that R(1) = 1 and that |M0 (ejω )|2 can be written as
ω N
|M0 (ejω )|2 = cos2 |R(ejω )|2 . (4.4.31)
2
Since |R(ejω )|2 = R(ejω ) · R∗ (ejω ) = R(ejω )R(e−jω ), it can be expressed as a
polynomial in cos ω or of sin2 ω/2 = (1 − cos ω)/2. Using the shorthands y =
cos2 (ω/2) and P (1 − y) = |R(ejω )|2 , we can write (4.4.30) using (4.4.31) as
where
P (y) ≥ 0 for y ∈ [0, 1]. (4.4.33)
Suppose that we have a polynomial P (y) satisfying (4.4.32) and (4.4.33) and more-
over
1
supω |R(ejω )| = supy∈[0,1] |P (y)| 2 < 2N −1 .
Then, there exists an orthonormal basis associated with G0 (ejω ), since the iterated
filter will converge to a continuous scaling function (following Proposition 4.7) from
which a wavelet basis can be obtained (Theorem 4.5).
Thus, the problem becomes to find P (y) satisfying (4.4.32) and (4.4.33) fol-
lowed by extracting R(ejω ) as the “root” of P . Daubechies shows [71, 73] that any
polynomial P solving (4.4.32) is of the form
N −1
N −1+j j
P (y) = y + y N Q(y), (4.4.34)
j
j=0
268 CHAPTER 4
Example 4.5
Let us illustrate the construction for the case N = 2. Using (4.4.34) with N = 2, Q = 0,
P (y) = 1 + 2y.
From (4.4.32),
1 jw 1 −jw
|R(ejω )|2 = P (1 − y) = 3 − 2 cos2 (ω/2) = 2 − cos ω = 2 − e − e ,
2 2
1 √ √
|R(ejω )|2 = √ [ejω − (2 − 3)][e−jω − (2 − 3)].
4−2 3
1 √ 1 √ √
R(ejω ) = √ [ejω − (2 − 3)] = [(1 + 3)ejω + 1 − 3]
3−1 2
This filter is√the 4-tap Daubechies’ filter (within a phase shift to make it causal and a scale
factor of 1/ 2). That is, by computing the iterated filters and the associated continuous-
time functions (see (4.4.12)- (4.4.13)), one obtains the D2 wavelet and scaling function as
shown in Figure 4.4. The regularity (continuity) of this filter was discussed after Proposi-
tion 4.7.
4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 269
1.25
1.25
1
1
0.75
0.75
Amplitude
0.5
Amplitude
0.5
0.25
0.25
0
0
-0.25
-0.25
-0.5
-0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Time
Time
(a) (b)
1.25 1.25
1 1
0.75 0.75
Amplitude
Amplitude
0.5 0.5
0.25 0.25
0 0
-0.25 -0.25
-0.5 -0.5
1.0 2.0 3.0 4.0 5.0 6.0 7.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
Time Time
(c) (d)
FIGURE
Figure 4.21 Daubechies’ iterated graphical functions for N = 3, . . . fignew4.4.12
4.21 , 6 (eighth
iteration is plotted and they converge to their corresponding scaling functions).
Their regions of support are from 0 to 2N − 1 and thus only for N = 3, 4, they
are plotted in their entirety. For N = 5, 6, after t = 7.0, their amplitude is
negligible. Recall that the case N = 2 is given in Figure 4.4. (a) N = 3. (b)
N = 4. (c) N = 5. (d) N = 6.
Figure 4.21 gives the iterated graphical functions for N = 3, . . . , 6 (the eighth
iteration is plotted and they converge to their corresponding scaling functions).
Recall that the case N = 2 is given in Figure 4.4. Table 4.2 gives the R(z) functions
for N = 2, . . . , 6, which can be factored into maximally regular filters. The lowpass
filters obtained by a minimum phase factorization are given in Table 4.3. Table 4.4
gives the regularity of the first few Daubechies’ filters.
This concludes our discussion of iterated filter bank constructions leading to
wavelet bases. Other variations are possible by looking at other filter banks such
as biorthogonal filter banks or IIR filter banks. Assuming regularity, they lead to
270 CHAPTER 4
N Coefficients of R(z)
2 2−1 [−1, 4, −1]
−3
3 2 [3, −18, 38, −18, 3]
4 2−4 [−5, 40, −131, 208, −131, 40, −5]
5 2−7 [35, −350, 1520, −3650, 5018, −3650, 1520, −350, 35]
−8
6 2 [−63, 756, −4067, 12768, −25374, 32216, −25374, 12768, −4067, 756, −63]
Table 4.3 First few maximally flat Daubechies’ filters. N is the number of zeros at
ω = π and equals L/2 where L is the length of the filter. The lowpass filter g0 [n]
is given and the highpass filter can be obtained as g1 [n] = (−1)n g0 [−n + 2N − 1].
These are obtained from a minimum phase factorization of P (z) corresponding
to Table 4.2.
biorthogonal wavelet bases with compact support and wavelets with exponential
decay (see Section 4.6 for more details).
N α(N )
2 0.500
3 0.915
4 1.275
5 1.596
6 1.888
where ∞
F [m, n] = ψm,n (t), f (t) = ψm,n (t) f (t)dt. (4.5.2)
−∞
that is, the wavelet series operator is linear. Its proof follows from the linearity of
the inner product.
272 CHAPTER 4
Shift Recall that the Fourier transform has the following shift property: If a signal
and its Fourier transform pair are denoted by f (t) and F (ω) respectively, then the
signal f (t − τ ) will have e−jωτ F (ω) as its Fourier transform (see Section 2.4.2).
Consider now what happens in the wavelet series case. Suppose that the function
and its transform coefficient are denoted by f (t) and F [m, n] respectively. If we
shift the signal by τ , that is, f (t − τ ),
∞
F [m, n] = ψm,n (t) f (t − τ )dt
−∞
∞
= 2−m/2 ψ(2−m t − n + 2−m τ ) f (t)dt.
−∞
For the above to be a coefficient from the original transform F [m, n], one must have
that
2−m τ ∈ Z,
or τ = 2m k, k ∈ Z. Therefore, the wavelet series expansion possesses the following
shift property: If a signal and its transform coefficient are denoted by f (t) and
F [m, n], then the signal f (t − τ ), τ = 2m k, k ∈ Z, will have F [m , n − 2−m τ ], m ≤
m as its transform coefficient, that is,
f (t − 2m k) ←→ F [m , n − 2m−m k], k ∈ Z, ≤ .
M2
f (t) = F [m, n] ψm,n (t),
n∈Z m=−∞
then this signal will possess the weak shift property with respect to the shifts by
2M2 k, that is
Scaling Recall the scaling property of the Fourier transform: If a signal and its
Fourier transform pair are denoted by f (t) and F (ω), then the scaled version of the
signal f (at) will have (1/|a|) · F (ω/a) as its transform (see Section 2.4.2).
The wavelet series expansion F [m, n] of f (t) = f (at), a > 0, is
∞
F [m, n] = ψm,n (t) f (at) dt
−∞
−m
1 ∞ −m/2 2 t
= 2 ψ − n f (t) dt.
a −∞ a
4.5. WAVELET SERIES AND ITS PROPERTIES 273
scale m
m = -2
m = -1
m=0 shift n
m=1
m=2
Figure 4.22 Dyadic sampling of the time-frequency plane in the wavelet series
FIGURE 4.22 fignew4.5.1
expansion. The dots indicate the center of the wavelets ψm,n (t).
Scaling by factors which are not powers of two require reinterpolation. That is,
either one reinterpolates the signal and then takes the wavelet expansion, or some
interpolation of the wavelet series coefficients is made. The former method is more
immediate.
Parseval’s Identity The Parseval’s identity, as seen for the Fourier-type expan-
sions (see Section 2.4), holds for the wavelet series as well. That is, the orthonormal
family {ψm,n } satisfies (see Theorem 4.5)
|ψm,n , f |2 = f 2 , f ∈ L2 (R).
m,n∈Z
scale m scale m
m = -2 m = -2
m = -1 m = -1
t0-n1 t0 t0+n2 shift n shift n
m=0 m=0
m=1 m=1
m=2 m=2
(a) (b)
Figure 4.23 (a) Region of coefficients F [m, n] which will be influenced by the
FIGUREof4.23
value of the function at t0 . (b) Region fignew4.5.2
influence of the Fourier component
F (ω0 ).
Localization One of the reasons why wavelets are so popular is due to their ability
to have good time and frequency localization. We will discuss this next.
Time Localization Suppose that one is interested in the signal around t = t0 . Then
a valid question is: Which values F [m, n] will carry some information about the
signal f (t) at t0 , that is, which region of the (m, n) grid will give information about
f (t0 )?
Suppose a wavelet ψ(t) is compactly supported on the interval [−n1 , n2 ]. Thus,
ψm,0 (t) is supported on [−n1 2m , n2 2m ] and ψm,n (t) is supported on [(−n1 + n)2m ,
(n2 + n)2m ]. Therefore, at scale m, wavelet coefficients with index n satisfying
2−m t0 − n2 ≤ n ≤ 2−m t0 + n1 .
influences F [m0 , n0 ].
4.5. WAVELET SERIES AND ITS PROPERTIES 275
∞
F [m, n] = ψm,n (t) f (t) dt
−∞
1 m/2 ∞
F (ω) Ψ∗ (2m ω) ej2 nω dω.
m
= 2
2π −∞
Now, suppose that a wavelet ψ(t) vanishes in the Fourier domain outside the region
[ωmin , ωmax ].6 At scale m, the support of Ψm,n (ω) will be [ωmin /2m , ωmax /2m ].
Therefore, a frequency component at ω0 influences the wavelet series at scale m if
ωmin ωmax
≤ ω0 ≤ m
2m 2
is satisfied or if the following range of scales is influenced:
ωmin ωmax
log2 ≤ m ≤ log2 .
ω0 ω0
This is shown in Figure 4.23(b). Conversely, given a scale m0 , all frequencies of the
signal between ωmin /2m0 and ωmax /2m0 will influence the expansion at that scale.
1.5
Amplitude
0.5
advantage over the Fourier case, however, in that one can characterize local regular-
ity. Remember that the Fourier transform gives a global characterization only. The
wavelet transform and the wavelet series, because of the fact that high frequency
basis functions become arbitrarily sharp in time, allow one to look at the regular-
ity at a particular location independent of the regularity elsewhere. This property
will be discussed in more detail for the continuous-time wavelet transform in Chap-
ter 5. The basic properties of regularity characterization carry over to the wavelet
series case since it is a sampled version of the continuous wavelet transform, and
since the sampling grid becomes arbitrarily dense at high frequencies (we consider
“well-behaved” functions only, that is, of bounded variation).
In a dual manner, we can make statements about the decay of the wavelet series
coefficients depending on the regularity of the analyzed signal. This gives a way to
quantify the approximation property of the wavelet series expansion for a signal of
a given regularity. Again, the approximation property is local (since regularity is
local).
Note that in all these discussions, one assumes that the wavelet is more regular
than the signal (otherwise, the wavelet’s regularity interferes). Also, because of the
sampling involved in the wavelet series, one might have to go to very fine scales in
order to get good estimates. Therefore, it is easier to use the continuous wavelet
transform or a highly oversampled discrete-time wavelet transform (see Chapter 5
and [73]).
Two-Scale Equation Property The scaling function can be built from itself (see
Figure 4.24). Recall the definition of a multiresolution analysis. The scaling func-
tion ϕ(t) belongs to V0 . However, since V0 ⊂ V−1 , ϕ(t) belongs √ to V−1 as well.
We know that ϕ(t − n) is an orthonormal basis for V0 and thus, 2ϕ(2t − n) is an
orthonormal basis for V−1 . This means that any function from V0 , including ϕ(t),
can be expressed as a linear combination of the basis functions from V−1 , that is,
ϕ(2t − n). This leads to the following two-scale equation
√
ϕ(t) = 2 g0 [n] ϕ(2t − n). (4.5.3)
n
On the other hand, using the same argument for the wavelet ψ(t) ∈ W0 ⊂ V−1 , one
can see that √
ψ(t) = 2 g1 [n] ϕ(2t − n). (4.5.4)
n
These two relations can be expressed in the Fourier domain as
1 ω ω ω
Φ(ω) = √ g0 [n]e−jn(ω/2) Φ = M0 Φ , (4.5.5)
2 n 2 2 2
1 ω ω ω
Ψ(ω) = √ g1 [n]e−jn(ω/2) Φ = M1 Φ . (4.5.6)
2 n 2 2 2
As an illustration, consider the two-scale equation in the case of the Daubechies’
scaling function. Figure 4.24 shows how the D2 scaling function is built using four
scaled and shifted versions of itself.
The functions M0 (ω) and M1 (ω) in (4.5.5) and (4.5.6) are 2π-periodic functions
and correspond to scaled versions of filters g0 [n], g1 [n] (see (4.4.5) and (4.4.8)) which
can be used to build filter banks (see Section 4.5.2 below).
The two-scale equation can also be used as a starting point in constructing a
multiresolution analysis. In other words, instead of starting from an axiomatic
definition
of a multiresolution
analysis, choose ϕ(t) such that (4.5.3) holds, with
|g
n 0 [n]|2 < ∞ and 0 < A ≤
n |Φ(ω + 2πn)| ≤ B < ∞. Then define Vm to be
2
the closed subspace spanned by 2−m/2 ϕ(2−m t − n). All the other axioms follow (an
orthogonalization step is involved if ϕ(t) is not orthogonal to its integer translates).
For more details, refer to [73].
Moment Properties of Wavelets Recall that the lowpass filter g0 [n], in an iter-
ated filter bank scheme, has at least one zero at ω = π and thus, g1 [n] has at least
one zero at ω = 0. Since Φ(0) = 1 (from the normalization of M0 (ω)) it follows
that Ψ(ω) has at least one zero at ω = 0. Therefore,
∞
ψ(t) dt = Ψ(0) = 0,
−∞
278 CHAPTER 4
that is, the first N moments of the wavelet are zero. Besides wavelets constructed
from iterated filter banks, we have seen Meyer’s and Battle-Lemarié wavelets.
Meyer’s wavelet, which is not based on the iteration of a rational function, has
by construction an infinite “zero” at the origin, that is, an infinite number of zero
moments. The Battle-Lemarié wavelet, on the other hand, is based on the N th-
order B-spline function. The orthogonal filter G0 (ejω ) has an (N + 1)th-order zero
at π (see (4.3.18)) and the wavelet thus has N + 1 zero moments.
The importance of zero moments comes from the following fact. Assume a
length L wavelet with N zero moments. Assume that the function f (t) to be
represented by the wavelet series expansion is polynomial of order N − 1 in an
interval [t0 , t1 ]. Then, for sufficiently small scales (such that 2m L < (t1 − t0 )/2) the
wavelet expansion coefficients will automatically vanish in the region corresponding
to [t0 , t1 ] since the inner product with each term of the polynomial will be zero.
Another view is to consider the Taylor expansion of a function around a point t0 ,
f (t0 ) f (t0 ) 2
f (t0 + ) = f (t0 ) + + + ....
1! 2!
The wavelet expansion around t0 now depends only on the terms of degree N and
higher of the Taylor expansion since the terms 0 through N − 1 are zeroed out
because of the N zero moments of the wavelet. If the function is smooth, the
high-order terms of the Taylor expansion are very small. Because the wavelet series
coefficients now depend only on Taylor coefficients of order N and larger, they will
be very small as well.
These approximation features of wavelets with zero moments are important in
approximation of smooth functions and operators and also in signal compression
(see Chapter 7).
Table 4.5 Zero moments, regularity, and decay of various wavelets. α(N )
is a linearly increasing function of N which approaches 0.2075·N for large
N . The Battle-Lemarié wavelet of order N is based on a B-spline of order
N − 1. The Daubechies’ wavelet of order N corresponds to a length-2N
maximally flat orthogonal filter.
decay or decay or
Wavelet # of zero moments regularity r support in support in
time frequency
Haar 1 0 [0,1] 1/ω
Sinc ∞ ∞ 1/t [π, 2π]
Meyer ∞ ∞ 1/poly. [2π/3, 8π/3]
Battle-Lemarié N N exponential 1/ω N
Daubechies N N α(N ) [0, 2N − 1] 1/ω α(N )
The regularity of all the wavelets discussed so far is indicated in Table 4.5. Reg-
ularity r means that the rth derivative exists almost everywhere. The localization
or decay in time and frequency of all these wavelets is also indicated in the table.
Filter Banks Obtained from Wavelets Consider again (4.5.3) and (4.5.4). An
interesting fact is that using the coefficients g0 [n] and g1 [n] for the synthesis lowpass
and highpass filters respectively, one obtains a perfect reconstruction orthonormal
filter bank (as defined in Section 3.2.3). To check the orthonormality conditions
for these filters use the orthonormality conditions of the scaling function and the
wavelet. Thus, start from
ϕ(t + l), ϕ(t + k) = δ[k − l],
or
B C
ϕ(t + l), ϕ(t + k) = g0 [n] ϕ(2t + 2l − n), g0 [m] ϕ(2t + 2k − m)
n m
B C
= g0 [n + 2l] ϕ(2t − n ), g0 [m + 2k] ϕ(2t − m )
n m
1
= g0 [n + 2l] g0 [n + 2k] = δ[l − k],
2
n
that is, the lowpass is orthogonal to its even translates. In a similar fashion, one can
show that the lowpass filter is orthogonal to the highpass and its even translates.
The highpass filter is orthogonal to its even translates as well. That is, {gi [n − 2k]},
i = 0, 1, is an orthonormal set, and it can be used to build an orthogonal filter bank
(see Section 3.2.3).
280 CHAPTER 4
We also assume that the axioms of multiresolution analysis hold. In searching for
projections of f (t) onto V1 and W1 , we use the fact that ϕ(t) and ψ(t) satisfy
two-scale equations. Consider first the projection onto V1 , that is
D E
1 t
f [n] = √ ϕ
(1)
− n , f (t) . (4.5.8)
2 2
√
Because ϕ(t) = 2 k g0 [k] ϕ(2t − k),
1 t
√ ϕ −n = g0 [k] ϕ(t − 2n − k). (4.5.9)
2 2
k
Because of the orthogonality of ϕ(t) with respect to its integer translates, the inner
product in the above equation is equal to δ[l − 2n − k]. Therefore, only the term
with l = 2n − k is kept from the second summation. With a change of variable, we
can write (4.5.10) as
f (1) [n] = g0 [k − 2n] f (0) [k].
k
that is, the coefficients of the projection onto V1 are obtained by filtering f (0) with
g̃0 and downsampling
√ by 2. To calculate the projection onto W1 , we use the fact
that ψ(t) = 2 k g1 [k] · ϕ(2t−k). Calling d(1) [n] the coefficients of the projection
onto W1 , or
1 t
d(1) [n] = √ ψ − n , f (t),
2 2
and using the two-scale equation for ψ(t) as well as the expansion for f (t) given in
(4.5.7), we find, similarly to (4.5.9–4.5.11)
d(1) [n] = g1 [k] f (0) [l] ϕ(t − 2n − k), ϕ(t − l)
k l
= g1 [k] f (0) [l] δ[l − 2n − k]
k l
= g1 [l − 2n] f (0) [l] = g̃1 (2n − l) f (0) [l],
l l
where g̃1 [n] = g1 [−n]. That is, the coefficients of the projection onto W1 are
obtained by filtering f (0) with g̃1 and downsampling by 2, exactly as we obtained the
projection onto V1 using g̃0 . Of course, projections onto V2 and W2 can be obtained
similarly from filtering f (1) and downsampling by 2. Therefore, the projections
onto Wm , m = 1, 2, 3, . . . are obtained from m − 1 filtering with g̃0 [n] followed by
downsampling by 2, as well as a final filtering by g̃1 [n] and downsampling. This
purely discrete-time algorithm to implement the wavelet series expansion is depicted
in Figure 4.25.
A key question is how to obtain an orthogonal projection fˆ(t) onto V0 from an
arbitrary signal f (t) . Because {ϕ(t−n)} is an orthonormal basis for V0 , fˆ(t) equals
fˆ(t) = ϕ(t − n), f (t) ϕ(t − n),
n
and fˆ(t) − f (t) is orthogonal to ϕ(t − n), n ∈ Z. Thus, given an initial signal
f (t), we have to compute the set of inner products f (0) (n) = ϕ(t − n), f (t).
This, unlike the further decomposition which involves only discrete-time processing,
requires continuous-time processing. However, if V0 corresponds to sufficiently fine
resolution compared to the resolution of the input signal f (t), than sampling f (t)
will be sufficient. This follows because ϕ(t) is a lowpass filter with an integral equal
to 1. If f (t) is smooth and ϕ(t) is sufficiently short-lived, then we have
ϕ(t − n), f (t) f (n).
Of course, if V0 is not fine enough, one can start with V−m for m sufficiently large
so that
2 2 ϕ (2m t − n) , f (t) 2−m/2 f (2−m n).
m
282 CHAPTER 4
〈 ϕ 0, n, f〉
g̃ 1 2 〈 ψ 1, n, f〉
g̃ 0 2 g̃ 1 2 〈 ψ 2, n, f〉
g̃ 0
2 g̃ 1 2 〈 ψ 3, n, f〉
g̃ 0 2 〈 ϕ 3, n, f〉
If f (t) has some regularity (for example, it is continuous), there will be a resolution
at which sampling is a good enough approximation of the inner products needed
to begin Mallat’s algorithm. Generalizations of Mallat’s algorithm, which include
more general initial approximation problems, are derived in [261] and [296].
relation is satisfied:
If in addition the family is complete in a given space such as L2 (R), then any
function of the space can be written as
f (t) = ψm,n , f ψ̃m,n (t) (4.6.2)
m n
= ψ̃m,n , f ψm,n (t), (4.6.3)
m n
since ψ and ψ̃ play dual roles. There are various ways to find such biorthogonal
families. For example, one could construct a biorthogonal spline basis by simply
not orthogonalizing the Battle-Lemarié wavelet.
Another approach consists in starting with a biorthogonal filter bank and using
the iterated filter bank method just as in the orthogonal case. Now, both the
analysis and the synthesis filters (which are not just time-reversed versions of each
other) have to be iterated. For example, one can use finite-length linear phase filters
and obtain wavelets with symmetries and compact support (which is impossible in
the orthogonal case).
In a biorthogonal filter bank with analysis/synthesis filters H0 (z), H1 (z), G0 (z),
and G1 (z), perfect reconstruction with FIR filters means that (see (3.2.21))
and
following (3.2.18), where det(Hm (z)) = 2z 2k+1 (we assume noncausal analysis filters
in this discussion). Now, given a polynomial P (z) satisfying P (z) + P (−z) = 2,
we can factor it into P (z) = G0 (z)H0 (z) and use {H0 (z), G0 (z)} as the analy-
sis/synthesis lowpass filters of a biorthogonal perfect reconstruction filter bank (the
highpass filters follow from (4.6.5–4.6.6).
We can iterate such a biorthogonal filter bank on the lowpass channel and find
equivalent iterated filter impulse responses. Note that now, analysis and synthesis
impulse responses are not simply time-reversed versions of each other (as in the
orthogonal case), but are typically very different (since they depend on H0 (z) and
284 CHAPTER 4
(i)
"
i−1
H0 (z) = H0 (z 2k ),
k=0
(i)
"
i−1
G0 (z) = G0 (z 2k ).
k=0
For the associated limit functions to converge, it is necessary that both H0 (z) and
G0 (z) have a zero at z = −1 (see Proposition 4.6). Therefore, following (4.6.4), we
have that
G0 (1) H0 (1) = g0 [n] h0 [n] = 2.
n n
That is, we can “normalize” the filters such that
√
g0 [n] = h0 [n] = 2.
n n
H0 (ejω ) G0 (ejω )
M̃0 (ω) = √ , M0 (ω) = √
2 2
and the associated limit functions
∞
" ω
Φ̃( ω) = M̃0
2k
k=1
"∞ ω
Φ(ω) = M0
2k
k=1
where the former is the scaling function at analysis (within time reversal) and the
latter is the scaling function at synthesis. These two scaling functions can be very
different, as shown in Example 4.6.
Example 4.6
Consider a biorthogonal filter bank with length-4 linear phase filters. This is a one-parameter
family with analysis and synthesis lowpass filters given by (α = ±1):
1
H0 (z) = √ (1 + αz + αz 2 + z 3 ),
2(α + 1)
1
G0 (z) = √ (−1 + αz −1 + αz −2 − z −3 ).
2(α − 1)
4.6. GENERALIZATIONS IN ONE DIMENSION 285
3.0
Amplitude
2.0
1.0
0.0 0.0
α
1.0 -1.0
3.0 -3.0
In Figure 4.26 we show the iteration of the filter H0 (z) for a range of values α. Looking at
the iterated filter for α and −α, one can see that there is no solution having both a regular
analysis and a regular synthesis filter. For example, for α = 3, the analysis filter converges
to a quadratic spline function, while the iterated synthesis filter exhibits fractal behavior
and no regularity.
H1 (ejω ) G1 (ejω )
M̃1 (ω) = √ , M1 (ω) = √ , (4.6.7)
2 2
ω "
∞ ω
ψ̃(ω) = M̃1 M̃0 , (4.6.8)
2 2k
k=2
ω "
∞ ω
ψ(ω) = M1 M0 .
2 2k
k=2
Note that the regularity of the wavelet is the same as that of the scaling func-
tion (we assume FIR filters). Except that we define scaling functions and wavelets
as well as their duals, the construction is analogous to the orthogonal case. The
biorthogonality relation (4.6.1) can be derived similarly to the orthogonal case (see
Proposition 4.4), but using properties of the underlying biorthogonal filter bank
286 CHAPTER 4
2
5
4 1.5
3
1
Amplitude
Amplitude
0.5
1
0 0
-1
-0.5
-2 1 2 3 4 5 6 7
1 2 3 Time
Time
(a) (b)
2
1
1
Amplitude
Amplitude
0.5
-1
-0.5
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time Time
(c) (d)
Figure 4.27 Biorthogonal wavelet bases. The scaling function ϕ(t) isfignew4.6.2
the hat
function or linear spline (shown in Figure 4.12(b)). (a) Biorthogonal scaling
function ϕ̃(t) based on a length-5 filter. (b) Biorthogonal scaling function ϕ̃ (t)
based on a length-9 filter. (c) Wavelet ψ (t) which is piecewise linear. (d) Dual
wavelet ψ̃ (t).
instead [58, 319]. As can be seen in the previous example, a difficult task in design-
ing biorthogonal wavelets is to guarantee simultaneous regularity of the basis and
its dual.8 To illustrate this point further, consider the case when one of the two
wavelet bases is piecewise linear.
8
Regularity of both the wavelet and its dual is not necessary. Actually, they can be very different
and still form a valid biorthogonal expansion.
4.6. GENERALIZATIONS IN ONE DIMENSION 287
(4.6.4) is satisfied, (ii) H0 (−1) = 0, and (iii) ϕ̃(t) has some regularity. First, choose
1 1
H0 (z) = √ (−z 2 + 2z + 6 + 2z −1 − z −2 ) = √ · (1 + z)(1 + z −1 )(−z + 4 − z −1 )
4 2 4 2
(i)
which satisfies (i) and (ii) above. As for regularity, we show the iterated filter H0 (z) in
Figure 4.27(a) leading to an approximation of ϕ̃(t). As can be seen, the dual scaling function
is very “spiky”. Instead, we can take a higher-order analysis lowpass filter, in particular
having more zeros at z = −1. For example, using
1
H0 (z) = √ (1 + z)2 (1 + z −1 )2 (3z 2 − 18z + 38 − 18z −1 + 3z −2 )
64 2
leads to a smoother dual scaling function ϕ̃ (t) as shown in Figure 4.27(b). The wavelet ψ (t)
and its dual ψ̃ (t) are shown in Figure 4.27(c) and (d). Note that both of these examples
are simply a refactorization of the autocorrelation of the Daubechies’ filters for N = 2 and
3, respectively (see Table 4.2).
Given the vastly different behavior of the wavelet and its dual, a natural question
that comes to mind is which of the two decomposition formulas, (4.6.2) or (4.6.3),
should be used. If all wavelet coefficients are used, and we are not worried about
the speed of convergence of the wavelet series, then it does not matter. However, if
approximations are to be used (as in image compression), then the two formulas can
exhibit different behavior. First, zero moments of the analyzing wavelet will tend
to reduce the number of significant wavelet coefficients (see Section 4.5.2) and thus,
one should use the wavelet with many zeros at ω = 0 for the analysis. Since ψ̃(ω)
involves H1 (ejω ) (see (4.6.7–4.6.8)) and H1 (z) is related to G0 (−z), zeros at the
origin for ψ̃(ω) correspond to zeros at ω = π for G0 (ejω ). Thus many zeros at z = −1
in G0 (z) will give the same number of zero moments for ψ̃(ω) and contribute to a
more compact representation of smooth signals. Second, the reconstructed signal
is a linear combination of the synthesis wavelet and its shifts and translates. If
not all coefficients are used in the reconstruction, a subset of wavelets should give
a “close” approximation to the signal and in general, smooth wavelets will give a
better approximation (for example in a perceptual sense for image compression).
Again, smooth wavelets at the synthesis are obtained by having many zeros at
z = −1 in G0 (z). In practice, it turns out that (4.6.2) and (4.6.3) indeed lead to
a different behavior (for example in image compression) and usually, the schemes
having smooth synthesis scaling function and wavelet are preferred [14].
This concludes our brief overview of biorthogonal wavelet constructions based
on filter banks. For more material on this topic, please refer to [58] (which proves
completeness of the biorthogonal basis under certain conditions on the filters), [289]
(which discusses general properties of biorthogonal wavelet bases) and [130, 319]
(which explores further properties of biorthogonal filter banks useful for designing
biorthogonal wavelets).
288 CHAPTER 4
The Daubechies’ and Butterworth maximally flat filters are two extreme cases to
solving for a minimum degree autocorrelation R(z) such that
is satisfied. In the Daubechies’ solution, R(z) has zeros only, while in the Butter-
worth case, R(z) is all-pole. For N ≥ 4, there are intermediate solutions where
R(z) has both poles and zeros and these are described in [130, 133]. The regularity
of the associated wavelets is very close to the Butterworth case and thus, better
than the corresponding Daubechies’ wavelets.
The freedom gained by going from FIR to IIR filters allows the construction of
orthogonal wavelets with symmetries or linear phase; a case excluded in the FIR
or wavelet with compact support case (except for the Haar wavelet). Orthogonal
IIR filter banks having linear phase filters were briefly discussed in Section 3.2.5.
In particular, the example derived in (3.2.81–3.2.82) is relevant for wavelet con-
structions. Take synthesis filters G0 (z) = A(z 2 ) + z −1 A(z −2 ) and G1 (z) = G0 (−z)
(similar to (3.2.81)) and A(z) as the allpass given in (3.2.82). Then
1 (1 + z −1 )(49 − 20z −1 + 198z −2 − 20z −3 + 49z −4 )
G0 (z) = √
2 (15 + 42z −2 + 7z −4 )(7 + 42z −2 + 15z −4 )
has linear phase and five zeros at z = −1. It leads, through iteration, to a smooth,
differentiable scaling function and wavelet with exponential decay (but obviously
noncausal).
Then, we can design different “wavelet” bases based on iterated low and highpass
filters. Let us take a simple example. Consider the following four filters, corre-
sponding to a four-channel filter bank derived from a binary tree:
F0 (z) = G0 (z)G0 (z 2 ) F1 (z) = G0 (z)G1 (z 2 ), (4.6.9)
2 2
F2 (z) = G1 (z)G0 (z ) F3 (z) = G1 (z)G1 (z ). (4.6.10)
This corresponds to an orthogonal filter bank as we had seen in Section 3.3. Call
the impulse responses fi [n]. Then, the following ϕ(t) is a scaling function (with
scale change by 4):
ϕ(t) = 2 f0 [k] ϕ(4t − k).
k
290 CHAPTER 4
1
1
0.8 0.8
Magnitude response
0.6
0.6
Amplitude
0.4
0.2 0.4
0
0.2
-0.2
0
-2 -1 0 1 2 3 4 -9.42 -6.28 -3.14 0.0 3.14 6.28 9.42
Time Frequency [radians]
(a) (b)
1
1
0.8
0.5 Magnitude response
0.6
Amplitude
0.4
-0.5
0.2
-1
0
-2 -1 0 1 2 3 4 -9.42 -6.28 -3.14 0.0 3.14 6.28 9.42
Time Frequency [radians]
(c) (d)
Figure 4.28 Scaling function ϕ(t) and wavelet ψ(t) based on a half-bandfignew4.6.4
digital
Butterworth filter with five zeros at w = π. (a) Scaling function ϕ(t). (b)
Fourier transform magnitude Φ(ω). (c) Wavelet ψ(t). (d) Fourier transform
magnitude Ψ(ω).
Note that ϕ(t) is just the usual scaling function from the iterated two-channel bank,
but now written with respect to a scale change by 4 (which involves the filter f0 [k]).
The following three functions are “wavelets”:
ψi (t) = 2 fi [k]ϕ(4t − k), i ∈ {1, 2, 3}.
k
The set {ϕ(t−k), ψ1 (t−l), ψ2 (t−m), ψ3 (t−n), } is orthonormal and 2j ψi (4j t−li ), i ∈
{1, 2, 3}, li , j ∈ Z is an orthonormal basis for L2 (R) following similar arguments
as in the classic “single” wavelet case (we have simply expanded two successive
wavelet spaces into three spaces spanned by ψi (t), i ∈ {1, 2, 3}). Of course, this is a
simple variation on the normal wavelet case (note that ψ1 (t) is the usual wavelet).
4.6. GENERALIZATIONS IN ONE DIMENSION 291
With these methods and the previously discussed concept of wavelet packets in
Section 3.3.4 it can be seen how to obtain continuous-time wavelet packets. That
is, given any binary tree built with two-channel filter banks, we can associate a set
of “wavelets” with the highpass and bandpass channels. These functions, together
with appropriate scales and shifts will form orthonormal wavelet packet bases for
L2 (R).
The case for general filter banks is very similar [129, 277]. Assume we have a
size-N filter bank with a regular lowpass filter. This filter has to be regular with
respect to downsampling by N (rather than 2), which amounts (in a similar fashion
to Proposition 4.7) to having a sufficient number of zeros at the N th roots of unity
(the aliasing frequencies, see discussion below). The lowpass filter will lead to a
scaling function satisfying
ϕ(t) = N 1/2 g0 [k] ϕ(N t − k).
k
The N − 1 functions
ψi (t) = N 1/2 gi [k]ϕ(N t − k), i = 1, . . . , N − 1,
k
(i)
"
i−1
k
G0 (z) = G0 (z N ), (4.6.11)
k=0
A necessary condition for convergence of the graphical function is that (see Prob-
lem 4.15)
G0 (ej(ω+2πk/N ) ) = 0, k = 1, . . . , N − 1, (4.6.14)
that is, G0 (z) has at least one zero at each of the aliasing frequencies ω = 2πk/N ,
k = 1, . . . , N − 1. Then, using (4.6.14) in (4.6.13), we see that
√
G0 (1) = N .
B = supw∈[0,2π] |R(ω)|.
Then
B < N K−1 (4.6.15)
ensures that the limit ϕ(i) (t) as i → ∞ is continuous (see Problem 4.16).
The design of lowpass filters with a maximum number of zeros at aliasing fre-
quencies (the equivalent of the Daubechies’ filters, but for integer downsampling
larger than 2) is given in [277]. An interesting feature of multichannel wavelet
schemes is that now, orthogonality and compact support are possible simultane-
ously. This follows from the fact that there exist unitary FIR filter banks having
linear phase filters for more than two channels [321]. A detailed exploration of
such filter banks and their use for the design of orthonormal wavelet bases with
symmetries (for example, a four-band filter bank leading to one symmetric scaling
function as well as one symmetric and two antisymmetric wavelets) is done in [275].
The problem with scale changes by N > 2 is that the resolution steps are even
larger between a scale and the next coarser scale than for the typical “octave-band”
4.7. MULTIDIMENSIONAL WAVELETS 293
wavelet analysis. A finer resolution change could be obtained for rational scale
changes between 1 and 2. In discrete time such finer steps can be achieved with
filter banks having rational sampling rates [166]. The situation is more complicated
in continuous time. In particular, the iterated filter bank method does not lead to
wavelets in the same sense as for the integer-band case. Yet, orthonormal bases
can be constructed which have a similar behavior to wavelets [33]. A direct wavelet
construction with rational dilation factors is possible [16] but the coefficients of the
resulting two-scale equation do not correspond to either FIR or IIR filters.
DZ ⊂ Z
|λi | > 1, ∀i.
The first condition requires D to have integer entries, while the second one states
that all the eigenvalues of D must be strictly greater than 1 in order to ensure
dilation in each dimension. For example, in the quincunx case, the matrix DQ
from (3.B.2)
1 1
DQ = , (4.7.1)
1 −1
as well as
1 −1
D Q1 = ,
1 1
are both valid matrices, while
2 1
D Q2 = ,
0 1
is not, since it dilates only one dimension. Matrix DQ from (4.7.1) is a so-called
“symmetry” dilation matrix, used in [163], while DQ1 is termed a “rotation” matrix
used in [57]. As will be seen shortly, although both of these matrices represent
the same lattice, they are fundamentally different when it comes to constructing
wavelets.
For the case obtained as a tensor product, the dilation matrix is diagonal.
Specifically, in two dimensions, it is the matrix DS from (3.B.1)
2 0
Ds = . (4.7.2)
0 2
| det(D)| − 1 = N − 1,
where N represents the downsampling rate of the underlying filter bank. Thus, in
the quincunx case, we have one “mother” wavelet, while in the 2 × 2 separable case
(4.7.2), there are three “mother” wavelets ψ1 , ψ2 , ψ3 .
The two-scale equation is obtained as in the one-dimensional case. For example,
using DQ (we will drop the subscript when there is no risk of confusion)
√
ϕ(t) = 2 g0 [n] ϕ(Dt − n),
n∈Z 2
4.7. MULTIDIMENSIONAL WAVELETS 295
√
ϕ(t1 , t2 ) = 2 g0 [n1 , n2 ] ϕ(t1 + t2 − n1 , t1 − t2 − n2 ).
n1 ,n2 ∈Z
√
We have assumed that g0 [n] = 2.
Note that these regions are not in general rectangular and specifically in this case,
they are squares in even, and diamonds (tilted squares) in odd iterations. Note that
one of the advantages of using the matrix D Q rather than DQ1 , is that it leads
to separable sampling (diagonal matrix) in every other iteration since D2Q = 2I.
The reason why this feature is useful is that one can use certain one-dimensional
results in a separable manner in even iterations. We are again interested in the
limiting behavior of this “graphical” function. Let us first assume that the limit of
ϕ(i) (t1 , t2 ) exists and is in L2 (R2 ) (we will come back later to the conditions under
which it exists). Hence, we define the scaling function as
Once the scaling function exists, the wavelet can be obtained from the two-dimensional
counterpart of (4.2.14). Again, the coefficients used in the two-scale equation and
296 CHAPTER 4
the quincunx version of (4.2.14) are the impulse response coefficients of the low-
pass and highpass filters, respectively. To prove that the wavelet obtained in such
a fashion actually produces an orthonormal basis for L2 (R2 ), one has to demon-
strate various facts. The proofs of the following statements are analogous to the
one-dimensional case (see Proposition 4.4), that is, they rely on the orthogonality
of the underlying filter banks and the two-scale equation property [163]:
(c) ϕ(t), ψ(t−k), the scaling function is orthogonal to the wavelet and its integer
translates.
P ROOF
Following (4.7.3), one can express the equivalent filter after i steps in terms of the equivalent
filter after (i − 1) steps as
(i)
(i−1)
(i−1)
g0 [n] = g0 [k] g0 [n − D i−1 k] = g0 [k] g0 [n − Dk],
k k
4.7. MULTIDIMENSIONAL WAVELETS 297
and thus
(i) (i−1)
g0 [Dn] = g0 [Dk] g0 [n − k].
k
(i−1) (i)
Using (4.7.4) express g0 and g0 in terms of ϕ(i−1) and ϕ(i) and then take the limits
(which we are allowed to do by assumption)
√
ϕ(Dt) = 2 g0 [Dk] ϕ(Dt). (4.7.6)
k
(i)
Doing now the same for g0 [Dn + k1 ] one obtains
√
ϕ(Dt) = 2 g0 [Dn + k1 ] ϕ(Dt). (4.7.7)
k
Now, a single zero at aliasing frequency is in general not sufficient to ensure reg-
ularity. Higher-order zeros have led to regular scaling functions and wavelets, but
the precise relationship is a topic of current research.
1.0 1.0
t2
t2
0.5 0.5
0.0 0.0
0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5
t1 t1
Figure 4.29 (a) The twin-dragon scaling function. The function is 1 in the
white area and 0 otherwise. (b) The twin-dragon wavelet. The function is 1
in the white area, −1 in the black area, and 0 otherwise.
filter. Thus, the corresponding subband schemes would consist of two-tap filters.
The algorithm, when it converges, can be interpreted as the iteration of a lowpass
filter with only two nonzero taps (each equal to one and being in a different coset)
which converges to the characteristic function of some compact set, just as the
one-dimensional Haar filter converged to the indicator function of the unit interval.
A very interesting scaling function is obtained when using the “rotation” matrix
DQ1 from (4.7.1) and points {(0, 0), (1, 0)}, that is, the lowpass filter with g0 [0, 0] =
g0 [1, 0] = 1, and 0 otherwise. Iterating this filter leads to the “twin-dragon” case
[190], as given in Figure 4.29. Note that ϕ(t) = 1 over the white region and 0
otherwise. The wavelet is 1 and −1 in the white/black regions respectively, and 0
otherwise. Note also how the wavelet is formed by two “scaled” scaling functions, as
required by the two-dimensional counterpart of (4.2.9), and how this fractal shape
tiles the space.
Amplitude
2.0
0.0
t_2
FIGURE 4.27
1.0
1.0
fignew4.7.2
t_1
2.0
0.0
Figure 4.30 The sixth iteration of the smallest regular two-dimensional filter.
iteration of this solution. As can be seen from the plot, the function looks contin-
uous, but not differentiable at some points. As a simple check of continuity, the
largest first-order differences of the iterated filter can be computed (in this case,
these differences decrease with an almost constant rate — a good indicator that
the function is continuous [163]). Recently, a method for checking the continuity
was developed [324]. Using this method, it was confirmed that this solution indeed
leads to a continuous scaling function and consequently a continuous wavelet.
This method, however, fails for larger size filters, since imposing a zero of a par-
ticular order means solving a large system of nonlinear equations (in the orthogonal
case). Note, however, that numerical approaches are always possible [162].
(a) Perfect reconstruction is preserved (in order to have a valid subband coding
system).
(b) Zeros at aliasing frequencies are preserved (necessary but not sufficient for
regularity).
We have already discussed how to obtain perfect reconstruction in Section 3.6. Here,
we will concern ourselves only with properties that might be of interest for designing
wavelets. If we used the method of separable polyphase components, an advantage
is that the zeros at aliasing frequencies carry over into multiple dimensions. As we
pointed out in Section 3.6, the disadvantage is that only IIR solutions are possible,
and thus we cannot obtain wavelets with compact support. In the McClellan case,
however, wavelets with compact support are possible, but not orthonormal ones.
For more details on these issues, see [163].
wj(t)
1
2 wj-1(t) wj+1(t)
aj aj+1
L
1
2
(b)
aj aj+1
L
1
2
(c)
Figure 4.31 Relationship among windows for the local cosine bases. (a) Rect-
angular window. All windows are theFIGURE 4.28Smooth window satisfying
same. (b) fignew4.8.1the
power complementary condition. All windows are the same. (c) General case.
We will start with a simple case which, when refined, will lead to what Meyer
calls “Malvar’s wavelets” [193]. Note that, beside this construction, there exists
other orthonormal bases with similar properties [61]. Thus, consider the following
set of basis functions:
2 π 1
ϕj,k (t) = wj (t) cos (k + )(t − aj ) , (4.8.1)
Lj Lj 2
and the basis functions are as in (4.8.1) (see Figure 4.31(b)). Note that here, on
top of cosines overlapping, we have to deal with the windowing of the cosines. To
prove orthogonality, again we will have to demonstrate it only for ϕj,k and ϕj+1,m ,
as well as for ϕj,k and ϕj,m .
By using the same change of variable as in (4.8.3), we obtain that
2 L/2 L L π 1 π 1
ϕj,k , ϕj+1,m = ± w(t+ ) w(t− ) sin (k + )t cos (m + )t dt.
L −L/2 2 2 L 2 L 2
Divide the above integral into three parts: from −L to −L/2, from −L/2 to L/2,
and from L/2 to L. Let us concentrate on the last one. With the change of variable
x = L − t, it becomes
2 L 2 π 1 L π 1 L
w (t) cos (k + )(t + ) cos (m + )(t + ) dt =
L L/2 L 2 2 L 2 2
L/2
2 π 1 3 π 1 3
w (L − x) cos
2
(k + )( L − x) cos (m + )( L − x) dx.
L 0 L 2 2 L 2 2
However, since cos[(π/L)(k + 1/2)((3/2)L − x) = − cos[π/L(k + 1/2)(x + 1/2)], we
can merge this integral to the second one from 0 to L/2. Using the same argument
for the one from −L to −L/2, we finally obtain
2 L/2 2 π 1 L π 1 L
(w (t) + w (L − t)) cos
2
(k + )(t + ) cos (m + )(t + ) dt = 0.
L −L/2 3 41 2 L 2 2 L 2 2
1
We now see why it was important for the window to satisfy the power complemen-
tary condition given in (4.8.4), exactly as in the discrete-time case. Therefore, we
have progressed from a rectangular window to a smooth window.
304 CHAPTER 4
This last condition ensures that the “tails” of the adjacent windows are power
complementary. An example of such a window is taking wj (t) = sin[(π/2)θ((t −
aj + ηj )/(2ηj ))] for |t − aj | ≤ ηj , and wj (t) = cos[(π/2)θ((t − aj+1 + ηj+1 )/ηj+1 )] for
|t − aj+1 | ≤ ηj+1 . Here, θ(t) is the function we used for constructing the Meyer’s
wavelet given in (4.3.1), Section 4.3.1. With these conditions, the set of functions
as in (4.8.1) forms an orthonormal basis for L2 (R). It helps to visualize the above
conditions on the windows as in Figure 4.31(c). Therefore, in this most general
case, the window can go anywhere from length 2L to length L (being a constant
window in this latter case of height 1) and is arbitrary as long as it satisfies the
above three conditions.
Let us see what has been achieved. The time-domain functions are local and
smooth and their Fourier transforms have arbitrary polynomial decay (depending
on the smoothness or differentiability of the window). Thus, the time-bandwidth
product is now finite (unlike in the piecewise Fourier series case), and we have a
local modulated basis with good time-frequency localization.
P ROOF
As mentioned previously, what follows is a brief outline of the proof, for more details, refer
to [71].
4.A. PROOF OF THEOREM 4.5 305
(f) Finally, the right side of the previous equation can be shown to be
lim 2π |Φ(2−N ω)|2 |F (ω)|2 dω = f 2 ,
N→∞
and
|ψjk , f |2 = f 2 ,
k
which completes the proof of the theorem.
306 CHAPTER 4
P ROBLEMS
4.1 Consider the wavelet series expansion of continuous-time signals f (t) and assume ψ(t) is
the Haar wavelet.
(a) Give the expansion coefficients for f (t) = 1, t ∈ [0, 1], and 0 otherwise (that is, the
scaling function ϕ(t)).
(b) Verify that m n |ψm,n , f |2 = 1 (Parseval’s identity for the wavelet series expan-
sion).
(c) Consider f (t) = f (t − 2−i ), where i is a positive integer. Give the range of scales
over which expansion coefficients are different from zero.
√
(d) Same as above, but now f (t) = f (t − 1/ 2).
4.2 Consider a multiresolution analysis and the two-scale equation for ϕ(t) given in (4.2.8).
Assume that {ϕ(t − n)} is an orthonormal basis for V0 . Prove that
4.3 In a multiresolution analysis with a scaling function ϕ(t) satisfying orthonormality to its
integer shifts, consider the two-scale equation (4.2.8). Assume further 0 < |Φ(0)| < ∞ and
that Φ(ω) is continuous in ω = 0.
√
(a) Show that N g0 [n] = 2.
(b) Show that n g0 [2n] = n g0 [2n + 1].
4.4 Consider the Meyer wavelet derived in Section 4.3.1 and given by equation (4.3.5). Prove
(4.3.6). Hint: in every interval [(2k π)/3, (2k+1 π)/3] there are only two “tails” present.
(a) Derive the scaling function and wavelet in this case (in Fourier domain).
(b) Discuss the decay in time of the scaling function and wavelet, and compare it to the
case when θ(x) given in (4.3.2) is used.
(c) Plot (numerically) the scaling function and wavelet.
(b) Given that β (2N+1) (t) = β (N) (t) ∗ β (N) (t), prove that
∞
b(2N+1) [n] = β (N) (t) β (N) (t − n) dt.
∞
4.7 Battle-Lemarié wavelets: Calculate the Battle-Lemarié wavelet for the quadratic spline case
(see (4.3.26–4.3.27)).
and derive Φ(ω), ϕ(t) (use the fact that 1/R(ejω ) is a recursive filter and find the set {αn })
and G0 (ejω ). Indicate also Ψ(ω) in this case.
4.9 Prove that if g(t), the nonorthogonal basis for V0 , has compact support, then D(ω) in (4.3.20)
is a trigonometric polynomial and has a stable (possibly noncausal) spectral factorization.
4.10 Orthogonality relations of Daubechies’ wavelets: Prove Relations (b) and (c) in Proposi-
tion 4.4, namely:
(a) ψ(t − n), ψ(t − n ) = δ[n − n ] (where we skipped the scaling factor for simplicity)
(b) ϕ(t − n), ψ(t − n ) = 0,
"
k
i
pk = ab |b| < 1,
i=0
p = lim pk = a1/(1−b) .
i→∞
(b) In Section 4.4.1, we derived the Haar scaling function as the limit of a graphical
function, showing that it was equal to the indicator function
√ of the unit interval.
Starting from the Haar
√ lowpass filter G0 (z) = (1+z −1 )/ 2 and its normalized version
M0 (ω) = G0 (ejω )/ 2, show that from (4.4.14),
"
∞ sin(ω/2)
Φ(ω) = M0 ω/2k = e−jω/2 .
ω/2
k=1
sin2 (ω/4)
Ψ(ω) = je−jω/2 .
ω/4
where M0 (ω) is 2π-periodic and satisfies M0 (0) = 1 as well as |M0 (ω)| ≤ 1, ω ∈ [−π, π].
(a) Show that the infinite product Φ(i) (ω) converges pointwise to a limit Φ(ω).
√
(b) Show that if M0 (ω) = 1/ 2G0 (e)ω and G0 (e)ω is the lowpass filter in an orthogo-
nal filter bank, then |M0 (ω)| ≤ 1 is automatically satisfied and M0 (0) = 1 implies
M0 (π) = 0.
4.13 Maximally flat Daubechies’ filters: A proof of the closed form formula for the autocorrelation
of the Daubechies’ filter (4.4.34) can be derived as follows (assume Q = 0). Rewrite (4.4.32)
as
1
P (y) = [1 − y N P (1 − y)].
(1 − y)N
Use Taylor series expansion of the first term and the fact that deg[P (y)] < N (which can
be shown using Euclid’s algorithm) to prove (4.4.34).
4.14 Given the Daubechies’ filters in Table 4.2 or 4.3, verify that they satisfy the regularity bound
given in Proposition 4.7. Do they meet higher regularity as well? (you might have to use
alternate factorizations or cascades).
4.15 In an N -channel filter bank, show that at least one zero at all aliasing frequencies 2πk/N ,
k = 1, . . . , N − 1, is necessary for the iterated graphical function to converge. Hint: See the
proof of Proposition 4.6.
4.16 Consider a filter G0 (z) whose impulse response is orthonormal with respect to shifts by N .
Assume G0 (z) as K zeros at each of the aliasing frequencies ω = 2πk/N , k = 1, . . . , N −
1. Consider the iteration of G0 (z) with respect to sampling rate change by N and the
associated graphical function (see (4.6.11–4.6.12)). Prove that the condition given in (4.6.15)
is sufficient to ensure a continuous limit function ϕ(t) = limi→∞ ϕ(i) (t). Hint: The proof is
similar to that of Proposition 4.7.
4.17 Successive interpolation [131]: Given an input signal x[n], we would like to compute an
interpolation by applying upsampling by 2 followed by filtering, and this i times. Assume
that the interpolation filter G(z) is symmetric and has zero phase, or G(z) = g0 + g1 z +
g−1 z −1 + g2 z 2 + g−2 z −2 + . . .
(a) After one step, we would like y (1) [2n] = x[n], while y (1) [2n + 1] is interpolated. What
conditions does that impose on G(z)?
(b) Show that if condition (a) is fulfilled, then after i iterations, we have y (i) [2i n] = x[n]
while other values are interpolated.
PROBLEMS 309
(c) Assume G(z) = 1/2z + 1 + 1/2z −1 . Given some input signal, sketch the output signal
y (i) [n] for some small i.
(d) Assume we associate a continuous-time function y (i) (t) with y (i) [n]:
What can you say about the limit function y (i) (t) as i goes to infinity and G(z) is as
in example (c)? Is the limit function continuous? differentiable?
(e) Consider G(z) to be the autocorrelation of the Daubechies’ filters for N = 2 . . . 6,
that is, the P (z) given in Table 4.2. Does this satisfy condition (a)? For N =
2 . . . 6, consider the limit function y (i) (t) as i goes to infinity and try to establish the
“regularity” of these limit functions (are they continuous, differentiable, etc.?).
4.18 Recursive subdivision schemes: Assume that a function f (t) satisfies a two-scale equation
f (t) = n cn f (2t − n). We can recursively compute f (t) at dyadic rationals with the
following procedure. Start with f (0) (t) = 1, −1/2 ≤ t ≤ 1/2, 0 otherwise. In particular,
f (0) (0) = 1 and f (0) (1) = f (0) (−1) = 0. Then, recursively compute
f (i) (t) = cn f (i−1) (2t − n).
n
In particular, at step i, one can compute the values f (i) (t) at t = 2−i n, n ∈ Z. This will
successively “refine” f (i) (t) to approach the limit f (t), assuming it exists.
(a) Consider this successive refinement for c0 = 1 and c1 = c−1 = 1/2. What is the limit
f (i) (t) as i → ∞?
(b) A similar refinement scheme can be applied to a discrete-time sequence s[n]. Create
a function g (0) (t) = s[n] at t = n. Then, define
n n
g (i) i−1 = g (i−1) ,
2 2i−1
(i) 2n + 1 1 (i−1) n 1 (i−1) n + 1
g = g + g .
2i 2 2i−1 2 2i−1
To what function g(t) does this converge in the limit of i → ∞? This scheme is
sometimes called bilinear interpolation, explain why.
(c) A more elaborate successive refinement scheme is based on the two-scale equation
9 1
f (t) = f (2x) + [f (2x + 1) + f (2x − 1)] − [f (2x + 3) + f (2x − 3)].
16 16
Answer parts (a) and (b) for this scheme. (Note: the limit f (x) has no simple closed
form expression).
4.19 Interpolation filters and functions: A filter with impulse response g[n] is called an interpo-
lation filter with respect to upsampling by 2 if g[2n] = δ[n]. A continuous-time function
f (t) is said to have the interpolation property if f (n) = δ[n]. Examples of such functions
are the sinc and the hat function.
310 CHAPTER 4
(a) Show that if g[n] is an interpolation filter and the graphical function ϕ(i) (t) associ-
ated with the iterated filter g (i) [n] converges pointwise, then the limit ϕ(t) has the
interpolation property.
(b) Show that if g[n] is a finite-length orthogonal lowpass filter, then the only solution
leading to an interpolation filter is the Haar lowpass filter (or variations thereof).
(c) Show that if ϕ(t) has the interpolation property and satisfies a two-scale equation
ϕ(t) = cn ϕ(2t − n),
n
4.20 Assume a continuous scaling function ϕ(t) with decay O(1/t(1+) ),
> 0, satisfying the
two-scale equation
ϕ(t) = cn ϕ(2t − n).
n
Show that n c2n = n c2n+1 = 1 implies that
f (t) = ϕ(t − n) = constant = 0.
n
4.21 Assume a continuous and differentiable function ϕ(t) satisfying a two-scale equation
ϕ(t) = cn ϕ(2t − n)
n
where n c2n = n c2n+1 = 1. Show that ϕ (t) satisfies a two-scale equation and show
this graphically in the case of the hat function (which is differentiable almost everywhere).
4.22 Prove the orthogonality relations for the set of basis functions (4.8.1) in the most general
setting, that is, when the windows wj (t) satisfy conditions (a)–(c) given at the end of Section
4.8.
5
311
312 CHAPTER 5
A similar situation exists in the short-time Fourier transform case (see Sec-
tion 2.6.3). There, the function is represented in terms of shifts and modulates of
a basic window function w(t). As for the wavelet transform, the span of the shift
and frequency parameters leads to a redundant representation, which we denote by
ST F Tf (ω, τ ) where ω and τ stand for frequency and shift, respectively.
Because of the high redundancy in both CW Tf (a, b) and ST F Tf (ω, τ ), it is
possible to discretize the transform parameters and still be able to achieve recon-
struction. In the STFT case, a rectangular grid over the (ω, τ ) plane can be used, of
the form (m · ω0 , n · τ0 ), m, n ∈ Z and with ω0 and τ0 sufficiently small (ω0 τ0 < 2π).
In the wavelet transform case, a hyperbolic grid is used instead (with a dyadic
grid as a special case when scales are powers of 2). That is, the (a, b) plane is
discretized into (±am 0 , n · a0 b0 ). In this manner, large basis functions (when a0
m m
is large) are shifted in large steps, while small basis functions are shifted in small
steps. In order for the sampling of the (a, b) plane to be sufficiently fine, a0 has to
be chosen sufficiently close to 1, and b0 close to 0.
These discretized versions of the continuous transforms are examples of frames,
which can be seen as overcomplete series expansions (a brief review of frames is
given in Section 5.3.2). Reconstruction formulas are possible, but depend on the
sampling density. In general, they require different synthesis functions than analysis
functions, except in a special case, called a tight frame. Then, the frame behaves
just as an orthonormal basis, except that the set of functions used to expand the
signal is redundant and thus the functions are not independent.
An interesting question is the following: Can one discretize the parameters in the
discussed continuous transforms such that the corresponding set of functions is an
orthonormal basis? From Chapter 4, we know that this can be done for the wavelet
case, with a0 = 2, b0 = 1, and an appropriate wavelet (which is a constrained
function). For the STFT, the answer is less obvious and will be investigated in
this chapter. However, as a rule, we can already hint at the fact that when the
sampling is highly redundant (or, the set of functions is highly overcomplete), we
have great freedom in choosing the prototype function. At the other extreme,
when the sampling becomes critical, that is, little or no redundancy exists between
various functions used in the expansion, then possible prototype functions become
very constrained.
Historically, the first instance of a signal representation based on a localized
Fourier transform is the Gabor transform [102], where complex sinusoids are win-
dowed with a Gaussian window. It is also called a short-time Fourier transform and
has been used extensively in speech processing [8, 226]. A continuous wavelet trans-
form was first proposed by Morlet [119, 125], using a modulated Gaussian as the
5.1. CONTINUOUS WAVELET TRANSFORM 313
wavelet (called the Morlet wavelet). Morlet also proposed the inversion formula.1
The discretization of the continuous transforms is related to the theory of frames,
which has been studied in nonharmonic Fourier analysis [89]. Frames of wavelets
and short-time Fourier transforms have been studied by Daubechies [72] and an ex-
cellent treatment can be found in her book [73] as well, to which we refer for more
details. A text that discusses both the continuous wavelet and short-time Fourier
transforms is [108]. Several papers discuss these topics as well [10, 60, 99, 293].
Further discussions and possible applications of the continuous wavelet trans-
form can be found in the work of Mallat and coworkers [182, 183, 184] for singularity
detection, and in [36, 78, 253, 266] for multiscale signal analysis. Representations
involving both scale and modulation are discussed in [185, 291]. Additional material
can also be found in edited volumes on wavelets [51, 65, 251].
The outline of the chapter is as follows: The case of continuous transform
variables is discussed in the first two sections. In Section 5.1 various properties
of the continuous wavelet transform are derived. In particular, the “zooming”
property, which allows one to characterize signals locally, is described. Comparisons
are made with the STFT, which is presented in Section 5.2. Frames of wavelets and
of the STFT are treated in Section 5.3. Tight frames are discussed, as well as the
interplay of redundancy and freedom in the choice of the prototype basis function.
where a, b ∈ R (a = 0), and the normalization ensures that ψa,b (t) = ψ(t) (for
now, we assume that a can be both positive and negative). In the following, we
will assume that the wavelet satisfies the admissibility condition
∞
|Ψ(ω)|2
Cψ = dω < ∞, (5.1.2)
−∞ |ω|
where Ψ(ω) is the Fourier transform of ψ(t). In practice, Ψ(ω) will always have
sufficient decay so that the admissibility condition reduces to the requirement that
1
Morlet proposed the inversion formula based on intuition and numerical evidence. The story
goes that when he showed it to a mathematician for verification, he was told: “This formula, being
so simple, would be known if it were correct...”
314 CHAPTER 5
Because the Fourier transform is zero at the origin and the spectrum decays at high
frequencies, the wavelet has a bandpass behavior. We now normalize the wavelet
so that it has unit energy, or
∞ ∞
1
ψ(t) =
2
|ψ(t)| dt =
2
|Ψ(ω)|2 dω = 1.
−∞ 2π −∞
As a result, ψa,b (t)2 = ψ(t)2 = 1 (see (5.1.1)). The continuous wavelet trans-
form of a function f (t) ∈ L2 (R) is then defined as
∞
∗
CW Tf (a, b) = ψa,b (t)f (t)dt = ψa,b (t), f (t). (5.1.3)
−∞
The function f (t) can be recovered from its transform by the following reconstruc-
tion formula, also called resolution of the identity:
P ROPOSITION 5.1
Given the continuous wavelet transform CW Tf (a, b) of a function f (t) ∈
L2 (R) (see (5.1.3)), the function can be recovered by:
∞ ∞
1 da db
f (t) = CW Tf (a, b) ψa,b (t) , (5.1.4)
Cψ −∞ −∞ a2
where reconstruction is in the L2 sense (that is, the L2 norm of the recon-
struction error is zero). This states that any f (t) from L2 (R) can be written
as a superposition of shifted and dilated wavelets.
P ROOF
In order to simplify the proof, we will assume that ψ(t) ∈ L1 , f (t) ∈ L1 ∩ L2 as well as
F (ω) ∈ L1 (or f (t) is continuous) [108]. First, let us rewrite CW Tf (a, b) in terms of the
Fourier transforms of the wavelet and signal. Note that the Fourier transform of ψa,b (t) is
√
Ψa,b (ω) = ae−jbω Ψ(aω).
According to Parseval’s formula (2.4.11) given in Section 2.4.2, we get from (5.1.3)
∞ ∞
∗ 1
CW Tf (a, b) = ψa,b (t)f (t)dt = Ψ∗ (ω)F (ω)dω
−∞ 2π −∞ a,b
√ ∞
a
= Ψ∗ (aω)F (ω)ejbω dω. (5.1.5)
2π −∞
5.1. CONTINUOUS WAVELET TRANSFORM 315
Note that the last integral is proportional to the inverse Fourier transform of Ψ∗ (aω)F (ω)
as a function of b. Let us now compute the integral over b in (5.1.4), which we call J(a),
∞
J(a) = CW Tf (a, b) ψa,b (t)db,
−∞
The second integral in the above equation equals (with substitution b = (t − b)/a)
∞ ∞
1 t−b
ψa,b (t)ejbω db = √ ψ ejbω db
−∞ a −∞ a
∞
√ √ jωt
= ae jωt
ψ(b )e−jωab db = ae Ψ(aω). (5.1.7)
−∞
Because of the restrictions we imposed on f (t) and ψ(t), we can change the order of inte-
gration. We evaluate (use the change of variable a = aω)
∞ ∞
|Ψ(aω)|2 |Ψ(a )|2
da = da = Cψ , (5.1.9)
−∞ |a| −∞ |a |
that is, this integral is independent of ω, which is the key property that makes it all work.
It follows that (5.1.8) becomes (this is actually the right side of (5.1.4) multiplied by Cψ )
∞
1
F (ω)ejωt Cψ dω = Cψ · f (t),
2π −∞
and thus, the inversion formula (5.1.4) is verified almost everywhere. It also becomes clear
why the admissibility condition (5.1.2) is required (see (5.1.9)).
If we relax the conditions on f (t) and ψ(t), and require only that they belong to
L2 (R), then the inversion formula still holds but the proof requires some finer arguments
[73, 108].
316 CHAPTER 5
For example, (5.1.10) is satisfied if the wavelet is real and admissible in the usual
sense given by (5.1.2).
A generalization of the analysis/synthesis formulas involves two different wave-
lets; ψ1 (t) for analysis and ψ2 (t) for synthesis, respectively. If the two wavelets
satisfy
∞
|Ψ1 (ω)||Ψ2 (ω)|
dω < ∞,
−∞ |ω|
5.1.2 Properties
The continuous wavelet transform possesses a number of properties which we will
derive. Some are closely related to Fourier transform properties (for example, en-
ergy conservation) while others are specific to the CWT (such as the reproducing
kernel). Some of these properties are discussed in [124]. In the proofs we will
assume that ψ(t) is real.
Linearity The linearity of the CWT follows immediately from the linearity of the
inner product.
5.1. CONTINUOUS WAVELET TRANSFORM 317
f (t) f '(t)
Figure 5.1 Shift property of the continuous wavelet transform. A shift of the
function leads to a shift of its wavelet transform. The shading in the (a, b)
plane indicates the region of influence.
Shift Property If f (t) has a continuous wavelet transform given by CW Tf (a, b),
then f (t) = f (t − b ) leads to the following transform:2
CW Tf (a, b) = CW Tf (a, b − b ).
This follows since
∞
1 t−b
CW T (a, b) =
f
ψ f (t − b )dt
|a| −∞ a
∞
1 t + b − b
=
ψ f (t )dt = CW Tf (a, b − b ).
|a| −∞ a
This shift invariance of the continuous transform is to be contrasted with the shift
variance of the discrete-time wavelet series seen in Chapter 4. Figure 5.1 shows the
shift property pictorially.
b
⎛ a 0 b 0⎞
⎜ ------, ------⎟
⎝ s s⎠
t ε/s (a0, b0)
ε
---
s ε
b
ε
a a
(a) (b)
The scaling property is shown in Figure 5.2(a). We chose f (t) such that it has
the same energy as f (t). Note that an elementary square in the CWT of f , with
the upper left corner (a0 , b0 ) and width ε, corresponds to an elementary square
in the CWT of f with the corner point (a0 /s, b0 /s) and width ε/s, as shown in
Figure 5.2(b). That is, assuming a scaling factor greater than 1, energy contained
in a given region of the CWT of f is spread by a factor of s in both dimensions in
the the CWT of f . Therefore, we have an intuitive explanation for the measure
(da db)/a2 used in the reconstruction formula (5.1.4), which weights elementary
squares so that they contribute equal energy.
P ROOF
From (5.1.5) we can write
!√ ∞ !2
∞ ∞
da db ∞ ∞ ! a !
|CW Tf (a, b)|2 = ! Ψ ∗
(aω)F (ω)e jbω
dω ! db da .
a2 ! 2π ! a2
−∞ −∞ −∞ −∞ −∞
∗
Calling now P (ω) = Ψ (aω)F (ω), we obtain that the above integral equals
∞ ∞ ∞ ∞ ∞
da db 1 da
|CW Tf (a, b)|2 2 = | P (ω)ejbω dω|2 db
−∞ −∞ a −∞ −∞ 2π −∞ |a|
5.1. CONTINUOUS WAVELET TRANSFORM 319
∞ ∞
da
= |p(b)|2 db
−∞ −∞ |a|
∞ ∞
1 2 da
= |P (ω)| dω , (5.1.13)
−∞ 2π −∞ |a|
where we have again used Parseval’s formula (2.4.12). Thus, (5.1.13) becomes
∞ ∞ ∞ ∞
1 da 1 |Ψ(aω)|2
|Ψ∗ (aω)|2 |F (ω)|2 dω = |F (ω)|2 da dω. (5.1.14)
−∞ 2π −∞ |a| 2π −∞ −∞ |a|
The second integral is equal to Cψ (see (5.1.9)). Applying Parseval’s formula again, (5.1.14),
and consequently (5.1.13) become
∞ ∞ ∞ ∞
1 da db 1 Cψ
|CW Tf (a, b)|2 = · |F (ω)|2 dω = |f (t)|2 dt,
Cψ −∞ −∞ a2 Cψ 2π −∞ −∞
Again, the importance of the admissibility condition (5.1.2) is evident. Also, the
measure (da db)/a2 used in the transform domain is consistent with our discussion
of the scaling property. Scaling by s while conserving the energy will spread the
wavelet transform by s in both the dimensions a and b, and thus a renormalization
by 1/a2 is necessary.
A generalization of this energy conservation formula involves the inner product
of two functions in time and in wavelet domains. Then, (5.1.12) becomes [73]
∞ ∞
∗ 1 da db
f (t) · g(t)dt = CW Tf∗ (a, b) · CW Tg (a, b) 2 , (5.1.15)
Cψ −∞ −∞ a
that is, the usual inner product of the time-domain functions equals, up to a mul-
tiplicative constant, the inner product of their wavelet transform, but with the
measure (da db)/a2 .
Time Localization Consider a Dirac pulse at time t0 , δ(t − t0 ) and a wavelet ψ(t).
The continuous wavelet transform of the Dirac is
1 t−b 1 t0 − b
CW Tδ (a, b) = √ ψ δ(t − t0 )dt = √ ψ .
a a a a
For a given scale factor a0 , that is, a horizontal line in the wavelet domain, the
transform is equal to the scaled (and normalized) wavelet reversed in time and
centered at the location of the Dirac. Figure 5.3(a) shows this localization for the
320 CHAPTER 5
δ (t - t0) u (t - t0)
1
t t
t0 t0
a0 a0 a a
t 0 – ------ t0 t 0 + ------ t 0 – -----0- t0 t 0 + -----0-
2 2 b 2 2
b
1
----------
a0 a0 a0
1
--- a 0
2
a a
(a) (b)
Figure 5.3 Time localization property, shown for the case of a zero-phase Haar
FIGURE 5.3 fig5.1.3
wavelet. (a) Behavior of f (t) = δ(t − t0 ). The cone of influence has a width of
−1/2
a0 /2 on each side of t0 and the height is a0 . (b) Behavior for f (t) = u(t−t0 ),
that is, the unit-step function. The cone of influence is as in part (a), but the
1/2
height is −1/2a0 .
compactly supported Haar wavelet (with zero phase). It is clear that for small
a’s, the transform “zooms-in” to the Dirac with a very good localization for very
small scales. Figure 5.3(b) shows the case of a step function, which has a similar
localization but a different magnitude behavior. Another example is given in Fig-
ure 5.4 where the transform of a simple synthetic signal with different singularities
is shown.
Frequency Localization For the sake of discussion, we will consider the sinc wavelet,
that is, a perfect bandpass filter. Its magnitude spectrum is 1 for |ω| between π
and 2π. Consider a complex sinusoid of unit magnitude and at frequency ω0 . The
highest-frequency wavelet that
will pass the sinusoid through, has a scale factor
amin = π/ω0 (and a gain of π/ω0 ) while the lowest-frequency
wavelet passing
the sinusoid is for amax = 2π/ω0 (and a gain of 2π/ω0 ). Figure 5.5(a) shows
the various octave-band filters, and Figure 5.5(b) shows the continuous wavelet
transform of a sinusoid using a sinc wavelet.
The frequency resolution using an octave-band filter is limited, especially at
high frequencies. An improvement is obtained by going to narrower bandpass filters
(third of an octave, for example).
Amplitude
3
0
1 2 3 4
Time
(a)
Scale
Time
(b)
Figure 5.4 Continuous wavelet transform of a simple signal using the Haar
wavelet. (a) Signal containing four singularities. (b) Continuous wavelet trans-
FIGURE
form, with small scales toward the front. Note5.? fig5.1.3.new
the different behavior at the
different singularities and the good time localization at small scales.
a
Ψ(ω)
δ (ω − ω0)
1
√amaxΨ(amaxω)
√aminΨ(aminω)
ω
π 2π ω0 ω0 2ω0
--------
2 (a)
b
π/ω0
2π/ω0
a
(b)
transform is able to indicate local regularity within a window, but not more locally.
The wavelet transform, because of the zooming property, will isolate the disconti-
nuity from the rest of the function and the behavior of the wavelet transform in the
neighborhood of the discontinuity will characterize it.
Consider the wavelet transform of a Dirac impulse in Figure 5.3(a) and of a
step function in Figure 5.3(b). In the former case, the absolute value of the wavelet
transform behaves as |a|−1/2 when approaching the Dirac. In the latter case, it is
easy to verify, that the wavelet transform, using a Haar wavelet (with zero phase),
1/2
is equal to a hat function (a triangle) of height −1/2 · a0 and width from t0 − a0 /2
to t0 + a0 /2. Along the line a = a0 , the CWT in 5.3(a) is simply the derivative
of the CWT in 5.3(b). This follows from the fact that the CWT can be written
as a convolution of the signal with a scaled and time-reversed wavelet. From the
differentiation property of the convolution and from the fact that the Dirac is the
derivative of the step function (in the sense of distributions), the result follows. In
Figure 5.4, we saw the different behavior of the continuous wavelet transform for
different singularities, as scale becomes small. A more thorough discussion of the
5.1. CONTINUOUS WAVELET TRANSFORM 323
characterization of local regularity can be found in [73, 183] (see also Problem 5.1).
P ROPOSITION 5.3
If a function F (a, b) belongs to H, that is, it is the wavelet transform of a
function f (t), then F (a, b) satisfies
1 da db
F (a0 , b0 ) = K(a0 , b0 , a, b)F (a, b) 2 , (5.1.16)
Cψ a
where
K(a0 , b0 , a, b) = ψa0 ,b0 , ψa,b ,
is the reproducing kernel.
P ROOF
To prove (5.1.16), note that K(a0 , b0 , a, b) is the complex conjugate of the wavelet transform
of ψa0 ,b0 at (a, b),
K(a0 , b0 , a, b) = CW Tψ∗a0 ,b0 (a, b), (5.1.17)
since ψa0 ,b0 , ψa,b = ψa,b , ψa0 ,b0 ∗ . Since F (a, b) = CW Tf (a, b) by assumption and using
(5.1.17), the right side of (5.1.16) can be written as
∞ ∞
1 da db
K(a0 , b0 , a, b)F (a, b) 2
Cψ −∞ −∞ a
∞ ∞
1 da db
= CW Tψ∗a0 ,b0 (a, b) · CW Tf (a, b) 2
Cψ −∞ −∞ a
= ψa0 ,b0 , f = CW Tf (a0 , b0 ) = F (a0 , b0 ),
Scale
Shift
0.4
FIGURE 5.? 1
fig5.1.5
Magnitude response
0.8
0.2
Amplitude
0.6
0
0.4
-0.2
0.2
-0.4 0
-3 -2 -1 0 1 2 3 2 4 6 8
Time Frequency [radians]
(a) (b)
Figure 5.7 Morlet wavelet. (a) Time domain (real and imaginary parts are
the continuous and dotted graphs, respectively).
FIGURE 5.6 (b) Magnitude spectrum.
fig5.1.3.1
An example of a reproducing kernel, that is, the wavelet transform of itself (the
wavelet is real), is shown in Figure 5.6 for the Haar wavelet. Note that because of
the orthogonality of the wavelet with respect to the dyadic grid, the reproducing
kernel is zero at the dyadic grid points.
1 2
ψ(t) = √ e−jω0 t e−t /2 , (5.1.18)
2π
2 /2
Ψ(ω) = e−(ω−ω0 ) .
√
The factor 1/ 2π in (5.1.18) ensures that ψ(t) = 1. The center frequency ω0 is
usually chosen such that the second maximum of Re{ψ(t)}, t > 0, is half the first
one (at t = 0). This leads to
#
2
ω0 = π = 5.336.
ln 2
It should be noted that this wavelet is not admissible since Ψ(ω)|ω=0 = 0, but its
value at zero frequency is negligible (∼ 7·10−7 ), so it does not present any problem in
practice. The Morlet wavelet can be corrected so that Ψ(0) = 0, but the correction
term is very small. Figure 5.7 shows the Morlet wavelet in time and frequency.
The latter graph shows that the Morlet wavelet is roughly an octave-band filter.
Displays of signal analyses using the continuous-time wavelet transform are often
called scalograms, in contrast to spectrograms which are based on the short-time
Fourier transform.
5.2.1 Properties
In the short-time Fourier transform (STFT) case, the functions used in the expan-
sion are obtained by shifts and modulates of a basic window function w(t)
P ROOF
First, using Parseval’s formula, let us write the STFT in Fourier domain as
∞ ∞
∗ 1
ST F Tf (Ω, τ ) = gΩ,τ (t)f (t)dt = G∗Ω,τ (ω)F (ω) dω, (5.2.3)
−∞ 2π −∞
where
GΩ,τ (ω) = e−j(ω−Ω)τ W (ω − Ω) (5.2.4)
and W (ω) is the Fourier transform of w(t). Using (5.2.4) in (5.2.3), we obtain
1 −jΩτ ∞
ST F Tf (Ω, τ ) = e W ∗ (ω − Ω)F (ω)ejωτ dω
2π −∞
where we used Parseval’s relation. Interchanging the order of integration (it can be shown
that W ∗ (ω − Ω)F (ω) is in L2 (R)), (5.2.5) becomes
∞ ∞ ∞
1 1 1
|F (ω)|2 |W ∗ (ω − Ω)|2 dΩ dω = |F (ω)|2 dω = f (t)2
−∞ 2π 2π −∞ 2π −∞
5.2.2 Examples
Since the STFT is a local Fourier transform, any classic window that is used in
Fourier analysis of signals is a suitable window function. A rectangular window
will have poor frequency localization, so smoother windows are preferred. For
example, a triangular window has a spectrum decaying in 1/ω 2 and is already a
better choice. Smoother windows have been designed for data analysis, such as the
Hanning window [211]:
[1 + cos(2πt/T )]/2 t ∈ [−T /2, T /2],
w(t) =
0 otherwise.
where α controls the width, or spread, in time and β is a normalization factor. Its
Fourier transform W (ω) is given by
#
π −ω2 /4α
W (ω) = β e .
α
Modulates of a Gaussian window (see (5.2.1)) are often called Gabor functions. An
attractive feature of the Gaussian window is that it achieves the best joint time
and frequency localization since it meets the lower bound set by the uncertainty
principle (see Section 2.6.2).
It is interesting to see that Gabor functions and the Morlet wavelet (see (5.1.18),
are related, since they are both modulated Gaussian windows. That is, given a
certain α in (5.2.6) and a certain ω0 in (5.1.18), we have that ψa,0 (t), using the
Morlet wavelet, is (we assume zero time shift for simplicity)
1 2 2
ψa,0 (t) = √ ejω0 t/a e−t /2a ,
2πa
while gω,0 (t), using the Gabor window, is
2
gω,0 (t) = βejωt e−αt ,
328 CHAPTER 5
√ √
that is, they are equal if a = 1/ 2α and ω = ω0 2α. Therefore, there is a fre-
quency and a scale at which the Gabor and wavelet transforms coincide. At others,
the analysis is different since the wavelet transform uses variable-size windows, as
opposed to the fixed-size window of the local Fourier analysis.
This points to a key design question in the STFT, namely the choice of the
window size. Once the window size is chosen, all frequencies will be analyzed
with the same time and frequency resolutions, unlike what happens in the wavelet
transform. In particular, events cannot be resolved if they appear close to each
other (within the window spread).
As far as regularity of functions is concerned, one can use Fourier techniques
which will indicate regularity estimates within a window. However, it will not be
possible to distinguish different behaviors within a window spread. An alternative
is to use STFT’s with multiple window sizes (see [291] for such a generalized STFT).
short-time Fourier transform bases will not be achievable with basis functions being
well localized in time and frequency). On the other hand, wavelet frames are less
restricted and this is one of the reasons behind the excitement that wavelets have
generated over the past few years.
A fair amount of the material in this section follows Daubechies’s book [73]. For
more details and a more rigorous mathematical presentation, the reader is referred
to [73], as well as to [26, 72] for more advanced material.
Δt (ψam
0 ,0
(t)) = am
0 Δt (ψ(t)).
Then, it is obvious that for ψa,b (t) to “cover” the whole axis at a scale a = am0 , the
shift has to be b = nb0 am
0 . Therefore, we choose the following discretization:
a = am m
0 , b = nb0 a0 , m, n ∈ Z, > ∞, > .
scale m
m = -2
m = -1
m=0 shift n
m=1
m=2
(a)
scale m
m=0 shift n
m=1
m=2
(b)
It is also intuitively clear that when a0 is close to one, and b0 is close to zero,
reconstruction should be possible by using the resolution of the identity (since the
double sum will become a close approximation to the double integral used in the
resolution of the identity). Also, as we said earlier, we know that for some choices of
a0 and b0 (such as the dyadic case and orthonormal bases in general), reconstruction
is possible as well. What we want to explore are the cases in between.
Let us now see what is necessary in order to have a stable reconstruction. Intu-
5.3. FRAMES OF WAVELET AND SHORT-TIME FOURIER TRANSFORMS 331
itively, the operator that maps a function f (t) into coefficients ψm,n , f has to be
bounded. That is, if f (t) ∈ L2 (R), then m,n |ψm,n , f |2 has to be finite. Also,
no f (t) with f > 0 should be mapped to 0. These two conditions lead to frame
bounds which guarantee stable reconstruction. Consider the first condition. For
any wavelet with some decay in time and frequency, having zero mean, and any
choice for a0 > 1, b0 > 0, it can be shown that
|ψm,n , f |2 ≤ B f 2 (5.3.1)
m,n
(this just states that the sequence (ψm,n , f )m,n is in l2 (Z 2 ), that is, the sequence is
square-summable [73]). On the other hand, the requirement for stable reconstruc-
tion means that if m,n |ψm,n , f |2 is small, f 2 should be small as well (that
m,n |ψm,n , f | should be “close” to f 2 ). This further means that there
is, 2
should exist α < ∞ such that m,n |ψm,n , f |2 < 1 implies f 2 ≤ α. Take now
−1/2
an arbitrary f and define f˜ = m,n |ψm,n , f |2 f . Then it is obvious that
m,n |ψm,n , f˜| ≤ 1 and consequently, f˜ ≤ α. This is equivalent to
2 2
A f 2 ≤ |ψm,n , f |2 , (5.3.2)
m,n
or, (5.3.2) is equivalent to the stability requirement. Putting (5.3.1) and (5.3.2)
together tells us that a numerically stable reconstruction of f from its transform
(wavelet) coefficients is possible only if
A f 2 ≤ |ψm,n , f |2 ≤ B f 2 .
m,n
If this condition is satisfied, then the family (ψm,n )m,n∈Z constitutes a frame. When
A = B = 1, and |ψm,n | = 1, for all m, n, the family of wavelets is an orthonormal
basis (what we will call a tight frame with a frame bound equal to 1). These notions
will be defined in Section 5.3.2.
Until now, we have seen how the continuous-time wavelet transform can be
discretized and what the conditions on that discretized version are so that a nu-
merically stable reconstruction from (ψm,n , f )m,n is possible. What about the
short-time Fourier transform? As we have seen in Section 5.2, the basis functions
are given by (5.2.1). As before, we would like to be able to reconstruct the function
from the samples taken on a discrete grid. In the same manner as for the wavelet
transform, it is possible to discretize the short-time Fourier transform as follows:
332 CHAPTER 5
In gω,τ (t) = ejωt w(t − τ ) choose ω = mω0 and τ = nt0 , with ω0 , t0 > 0 fixed,
m, n ∈ Z so that
gm,n (t) = ejmω0 t w(t − nt0 ). (5.3.3)
Again, we would like to know whether it is possible to reconstruct a given function
f from its transform coefficients (gm,n , f )m,n in a numerically stable way and
again, the answer is positive provided that gm,n constitute a frame. Then, the
reconstruction formula becomes
gm,n , f g̃m,n = f = g̃m,n , f gm,n ,
m,n m,n
Since for a tight frame j∈J |γj , f |2 = Af 2 , or, j∈J f, γj γj , g = Af, g,
we can say that (at least in the weak sense [73])
1
f = γj , f γj . (5.3.5)
A
j∈J
This gives us an easy way to recover f from its transform coefficients γj , f if the
frame is tight. Note that (5.3.5) with A = 1 gives the usual reconstruction formula
for an orthonormal basis.
A frame, however, (even a tight frame) is not an orthonormal basis; it is a set
of nonindependent vectors, as is shown in the following examples.
Example 5.1
√
Consider√R2 and the redundant set of vectors ϕ0 = [1, 0]T , ϕ1 = [−1/2, 3/2]T and ϕ2 =
[−1/2, − 3/2] (this overcomplete set was briefly discussed in Example 1.1 and shown in
T
3
MMT = I
2
2
2
x= ϕi , x ϕi . (5.3.6)
3 i=0
Note that ϕi = 1, and thus 3/2 is the redundancy factor. Also, in (5.3.6), the dual set is
expansion. However, this set is not unique, because the ϕi ’s
identical to the vectors of the
are linearly dependent. Since 2i=0 ϕi = 0, we can choose
α
ϕ̃i = ϕi +
β
The particular choice of α = β = 0 leads to ϕ̃i = ϕi .4 See Problem 5.5 for a more general
version of this example.
Example 5.2
Consider a two-channel filter bank, as given in Chapter 3, but this time with no downsam-
pling (see Section 3.5.1). Obviously, the output is simply
Suppose now that the two filters G0 (z) and G1 (z) are of unit norm and satisfy
G0 (z)G0 (z −1 ) + G1 (z)G1 (z −1 ) = 2.
Write this in time domain using the impulse responses g0 [n] and g1 [n] and their translates.
The output of the filter h0 [n] = g0 [−n] at time k equals g0 [n−k], x[n] and thus contributes
g0 [n − k], x[n] · g0 [m − k] to the output at time m. A similar relation holds for g1 [n − k].
Therefore, using these relations and (5.3.7), we can write
∞
1
x̂[m] = gi [n − k], x[n] gi [m − k] = 2 · x[m].
k=−∞ i=0
That is, the set {gi [n − k]} , i = 0, 1, and k ∈ Z, forms a tight frame for l2 (Z) with a
redundancy factor R = 2. The redundancy factor indicates the oversampling rate, which is
indeed a factor of two in our two-channel, nondownsampled case. The vectors gi [n−k], k ∈ Z
are not independent; indeed, there are twice as many than what would be needed to uniquely
represent the vectors in l2 (Z). This redundancy, however, allows for more freedom in design
of gi [k − n]. Moreover, the representation is now shift-invariant, unlike in the critically
sampled case.
What about reconstructing with frames that are not tight? Let us define the frame
operator Γ from L2 (R) to l2 (J) as
Since (γj )j∈J constitute a frame, we know from (5.3.4) that Γf 2 ≤ Bf 2 , that is,
Γ is bounded, which means that it is possible to find its adjoint operator Γ∗ . Note
first that the adjoint operator is a mapping from l2 (J) to L2 (R). Then, f, Γ∗ c
is an inner product over L2 (R), while Γf, c is an inner product over l2 (J). The
adjoint operator can be computed from the following relation (see (2.A.2))
f, Γ∗ c = Γf, c = γj , f ∗ cj . (5.3.9)
j∈J
Comparing the left side of (5.3.9) with the right side of (5.3.10), we find the adjoint
operator as
Γ∗ c = cj γj . (5.3.11)
j∈J
5.3. FRAMES OF WAVELET AND SHORT-TIME FOURIER TRANSFORMS 335
Using this adjoint operator, we can express condition (5.3.4) as (I is the identity
operator)
A · I ≤ Γ∗ Γ ≤ B · I, (5.3.13)
from where it follows that Γ∗ Γ is invertible (see Lemma 3.2.2 in [73]). Applying
this inverse (Γ∗ Γ)−1 to the family of vectors γj , leads to another family γ̃j which
also constitutes a frame. The vectors γ̃j are given by
where we have used (5.3.14), (5.3.8) and (5.3.11). Therefore, one can write
γj , f γ̃j = f = γ̃j , f γj . (5.3.15)
j∈J j∈J
The above relation shows how to obtain a reconstruction formula for f from γj , f ,
where the only thing one has to compute is γ̃j = (Γ∗ Γ)−1 γj , given by
∞
2 2
γ̃j = (I − Γ∗ Γ)k γj . (5.3.16)
A+B A+B
k=0
We now sketch a proof of this relation (see [73]) for a rigorous development).
336 CHAPTER 5
P ROOF
If frame bounds A and B are close, that is, if
B
∇ = − 1 1,
A
then (5.3.13) implies that Γ∗ Γ is close to ((A + B)/2)I, or (Γ∗ Γ)−1 is close to (2/(A + B))I.
This further means that the function f can be written as follows:
2
f = γj , f γj + Rf,
A + B j∈J
2
R = I− Γ∗ Γ. (5.3.17)
A+B
and as a result,
B−A ∇
R ≤ = ≤ 1. (5.3.18)
B+A 2+∇
From (5.3.17) and using (5.3.18), (Γ∗ Γ)−1 can be written as (see also (2.A.1))
2 2
∞
(Γ∗ Γ)−1 = (I − R)−1 = Rk ,
A+B A+B
k=0
implying that
2 ∞
2 ∞
2
γ̃j = (Γ∗ Γ)−1 γj = R k γj = (I − Γ∗ Γ)k γj . (5.3.19)
A + B k=0 A + B k=0 A+B
Note that if B/A is close to one, that is, if ∇ is small, then R is close to zero and
convergence in (5.3.19) is fast. If the frame is tight, that is, A = B, and moreover,
if it is an orthonormal basis, that is, A = 1, then R = I and γ̃j = γj .
We have seen, for example, in the wavelet transform case, that to have a numer-
ically stable reconstruction, we require that (ψm,n ) constitute a frame. If (ψm,n ) do
constitute a frame, we found an algorithm to reconstruct f from f, ψm,n , given
by (5.3.15) with γ̃j as in (5.3.16). For this algorithm to work, we have to obtain
estimates of frame bounds.
5.3. FRAMES OF WAVELET AND SHORT-TIME FOURIER TRANSFORMS 337
0.8
0.6
0.4
Amplitude
0.2
-0.2
-0.4
2
Figure 5.9 The Mexican-hat function ψ(t) = (2/31/2 ) π −1/4 (1 − t2 )e−t /2 . The
rotated ψ(t) gives rise to a Mexican hat — thus the name for the function.
words, if ψ(t) is
at all a “reasonable” function (it has some decay in time and
frequency, and ψ(t)dt = 0) then there exists a whole arsenal of a0 and b0 , such
that {ψm,n } constitute a frame. This can be formalized, and we refer to [73] for
more details (Proposition 3.3.2, in particular). In [73], explicit estimates for frame
bounds A, B, as well as possible choices for ψ, a0 , b0 , are given.
Example 5.3
As an example to the previous discussion, consider the so-called Mexican-hat function
2 2
ψ(t) = √ π −1/4 (1 − t2 ) e−t /2 ,
3
given in Figure 5.9. Table 5.1 gives a few values for frame bounds A, B with a0 = 2 and
varying b0 . Note, for example, how for certain values of b0 , the frame is almost tight — a
so-called “snug” frame. The advantage of working with such a frame is that we can use just
the 0th-order term in the reconstruction formula (5.3.16) and still get a good approximation
of f . Another interesting point is that when the frame is almost tight, the frame bounds
(which are close) are inversely proportional to b0 . Since the frame bounds in this case
measure redundancy of the frame, when b0 is halved (twice as many points on the grid),
the frame bounds should double (redundancy increases by two since we have twice as many
functions). Note also how for the value of b0 = 1.50, the ratio B/A increases suddenly.
Actually, for larger values of b0 , the set {ψm,n } is not even a frame any more, since A is not
strictly positive anymore.
b0 A B B/A
0.25 13.091 14.183 1.083
0.50 6.546 7.092 1.083
0.75 4.364 4.728 1.083
1.00 3.223 3.596 1.116
1.25 2.001 3.454 1.726
1.50 0.325 4.221 12.986
−m
implies that ψm,n will be centered around t = am 0 nb0 and around ±a0 ω0 in fre-
quency). This means that the inner product ψm,n , f represents the “information
−m
content” of f near t = am 0 nb0 and near ω± = ±a0 ω0 . If the function f is localized
(most of its energy lies within |t| ≤ T and Ω0 ≤ |ω| ≤ Ω1 ) then only the coeffi-
−m
cients ψm,n , f for which (t, ω) = (am 0 nb0 , ±a0 ω0 ) lies within (or very close) to
[−T, T ] × ([−Ω1 , −Ω0 ] ∪ [Ω0 , Ω1 ]) will be necessary for f to be reconstructed up to
a good approximation. This approximation property is detailed in [73] (Theorem
3.5.1, in particular).
Let us now shift our attention to the short-time Fourier transform frames. As
mentioned before, we need to be able to say something about the frame bounds in
order to compute the dual frame. Then, in a similar fashion to Proposition 5.7,
one can obtain a very interesting result, which states that if gm,n (t) (as in (5.3.3))
constitute a frame for L2 (R) with frame bounds A and B, then
2π
A ≤ g2 ≤ B. (5.3.22)
ω 0 t0
Note how in this case, any tight frame will have a frame bound A = (2π)/(ω0 t0 )
(with g = 1). In particular, an orthonormal basis will require the following to be
true:
ω0 t0 = 2π.
Beware, however, that ω0 t0 = 2π will not imply an orthonormal basis; it just
states that we have “critically” sampled our short-time Fourier transform.5 Note
that in (5.3.22) g does not appear (except g which can always be normalized to 1),
as opposed to (5.3.20), (5.3.21). This is similar to the absence of an admissibility
condition for the continuous-time short-time Fourier transform (see Section 5.2).
On the other hand, we see that ω0 , t0 cannot be arbitrarily chosen. In fact, there
5
In signal processing terms, this corresponds to the Nyquist rate.
340 CHAPTER 5
ω0
ω0t0 = 2π
frames possible, but with bad
time-frequency localization
t0
are no short-time Fourier transform frames for ω0 t0 > 2π. Even more is true: In
order to have good time-frequency localization, we require that ω0 t0 < 2π. The
last remaining case, that of critical sampling, ω0 t0 = 2π, is very interesting. Unlike
for the wavelet frames, it turns out that no critically sampled short-time Fourier
transform frames are possible with good time and frequency localization. Actually,
the following theorem states just that.
T HEOREM 5.8 (Balian-Low)
If the gm,n (t) = ej2πmt w(t −n), m, n ∈ Z constitute a frame for L2 (R), then
either t2 |w(t)|2 dt = ∞ or ω 2 |W (ω)|2 dω = ∞.
For a proof, see [73]. Note that in the statement of the theorem, t0 = 1, ω0 =
2π/t0 = 2π. Thus, in this case (ω0 t0 = 2π), we will necessarily have bad localiza-
tion either in time or in frequency (or possibly both). This theorem has profound
consequences, since it also implies that no good short-time Fourier transform or-
thonormal bases (good meaning with good time and frequency localization) are
achievable (since orthonormal bases are necessarily critically sampled). This is
similar to the discrete-time result we have seen in Chapter 3, Theorem 3.17. The
previous discussion is pictorially represented in Figure 5.10 (after [73]).
A few more remarks about the short-time Fourier transform: First, as in the
wavelet case, it is possible to obtain estimates of the frame bounds A, B. Unlike
the wavelet case, however, the dual frame is always generated by a single function
w̃. To see that, first introduce the shift operator T w(t) = w(t−t0 ) and the operator
5.3. FRAMES OF WAVELET AND SHORT-TIME FOURIER TRANSFORMS 341
λ A B B/A
0.250 3.899 4.101 1.052
0.375 2.500 2.833 1.133
0.500 1.575 2.425 1.539
0.750 0.582 2.089 3.592
0.950 0.092 2.021 22.004
One can easily check that both T and E commute with Γ∗ Γ and thus with (Γ∗ Γ)−1
as well [225]. Then, the dual frame can be found from (5.3.14)
To conclude this section, we will consider an example from [73], the Gaussian
window, where it can be shown how, as oversampling approaches critical sampling,
the dual frame starts to “misbehave.”
Also, since g̃m,n (t) are generated from a single function w̃(t) (see (5.3.23)), we will fix
m = n = 0 and find only w̃(t) from g0,0 (t) = w(t). Then we use (5.3.16) and write
2 ∞
2
w̃(t) = (I − Γ∗ Γ)k w(t). (5.3.24)
A + B k=0 A+B
342 CHAPTER 5
We will use the frame bounds already computed in [73]. Table 5.2 shows these frame bounds
for λ = 0.25, 0.375, 0.5, 0.75, 0.95, or corresponding t0 ∼
= 1.25, 1.53, 1.77, 2.17, 2.44. Each
of these was taken from Table 3.3 in [73] (we took the nearest computed value). Our first
step is to evaluate Γ∗ Γw. From (5.3.12) we know that
Γ∗ Γw = gm,n , wgm,n .
m n
Due to the fast decay of functions, one computes only 10 terms on both sides (yielding a
total of 21 terms in the summation for m and as many for n). Note that for computational
purposes, one has to separate the computations of the real and the imaginary parts. The
iteration is obtained as follows: We start by setting w̃(t) = w0 (t) = w(t). Then for each i,
we compute
2
wi (t) = wi−1 (t) − Γ∗ Γwi−1 (t),
A+B
w̃(t) = w̃(t) + wi (t).
Since the functions decay fast, only 20 iterations were needed in (5.3.24). Figure 5.11 shows
plots of w̃ with λ = 0.25, 0.375, 0.5, 0.75, 0.95, 1. Note how w̃ becomes less and less smooth
as λ increases (oversampling decreases). Even so, for all λ < 1, these dual frames have good
time-frequency localization. On the other hand, for λ = 1, w̃ is not even square-integrable
any more and becomes one of the pathological, Baastians’ functions [18]. Since in this case
A = 0, the dual frame function w̃ has to be computed differently. It is given by [225]
−3/2 t2 /2
2
w̃B (t) = π 7/4 K0 e (−1)n e−π(n+0.5) ,
√
n>|t/ 2π|−0.5
with K0 ≈ 1.854075.
5.3.4 Remarks
This section dealt with overcomplete expansions called frames. Obtained by dis-
cretizing the continuous-time wavelet transform as well as the short-time Fourier
transform, they are used to obtain a numerically stable reconstruction of a function
f from a sequence of its transform coefficients. We have seen that the conditions
on wavelet frames are fairly relaxed, while the short-time Fourier transform frames
suffer from a serious drawback given in the Balian-Low theorem: When critical
sampling is used, it will not be possible to obtain frames with good time and fre-
quency resolutions. As a result, orthonormal short-time Fourier transform bases
are not achievable with basis functions being well localized in time and frequency.
5.3. FRAMES OF WAVELET AND SHORT-TIME FOURIER TRANSFORMS 343
0.2
0.25
0.15
0.2
Amplitude
Amplitude
0.1 0.15
0.1
0.05
0.05
0 0
-10 -5 0 5 10 -10 -5 0 5 10
Time Time
(a) (b)
0.4
0.5
0.3
0.4
0.3
Amplitude
Amplitude
0.2
0.2
0.1 0.1
0
-0.1
-10 -5 0 5 10 -10 -5 0 5 10
Time Time
(c) (d)
0.8 3
0.6 2
0.4 1
Amplitude
Amplitude
0.2 0
0 -1
-0.2 -2
-0.4 -3
(e) (f)
Figure 5.11 The dual frame functions w̃ for ω0 = t0 = (2πλ)1/2 and (a)
λ = 0.25, (b) λ = 0.375, (c) λ = 0.5, (d) λ = 0.75, (e) λ = 0.95, (f) λ = 1.0.
FIGURE 5.11 fig5.3.6
Note how w̃ starts to “misbehave” as λ increases (oversampling decreases). In
fact, for λ = 1, w̃ is not even square-integrable any more (after [73]).
344 CHAPTER 5
P ROBLEMS
5.1 Characterization of local regularity: In Section 5.1.2, we have seen how the continuous wave-
let transform can characterize the local regularity of a function. Take the Haar wavelet for
simplicity.
CW Tf (a, b) a3/2 ,
(a) Give the expression and the graph of its autocorrelation function a(t),
a(t) = ψ(τ )ψ(τ − t)dτ.
(b) Is a(t) continuous? Derivable? What is the decay of the Fourier transform A(ω) as
ω → ±∞?
(a) Choose {H0 (z), H1 (z), G0 (z), G1 (z)} as in an orthogonal two-channel filter bank.
What is y[n] as a function of x[n]? Note: G0 (z) = H0 (z −1 ) and G1 (z) = H1 (z −1 ),
and assume FIR filters.
(b) Given the “energy” of x[n], or x2 , what can you say about x0 2 + x1 2 ? Give
either an exact expression, or bounds.
(c) Assume H0 (z) and G0 (z) are given, how can you find H1 (z), G1 (z) such that y[n] =
x[n]? Calculate the example where
H0 (z) = G0 (z −1 ) = 1 + 2z −1 + z −2 .
Is the solution (H1 (z), G1 (z)) unique? If not, what are the degrees of freedom? Note:
In general, y[n] = x[n − k] would be sufficient, but we concentrate on the zero-delay
case.
PROBLEMS 345
5.5 Consider Example 5.1, and choose N vectors ϕi (N odd) for an expansion of R∈ , where ϕi
is given by
ϕi = [cos(2πi/N ), sin(2πi/N )]T i = 0 . . . N − 1.
Show that the set {ϕ} constitutes a tight frame for R∈ , and give the redundancy factor.
5.6 Show that the set {sinc(t − i/N )}, i ∈ Z and N ∈ N , where
sin(πt)
sinc(t) = ,
πt
forms a tight frame for the space of bandlimited signals (whose Fourier transforms are zero
outside (−π, π). Give the frame bounds and redundancy factor.
5.7 Consider a real m × n matrix M with m > n, rank(m) = n and bounded entries.
(a) Show, given any x ∈ Rn , that there exist real constants A and B such that
(b) Show that M T M is always invertible, and that a possible left inverse of M is given
by
−1
M̃ = M T M MT.
The theme of this chapter is “divide and conquer.” It is the algorithmic counter-
part of the multiresolution approximations seen for signal expansions in Chapters
3 and 4. The idea is simple: To solve a large-size problem, find smaller-size sub-
problems that are easy to solve and combine them efficiently to get the complete
solution. Then, apply the division again to the subproblems and stop only when
the subproblems are trivial.
What we just said in words, is the key to the fast Fourier transform (FFT) algo-
rithm, discussed in Section 6.1. Other computational tasks such as fast convolution
algorithms, have similar solutions.
The reason we are concerned with computational complexity is that the number
of arithmetic operations is often what makes the difference between an impractical
and a useful algorithm. While considerations other than just the raw numbers
of multiplications and additions play an important role as well (such as memory
accesses or communication costs), arithmetic or computational complexity is well
studied for signal processing algorithms, and we will stay with this point of view in
what follows. We will always assume discrete-time data and be mostly concerned
with exact rather than approximate algorithms (that is, algorithms that compute
the exact result in exact arithmetic).
347
348 CHAPTER 6
First, we will review classic digital signal processing algorithms, such as fast
convolutions and fast Fourier transforms. Next, we discuss algorithms for multirate
signal processing, since these are central for filter banks and discrete-time wavelet
series or transforms. Then, algorithms for wavelet series computations are consid-
ered, including methods for the efficient evaluation of iterated filters. Even if the
continuous wavelet transform cannot be evaluated exactly on a digital computer,
approximations are possible, and we study their complexity. We conclude with some
special topics, including FFT-based overlap-add/save fast convolution algorithms
seen as filter banks.
reduces to the product of their transforms. If the sequences are of finite length,
convolution becomes a polynomial product in transform domain. Taking the z-
transform of (6.1.1) and replacing z −1 by x, we obtain
where
C(αi ) = A(αi ) · B(αi ), i = 0, . . . , M + N.
6.1. CLASSIC RESULTS 349
where the first equality holds because the degree of P (x) is larger than that of C(x),
and thus the reduction modulo
5 P (x) does not affect C(x). Factorizing P (x) into
its coprime factors, P (x) = Pi (x), one can separately evaluate
(where Ai (x) and Bi (x) are the residues with respect to Pi (x)) and reconstruct
C(x) from its residues. Note that5 the Cook-Toom algorithm is a particular case of
this algorithm when P (x) equals (x − αi ). The power of the algorithm is that if
P (x) is well chosen and factorized over the rationals, then the Pi (x)’s can be simple
and the reduction operations as well as the reconstruction does not involve much
computational complexity. A classic example is to choose P (x) to be of the form
xL − 1 and to factor over the rationals. The factors, called cyclotomic polynomials
[32], have coefficients {1, 0, −1} up to relatively large L’s. Note that if A(x) and
B(x) are of degree L − 1 or less and we compute
then we obtain the circular, or, cyclic convolution of the sequences a[n] and b[n]:
L−1
c[n] = a[k]b[(n − k) mod L].
k=0
where WL = e−j 2π/L . For any polynomial Q(x), it can be verified that
A(x) B(x)
Reduction Reduction
Modulo Modulo
Pi(x) Pi(x)
Modulo Pi(x)
Chinese Remainder
Theorem reconstruction
from residues
C(x)
FIGURE 6.1
Figure 6.1 Generic fast convolution algorithms. The product C(x) = A(x) · fig6.1
B(x) is evaluated
5 modulo P (x). Particular cases are the Cook-Toom algorithm
5
with P (x) = (x − αi ) and Fourier-domain computation with P (x) = (x −
WLi ) where WL is the Lth root of unity.
Therefore, reducing A(x) and B(x) modulo the various factors of xL − 1 amounts
to computing
Ai (x) = A(WLi ),
Bi (x) = B(WLi ), i = 0, . . . , L − 1,
which, according to (2.4.43), is simply taking the length-L discrete Fourier trans-
form of the sequences a[n] and b[n]. Then
The reconstruction is simply the inverse Fourier transform. Of course, this is the
convolution theorem of the Fourier transform, but it is seen as a particular case of
either Lagrange interpolation or of the Chinese Remainder theorem.
In conclusion, we have seen three convolution algorithms and they all had the
generic structure shown in Figure 6.1. First, there is a reduction of the two poly-
nomials involved, then there is a product in the residue domain (which is only a
pointwise multiplication if the reduction is modulo first degree polynomials as in
the Fourier case) and finally, a reconstruction step concludes the algorithm.
352 CHAPTER 6
N −1
X[k] = x[n] · WNnk , WNnk = e−j 2π/N . (6.1.5)
n=0
c = F −1 · Λ · F · a,
The Cooley-Tukey FFT Algorithm Assume that the length of the Fourier trans-
form is a composite number, N = N1 · N2 . Perform the following change of variable
in (6.1.5):
n = N2 · n 1 + n 2 , ni = 0, . . . , Ni − 1,
(6.1.6)
k = k1 + N1 · k2 , ki = 0, . . . , Ni − 1.
Then (6.1.5) becomes
1 −1 N
N 2 −1
(N n +n2 )(k1 +N1 k2 )
2 1
X[k1 + N1 k2 ] = x[N2 n1 + n2 ]WN1 N 2
. (6.1.7)
n1 =0 n2 =0
6.1. CLASSIC RESULTS 353
We recognize:
If N1 and N2 are themselves composite, one can iterate the algorithm. In particular,
if N = 2l and choosing N1 = 2, N2 = N/2, (6.1.8) becomes
2 −1
N
n2 k 2
X[2k2 ] = WN/2 · (x[n2 ] + x [n2 + N/2] ) ,
n2 =0
2 −1
N
n2 k 2 F n2 G
X[2k2 + 1] = WN/2 · WN · (x[n2 ] − x [n2 + N/2] ) .
n2 =0
Thus, at the cost of N/2 complex multiplications (by WNn21 N2 ) we have reduced the
complexity of a size-N DFT to two size-(N/2) DFT’s. Iterating log2 N − 1 times
leads to trivial size-2 DFT’s and thus, the complexity is of order N log2 N . Such
an algorithm is called a radix-2 FFT and is very popular due to its simplicity and
good performance.
The Good-Thomas or Prime Factor FFT Algorithm When performing the index
mapping in the Cooley-Tukey FFT (see (6.1.6)), we did not require anything except
that N had to be composite. If the factors N1 and N2 are coprime, a more powerful
mapping based on the Chinese Remainder Theorem can be used [32]. The major
difference is that such a mapping avoids the N/2 complex multiplications present in
the “middle” of the Cooley-Tukey FFT, thus mapping a length-(N1 N2 ) DFT (N1
and N2 being coprime) into:
Rader’s FFT When the length of a Fourier transform is a prime number p, then
there exists a permutation of the input and output such that the problem becomes
a circular convolution of size p − 1 (and some auxiliary additions for the frequency
zero which is treated separately). While the details are somewhat involved, Rader’s
method shows that prime-length Fourier transforms can be solved as convolutions
and efficient algorithms will be in the generic form we saw in Section 6.1.1 (see the
example in (6.1.4)). That is, the Fourier transform matrix F can be written as
F = CM D, (6.1.9)
where C and D are matrices of output and input additions (which are rectangular)
and M is a diagonal matrix containing of the order of 2N multiplications.
The Winograd FFT Algorithm We saw that the Good-Thomas FFT mapped a
size-(N1 N2 ) Fourier transform into a two-dimensional Fourier transform. Using
Kronecker products [32] (see (2.3.2)), we can thus write
F N1 ·N2 = F N1 ⊗ F N2 . (6.1.10)
F N1 ⊗ F N2 = (C 1 · M 1 · D1 ) ⊗ (C 2 · M 2 · D2 )
= (C 1 ⊗ C 2 ) · (M 1 ⊗ M 2 ) · (D1 ⊗ D2 ).
Since the size of M 1 ⊗M 2 is of the order of (2N1 )·(2N2 ), we see that the complexity
is roughly 4N multiplications. In general, instead of the N log N behavior of the
Cooley-Tukey FFT, the Winograd FFT has a C(N ) · N behavior, where C(N ) is
slowly growing with N . For example, for N = 1008 = 7 · 9 · 16, the Winograd
FFT uses 3548 multiplications, while for N = 1024 = 210 , the split-radix FFT
[90] uses 7172 multiplications. Despite the computational advantage, the complex
structure of the Winograd FFT has lead to mixed success in implementations and
the Cooley-Tukey FFT is still the most popular fast implementation of Fourier
transforms.
6.1. CLASSIC RESULTS 355
N −1
2π(2n + 1)k
X[k] = x[n] cos . (6.1.11)
4N
n=0
√
x [n] = x[2n],
N
x [N − n − 1] = x[2n + 1], n = 0, . . . , − 1, (6.1.12)
2
transforms (6.1.11) into
N −1
2π(4n + 1)k
X[k] = x [n] cos .
4N
n=0
This can be related to the DFT of x [n], denoted by X [k], in the following manner:
2πk 2πk
X[k] = cos Re[X [k]] − sin Im[X [k] ].
4N 4N
Evaluating X[k] and X[N − k − 1] at the same time, it is easy to see that they
follow from X [k] with a rotation by 2πk/4N [322]. Therefore, the length-N DCT
on a real vector has been mapped into a permutation (6.1.12), a Fourier transform
of length-N and a set of N/2 rotations. Since the Fourier transform on a real vector
takes half the complexity of a general Fourier transform [209], this is a very efficient
way to compute DCT’s. While there exist “direct” algorithms, it turns out that
mapping it into a Fourier transform problem is just as efficient and much easier.
B0(x)
(b) B(x) 2 A0(x)
+ C0(x)
D 2 A1(x)
xB1(x)
This is equivalent to filtering the two independent signals B0 (x) and B1 (x) by the
half-length filters A1 (x) and A0 (x) (see Figure 6.2). Because of the independence,
the complexity of the two polynomial products in (6.1.13) adds up. Assuming A(x)
and B(x) are of odd degree 2M − 1 and 2N − 1, then we have to evaluate two
products between polynomials of degree M − 1 and N − 1, which takes at least
2(M + N − 1) multiplications. This is almost as much as the lower bound for
the full polynomial product (which is 2(M + N ) − 1 multiplications). If an FFT-
based convolution is used, we get some improvement. Assuming that an FFT takes
C · L · log2 L operations,1 it takes 2 · C · L · log2 L + L operations to perform a
length-L circular convolution (the transform of the filter is precomputed). Assume
a length-N input and a length-N filter and use a length-2N FFT. Direct convo-
lution therefore takes 4 · C · N · (log2 N + 1) + 2N operations. The computation
of (6.1.13) requires two FFT’s of size N (for B0 (x) and B1 (x)), 2N operations for
the frequency-domain convolution, and a size-N inverse FFT to recuperate C0 (x),
that is, a total of 3 · C · N · log2 N + 2N . This is a saving of roughly 25% over the
nondownsampled convolution.
1
C is a small constant which depends on the particular length and FFT algorithm. For example,
the split-radix FFT of a real signal of length N = 2n requires 2n−1 (n − 3) + 2 real multiplications
and 2n−1 (3n − 5) + 4 real additions [90].
6.1. CLASSIC RESULTS 357
where B(x) is the input and A(x) the interpolation filter. Writing A(x) =
A0 (x2 ) + x · A1 (x2 ), the efficient way to compute (6.1.14) is
that is, two polynomial products where each of the terms is approximately of half
size, since B(x2 ) · A0 (x2 ) can be computed as B(x) · A0 (x) and then upsampled
(similarly for B(x2 ) · A1 (x2 )). That this problem seems very similar to filtering
and downsampling is no surprise, since they are duals of each other. If one writes
the matrix that represents convolution by a[n] and downsampling by two, then its
transpose represents upsampling by two followed by interpolation with ã[n] (where
ã[n] is the time-reversed version of a[n]). This is shown in a simple three-tap filter
example below
⎛ ⎞
..
⎜ . 0 0 ⎟
⎛ ⎞T ⎜ a[0] a[2] 0 ⎟
. . . a[0] 0 0 ... ... ⎜ ⎟
⎜ 0 a[1] 0 ⎟
⎝ 0 a[2] a[1] a[0] 0 0 ⎠ = ⎜⎜ ⎟.
⎜ 0 a[0] a[2] ⎟ ⎟
... 0 0 a[2] a[1] a[0] ⎜ 0 ⎟
⎝ 0 a[1] ⎠
..
. 0 a[0]
The block diagram of an efficient implementation of upsampling and interpolation
is thus simply the transpose of the diagram in Figure 6.2. Both systems have the
same complexity, since they require the implementation of two half-length filters
(A0 (x) and A1 (x)) in the downsampled domain.
Of course, upsampling by an arbitrary factor K followed by interpolation can
be implemented by K small filters followed by upsampling, shifts, and summation.
358 CHAPTER 6
This property has been used to design very sharp filters with low complexity in
[236]. While the complexity remains bounded, the delay does not. If the first block
contributes a delay D, the second will produce a delay 2D and the ith block a delay
2i−1 D. That is, the total delay becomes
This large delay is a serious drawback, especially for real-time applications such as
speech coding.
Efficient Filtering Using Multirate Signal Processing One very useful applica-
tion of multirate techniques to discrete-time signal processing has been the efficient
computation of narrow-band filters. There are two basic ideas behind the method.
First, the output of a lowpass filter can be downsampled, and thus, not all outputs
have to be computed. Second, a very long narrow-band filter can be factorized into
a cascade of several shorter ones and each of these can be downsampled as well.
We will show the technique on a simple example, and refer to [67] for an in-depth
treatment.
Example 6.2
Assume we desire a lowpass filter with a cutoff frequency π/12. Because of this cutoff
frequency, we can downsample the output, say by 8. Instead of a direct implementation, we
build a cascade of three filters with a cutoff frequency π/3, each downsampled by two. We
6.1. CLASSIC RESULTS 359
|H(e j4ω)|
|H(e j2ω)|
|H(e jω)|
ω
π π π π π π 2π
----- -- -- -- 3 --
12 6 3 2 2
(a)
|He(e jω)|
ω
π π 2π
-----
12
(b)
FIGURE 2.9 fig2.4.4
call such a filter a third-band filter. Using the interchange of downsampling and filtering
property, we get an equivalent filter with a z-transform:
where H(z) is the z-transform of the third-band lowpass filter. The spectral responses of
H(ejω ), H(ej2ω ), and H(ej4ω ) are shown in Figure 6.4(a) and their product, Hequiv (z), is
shown in Figure 6.4(b), showing that a π/12 lowpass filter is realized. Note that its length
is approximately equal to L + 2L + 4L = 7L, where L is the length of the filter with the
cutoff frequency π/3.
If the filtered signal is needed at the full sampling rate, one can use upsampling
and interpolation filtering and the same trick can be applied to that filter as well.
Because of the cascade of shorter filters, and the fact that each stage is downsam-
pled, it is clear that substantial savings in computational complexity are obtained.
How this technique can be used to derive arbitrary sharp filters while keeping the
complexity bounded is shown in [236].
360 CHAPTER 6
(c) 2 · C · L · log2 L operations for the inverse FFT’s to get Y0 (z) and Y1 (z),
where we assumed that the transforms of the polyphase filters were precomputed.
That is, the Fourier-domain evaluation requires
4 · C · L · log2 L + 4N operations,
which is of the same order as Fourier-domain computation of a length-L filter
convolved with a length-L signal.
In [245], a precise analysis is made involving FFT’s with optimized lengths so
as to minimize the operation count. Using the split-radix FFT algorithm [90], the
number of operations (multiplications plus additions/sample) becomes (for large
L)
4 log 2 L + O(log log L), (6.2.2)
6.2. COMPLEXITY OF DISCRETE BASES COMPUTATION 361
Classic QMF Solution The classic QMF solution given in (3.2.34)-(3.2.35) (see
Figure 6.5(a)), besides using even-length linear phase filters, forces the highpass
filter to be equal to the lowpass, modulated by (−1)n . The polyphase matrix is
therefore:
H0 (z) H1 (z) 1 1 H0 (z) 0
H p (z) = = · ,
H0 (z) −H1 (z) 1 −1 0 H1 (z)
where H0 and H1 are the polyphase components of the prototype filter H(z). The
factorized form on the right indicates that the complexity is halved, and an obvious
362 CHAPTER 6
H(z) 2 2 H0(z) +
2
−+
H(-z) z−1 2 H1(z)
(a) (b)
Figure 6.5 Classic QMF filter bank. (a) Initial filter bank. (b) Efficient
implementation using polyphase components and a butterfly.
FIGURE 6.4 fig6.5
implementation is shown in Figure 6.5(b). Recall that this scheme only approxi-
mates perfect reconstruction when using FIR filters.
Orthogonal Filter Banks As seen in Section 3.2.4, orthogonal filter banks have
strong structural properties. In particular, because the highpass is the time-reversed
version of the lowpass filter modulated by (−1)n , the polyphase matrix has the
following form:
H00 (z) H01 (z)
H p (z) = , (6.2.3)
−H˜01 (z) H˜00 (z)
where H˜00 (z) and H˜01 (z) are time-reversed versions of H00 (z) and H01 (z), and
H00 (z) and H01 (z) are the two polyphase components of the lowpass filter. If
H00 (z) and H01 (z) were of degree zero, it is clear that the matrix in (6.2.3) would
be a rotation matrix, which can be implemented with three multiplications. It turns
out that for arbitrary degree polyphase components, terms can still be gathered into
rotations, saving 25% of multiplications (at the cost of 25% more additions) [104].
This rotation property is more obvious in the lattice structure form of orthogonal
filter banks [310]. We recall that the two-channel lattice factorizes the paraunitary
polyphase matrix into the following form (see (3.2.60)):
N −1
H00 (z) H01 (z) " 1 0
H p (z) = = U0 · Ui ,
H10 (z) H11 (z) 0 z −1
i=1
where filters are of length L = 2N and the matrices U i are 2 × 2 rotations. Such
rotations can be written as (where we use the shorthand ai and bi for cos(αi ) and
sin(αi ) respectively) [32]
⎛ ⎞ ⎛ ⎞
ai + bi 0 0 1 0
ai bi 1 0 1
= ·⎝ 0 ai − bi 0 ⎠ · ⎝ 0 1 ⎠ . (6.2.4)
−bi ai 0 1 1
0 0 −bi 1 −1
6.2. COMPLEXITY OF DISCRETE BASES COMPUTATION 363
Thus, only three multiplications are needed, or 3N for the whole lattice. Since the
lattice works in the downsampled domain, the complexity is 3N/2 multiplications
or, since N = L/2, 3L/4 multiplications/input sample and a similar number of
additions. A further trick consists in denormalizing the diagonal matrix in (6.2.4)
(taking out bi for example) and gathering all scale factors at the end of the lattice.
Then, the complexity becomes (L/2)+1 multiplications/input sample. The number
of additions remains unchanged.
Table 6.1 summarizes the complexity of various filter banks. Except for the last
entry, time-domain computation is assumed. Note that in the frequency-domain
computation, savings due to symmetries become minor.
This holds because the initial block is followed by two blocks at half rate (which
contributes 2 · C0 /2), four blocks at quarter rate and so on. Thus, while the number
of leaves grows exponentially with K, the complexity only grows linearly with K.
Let us discuss alternatives for the computation of the full tree structure in the
simplest, two-stage case, shown in Figure 6.6(a). It can be transformed into the
four-channel filter bank shown in Figure 6.6(b) by passing the second stage of fil-
ters across the first stage of downsampling. While the structure is simpler, the
length of the filters involved is now of the order of 3L if Hi (z) is of degree L − 1.
Thus, unless the filters are implemented in factorized form, this is more complex
than the initial structure. However, the regular structure might be preferred in
hardware implementations.
Let us consider a Fourier-domain implementation. A simple trick consists of
implementing the first stage with FFT’s of length N and the second stage with
FFT’s of length N/2. Then, one can perform the downsampling in Fourier domain
and then, the forward FFT of the second stage cancels the inverse FFT of the first
stage. The downsampling in Fourier domain requires N/2 additions, since if X[k] is
a length-N Fourier transform, the length-N/2 Fourier transform of its downsampled
version is
1
Y [k] = (X[k] + X [k + N/2] ) .
2
Figure 6.6(c) shows the algorithm schematically, where, for simplicity, the filters
rather than the polyphase components are shown. The polyphase implementation
requires to separate even and odd samples in time domain. The even samples are
obtained from the Fourier transform X[k] as
N −1
y[2n] = X[k]WN−2nk
k=0
N/2
−nk
= (X[k] + X [k + N/2] ) WN/2 , (6.2.5)
k=0
N −1
−(2n+1)k
y[2n + 1] = X[k]WN
k=0
N −1
= WN−k (X[k] + X [k + N/2] ) WN/2
−nk
. (6.2.6)
k=0
If the next stage uses a forward FFT of size N/2 on y[2n] and y[2n + 1], the inverse
FFT’s in (6.2.5) and (6.2.6) are cancelled and only the phase shift in (6.2.6) remains.
6.2. COMPLEXITY OF DISCRETE BASES COMPUTATION 365
H0(z) 2 H0(z)H0(z2) 4
H0(z) 2
H1(z) 2 H0(z)H1(z2) 4
H0(z) 2 H1(z)H0(z2) 4
H1(z) 2
H1(z) 2 H1(z)H1(z2) 4
(a) (b)
FS IFFT - N/4
FS H0[k]
FS IFFT - N/4
H0[k]
FFT - N X[k] H1[k]
FS IFFT - N/4
FS H0[k]
FS IFFT - N/4
H1[k]
H1[k]
(c)
Figure 6.6 Two-stage full-tree filter bank. (a) Initial system. (b) Parallelized
FIGUREwith
system. (c) Fourier-domain computation 6.5implicit cancellation of forward
fig6.6
and inverse transforms between stages. FS stands for Fourier-domain down-
sampling. Note that in the first stage the Hi [k] are obtained as outputs of a
size-N FFT, while in the second stage, they are outputs of a size-N/2 FFT.
Octave-Band Trees and Discrete-Time Wavelet Series In this case, we can use
the property of iterated multirate systems which leads to a complexity independent
of the number of stages as seen in (6.1.15). For example, assuming a Fourier-domain
implementation of an elementary two-channel bank which uses about (4 log 2 L) op-
erations/input sample as in (6.2.2), a K-stage discrete-time wavelet series expansion
requires of the order of
8 log2 L (1 − 1/2K ) operations
for long filters implemented in Fourier domain, and
4 L (1 − 1/2K ) operations (6.2.7)
for short filters implemented in time domain. As mentioned earlier, filters of length
8 or more are more efficiently implemented with Fourier-domain techniques.
Of course, the merging trick of inverse and forward FFT’s between stages can
be used here as well. A careful analysis made in [245] shows that merging of two
stages pays off for filter lengths of 16 or more. Merging of more stages is marginally
interesting for large filters since it involves very large FFT’s, which is probably
impractical. Again, fast running convolution methods are best for medium size
filters (L = 6, . . . , 12) [245]. Finally, all savings due to special structures, such as
orthogonality or linear phase, carry over to tree structures as well.
The study of hardware implementations of discrete-time wavelet transforms is
an important topic as well. In particular, the fact that different stages run at
different sampling rates makes the problem nontrivial. For a detailed study and
various solutions to this problem, see [219].
3 Hpr0(z)
z2 3 Hpr (z)
2
where Hpri (z) is the ith polyphase component of the filter Hpr (z) and F 3 is the size-
3 discrete Fourier transform matrix. The implementation is shown in Figure 6.7.
This fast implementation of modulated filter banks using polyphase filters of the
prototype filter followed by a fast Fourier transform is central in several applications
such as transmultiplexers. This fast algorithm goes back to the early 70’s [25]. The
complexity is now substantially reduced. The polyphase filters require N -times less
complexity than a full filter bank, and the FFT adds an order N log2 N operations
per N input samples. The complexity is of the order of
L
(2 + 2 · log2 N ) operations/input sample, (6.2.11)
N
that is, a substantial reduction over a single, length-L filtering operation. Further
reductions are possible by implementing the polyphase filters in frequency domain
(reducing the term of order L to log2 L) and merging FFT’s into a multidimensional
one [210]. Another important and efficient filter bank is based on cosine modulation.
It is sometimes referred to as lapped orthogonal transforms (LOT’s) [188] or local
cosine bases [63]. Several possible LOT’s have been proposed in the literature
and are of the general form described in (3.4.17–3.4.18) in Section 3.4.3. Using
368 CHAPTER 6
Fully Separable Case When both filters and downsampling are separable, then
the system is the direct product of one-dimensional systems. The implementation
is done separately over each dimension. For example, consider a two-dimensional
system filtering an N × N image into four subbands using the filters {H0 (z1 )H0 (z2 ),
H0 (z1 )H1 (z2 ), H1 (z1 )H0 (z2 ), H1 (z1 )H1 (z2 )} each of size L×L followed by separable
downsampling by two in each dimension. This requires N decompositions in one
dimension (one for each row), followed by N decompositions in the other, or a total
of 2N 2 · L multiplications and a similar number of additions. This is a saving of the
order of L/2 with respect to the nonseparable case. Note that if the decomposition
is iterated on the lowpass only (that is, a separable transform), the complexity is
only
C C 4
Ctot = C + + + · · · < C,
4 16 3
where C is the complexity of the first stage.
2 L (1 − 1/2K ) + Lp
For simplicity, we will omit the subscript “0” and will simply call the lowpass filter
G. The length of G(i) (z) is equal to
The first two relations will lead to recursive algorithms, while the last one produces
a doubling algorithm and can be used when iterates which are powers of two are
desired. Computing (6.3.2) as
F G
G(i) (z) = G0 (z 2 ) + z −1 G1 (z 2 ) · G(i−1) (z 2 ),
where G0 and G1 are the two polyphase components of filter G, leads to two
products between polynomials of size L/2 and (2i−1 − 1)(L − 1) + 1. Calling
O[G(i) (z)] the number of multiplications for finding G(i) (z), we get the recursion
O[G(i) (z)] = L · L(i−1) + O[G(i−1) (z)]. Again, because G(i−1) (z) takes half as much
complexity as G(i) (z), we get an order of complexity
O G(i) (z) 2 · L · L(i−1) 2i · L2 , (6.3.5)
multiplications, and about three times as many additions. This compares favorably
to time-domain evaluation (6.3.5). As usual, this is interesting for medium to large
L’s. It turns out that the doubling formula (6.3.4), which looks attractive at first
sight, does not lead to a more efficient algorithm than the ones we just outlined.
The savings obtained by the above simple algorithms are especially useful in
multiple dimensions, where the iterates are with respect to lattices. Because mul-
tidimensional wavelets are difficult to design, iterating the filter might be part of
the design procedure and thus, reducing the complexity of computing the iterates
can be important.
Then, the channel signals yi [n] are obtained by Fourier transform from the xi [n]’s
y[n] = F · x[n],
where y[n] = (y0 [n] . . . yN −1 [n] )T , x[n] = (x0 [n] . . . xN −1 [n] )T , and F is the size
N × N Fourier matrix. The complexity per output vector y[n] is L multipli-
cations and about L − N additions (from (6.4.1)) plus a size-N Fourier trans-
form, or, (N/2) log 2 N multiplications and three times as many additions. Since
y[n] has a rate M times smaller than the input, we get the following multi-
plicative complexity per input sample (where K = N/M is the oversampling ra-
tio):
1 L
(L + N log2 N ) = K · + log2 N ,
M N
that is, K times more than in the critically sampled case given in (6.2.11). The
additive complexity is similar (except for a factor of 3 in front of the log2 N ).
Because M < N , the polyphase matrix is nonsquare of size N × M and does not
have a structure as simple as the one given in (6.2.10). However, if N is a multiple
of M , some structural simplifications can be made.
"i−2 l
2i−1
Fi (z) = H1 z · H0 z 2 .
l=0
6.4. COMPLEXITY OF OVERCOMPLETE EXPANSIONS 373
time
1
2
(a)
3
scale
time
1
2
(b)
3
scale
time
1
2
(c)
3
scale
Figure 6.8 Sampling of the time-scale plane. (a) Sampling in the orthogonal
FIGURE 6.7
discrete-time wavelet series. (b) Oversampled time-scale plane in thefig6.9
“algo-
rithme à trous”. (c) Multiple voices/octave. The case of three voices/octave
is shown.
An efficient computational structure simply computes the signals along the tree and
takes advantage of the fact that the filter impulse responses are upsampled, that is,
nonzero coefficients are separated by 2k zeros. This lead to the name “algorithme
à trous” (algorithm with holes) given in [136]. It is immediately obvious that
the complexity of a direct implementation is now 2L multiplications and 2(L − 1)
additions/octave and input sample, since each octave requires filtering by highpass
and lowpass filters which have L nonzero coefficients. Thus, to compute J octaves,
374 CHAPTER 6
H1(z) 2
H0(z) 2 H1(z) 2
H0(z) 2 H1(z) 2
(a) H0(z) 2
H1(z)
H0(z) H1(z2)
H0(z2) H1(z4)
(b) H0(z4)
FIGURE
Figure 6.9 Oversampled discrete-time 6.8 series. (a) Critically
wavelet fig6.10
sampled
case. (b) Oversampled case obtained from (a) by deriving the equivalent fil-
ters and skipping the downsampling. This approximates the continuous-time
wavelet transform.
4 · L · J operations/input sample
that is, a linear increase with the number of octaves. The operations can be moved
to Fourier domain to reduce the order L to an order log2 L and octaves can be
merged, just as in the critically sampled case. A careful analysis of the result-
ing complexity is made in [245], showing gains with Fourier methods for filters of
medium length (L ≥ 9).
for m = 0, one can use the standard octave by octave algorithm, involving the
wavelet ψ(t). To get the scales for m = 1, . . . , M − 1, one can use the slightly
stretched versions
ψ (m) (t) = 2−m/2M ψ 2−m/M t , m = 1, . . . , M − 1.
The tiling of the time-scale plane is shown in Figure 6.8(c) for the case of three
voices/octave (compare this with Figure 6.8(a)). Note that lower voices are over-
sampled, but the whole scheme is redundant in the first place since one voice would
be sufficient. The complexity is M times that of a regular discrete-time wavelet
series, if the various voices are computed in an independent manner.
The parameters of each of the separate discrete-time wavelet series have to be
computed (following Shensa’s algorithm), since the discrete-time filters will not
be “scales” of each other, but different approximations. Thus, one has to find
the appropriate highpass and lowpass filters for each of the m-voice wavelets. An
alternative is to use the scaling property of the wavelet transform. Since
1
x(t), ϕ(at) = x(t/a), ϕ(t),
a
we can start a discrete-time wavelet series algorithm with m signals which are scales
of each other; xm (t) = 2m/2M x(2m/M t), m = 0, . . . , M − 1. Again, the complexity
is M times higher than a single discrete-time wavelet series. The problem is to find
the initial sequence which corresponds to the projection of the xm (t) onto V0 . One
way to do this is given in [300].
Finally, one can combine the multivoice with the “à trous” algorithm to compute
a dense grid over scales as well as time. The complexity then grows linearly with
the number of octaves and the number of voices, as
4 · L · J · M operations/input sample,
where J and M are the number of octaves and voices respectively. This is an
obvious algorithm, and there might exist more efficient ways yet to be found.
This concludes our discussion of algorithms for oversampled expansions, which
closely followed their counterparts for the critically sampled case.
Μ C0 Μ
size-N size-N
•••
modulated modulated
filter bank filter bank
(pruned to
length M) Μ CN-2 Μ
Μ CN-1 Μ
H(z) = z M −1 + z M −2 + · · · + z + 1,
$ %
Hi (z) = z −M +1 · H WNi z .
1 il
L−1
Ci = Wn c[l].
N
l=0
6.5. SPECIAL TOPICS 377
G(z) = 1 + z −1 + z −2 + · · · + z −N +1 ,
$ %
Gi (z) = G WNi z .
The algorithm is sketched in Figure 6.10. The proof that it does compute a running
convolution is simply by identification of the various steps with the usual overlap-
add algorithm. Note that the system produces a delay of M − 1 samples (since all
filters are causal), that is
H0(z)
2 1 1 2 z−1
1 1 0
0 -1 H0(z) − H1(z) 0 -1 1
z-1 1
z−1 2 2 +
H1(z)
Figure 6.11 Fast running convolution algorithm with channel filters. The
input-output relationship equals Htot (z) = z −1 (H0 (z 2 ) + z −1 H1 (z 2 )).
sion can be obtained. Thus, these are unlike the previous algorithms in this chapter
which reduced computations while being exact in exact arithmetic. The idea is that
matrices can be compressed just like images! In applications such as iterative so-
lution of large linear systems, the recurrent operation is a very large matrix-vector
product which has complexity N 2 . If the matrix is the discrete version of an op-
erator which is smooth (except at some singularities), the wavelet transform2 can
be used to “compress” the matrix by concentrating most of the energy into well-
localized bands. If coefficients smaller than a certain threshold are set to zero, the
transformed matrix becomes sparse. Of course, we now deal with an approximated
matrix, but the error can be bounded. Beylkin, Coifman and Rokhlin [30] show
that for a large class of operators, the number of coefficients after thresholding is
of order N .
We will concentrate on the simplest version of such an algorithm. Call W the
matrix which computes the orthogonal wavelet transform of a length-N vector. Its
inverse is simply its transpose. If we desire the matrix vector product y = M · x,
we can compute:
y = W T · (W · M · W T ) · W · x. (6.5.2)
Recall that W · x has a complexity of order L · N , where L is the filter length and
N the size of the vector. The complexity of W · M · W T is of order L · N 2 , and
thus, (6.5.2) is not efficient if only one product is evaluated. However, if we are in
the case of an iterative algorithm, we can compute M = W · M · W T once (at a
cost of LN 2 ) and then use M in the sequel. If M , after thresholding, has order-N
nonzero entries, then the subsequent iterations, which are of the form:
y = W T · M · W · x ,
are indeed of order N rather than N 2 . It turns out that the computation of M
itself can be reduced to an order N problem [30]. An interpretation of M is of
interest. Premultiplying M by W is equivalent to taking a wavelet transform of
the columns of M , while postmultiplying M by W T amounts to taking a wavelet
transform of its rows. That is, M is the two-dimensional wavelet transform of M ,
where M is considered as an image. Now, if M is smooth, one expects M to have
energy concentrated in some well-defined and small regions. It turns out that the
zero moments of the wavelets play an important role in concentrating the energy,
as they do in image compression. This short discussion only gave a glimpse of these
powerful methods, and we refer the interested reader to [30] and the references
therein for more details.
2
Since this will be a matrix operation of finite dimension, we call it a wavelet transform rather
than a discrete-time wavelet series.
PROBLEMS 381
P ROBLEMS
6.1 Toeplitz matrix-vector products: Given a Toeplitz matrix T of size N × N , and a vector x
of size N , show that the product T x can be computed with an order N log2 N operations.
The method consists in extending T into a circulant matrix C. What is the minimum size
of C, and how does it change if T is symmetric?
NM = F N ⊗ I M ,
FB
where F N is the size-N Fourier matrix, I M is the size-M identity matrix and ⊗ is the
Kronecker product (2.3.2).
6.4 Complexity of MUSICAM filter bank: The filter bank used in MUSICAM (see also Sec-
tion 7.2.3) is based on modulation of a single prototype of length 512 to 32 bandpass filters.
nk
For the sake of this problem, we assume a complex modulation by W32 , that is
nk
hk [n] = hp [n] W32 , W32 = e−j2π/32 ,
and thus, the filter bank can be implemented using polyphase filters and an FFT (see
Section 6.2.3). In a real MUSICAM system, the modulation is with cosines and the imple-
mentation involves polyphase filters and a fast DCT, thus it is very similar to the complex
case we analyze here. Assuming an input sampling rate of 44.1 kHz, give the number of
operations per second required to compute the filter bank.
6.6 Overlap-add/save filter banks: Consider a size-4 modulated filter bank downsampled by 2
and implementing overlap-add or save running convolution (see Figure 6.10 for example).
6.7 Consider a 3-channel analysis/synthesis filter bank downsampled by 2, with filtering of the
channels (see Figure 3.18). The filters are given by
Verify that the overall system is shift-invariant and performs a convolution with a filter
having the z-transform F (z) = (F0 (z 2 ) + z −1 F1 (z 2 ))z −1 .
7
The compression of signals, which is one of the main applications of digital signal
processing, uses signal expansions as a major component. Some of these expansions
were discussed in previous chapters, most notably discrete-time expansions via filter
banks. When the channels of a filter bank are used for coding, the resulting scheme
is known as subband coding. The reasons for expanding a signal and processing it
in transform domain are numerous. While source coding can be performed on the
original signal directly, it is usually more efficient to find an appropriate transform.
By efficient we mean that for a given complexity of the encoder, better compression
is achieved.
The first useful property of transforms, or “generalized” transforms such as sub-
band coding, is their decorrelation property. That is, in the transform domain, the
transform coefficients are not correlated, which is equivalent to diagonalizing the
autocovariance matrix of the signal, as will be seen in Section 7.1. This diagonal-
ization property is similar to the convolution property (or the diagonalization of
circulant matrices) of the Fourier transform as we discussed in Section 2.4.8. How-
ever, the only transform that achieves exact diagonalization, the Karhunen-Loève
transform, is usually impractical. Many other transforms come close to exact di-
agonalization and are therefore popular, such as the discrete cosine transform, or,
appropriately designed subband or wavelet transforms. The second advantage of
transforms is that the new domain is often more appropriate for quantization using
383
384 CHAPTER 7
perceptual criterions. That is, the transform domain can be used to distribute er-
rors in a way that is less objectionable for the human user. For example, in speech
and audio coding, the frequency bands used in subband coding might mimic opera-
tions performed in the inner ear and thus one can exploit the reduced sensitivity or
even masking between bands. The third advantage of transform coding is that the
previous features come at a low computational price. The transform decomposition
itself is computed using fast algorithms as discussed in Chapter 6, quantization in
the transform domain is often simple scalar quantization, and entropy coding is
done on a sample-by-sample basis.
Together, these advantages produced successful compression schemes for speech,
audio, images and video, some of which are now industry standards (32 Kbits/sec
subband coding for high-quality speech [192], AC [34, 290], PAC [147], and MUSI-
CAM for audio [77, 279], JPEG for images [148, 327], MPEG for video [173, 201]).
It is important to note that the signal expansions on which we have focused so far
are only one of the three major components of such compression schemes. The other
two are quantization and entropy coding. This three part view of compression will
be developed in detail in Section 7.1, together with the strong interaction that exists
among them. That is, in a compression context, there is no need for designing the
“ultimate” basis function system unless adequate quantization and entropy coding
are matched to it. This interplay, while fairly obvious, is often insufficiently stressed
in the literature. Note that this section is a review and can be skipped by readers
familiar with basic signal compression.
Section 7.2 concentrates on one-dimensional signal compression, that is, speech
and audio coding. Subband methods originated from speech compression research,
and for good reasons: Dividing the signal in frequency bands imitates the human
auditory system well enough to be the basis for a series of successful coders.
Section 7.3 discusses image compression, where transform and subband/
wavelet methods hold a preeminent position. It turns out that representing images
at multiple resolutions is a desirable feature in many systems using image compres-
sion such as image databases, and thus, subband or wavelet methods are a popular
choice. We also discuss some new schemes which contain wavelet decompositions
as a key ingredient.
Section 7.4 adds one more dimension and discusses video compression. While
straight linear transforms have been used, they are outperformed by methods using
a combination of motion based modeling and transforms. Again, a multiresolution
feature is often desired and will be discussed.
Section 7.5 discusses joint source-channel coding using multiresolution source
decompositions and matched channel coding. It turns out that several upcom-
ing applications, such as digital broadcasting and transmission over highly varying
channels such as wireless channels or channels corresponding to packet-switched
7.1. COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 385
y y^
(a) x T Q E c
y0 y^0
x0 Q0 E0 c0
y1 y^1
x1 Q1 E1 c1
T
(b)
•••
•••
•••
•••
yN-1 y^N-1
xN-1 QN-1 EN-1 cN-1
In this section, we will deal with compression systems, as given in Figure 7.1(a).
The linear transformation (T) is the first step in the process which includes quan-
tization (Q) and entropy coding (E). Quantization introduces nonlinearities in the
system and results in loss of information, while entropy coding is a reversible pro-
cess. A system as given in Figure 7.1 is termed an open-loop system, since there
is no feedback from the output to the input. On the other hand, a closed-loop
system, such as the DPCM (see Figure 7.5), includes the quantization in the loop.
We mostly concentrate on open-loop systems, because of their close connection
to signal expansions. Following Figure 7.1, we start by discussing various linear
transforms with an emphasis on the optimal Karhunen-Loève transform, followed
by quantization, and end up briefly describing entropy coding methods. We try to
emphasize the interplay among these three parts, as well as indicate the importance
of perceptual criterions in designing the overall system. Our discussion is based on
the excellent text by Gersho and Gray [109], to which we refer for more details.
386 CHAPTER 7
This chapter uses results from statistical signal processing, which are reviewed in
Appendix 7.A.
Let us here define the measures of quality we will be using. First, the mean
square error (MSE), or, distortion, equals
N −1
1
D= E(|xi − x̂i |2 ), (7.1.1)
N
i=0
where xi are the input values and x̂i are the reconstructed values. For a zero-mean
input, the signal-to-noise ratio (SN R) is given by
σ2
SN R = 10 log 10 , (7.1.2)
D
where D is as given in (7.1.1) and σ 2 is the input variance. The peak signal-to-noise
ratio (SN Rp ) is defined as [138]
M2
SN Rp = 10 log 10 , (7.1.3)
D
where M is the maximum peak-to-peak value in the signal (typically 256 for 8-
bit images). Distortion measures based on squared error have shortcomings when
assessing the quality of a coded signal such as an image. An improved distortion
measure is a perceptually weighted mean square error. Even better are distortion
models which include masking. These distortion metrics are signal specific, and
some of them will be discussed in conjunction with practical compression schemes
in later sections.
and can abbreviate it simply as x. From now on, we will assume that the process
is zero-mean and thus its autocorrelation and autocovariance are the same, that is,
K[n, m] = R[n, m]. The autocovariance matrix of the input vector x is
K x = E(x · xT ).
Again, since the process is wide-sense stationary and zero-mean, K[n, m] = K[n −
m] = R[n − m] (see Appendix 7.A). Therefore, the matrix K x has the following
form: ⎛ ⎞
R[0] R[1] . . . R[N − 1]
⎜ R[1] R[0] . . . R[N − 2] ⎟
Kx = ⎜ ⎝ .. .. .. .. ⎟.
⎠
. . . .
R[N − 1] R[N − 2] . . . R[0]
This matrix is Toeplitz, symmetric (see Section 2.3.5), and nonnegative definite
since all of its eigenvalues are greater or equal to zero (this holds in general for
autocorrelation matrices). Consider now the transformed vector y,
y = T x, (7.1.4)
λ0 ≥ λ1 ≥ · · · ≥ λN −1 ≥ 0, (7.1.6)
where the last inequality holds because K x is nonnegative definite. Moreover, since
K x is symmetric, there is a complete set of orthonormal eigenvectors (see Section
2.3.2). Take T as
T = [v 0 v 1 . . . v N −1 ]T , (7.1.7)
then, from (7.1.5),
K y = T · K x · T T = T · T T · Λ = Λ, (7.1.8)
388 CHAPTER 7
where the last equality follows from the fact that T is a unitary transform, that is, the MSE
is conserved between transform and original domains. Keeping only the first k coefficients
means that ŷi = yi for i = 0, . . . , k − 1 and ŷi = 0, for i = k, . . . , N − 1. Then the MSE
equals N−1
2 1 2
N−1
1
N−1
Dk = E (yi − ŷi ) = yi = λi ,
i=0
N N
i=k i=k
and this is obviously smaller or equal to any other set of N − k coefficients because of the
ordering in (7.1.6). Recall here that the assumption of zero mean still holds.
Another way to say this is that the first k coefficients contain most of the energy
of the transformed signal. This is the “energy packing” property of the Karhunen-
Loève transform. Actually, among all unitary transforms, the KLT is the one that
packs most energy into the first k coefficients.
There are two major problems with the KLT, however. First, the KLT is signal
dependent, since it depends on the autocovariance matrix. Second, it is computa-
tionally complex, since no structure can be assumed for T , and no fast algorithm
can be used. This leads to an order N 2 operations for applying the transform.
# N −1
2 2π(2n + 1)k
yk = xn cos , k = 1, . . . , N − 1. (7.1.11)
N 4N
n=0
The DCT was developed [2] as an approximation for the KLT of a first-order Gauss-
Markov process with a large positive correlation coefficient ρ (ρ → 1). In this case,
K x is of the following form (assuming unit variance and zero mean)
⎡ ⎤
1 ρ ρ2 ρ3 · · ·
⎢ ρ 1 ρ ρ2 · · · ⎥
⎢ ⎥
K x = ⎢ ρ2 ρ 1 ρ · · · ⎥ .
⎣ ⎦
.. .. .. .. . .
. . . . .
For large ρ’s, the DCT approximately diagonalizes K x . Actually, the DCT (as well
as some other transforms) is asymptotically equivalent to the KLT of an arbitrary
wide-sense stationary process when the block size N tends to infinity [294]. It
should be noted that even if the assumptions do not hold exactly (images are not
first-order Gauss-Markov), the DCT has proven to be a robust approximation to
the KLT, and is used in several standards for speech, image and video compression
as we shall see.
The DCT also has shortcomings. One must block the input stream in order to
perform the transform and this blocking is quite arbitrary. The block boundaries
often create not only loss of compression (correlation across the boundaries is not
removed) but also annoying blocking effects. This is one of the reasons for using
lapped transforms and subband or wavelet coding schemes. However, the goal of
these generalized transforms is the same, namely, to create decorrelated outputs
from a correlated input stream, and then to quantize the outputs separately.
y
3
x
-3 -2 -1 1 2 3
-1
-2
-3
Note that nonorthogonal systems (such as linear phase biorthogonal filter banks)
are usually designed to almost satisfy (7.1.12). If they do not, there is a risk that
small errors in the transform domain are magnified after reconstruction. The key
problem now is to design the set of quantizers so as to minimize E(y − ŷ).
7.1.2 Quantization
While we deal with discrete-time signals in this chapter, the sample values are real
numbers, that is, continuously distributed in amplitude. In order to achieve com-
pression, we need to map the real value of samples into a discrete set, or discrete
alphabet. This process of mapping the real line into a countable discrete alphabet
is called quantization. In practical situations, the sample values are mapped into
a finite alphabet. An excellent treatment of quantization can be found in [109].
In its simplest form, each sample is individually quantized, which is called scalar
quantization. A more powerful method consists in quantizing several samples at
once, which is referred to as vector quantization. Also, one can quantize the differ-
ence between a signal and a suitable prediction of it, and this is called predictive
quantization. We would like to stress here that the results on optimal quantization
for a given signal are well-known, and can be found in [109, 143].
that the number of intervals is finite. Thus, there are two unbounded intervals
which correspond to what is called “overload” regions of the quantizer, that is, for
x < −5/2 and x > 5/2. Given that the number of intervals is N , there are N
output symbols. Thus, R = "log2 N # bits are needed to represent the output of
the quantizer, and this is called the rate. The operation of selecting the interval is
sometimes called coding, while assigning the output value yi for the interval Ii is
called decoding. Thus, we have a two-step process
−→ i 3412
(xi−1 , xi ] 3412 −→ yi .
coder decoder
The performance of a quantizer is measured as the distance between the input and
the output, and typically, the squared error is used:
Given an input distribution, worst case or more often average distortion is measured.
Thus, the MSE is
xi
D = E(|x − x̂| ) =
2
(x − yi )2 fX (x)dx, (7.1.13)
i xi−1
where fX (x) is the probability density function (pdf) of x. For example, assume a
uniform input pdf and a bounded input with N intervals, then uniform quantization
with intervals of width Δ and yi = (xi + xi−1 )/2 leads to an MSE equal to
Δ2
D = . (7.1.14)
12
The derivation of (7.1.14) is left as an exercise (see Problem 7.1). The error due to
quantization is called quantization noise:
if x and x̂ are the input and the output of the quantizer, respectively. While e[n]
is a deterministic function of x[n], it is often modeled as a noise process which is
uncorrelated to the input, white and with a uniform sample distribution. This is
called an additive noise model, since x̂[n] = x[n] + e[n]. While this is clearly an
approximation, it is a fair one in the case of high-resolution uniform quantization
(when Δ is much smaller than the standard deviation σ of the input signal and N
is large).
Uniform quantization, while not optimal for nonuniform input pdf’s, is very
simple and thus often used in practice. One design parameter, besides the quanti-
zation step Δ, is the number of intervals, or the boundaries which correspond to the
392 CHAPTER 7
overload region. Usually, they are chosen as a multiple of the standard deviation σ
of the input pdf (typically, 4 σ away from the mean). Given constant boundaries
a and b, then Δ = (b − a)/N . Thus, Δ decreases as 1/N = 1/2R where R is the
number of bits of the quantizer. The distortion D is of the form (following (7.1.14))
Δ2 (b − a)2
D = = = σ 2 2−2R = C · 2−2R , (7.1.15)
12 12N 2
since σ 2 = (b − a)2 /12 for uniform input pdf. In general, C is a function of σ 2 and
depends on the distribution. This means that the SN R goes up by 6 dB for every
additional bit in the quantizer. To see that, add a bit to R, R = R + 1. Then
σ2
SN R = 10 log10 4 = SN R + 10 log10 4 SN R + 6 dB.
C2−2R
When the pdf is not uniform, optimal quantization will not be uniform either. An
optimal MSE quantizer is one that minimizes D in (7.1.13) for a given number
of output symbols N . For a quantizer to be MSE optimal, it has to satisfy the
following two necessary conditions [109]:
(a) Nearest neighbor condition For a given set of output levels, the optimal parti-
tion cells are such that an input is assigned to the nearest output level. For
MSE minimization, this leads to the midpoint decision level between every two
adjacent output levels.
(b) Centroid condition Given a partition of the input, the optimal decoding lev-
els with respect to the MSE are the centroids of the intervals, that is, yi =
E(x | x ∈ Ii ).
Note that such a quantizer is not necessarily optimal for compression since it
does not take into account entropy coding.2 The two conditions are sketched in
Figure 7.3. Both conditions are intuitive, and can be used to verify optimality of a
quantizer or actually design an optimal one. This is done in the Lloyd algorithm,
which iteratively improves a codebook for a given pdf and a number of codewords
N (the pdf can be given analytically or through measurements). Starting with some
(0)
initial codebook {yi }, it alternates between
2
A suitable modification, called entropy constrained quantization, takes entropy into account
in the design of the quantizer.
7.1. COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 393
Ii
⎫
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎭
(a) ssssssssssssssssssssssss
x
yi-1 xi-1 yi xi yi+1
fx ⁄ I ( x )
i
(b)
x
xi-1 yi xi
(n) (n)
(a) Given {yi }, find the partition {xi }, based on the nearest neighbor condi-
tion.
(n) (n+1)
(b) Given {xi }, find the next {yi }, satisfying the centroid condition.
and stops when D(n) is only marginally improved. The resulting quantizer is called
a Lloyd-Max quantizer.
The above discussion assumed quantization of a continuous variable into a dis-
crete set. Often, a discrete input set of size M has to be quantized into a set of
size N < M . A “discrete” version of the Lloyd algorithm, which uses the same
necessary conditions (nearest neighbor and centroid), can then be used.
While the above method yields quantizers with minimum distortion for a given
codebook size, entropy coding was not considered. We will see that if entropy
coding is used after quantization, a uniform quantizer can actually be attractive.
Vector Quantization While vector quantization (VQ) [109, 120] is much more
than just a generalization of scalar quantization to multiple dimensions, we will
only look at it in this restricted way in our brief treatment. Figure 7.4(a) shows a
regular vector quantizer for a two-dimensional variable. Note that the partition of
the square is into convex3 regions and the separation into regions is performed using
straight lines (in N dimensions, these would be hyperplanes of dimension N − 1).
There are several advantages of vector quantizers over scalar quantizers. For the
sake of discussion, we consider a two-dimensional case, but it obviously generalizes
to N dimensions.
3
Convex means that if two points x and y belong to one region, then all the points on the
straight line connecting x and y will belong to the same region as well.
394 CHAPTER 7
x1
(a)
x0
1 1 1
(b)
0 1 0 1 0 1
(a) Packing gain Even if two variables are independent, there is gain in quantizing
them together. The reason is that there exist better partitions of the space
then the rectangular partition obtained when we separately scalar quantize
each variable. For example, in two dimensions, it is well-known that hexagonal
tiling achieves a smaller MSE than the square tiling for the quantization of
uniformly distributed random variables, given a certain density. The packing
gain increases with dimensionality.
(b) Removal of linear and nonlinear dependencies While linear dependencies could
be removed using a linear transformation, VQ also removes nonlinear depen-
dencies. To see this, let us consider the classic example shown in Figure 7.4(b).
The two-dimensional probability density function equals 2 in shaded areas and
0 otherwise. Because the marginal distributions are uniform, scalar quantiza-
tion of each variable is uniform. Vector quantization “understands” the de-
pendency, and only allocates partitions where necessary. Thus, instead of 4.0
bits, or, 2.0 bits/sample for the scalar quantization, we obtain 3.0 bits, or, 1.5
bits/sample for the vector quantization, reducing the bit rate by 25% while
keeping the same distortion (see Figure 7.4(b)).
(c) Fractional bit rate At low bit rates, choosing between 1.0 bits/sample or 2.0
bits/sample is a rather crude choice. By quantizing several samples together
and allocating an integer number of bits to the group, fractional bit rates can
7.1. COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 395
encoder decoder
P(z)
encoder
be obtained.
For a vector quantizer to be MSE optimal, it has to satisfy the same two con-
ditions we have seen for scalar quantizers, namely:
A codebook satisfying these two necessary conditions is locally optimal (small per-
turbations will not decrease D) but is usually not globally optimal. The design
of VQ codebooks is thus a sophisticated technique, where a good initial guess is
crucial and is followed by an iterative procedure. For escaping local minimums,
stochastic relaxation is used. For details, we refer to [109].
A drawback of VQ is its complexity, which limits the size of vectors that can
be used. One solution is to structure the codebook so as to simplify the search of
the best matching vector, given the input. This is achieved with tree-structured
VQ. Another approach is to use linear transforms (including subband or wavelet
transforms) and apply VQ to the relevant transform coefficients. Finally, lattice VQ
uses multidimensional lattices as a partition, allowing large vectors with reasonable
complexity, since lattice VQ is the equivalent of uniform quantization in multiple
dimensions.
396 CHAPTER 7
σx2
G = .
σd2
Note that when the quantization is coarse, this can be quite different from the
open-loop prediction gain, which is the equivalent relation but with the prediction
7.1. COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 397
as in Figure 7.5(a). For practical reasons, the predictor P (z) in the closed-loop
case is usually chosen as in the open-loop case, that is, we are using the predicted
coefficients that are optimal for the true past L samples of the signal.
A further improvement involves adaptive prediction, and can be used both in
the open-loop and in the closed-loop cases. The predictor is updated every K
samples based on the local signal characteristics and sent to the decoder as side
information.
Linear predictive quantization is used successfully in speech and image com-
pression (both in the open-loop and closed-loop forms). In video, a special form of
adaptive DPCM, over time, involves motion-based prediction called motion com-
pensation, which is discussed in Section 7.4.2.
Bit Allocation Looking back at the transform coding diagram in Figure 7.1, the
obvious question is: How do we choose the quantizers for the various transform
coefficients? This is a classical resource allocation problem, where one tries to
maximize (or minimize) a cost function which describes the quality of approximation
under the constraint of finite resources, that is, a given number of bits that can be
used to code the signal. Let us first recall an important fact: The total squared
error between the input and the output is the sum of individual errors because the
transform is unitary. To see that, call x and x̂ the input and reconstructed input,
respectively. Then y and ŷ will be the input and the output of the quantizer. That
is,
y = T x, x̂ = T T ŷ,
where the last equation holds since the transform T is unitary, that is, T T T =
T T T = I. Then the total distortion is
where Di is the expected squared error of the ith coefficient. Then, the bit allocation
problem is to minimize
N −1
D = Di , (7.1.16)
i=0
N −1
Ri ≤ R, (7.1.17)
i=0
398 CHAPTER 7
distortion
x
(a) x
x
x
x
x x
rate
distortion distortion
(b) D1
D0
R0 rate R1 rate
where R is the total budget and Ri the number of bits allocated to the ith coefficient.
A dual situation appears when a maximum allowable distortion is given and the
rate has to be minimized. Before considering specific allocation procedures, we will
discuss some aspects of optimal solutions.
The fundamental trade-off in quantization is between rate (number of bits used)
and distortion (approximation error) and is formalized as rate-distortion theory
[28, 121]. A rate-distortion function for a given source specified by a statistical
model precisely indicates the possible trade-off. While rate-distortion bounds are
usually not closely met in practice, implementable systems have a similar behavior.
Figure 7.6(a) shows a possible rate-distortion function as well as points reached by
a practical system (called an operational rate-distortion curve). Note that the true
rate-distortion function is convex, while the operational one is not necessarily.
For example, for high-resolution scalar quantization, the distortion Di is related
to the rate Ri as (see (7.1.15))
sume is that both rate and distortion are additive. This is, for example, the case
in transform coding if the coefficients are independent. How shall we allocate bits
to each variable so as to minimize distortion? It is important to note that in a
rate-distortion problem, we have to consider both rate and distortion in order to
be optimal. Since the two dimensions are not related (one is bits and the other is
MSE), we use a new cost function L combining the two through a positive Lagrange
multiplier λ:
L = D + λ · R,
Li = Di + λ · Ri , i = 0, 1 ,
∂Di (Ri )
= C · σi2 · 2−2Ri ,
∂Ri
with C = −2 ln 2 · C. The constant-slope solution, that is, ∂Di (Ri )/∂Ri = −λ,
forces the rates to be of the following form:
Ri = α + log2 σi .
N −1
Ri = N · α + log2 σi = R,
i=0
we find
N −1
R 1
α = − · log2 σi ,
N N
i=0
and
N −1
R 1 σi
Ri = + log2 σi − log2 σi = R̄ + log2 , (7.1.19)
N N ρ
i=0
where R̄ = R/N is the mean rate and ρ is the geometric mean of the variances
N −1 1/N
"
ρ = σi .
i=0
The result of this allocation procedure is intuitive, since the number of quantization
levels allocated to the ith quantizer,
2R̄
2Ri = · σi ,
ρ
7.1. COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 401
That is, the next bit is allocated to where it is most needed. Since Di can be given
in analytical form or measured on a training set, this algorithm is easily applicable.
More sophisticated algorithms, optimal or near optimal, are based on Lagrange
methods applied to arbitrary rate-distortion curves [262].
Coding Gain Now that we have discussed quantization and bit allocation, we
can return to our study of transform coding and see what advantage is obtained by
doing quantization in the transform domain (see Figure 7.1).
First, recall that the Karhunen-Loève transform leads to uncorrelated variables
with variance λi (see (7.1.8)). Assume that the input to the transform is zero-mean
Gaussian with variance σx2 , and that fine quantization is used. This leads us to
Proposition 7.2.
P ROOF
After the KLT with optimal scalar quantization and bit allocation, the total distortion for
all N channels is (following (7.1.20)),
N−1 1/N
"
DKLT = N · C · 2−2R̄ · ρ2 = N · C · 2−2R̄ λi , (7.1.21)
i=0
402 CHAPTER 7
√
where C = 3π/2 (see (7.1.18)). Since the determinant of a matrix is equal to the product
of its eigenvalues, the last term is equal to (det(K x ))1/N where K x is the autocovariance
matrix (assuming zero mean, K x = Rx ). To prove the optimality of the KLT, we need
the following inequality for the determinant of an autocorrelation matrix of N zero-mean
variables with variances σi2 [109]:
"
N−1
det(Rx ) ≤ σi2 , (7.1.22)
i=0
with equality if and only if Rx is diagonal. It turns out that the more correlated the
variables are, the smaller the determinant.
Consider now an arbitrary orthogonal transform, with transform variables having
variance σi2 . The distortion is
N−1 1/N
−2R̄
"
DT = N · C · 2 σi .
i=0
Because of (7.1.22) and the fact that the determinant is conserved by unitary transforms,
this is greater or equal than
Since the KLT achieves a diagonal Rx , then the equality is reached by the KLT following
(7.1.21). This proves that if the input to the transform is Gaussian and the quantization is
fine, the KLT is optimal among all unitary transforms.
What is the gain we just obtained? If the samples are directly quantized, the
distortion will be
DPCM = N · C · 2−2R̄ · σx2 , (7.1.23)
(where PCM stands for pulse code modulation, that is, sample-by-sample quanti-
zation) and the coding gain due to optimal transform coding is
−1 2
DPCM σx2 1/N N i=0 σi
=
5N −1 2 1/N 5 1/N ,
= (7.1.24)
DKLT N −1 2
i=0 σi i=0 σi
2
where we used the fact that N · σx2 = σi . Recalling that the variances σi2 are
the eigenvalues of Rx , it follows that the coding gain is the ratio of the arithmetic
and geometric means of the eigenvalues of the autocorrelation matrix (under the
zero-mean assumption). The lower bound on the gain is 1, which is attained only
if all eigenvalues are identical.
Subband coding, being a generalization of transform coding, has a similar be-
havior. If the input is Gaussian, the channel signals are Gaussian as well. If the
filters are ideal bandpass filters, the channels will be decorrelated. In any case, the
7.1. COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 403
DSBC = N · C · 2−2R̄ · ρ2 ,
where ρ is the geometric mean of the subband variances. Using (7.1.23) for direct
quantization we get, similarly to (7.1.24), the subband coding gain as
−1 2
DPCM 1/N N i=0 σi
=
5N −1 2 1/N
,
DSBC
i=0 σi
where the σi2 ’s are the subband variances. That is, if the spectrum is far from
being flat, there will be a large coding gain in subband methods. This is to be
expected, since it becomes possible to match the spectral characteristics of the
signal very closely, unlike in a sample-domain quantization. It is worthwhile to note
that when the number of channels grows to infinity, both transform and subband
coding achieve the theoretical performance of predictive coding with infinitely long
predictor [143].
The obvious question is of course how do transform and subband coding com-
pare? The ratio of DKLT and DSBC is:
DKLT ρ2
= KLT ,
DSBC ρ2SBC
that is, the one with the smaller geometric mean wins. Qualitatively, the one with
the larger spread in variances will achieve better coding gain. The exact comparison
thus requires measurements of variances in specific transforms (such as the DCT)
versus filter banks (of finite length rather than ideal ones).
While the above considerations use some idealized assumptions, the concept
holds true in general: The wider the variations between the component signals
(transform coefficients or subbands), the higher the potential for coding gain. More
about the above can be found in [5, 220, 273, 292, 295].
English language while reserving long codes to less frequent ones. The parameters
in searching for the mapping M are the probabilities of occurrence of the symbols
ai , p(ai ). If the quantized variable is stationary, these probabilities are fixed, and a
fixed mapping such as Huffman coding can be used. If the probabilities evolve over
time, more sophisticated adaptive methods such as adaptive arithmetic coding can
be used. Such mappings will transform fixed-length codewords into variable-length
ones, creating a variable-length bit stream. If a constant bit rate channel is used,
buffering has to smooth out variations so as to accommodate the fixed-rate channel.
Huffman Coding Given an alphabet {ai } of size M and its associated probabil-
ities of occurrence p(ai ), the goal is to find a mapping bi = F (ai ) such that the
average length l(bi ) is minimized:
M −1
E(l(bi )) = p(ai )l(bi ). (7.1.25)
i=0
M −1
Ha = − p(ai ) log2 (p(ai )). (7.1.26)
i=0
Huffman’s construction elegantly meets the prefix condition while coming quite
close to the entropy lower bound. The design is guided by the following property
of optimum binary prefix codes: The two least probable symbols have codewords
of equal length which differ only in the last symbol.
The design of the Huffman code is best looked at as growing a binary tree
from the leaves up to the root. The codeword will be the sequence of zeros and
ones encountered as going from the root to the leaf corresponding to the desired
symbol. Start with a list of the probabilities of the symbols. Then, take the two
least probable symbols and make them two nodes with branches (labeled “0” and
“1”) to a common node which represents a new symbol. The new symbol has a
probability which is the sum of the two probabilities of the merged symbols. The
new list of symbols is now shorter by one. Iterate until only one symbol is left. The
codewords can now be read off along the branches of the binary tree. Note that at
every step, we have used the property of optimum binary prefix codes so that the
two least probable symbols were of equal length and had the same prefix.
7.1. COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 405
ai p(ai ) bi
ai p(ai ) bi
0 0.40 0 ai p(ai ) bi
0 0.40 0
1 0.20 100 0 0.40 0
1 0.20 100
2 0.15 101 3+(4+5) 0.25 11
2 0.15 101
3 0.10 110 1 0.20 100
4+5 0.15 111
4 0.10 1110 2 0.15 101
3 0.10 110
5 0.05 1111
(a) (b) (c)
ai p(ai ) bi
ai p(ai ) bi
0 0.40 0
(1+2) + (3+(4+5)) 0.60 1
1+2 0.35 10
0 0.40 0
3+(4+5) 0.25 11
(d) (e)
0.4
0
0.2
0
1 1.0
0.15 0.35 0
1
0.1 0.6
0 1
0.1 0.25
0 1
0.05 1 0.15
Figure 7.7 Huffman code derived from a binary tree and corresponding to the
symbol probabilities given in Table 7.1.7.7
Fig. figref. 7.2.7
Note that we call Huffman coding optimal when the average length E(l(bi )) given
in (7.1.25) reaches the theoretical lower bound given by the entropy (7.1.26), which
is possible only if the symbol probabilities are powers of two. This is a limitation
of Huffman coding, which can be surmounted by using arithmetic coding. It is
more complicated to implement and, in its simplest form, it also requires a priori
knowledge of symbol probabilities. If the source matches the probabilities used to
design the arithmetic coder, then the rate approaches the entropy arbitrarily closely
for long sequences. See [24] and [109] for more details.
Adaptive Entropy Coding While the above approaches come close to the entropy
of a known stationary source, they fail if the source is not well-known or changes
significantly over time. A possible solution is to estimate the probabilities on the
fly (by counting occurrences of the symbols at both the encoder and decoder) and
modify the Huffman code accordingly. While this seems complicated at first sight, it
turns out that only minor modifications are necessary, since only a single probability
is affected by an entering symbol [105, 109].
Arithmetic coding can be modified as well, in order to estimate probabilities
on the fly. This adaptive version is known as a Q-coder [221]. Finally, Ziv-Lempel
coding [342] is an elegant lossless coding technique which uses no a priori proba-
bilities. It builds up a dictionary of encountered subsequences in such a way that
the decoder can build the same dictionary. Then, the encoder sends only the index
to an encountered entry. The dictionary size is fixed and the index uses a fixed
number of bits. Thus, the Ziv-Lempel coding maps variable-size input sequences
into fixed-size codewords, a dual of the Huffman code. The only limitation of the
Ziv-Lempel code is its fixed-size dictionary, which leads to loss in performance when
very long sequences are encoded. No new entries can be created once the dictionary
is full and the remainder of the sequence has to be coded with the current entries.
Modifications of the basic algorithm allow for dictionary updates. Note that since
there are many variations on this theme, we refer to [24] for a thorough discussion.
7.1.4 Discussion
So far we have separately considered the three building blocks of a transform coder
as depicted in Figure 7.1. Some interaction between the transform and the quan-
tization was discussed when proving the optimality of the KLT. Including entropy
coding after quantization can change the way quantization should be done. In
the high-rate, memoryless4 case, uniform quantization followed by entropy coding
turns out to be better than using nonuniform quantization and fixed codewords
[109]. However, this leads to variable-rate schemes and thus requires buffering
when fixed-rate channels are used. This is done with a finite-size buffer, which has
a nonzero probability of overflow. Therefore, a buffer control algorithm is needed.
This usually means moving to coarser quantization when the buffer is close to over-
flow and finer quantization in the underflow case. Obviously, in the overflow control
case, there is a loss in performance in such variable-rate schemes. The size of the
buffer is limited for cost reasons, but also because of the delay it produces in a
real-time transmission case.
Our discussion has focused on MSE-based coding, but we indicated that it
extends readily to weighted MSE. Such weights are usually based on perceptual
criterions [141, 142], and will be discussed later. We note that certain “tricks” such
as the dead zone quantizers used in image compression (uniform quantizers with a
zone around zero larger than the step size that maps to the origin) are heuristics
derived from experiments that are not optimal in the sense discussed so far, but
which produce visually more pleasing images.
(b) Masking properties of dominant sounds over weaker ones within a critical band
and over nearby bands, as given by a spreading function.
7.2. SPEECH AND AUDIO COMPRESSION 409
log f
100 200 1K 2K 10K 20K
Figure 7.8 Critical bands of the auditory system. Bandpass filters’ magnitude
response on a logarithmic frequency axis.
Fig. 7.8 figref. 7.3.1
input entropy
filter bank quantization encoding
. . .
. . .
. . .
.
.
.
spectral masking
. threshold .
analysis . calculation .
. .
Figure 7.9 Generic perceptual coder for high-quality audio compression (after [146]).
Perceptual Coders A perceptual coder for transparent coding of audio will at-
tempt to keep quantization noise just below the level where it would become no-
ticeable. Quantization noise within a critical band has to be controlled and an easy
way to do that is to use a subband or transform coder. Also, permissible quanti-
zation noise levels have to be calculated and this is based on some form of spectral
analysis of the input. Therefore, a generic perceptual coder for audio is as depicted
410 CHAPTER 7
Lower Upper
Band edge Center edge BW
number (Hz) (Hz) (Hz) (Hz)
1 0 50 100 100
2 100 150 200 100
3 200 250 300 100
4 300 350 400 100
5 400 450 510 110
6 510 570 630 120
7 630 700 770 140
8 770 840 920 150
9 920 1000 1080 160
10 1080 1170 1270 190
11 1270 1370 1480 210
12 1480 1600 1720 240
13 1720 1850 2000 280
14 2000 2150 2320 320
15 2320 2500 2700 380
16 2700 2900 3150 450
17 3150 3400 3700 550
18 3700 4000 4400 700
19 4400 4800 5300 900
20 5300 5800 6400 1100
21 6400 7000 7700 1300
22 7700 8500 9500 1800
23 9500 10500 12000 2500
24 12000 13500 15500 3500
25 15500 19500
in Figure 7.9. Note that one can use the analysis filter bank as a spectrum analyzer
or calculate a separate spectrum estimation. Usually, the two are integrated for
computational reasons.
A filter bank implementing critical bands exactly, is computationally unfeasible.
Instead, some approximation is attempted that has roughly a logarithmic behav-
ior, with an initial octave-band filter bank, but uses short-time Fourier-like banks
within the octaves to get finer analysis at reasonable computational cost. A pos-
sible example is shown in Figure 7.10, where LOT stands for lapped orthogonal
7.2. SPEECH AND AUDIO COMPRESSION 411
12 -
0- 24 kHz 8-channel
..
(a) 24 kHz 2-channel LOT .
filter bank
6-
12 kHz 8-channel
..
2-channel 0 - LOT .
filter bank 6 kHz
8-channel
..
3-6 kHz LOT .
2-channel
filter bank 0-3 kHz
16-channel
..
LOT .
(b)
0 3K 6K 12K 24K
Figure 7.10 Filter bank example for Fig. 7.10 part in a perceptual
the analysis figref. 7.3.3
coder
for audio. (a) Architecture. (b) Frequency resolution.
transforms and also refers to cosine-modulated filter banks5 (Section 3.4.3). Re-
cently, Princen has proposed to use nonuniform modulated filter banks [227]. They
are near perfect reconstruction and since they are a straightforward extension of
the cosine-modulated filter banks, they are computationally efficient. High-quality
audio coding usually does not have to meet delay constraints and thus the delay
due to the filter bank is not a problem. Typically, very long filters are used in order
to get excellent band discrimination, and to avoid aliasing as much as possible since
aliasing is perceptually very disturbing in audio.
The next step consists of estimating the masking thresholds within the bands.
Typically, a fast Fourier transform is performed in parallel with the filter bank.
Based on the signal energy and spectral flatness within a critical band, the max-
imum tolerable quantization noise level can be estimated. Typically, single tones
can be identified, their associated masking function derived, and thus, the allow-
able quantization steps follow. Bands which have amplitudes below this maximum
step can be disregarded altogether. For a detailed description of the perceptual
threshold calculations, refer to [145]. Note that this quantization procedure is quite
5
Note that this filter bank is known under many names, such as LOT, MLT, MDCT, TDAC,
Princen & Bradley filter bank, cosine modulated filter bank [188, 229, 228].
412 CHAPTER 7
-20
-40
dB
-60
-80
-100
Figure 7.11 Magnitude response of the 32-channel filter bank used in MUSI-
different from an MSE-based approach as discussed in Section 7.1.2, where only the
variances within bands mattered. Sometimes, the perceptual and MSE approaches
are combined. A first pass allocates an initial number of bits so as to satisfy the
minimum perceptual requirements, while a second pass distributes remaining bits
according to the usual MSE criterions.
The quantization and bit allocation is recalculated for every new segment of the
input signal, and sent as side information to the decoder. Because entropy coding
is used on the quantized subband samples, the bit stream has to be buffered if fixed
rate transmission is intended. Note that not all systems use entropy coding (for
example, MUSICAM does not).
7.2.3 Examples
Various applications such as digital audio broadcasting (DAB) require CD-quality
audio (44.1 kHz sampling and 16 bits/sample). This lead to the development of
medium compression, high-quality standards for audio coding.
f f
(a) (b)
PAC Coder An interesting coder for high-quality compression of audio is the PAC
(Perceptual Audio Coder) coder [147]. In its stereo version, it has been proposed
for digital audio broadcasting as well as for a nonbackward compatible MPEG-II
audio compression system.
The coder has the basic blocks that are typical of many perceptual coders,
given in Figure 7.9. The signal goes through a filter bank and a perceptual model.
Then the outputs of the filter bank and the perceptual model are fed into PCM
quantization, Huffman coding and rate control.
The filter bank is based on the cosine modulated banks presented in Sec-
tion 3.4.3, with window switching. The psychoacoustic analysis provides a noise
threshold for L (Left), R (Right), S (Sum) and D (Difference) channels, where
414 CHAPTER 7
block transforms such as the lapped orthogonal transform. This leads naturally to
a description of the current image compression standard based on the DCT, called
JPEG [148, 327], indicating some of the constraints of a “real-world” compression
system.
We continue by discussing pyramid coding, which is a very simple but flexi-
ble image coding method. A detailed treatment of subband/wavelet image coding
follows. Several important issues pertaining to the choice of the filters, the decom-
position structure, quantization and compression are discussed and some examples
are given.
Following these standard coding algorithms, we describe some more recent and
sometimes exploratory compression schemes which use multiresolution as an in-
gredient. These include image compression methods based on wavelet maximums
[184], and a method using adaptive wavelet packets [15, 233]. We also discuss
some recent work on a successive approximation method for image coding using
subband/wavelet trees [259], quantization error analysis in a subband system [331],
joint design of quantization and filtering for subband coding [161], and nonorthog-
onal subband coding [200].
Note that in all experiments, we use the standard image Barbara, with 512×512
pixels and 8-bit gray-scale values (see Figure 7.13). For comparison purposes, we
will use the peak signal-to-noise ratio (SN Rp ) given by (7.1.3).
Block Transforms Recall that unitary block transforms of size N ×N are defined
by N orthonormal basis vectors, that is, the transform matrix T has these basis
vectors as its rows (see Section 3.4.1 and (7.1.4)). For two-dimensional signals, one
usually takes a separable transform which corresponds to the Kronecker product of
416 CHAPTER 7
Figure 7.13 Standard image used for the image compression experiments,
called Barbara. The size is 512 × 512 pixels and 8 bits/pixel.
T with itself,
T 2D = T ⊗ T .
In other words, this separable transform can be evaluated by taking one-dimensional
transforms along the rows and columns of a block B of an image. This can be
written as:
BT = T B T T ,
where the first product corresponds to transforming the columns, while the second
product computes the transform on rows of the image block. Many transforms have
been proposed for the coding of images. Besides the DCT given in (7.1.10–7.1.11),
the sine, slant, Hadamard and Haar transform are common candidates, the last
two mainly because of their low computational complexity (only additions and sub-
tractions are involved). All of the transforms have fast, O(N log N ) algorithms, as
opposed to the optimal KLT which has O(N 2 ) complexity and is signal dependent.
The performance of the DCT in image compression is sufficiently close to that of
the KLT as well as superior to other transforms so that it has become the standard
transform. Figure 7.14 shows the 8 × 8 DCT transform of the original image. Note
the two representations shown. In part (a), we display the transform of each block
of the image, while part (b) has gathered all coefficients of the same frequency into
a block. This latter representation is simply a subband interpretation of the DCT;
for example, the lowest left corner is the output of a filter which takes the average
of 8 × 8 blocks. The similarity of this representation with subband-decomposed
images is obvious. Note that for quantization and entropy coding purposes, the
representation (a) is preferred.
7.3. IMAGE COMPRESSION 417
Figure 7.14 8 × 8 DCT transform of the original image. On the left is the
usual block-by-block representation and on the right is the reordering of the
coefficients so that same frequencies appear together (subband interpretation
of DCT). The lowest frequency is in the lower left corner.
The quantization in the DCT domain is usually scalar and uniform. The lowest
two-dimensional frequency component, called the DC coefficient, is treated with
particular care. According to (7.1.10), it corresponds to the local average of the
block. Mismatches between blocks often lead to the feared blocking effect, that
is, the boundaries between the blocks become visible, a visually annoying artifact.
Because the DC coefficient has the highest energy, a fine scalar quantization leads
to a large entropy. Also, as can be seen in Figure 7.14(b), there is still high correla-
tion among DC coefficients (it resembles the original image). Therefore, predictive
quantization, such as the DPCM, of the DC coefficients is often used to increase
compression without increasing distortion.
The choice of the quantization steps for the various coefficients of the DCT is
a classic bit-allocation problem, since distortion and rate are additive. However,
perceptual factors are very important and careful experiments lead to quantization
matrices which take into account the visibility of errors (besides the variance and en-
tropy of the coefficients). While this has the flavor of a weighted MSE bit-allocation
method, it relies heavily on experimental results. An example quantization matrix,
showing the quantizer step sizes used for various DCT coefficients in JPEG, is given
in Table 7.3 [148]. What is particularly important is the relative size of the steps,
because within a certain range one can scale this quantization matrix, that is, mul-
tiply all step sizes by a scale factor greater or smaller than one in order to reduce
or increase the bit rate, respectively. This scale factor is very useful for adaptive
quantization, where the bit allocation is made between blocks which have various
418 CHAPTER 7
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 194 113 92
49 64 78 87 103 121 120 101
72 92 95 98 121 100 103 99
energy levels. Then, one can think of this scale factor as a “super” quantizer step
and the goal is to choose the sequence of scale factors that will minimize the total
distortion given a certain budget. Each block has its rate-distortion function and
thus, the scale factors can be chosen according to the constant-slope rule described
in Section 7.1.2. Sometimes, scale factors are fixed for a number of blocks (called
macro-block) in order to reduce the overhead.
Of course, bit allocation is done by taking entropy coding into account, which
we describe next. As in subband coding, higher frequency coefficients have lower
energy and thus have high probability to be zero after quantization. In particular,
the conditional probability of a high-frequency coefficient to be zero, given that its
predecessors are zero, is close to one. Therefore, there will be runs of zeros, in par-
ticular up to the terminal coefficient. To take better advantage of this phenomenon
in a two-dimensional transform, an ordering of the coefficients called zig-zag scan-
ning is used (see Figure 7.15(a)). Very often, a long stretch of zeros terminates
the sequence (see Figure 7.15(b)) and then an “end of block” (EOB) can be sent
instead. The nonzero values and the run lengths are entropy coded (typically using
Huffman or arithmetic codes).
Note that DCT coding is used not only on images, but also in video cod-
ing. While the same principles are used, specific quantization and entropy coding
schemes have to be developed, as will be seen in Section 7.4.2.
The coding of color images is performed on a component-by-component basis,
7.3. IMAGE COMPRESSION 419
DC AC(0,1) AC(0,7)
EOB
63
(b)
AC(7,0) AC(7,7)
(a)
that is, after transformation into an appropriate color space such as the luminance
and two chrominance components. The components are coded individually with a
lesser weighting of the errors in the chrominance components.
(d) Lossless encoding: this mode actually does not use the DCT, but predictive
encoding based on a causal neighborhood of three samples.
We will only discuss the sequential encoding mode in its simplest version which
is called the baseline JPEG coder. It uses a size 8 × 8 DCT, which was found to
be a good compromise between coding efficiency (large blocks) and avoidance of
blocking effects (small blocks). This holds true for the typical imagery and bit
rates for which JPEG is designed, such as the 512 × 512 Barbara image compressed
to 0.5 bits/pixel. Note that other types of imagery might use other DCT sizes.
The input is assumed to be 8 bits (typical for regular images) or 12 bits (typical
for medical images). Colors are separately treated. After the DCT transform, the
quantization uses a carefully designed set of uniform quantizers. Their step sizes
are stored in a quantization table, where each entry is an integer belonging to the
set {1, . . . , 255}. An example was shown in Table 7.3. Quantization is performed
by rounding the DCT coefficient divided by the step size to the nearest integer. At
the decoder, this rounded value is simply multiplied by the step size. Note that the
7.3. IMAGE COMPRESSION 421
quantization tables are based on visual experiments, but since they can be specified
by the user, they are not part of the standard.
Zig-zag scanning follows quantization and finally entropy coding is performed.
First, the DC coefficient (the average of 64 samples) is differentially encoded, that
is, Δl = DCl − DCl−1 is entropy coded. This removes some of the correlation
left between DC coefficients of adjacent blocks. Then, the sequences of remaining
DCT coefficients is entropy coded. Because of the high probability of stretches
of consecutive zeros, run-length coding is used. A symbol pair (L, A) specifies the
length of the run (0 to 15) and the amplitude range (number of bits, 0, . . . , 10) of the
following nonzero value. Then follows the nonzero value (which has the previously
specified number of bits). For example, (15, 7) would mean that we have 15 zeros
followed by a number requiring seven bits.
Runs longer than 15 samples simply use a value A equal to zero, signifying con-
tinuation of the run, and the pair (0, 0) stands for end of block (no more nonzero
values in this block). Finally, the pairs (L, A) are Huffman coded with a table spec-
ified by the user (default tables are suggested, but can be replaced). The nonzero
values following a run of zeros are now so-called variable-length integers specified
by the preceding value A. These are not Huffman coded because of insufficient gain
in view of the complexity.
The decoder now operates as follows: Based on the Huffman coding table,
it entropy decodes the incoming bit stream, and using the quantization table, it
“dequantizes” the transform domain values. Finally, an inverse DCT is applied to
reconstruct the image.
Figure 7.16 schematically shows a JPEG encoder. An example of the Barbara
image coded with the baseline JPEG algorithm is shown in Figure 7.17 at the rate
of 0.5 bits/pixel and SN Rp = 28.26 dB.
8 x 8 blocks
DCT-based encoder
entropy
DCT quantizer encoder
Fig. 7.15
Figure 7.16 Transform coding following the JPEG standard.figref.
The7.4.4
encoder is
shown. The decoder performs entropy decoding, inverse quantization and an
inverse DCT (after [327]).
xc x̂ c
Qc
2 2 2
D I I
xp xp
−
x +
xd
Qd
x̂ d
+ x̂
encoder decoder
case, it will simply be the maximum error of the quantizer Qd (typically half the
largest quantization interval). The property holds also for multilevel pyramids if
one uses quantization error feedback [303]. As can be seen from Figure 7.19, the
trick is to use only quantized coarse versions in the prediction of a finer version.
Thus, the same prediction can be obtained in the decoder as well and the source
of quantization noise can be limited to the last quantizer Qd0 . Note that quantizer
error feedback requires the reconstruction of x̂c1 in the encoder, and is thus more
complex than an encoder without feedback and adds encoding delay.
Qc x̂ c
2 2
2 2 2
D I I
x̂ d
−
1
+ Qd
0
+ x̂ c
1
2 2
D I
− Qd x̂ d
x + 0 0
1 1 4
N 2 (1 + + + · · ·) ≤ N 2 ,
4 42 3
as was given in (3.5.4). This oversampling of up to 33% has often been considered
as a drawback of pyramid coding (in one dimension, the overhead is 100% and thus
a real problem). However, it does not prohibit efficient coding a priori and the
other attractive features such as the control of quantization noise, quality of coarse
pictures, and robustness counterbalance the oversampling problem.
Bit Allocation The problem of allocating bits to the various quantizers is tricky in
pyramid coders, especially when quantization noise feedback is present. The reason
is that the independence assumption used in the optimal bit allocation algorithm
derived in Section 7.1.2 does not hold. Consider Figure 7.18 and assume a choice
of quantizers for Qc and Qd . Because the choice for Qc influences the prediction
xp and thus the variable to be quantized xd , there is no independence between the
choices for Qc and Qd . For example, increasing the step size of Qc not only increases
7.3. IMAGE COMPRESSION 425
the distortion of x̂c , but also of x̂d (since its variance will probably increase). Thus,
in the worst case, one might have to search all possible pairs of quantizers for xc
and xd and find the best performing pair given a certain bit budget. It is clear that
this search grows exponentially as the number of levels increases, since we have K l
possible l-tuples of quantizers, where K is the number of quantizers at every level
and l is the number of levels. Even if quantization error feedback is not used, there
is a complication because the total error squared is not the sum of the errors ec and
ed squared (see (7.1.16)), since the pyramid decomposition is not unitary (unless
an ideal lowpass filter is assumed). A discussion of dependent quantization and its
application to pyramid coding can be found in [232].
ny ny ny
(a) (b) (c)
nx nx nx
ωy ωy ωy
π π π
ωx ωx π ωx
π π
where the sampling density is reduced by a factor of four for the separable sampling,
two for the quincunx sampling (see also Appendix 3.B) and by a factor of four
for the hexagonal sampling. The repeated spectrums in Fourier domain due to
downsampling appear on the dual lattice, which is given by the transposed inverse
of the lattice matrix. Also shown in Figure 7.20 are possible ideal lowpass filters that
6
Recall from Appendix 3.B, that a given sampling lattice may have infinitely many matrix
representations.
7.3. IMAGE COMPRESSION 427
ωy
π
−π π ωx
−π
will avoid aliasing when downsampling to these sublattices. If, as we said, images
Fig. 7.18
have circularly symmetric power spectrums that decrease withfigref.
higher frequencies,
7.4.7
then the quincunx lowpass filter will retain more of the original signal’s energy than
a separable lowpass filter (which would be one-dimensional since the downsampling
is by two). Using the same argument, the hexagonal lowpass filter is then better
than the corresponding lowpass filter in a separable system with downsampling by
two in each dimension. Thus, these nonseparable systems, while being more difficult
to design and more complex to implement, represent a better match to usual image
spectrums.
Furthermore, the simple quincunx case has the following perceptual advantage:
The human visual system is more accurate in horizontal and vertical high frequen-
cies than along diagonals. The lowpass filter in Figure 7.20(b) conserves horizontal
and vertical frequencies, while it cuts off diagonals to half of their original range.
This is a good match to the human eye and often, the highpass channel (which is
complementary to the lowpass channel) can be disregarded altogether. That is, a
compression by a factor of two can be achieved with no visible degradation. Such
preprocessing has been used in intraframe coding of HDTV [12]. The above quin-
cunx scheme is often iterated on the lowpass channel, leading to a frequency decom-
position as shown in Figure 7.21. This actually corresponds to a two-dimensional
nonseparable wavelet decomposition [163] and has been used for image compression
[14].
The hexagonal system, besides having a fairly good approximation to a circu-
larly symmetric lowpass, has three directional channels which can be used to detect
directional edges [264]. However, the goal of an isotropic analysis is only approx-
imated, since the horizontal and vertical directions are not treated in the same
manner (see Figure 7.20(c)). Therefore, it is not clear if the added complexity of a
nonseparable four-channel system based on the hexagonal sublattice is justified for
coding purposes.
428 CHAPTER 7
Choice of Filters Unlike in audio compression, the filters for image subband cod-
ing do not need high out-of-band rejection. Instead, a number of other constraints
have to be satisfied.
Linear phase In regular image filtering, the need for linear phase is well-known since
without linear phase, the phase distortion around edges is very visible. Therefore,
the use of linear phase filters in subband coding has been often advocated [14].
Recall from Section 3.2.4, that in two-band FIR systems, linear phase and orthog-
onality are mutually exclusive and this carries over to four-band separable systems
which are most often used in practice.
However, the case for linear phase is not as obvious as it seems at first sight.
For example, in the absence of quantization, the phase of the filters has no bearing
since the system has perfect reconstruction. This argument carries over for fine
quantization as well. In the case of coarse quantization, the situation is more
complex. One scenario is to consider the highpass channel as being set to zero.
Look at the two impulse responses of this system. Nonlinear phase systems lead to
nonsymmetric responses, but so do some of the linear phase systems. Only if the
filters meet additional constraints do the two impulse responses remain symmetric.
Note also, that for computational purposes, linear phase is more convenient because
of the symmetry of the filters.
Note that orthogonal FIR filters of sufficient length can be made almost linear
phase by appropriate factorization of their autocorrelation function. Also, there
are nonseparable orthogonal filters with linear phase. Finally, by resorting the IIR
filters, one can have both linear phase and orthogonality, and such noncausal IIR
filters can be used in image processing without problems since we are dealing with
finite-length input signals.
and the total bit rate is the sum of all the subband’s bit rates. Therefore, optimal
bit-allocation algorithms which assume additivity of bit rate and distortion can be
used (see Section 7.1.2). In the nonorthogonal case, (7.3.1) does not hold, and thus,
these bit allocation algorithms cannot be used directly. It should be noted that well
designed linear phase FIR filter banks (that is, with good out-of-band rejection) are
often close to being orthogonal and thus satisfy (7.3.1) approximately.
7.3. IMAGE COMPRESSION 429
Filter size Good out-of-band rejection or high regularity require long filters. Be-
sides their computational complexity, long filters are usually avoided because they
tend to spread coding errors. For example, sharp edges introduce distortions be-
cause high-frequency channels are coarsely quantized. If the filters are long (and
usually their impulse response has several sign changes), this causes an annoying
artifact known as ringing around edges. Therefore, filters used in audio subband
compression, such as length-32 filters, are too long for image compression. Instead,
shorter “smooth” filters are preferred. Sometimes both their impulse and their step
response are considered from a perceptual point of view [167]. The step response
is important since edges in images will generate step responses at least in some
of the channels. Highly oscillating step responses will require more bits to code,
and coarse quantization will produce oscillations which are related to the step re-
sponse. As can already be seen from this short discussion, there is an intertwining
between the choice of filters and the type of quantization that follows. However,
it is clear that the frequency-domain criterions used in audio (sharp cut-off, strong
out-of-band rejection) have little meaning in the image compression context, where
time-domain arguments such as ringing, are more important.
Regularity An orthogonal filter with a certain number of zeros at the aliasing fre-
quency (π in the two-channel case) is called regular if its iteration tends to a con-
tinuous function (see Section 4.4). The importance of this property for coding is
potentially twofold when the decomposition is iterated. First, the presence of many
zeroes at the aliasing frequency can improve the coding gain and second, compres-
sion artifacts might be less objectionable. To investigate the first effect, Rioul [243]
compared the compression gain for filters of varying regularity used in a wavelet
coder, or octave-band subband coder, with four stages. The experiment included
bit allocation, quantization, and entropy coding and is thus quite realistic. The
results are quite interesting: Some regularity is desired (the performance with no
regularity is poor) and higher regularity improves compression further (but not
substantially).
As for the compression artifacts, the following argument shows that the filters
should be regular when an octave-band decomposition is used: Assume a single
quantization error in the lowpass channel. This will add an error to the recon-
structed signal which depends only on the equivalent — iterated lowpass filter. If
the iterated filter is smooth, this will be less noticeable than if it is a highly irregular
function (even though both contribute the same MSE). Note also that the lowest
band is upsampled 2i−1 times (where i is the number of iterations) and thus, the
iterated filter’s impulse response is shifted by large steps, making irregular patterns
in the impulse response more visible.
In the case of biorthogonal systems such as linear phase FIR filter banks, one is
430 CHAPTER 7
often faced with the case where either the analysis or the synthesis is regular, but
not both. In that case, it is preferable to use the regular filter at the synthesis, by
the same argument as above. Visually, an irregular analysis is less noticeable than
an irregular synthesis, as can be verified experimentally.
When the decomposition is not iterated, regularity is of little concern. A typical
example is the lapped orthogonal transform, that is, a multi-channel filter bank
which is applied only once.
Frequency selectivity What is probably the major criterion in audio subband filter
design is of much less concern in image compression. Aliasing, which is a major
problem in audio, is much less disturbing in images [331]. The desire for short filters
limits the frequency selectivity as well. One advantage of frequency selectivity is
that perceptual weighting of errors is easier, since errors will be confined to the
band where they occur.
In conclusion, subband image coding requires relatively short and smooth filters,
with some regularity if the decomposition is iterated.
Quantization of the Subbands There are basically two ways to approach quan-
tization of a subband-decomposed image: Either the subbands are quantized inde-
pendently of each other, or dependencies are taken into account.
(a) The lowest band, being a lowpass and downsampled version of the original,
has a behavior much like the original image. That is, traditional quantization
methods used for images can be applied here as well, such as DPCM [337] or
even transform coding [174, 285].
(b) The highest bands have negligible energy and can usually be discarded with
no noticeable loss in visual quality.
(c) Except along edges, little correlation remains within higher bands. Because of
the directional filtering, the edges are confined to certain directions in a given
subband. Also, the probability density function of the pixel values peaks
in zero and falls off very rapidly. While it is often modeled as a Laplacian
distribution, it is actually falling off more rapidly. It is more adequately fitted
with a generalized Gaussian pdf with faster decay than the Laplacian pdf [329].
7.3. IMAGE COMPRESSION 431
Besides the lowband compression, which uses known image coding methods, the
bulk of the compression is obtained by appropriate quantization of the high bands.
The following quantizers are typically used:
(a) Lloyd quantizers fitted to the distribution of the particular band to be quan-
tized. Tables of such Lloyd quantizers for generalized Gaussian pdf’s and
decay values of interest for image subbands can be found in [329].
(b) Uniform quantizers with a so-called dead zone which maps a region around the
origin to zero (typically of twice the step size used elsewhere). Such dead zone
quantizers have proven useful because they increase compression substantially
with little loss of visual quality, since they tend to eliminate what is essentially
noise in the subbands [111].
Because entropy coding is used after quantization, uniform quantizers are nearly
optimal [285]. Thus, since uniform quantizers are much easier to implement than
Lloyd quantizers, the former are usually chosen, unless the variable rate associated
with entropy codes has to be avoided. Note that vector quantization could be used
in the subbands, but its complexity is usually not worthwhile since there is little
dependence between pixels anyway.
An important consideration is the relative perceptual importance of various
subbands. This leads to a weighting of the MSE in various subbands. This weighting
function can be derived through perceptual experiments by finding the level of “just
noticeable noise” in various bands [252]. As expected, high bands tolerate more
noise because the human visual system becomes less sensitive at high frequencies.
Note that more sophisticated models would include masking as well.
H3 4 y3[n]
H2 4 y2[n]
(a) x[n]
H1 4 y1[n]
H0 4 y0[n]
H0 2
H1 22 y1[n]
H0 22
H0 22 y0[n]
Fig.the7.19
Figure 7.22 Vector quantization across figref. 7.4.8
bands in subband decomposition.
(a) Uniform decomposition. (b) Octave-band, or, wavelet decomposition. Note
that the number of samples in the various bands corresponds to a fixed region
of the input signal.
is easiest when equal-size subbands are used. In the case of an octave-band de-
composition, the vector should use pixels at each level that correspond to the same
region of the original signal. That is, the number of pixels should be inversely pro-
portional to scale. The comparison of vector quantization for equally-spaced bands
and octave-spaced bands is shown in Figure 7.22 for the one-dimensional case for
simplicity.
Bit Allocation For bit allocation between the bands, one can directly use the
procedures developed in Section 7.1.2, at least if the filters are orthogonal. Then,
the total distortion is the sum of the subbands distortions, and the total rate is the
sum of rates for the various bands. In the nonorthogonal case, the distortion is not
additive, but can be approximated as such.
The typical allocation problem is the following: For each channel i, one has a
choice from a set of quantizers {qi,j }. Choosing a given quantizer qi,j will produce a
distortion di,j and a rate ri,j for channel i (one can use weighted distortion as well).
The problem is to find which combination of quantizers in the various channels will
produce the minimum squared error while satisfying the budget constraint. The
optimal solution is found using the constant-slope solution as described in Section
7.3. IMAGE COMPRESSION 433
LL LH HH HL
HL 0.58959 0.86237 1.77899 0.88081
HH 2.87483 6.71625 8.56729 3.25402
LH 23.5474 33.4055 60.9195 14.8490
LL 2711.45 56.0058 52.5202 13.9685
7.1.2. The pairs (di,j , ri,j ), that is, the operational rate-distortion curves can be
measured over a set of representative images and then used as a fixed allocation.
The problem is that, when applied to a particular image, the budget might not
be met. On the other hand, given an image to be coded, one can measure the
operational rate-distortion curves and use the constant-slope allocation procedure.
This will guarantee an optimal solution, but is computationally expensive. Finally,
one can use allocations based on probability density functions, in which case it is
often sufficient to measure the variance of a particular channel in order to find its
allocation (see (7.1.19) for example). Note that the rates used in the allocation
procedure are after entropy coding.
Examples Two typical coding examples will be described in some detail. The first
is a uniform separable decomposition. The second is an octave-band or constant
relative bandwidth decomposition (often called a wavelet decomposition).
Figure 7.24 Uniform subband decomposition of the Barbara image. The or-
dering of the subbands is given in Figure 7.23.
uniform quantization with a dead zone of twice the step size used elsewhere. Using
a set of step sizes, one can derive rate-distortion curves by measuring the entropy
of the resulting quantized channels. A true operational rate-distortion curve would
have to include run-length coding and actual entropy coding. Based on these rate-
distortion curves, one can perform an optimal constant-slope bit allocation, that
is, one can choose the optimal quantizer step sizes for the various bands. The step
sizes for a budget of 0.5 bits/pixel are listed in Table 7.5. A set of Huffman codes
7.3. IMAGE COMPRESSION 435
LL LH HH HL
HL 9.348 8.246 8.657 22.318
HH 8.400 10.161 8.887 13.243
LH 6.552 7.171 10.805 16.512
LL QF-89 8.673 11.209 15.846
L, H H, H
LL, LH LH, LH
LLL, LLH,
H, L
LLH LLH
LLL, LLH,
LH, LL
LLL LLL
and run-length codes are designed for each subband channel. Note that the special
symbol “start of run” (SR) is entropy coded as any other nonzero pixel. Altogether,
one obtains the final rate of 0.497 bits/pixel (the difference in rate comes from the
fact that bit allocation was based on entropy measures). Then, the coded image
has SN Rp of 30.38 dB. Figure 7.27 (top row) shows the compressed Barbara image
and a detail at the same rate.
Table 7.6 Variances in the different Table 7.7 Step sizes for uniform quan-
bands of an octave-band decomposi- tizer in the octave subband or wave-
tion (defined as in Figure 7.25). let decomposition of Figure 7.25, for a
target rate of 0.5 bits/pixel.
Band Variance
LLL,LLL 2559.8 Band Step size
LLH,LLL 60.7 LLL,LLL 5.21
LLL,LLH 43.8 LLH,LLL 3.69
LLH,LLH 21.2 LLL,LLH 4.42
LH,LL 55.4 LLH,LLH 4.08
LL,LH 24.5 LH,LL 8.42
LH,LH 33.7 LL,LH 9.22
H,L 141.4 LH,LH 7.45
L,H 15.2 H,L 17.23
H,H 16.2 L,H 22.05
H,H 21.57
to the ones in a uniform decomposition. Because the lowest band (LLL, LLL) is
small enough (64× 64 pixels), we use scalar quantization on it as on all other bands.
Again, uniform quantizers with double-sized dead zone are used and rate-distortion
curves are derived for bit-allocation purposes. The resulting step sizes for the target
bit rate of 0.5 bits/pixel are given in Table 7.7.
The development of entropy coding (including run-length coding for higher
bands) is similar to the uniform-decomposition case discussed earlier. The final
rate is 0.499 bits/pixel, with SN Rp of 29.21 dB. The coded image and a detail are
7.3. IMAGE COMPRESSION 437
Figure 7.27 Compression results on Barbara image. Top left: Subband coding
in 16 uniform bands at 0.4969 bits/pixel and SN Rp = 30.38 dB. Top right:
Detail of top left. Bottom left: Octave-band or wavelet compression at 0.4990
bits/pixel and SN Rp = 29.21 dB. Bottom right: Detail of bottom left.
shown in Figure 7.27 (bottom row). Note that there is little difference between the
uniform and the octave-band decomposition results.
We would like to emphasize that the above examples are “textbook examples”
for illustration purposes. For example, no statistics over large sets of images were
taken and thus, the entropy coders might perform poorly for a substantially different
image. The aim was more to demonstrate the ingredients used in a subband/wavelet
image coder.
438 CHAPTER 7
State of the art coders, which can be found in the current literature, improve
substantially the results shown here. Major differences with respect to the simple
coders we discussed so far are the following:
(a) Vector quantization can be used in the subbands, such as lattice vector quan-
tization [13].
(b) Adaptive entropy coding is used to achieve immunity to changes in image statis-
tics.
(c) Adaptive quantization in the subbands can take care of busy versus nonbusy
regions.
(e) Perceptual tuning using band sensitivity, background luminance level and mask-
ing of noise due to high activity can improve the visual quality [252].
The last point — perceptual models for subband compression, is where most gain
can be obtained.
With these various fine tunings, good image quality for a compressed version
of a 512 × 512 original image such as Barbara can be obtained in the range of 0.25
to 0.5 bits/pixel. Note that the complexity level is still of the same order as the
coders we presented and is comparable in order of magnitude to a DCT coder such
as JPEG.
trees are shown in Figure 7.28. Because the tree grows as powers of four, a zero
tree allows us to disregard many insignificant symbols at once. Note also that a
zero tree gathers coefficients that correspond to the same spatial location in the
original image.
Zero trees have been combined with bit plane coding in an elegant and efficient
compression algorithm due to Shapiro [260, 259]. It incorporates nicely many of
the key ideas presented in this section and demonstrates the effectiveness of wavelet
based coding. The resulting algorithm is called embedded zero-tree wavelet (EZW)
algorithm. Embedded means that the encoder can stop encoding at any desired
target rate. Similarly, the decoder can stop decoding at any point resulting in the
image that would have been produced at the rate of the truncated bit stream. This
compression method produces excellent results without requiring a priori knowledge
of the image source, without prestored tables of codebooks, and without training.
The EZW algorithm uses the discrete-time wavelet transform decomposition
where at each level i the lowest band is split into four more bands: LLi+1 , LHi+1 ,
HLi+1 , and HHi+1 . In simulations in [260], six levels are used with length-9 sym-
metric filters given in [1].
The second important ingredient is that the absence of significance across scales
is predicted by exploiting self-similarity inherent in images. A coefficient x is called
insignificant with respect to a given threshold T , if |x| < T . The assumption is that
if x is insignificant, then all of its descendents of the same orientation in the same
spatial location at all finer scales are insignificant as well. We call a coefficient at
a coarse scale a parent. All coefficients at the next finer scale at the same spatial
location and of similar orientation are children. All coefficients at all finer scales
at the same spatial location and of similar orientation are descendents. Although
there exist counterexamples to the above assumption, it holds true most of the
440 CHAPTER 7
time. Then, one can make use of it, and code such a parent as a zero-tree root
(ZTR), thereby avoiding to code all its descendants. When the assumption is not
true, that is, the parent is insignificant but down the tree, there exists a significant
descendant, then such a parent will be coded as an isolated zero (IZ). To code the
coefficients, Shapiro uses four symbols, ZTR, IZ, P OS for a positive significant
coefficient, and NEG for a negative significant one. In the highest bands which
do not have any children, IZ and ZTR are merged into a zero symbol (Z). The
order in which the coefficients are scanned is of importance as well. It is performed
so that no child is scanned before its parent. Thus, one scans bands LLN , HLN ,
LHN , HHN , and moves on to the scale (N − 1) scanning HLN −1 , LHN −1 , HHN −1 ,
until reaching the starting scale HL1 , LH1 , HH1 . This scanning pattern orders the
coefficients in the order of importance, allowing for embedding.
The next step is successive approximation quantization. It entails keeping at
all times two lists: the dominant list and the subordinate list. The dominant list
contains the coordinates of those coefficients that have not yet been found to be
significant. The subordinate list contains the magnitudes of those coefficients that
have been found to be significant. The process is as follows: We decide on the initial
threshold T0 , (for example, it could be half of the positive range of the coefficients)
and start with the dominant pass where we evaluate each coefficient in the scanning
order described above to be one of the four symbols ZTR, IZ, P OS and N EG.
Then we cut the threshold in half obtaining T1 and add another bit of precision
to the magnitudes on the list of coefficients known to be significant, that is, the
subordinate list. More precisely, we assign the symbols 0 and 1 depending whether
the refinement leaves the reconstruction of a coefficient in the upper or lower half
of the previous bin. We reorder the coefficients in the decreasing order and go onto
the dominant pass again with the threshold T1 . Note that now those coefficients
that have been found to be significant during a previous pass are set to zero so that
they do not preclude a possibility of finding a zero tree. The process then alternates
between these two passes until some stopping condition is met, such as that the
bit budget is exhausted. Finally, the symbols are losslessly encoded using adaptive
arithmetic coding.
5 11 5 6 0 3 -4 4
2 -3 6 -4 3 6 3 6
3 0 -3 2 3 -2 0 4
-5 9 -1 47 4 6 -2 2
9 -7 -14 8 4 -2 3 2
15 14 3 -12 5 -7 3 9
-31 23 14 -13 3 4 6 -1
63 -34 49 10 7 13 -12 7
comprising bands HH2 and HH3 . We continue the process in the scanning order, except
that we skip all those coefficients for which we have previously established that they belong
to a zero tree. The result of this procedure is given in Table 7.9.
After we have scanned all available coefficients, we are ready to go onto the first
subordinate pass. We commence by halving the threshold, to obtain T1 = 16 as well
as quantization intervals. The resulting intervals are now [32, 48) and [48, 64). The first
442 CHAPTER 7
significant value, 63, obtains a 1, and is reconstructed to 56. The second one, −34, gets
a 0 and is reconstructed to −40, 49 gets a 1 and is reconstructed to 56, and finally, 47
gets a 0 and is reconstructed to 40. We then order these values in the decreasing order of
reconstructed values, that is, (63, 49, 34, 47). If we want to continue the process, we start
the second dominant pass with the threshold of 16. We first set all significant values from
the previous pass to zero, in order to be able to identify zero trees. In this pass, we establish
that −31 in LH3 is N EG and 23 in HH3 is P OS. All the other coefficients are then found
to be either zero tree roots or zeros. We add to the list of significant coefficients 31 and 23
and halve the quantization intervals, to obtain, [16, 24), [24, 32), [32, 40), [40, 48), [48, 56),
and [56, 64). At the end of this pass, the revised list is (63, 49, 47, 34, 31, 23), while the
reconstructed list is (60, 52, 44, 36, 28, 20). This process continues until, for example, the bit
budget is met.
we can prune the children and keep the root, otherwise, we keep the children. The
comparison is made at constant-slope points (of slope λ) on the respective rate-
distortion curves. Going up the tree in this fashion will result in an optimal binary
tree for the image to be compressed. Note that in order to apply the Lagrange
method, we assumed independence of the nodes, an assumption that might be
violated (especially for deep trees).
An extension of this idea consists of considering not only frequency divisions
(obtained by a subband decomposition) but also splitting of the signal in time,
so that different wavelet packets can be used for different portions of the time-
domain signal (see also Figure 3.13). This is particularly useful if the signal is
7.3. IMAGE COMPRESSION 443
Figure 7.29 Simultaneous space and frequency splitting of the Barbara image
using the double-tree algorithm. Black lines correspond to spatial segmenta-
tions, while white lines correspond to frequency splits.
Methods Based on Wavelet Maximums Since edges are critical to image percep-
tion [168], there is a strong motivation to find a compression scheme that contains
edges as critical information. This is done in Mallat and Zhong’s algorithm [184]
which is based on wavelet maximums representations. The idea is to decompose
the image using a redundant representation which approximates the continuous
wavelet transform at scales which are powers of two. This can be done using non-
downsampled octave-band filter banks. Because there is no downsampling, the
decomposition is shift-invariant. If the highpass filter is designed as an edge de-
tector (such as the derivative of a Gaussian), then we will have edges represented
at all scales by some local maximums or minimums. Because the representation is
redundant, keeping only these maximums/minimums still allows good reconstruc-
444 CHAPTER 7
where σq2 , σx2 , σy2 are the variances of the quantization error, the input and output
signals, respectively. Consider now a so-called “gain plus additive noise” linear
model for this quantizer. Its input/output relationship is given by
y = αx + r
where x, y are the input/output of the quantizer,7 r is the additive noise term, and
α is the gain factor (α ≤ 1). The main advantage of this model is that, by choosing
σq2
α = 1 − , (7.3.3)
σx2
the additive noise will not be correlated with the signal and (7.3.2) will hold. In
other words, to fit the model to our given quantizer, (7.3.3) must be satisfied. Note
also, that the additive noise term is not correlated with the output signal.
The authors in [331] then incorporate this model into a QMF system (where the
filters are designed to cancel aliasing, as given in (3.2.34–3.2.35)). That is, each of
the two channel signals are quantized, use a gain factor αi , and generate an additive
noise r i . Consequently, the error at the output of the system can be written as the
sum of the error terms
where
1 2
EQ (z) = [H (z) − H 2 (−z) − 2] X(z),
2
1
ES (z) = [(α0 − 1)H 2 (z) − (α1 − 1)H 2 (−z)] X(z),
2
1
EA (z) = (α0 − α1 ) H(z) H(−z) X(−z),
2
ER (z) = H(z)R0 (z 2 ) − H(−z)R1 (z 2 ).
Note that here, z 2 in Ri (z 2 ) appears since the noise component passes through the
upsampler. This breakdown into different types of errors allows one to investigate
their influence and severity. Here, EQ denotes the QMF (lack of perfect reconstruc-
tion) error, ES is the signal error (term with X(z)), EA is the aliasing error (term
with X(−z)), and ER is the random error. Note that only the random error ER
is uncorrelated with the signal. The QMF error is insignificant and can be disre-
garded. Aliasing errors become negligible if filters of length 12 or more are used.
Finally, the signal error determines the sharpness while the random error is most
visible in flat areas of the image.
to the signal. The potential benefit of this approach is that one has to deal only
with a random, noise-like error at the output, which can then be alleviated with
an appropriate noise removal technique. Note, however, that the random error has
been boosted by dividing the terms by αi ≤ 1. For more details, see [161].
ŷ = Q(y).
x̂ = Gŷ,
where ŷ opt belongs to the set of all possible quantized images. Due to this con-
straint, the problem becomes a discrete optimization problem and is solved using a
numerical relaxation algorithm. Experiments on images show significant visual as
well as MSE improvement. For more details, refer to [200].
[173, 201] (the Moving Pictures Experts Group of the International Standardiza-
tions Organization). While the video compression problem is quite different from
straight image coding, mainly because of the presence of motion, techniques suc-
cessful with images are often part of video coding algorithms as well. That is, signal
expansion methods are an integral part of most video coding algorithms and are
used in conjunction with motion based techniques.
This section will discuss both signal expansion and motion based methods used
for moving images. We start by describing the key problems in video compression,
one of which is compatibility between standards of various resolutions and has a
natural answer in multiresolution coding techniques. Standard motion compensated
video compression is described next, as well as the use of transforms for coding
the prediction error signal. Then, pyramid coding of video, which attempts to
get the best of subband and motion based techniques, is discussed. Subband or
wavelet decomposition techniques in three dimensions are presented, indicating both
their usefulness and their shortcomings. Finally, the emerging MPEG standard is
discussed.
Note that by intraframe coding we will denote video coding techniques where
each frame is coded separately. On the other hand, interframe coding will mean
that we take the time dimension and the correlation between frames into account.
y t
Figure 7.30 Moving objects in a video sequence. One object is still — zero
motion, whereas the other has a purely translational motion.
Fig. 7.28 figref. 7.5.1
The Perceptual Point of View Just as in coding of speech or images, the ultimate
judge of quality is the human observer. Therefore, spatio-temporal models of the
human visual system (HVS) are important. These turn out to be more complex
than for static images, especially because of spatio-temporal masking phenomena
related to motion. If one considers sensitivity to spatio-temporal gratings (sinusoids
with an offset and various frequencies in all three dimensions), then the eye has
a lowpass/bandpass characteristic [207]. The sensitivity is maximum at medium
spatial and temporal frequencies, falls off slightly at low frequencies, and falls off
rapidly toward high frequencies (note that the sensitivity function is not separable
in space and time). Finally, sinusoids separated by more than an octave in spatial
frequency are treated in an independent manner.
Masking does occur, but it is a very local effect and cannot be well modeled in
the frequency domain. This masking is both spatial (reduced sensitivity at sharp
transitions) and temporal (reduced sensitivity at scene changes). The perception of
motion is a complex phenomenon and psychophysical results are only starting to be
applicable to coding. One effect is clear and intuitive however: The perception of a
moving object depends on if it is tracked by the eye or not. While in the latter case,
the object could be blurred without noticeable effect, in the former, the object will
be perceived as accurately as if it were still. Since it cannot be predicted if the viewer
will or will not follow the object, one cannot increase compression of moving objects
by blurring them. This somewhat naive approach has sometimes been suggested
in conjunction with three-dimensional frequency-domain coding methods, but does
not work, since more often than not, the interest of the viewer is in the moving
object.
(a) y t
(b) y t
even field
odd field
even field
x
even field
odd field
even field
x
interlacing.
An even better compromise would be obtained with the face-centered orthorhom-
bic (FCO) lattice [164], which is the true generalization of the two-dimensional
quincunx lattice to three dimensions (see Figure 7.31(c)). Then, only frequencies
which are high in all three dimensions simultaneously are lost, and these are not well
perceived anyway. However, for technological reasons, FCO is less attractive than
interlaced scanning. Of course, in the various sampling schemes discussed above,
one can always construct counter examples that lose resolution, in particular when
tracked by the human observer (for example, objects with high frequency patterns
moving in a worst case direction). However, these counter examples are unlikely in
real world imagery, particularly for interlaced and even more for FCO scanning.8
vertical vertical
(a) (b)
time time
}
}
1/60 sec. 1/60 sec.
+
+ DCT Q
entropy
coding
−
Q-1
motion
estimation
IDCT
+
+ +
motion
compensation
motion vectors
(b) Predicting the higher resolution based on the coded low resolution.
(c) Taking the difference between the predicted and the true higher resolution,
resulting in the prediction error.
While these steps could be done in the three dimensions at once, it is preferable
to separate the spatial and temporal dimensions. First, the spatial dimension is
interpolated using filtering and then the temporal dimension is interpolated using
motion-based interpolation. This is shown in Figure 7.34(b). Following each inter-
polation step, the prediction error is computed and coded and this coded value is
added to the prediction before going to the next step. Because at each step, we
use coded versions for our prediction, we have a pyramid scheme with quantization
noise feedback, as was described in Figure 7.19. Therefore, there is only one source
of error, namely the compression of the last prediction error.
The oversampling inherent in pyramid coding is not a problem in the three-
dimensional case, since, following (3.5.4), we have a total number of samples which
increases only as
1 N 8
(1 + + 2 + · · ·)N < N,
8 8 7
or at most 14%, since every coarser level has only 1/8th the number of samples of
its predecessor.
7.4. VIDEO COMPRESSION 455
0 1 2
(a) (b)
0 1 2 3 4 0 1
0 1 2
0 1 2 3 4 5 6 78
Fig. 7.35
Figure 7.34 Spatio-temporal pyramid video coding. (a) Three figref. 7.5.8
layers of the
pyramid, corresponding to three resolutions. (b) Prediction of the higher res-
olution. The spatial resolution is interpolated first (using linear filtering) and
then the temporal resolution is increased using motion interpolation.
The key technique in the spatio-temporal pyramid scheme is the motion interpo-
lation step, which predicts a frame from its two neighbors based on motion vectors.
Assume the standard rigid-object and pure translational motion model [207]. If we
denote the intensity of a pixel at location r = (x, y) and time t by I(r, t), we are
looking for a mapping d(r, t) such that we can write:
The goal is to find the function d(r, t), that is, estimate the motion. This is
a standard estimation procedure, where some simplifying assumptions are made
(such as constant motion over a neighborhood). Typically, for a small block b in
the current frame, one searches over a set of possible motion vectors such that the
sum of squared differences,
ˆ t)|2 ,
|I(r, t) − I(r, (7.4.1)
r ∈b
is minimized, where
ˆ t) = I(r − d , t − 1),
I(r, (7.4.2)
b
456 CHAPTER 7
corresponds to a block in the previous frame displaced by db (the motion for the
block under consideration in the current frame). It is best to actually perform a
symmetric search by considering the past (as in (7.4.2)), the future ((7.4.2) with
sign reversals for db ), and the average,
and then to choose the best match. Choosing past or future for the interpolation
is especially important for covering and uncovering of background due to moving
objects, as well as in case of abrupt changes (scene changes).
Interestingly, a very successful technique to perform motion estimation (that
is, finding the displacement db that minimizes (7.4.1)) is based on multiresolution
or successive approximation. Instead of solving (7.4.1) directly, one solves a coarse
version of the same problem, refines the solution (by interpolating the motion vector
field), and uses this new field as a starting point for a new, finer search. This is not
only computationally less complex, but also more robust in general [31, 302]. It is
actually a regularization of the motion estimation problem.
As an illustration of this video coding scheme, a few representative pictures are
shown. First, Figure 7.35 shows the successive refinement of the motion vector field,
which starts with a sparse field on a coarse version and refines it to a fine field on the
full-resolution image. In Figure 7.36, we show the resulting spatial and temporal
prediction error signals. As can be seen, the spatial prediction error has higher
energy than the temporal one, which shows that temporal interpolation based on
motion is quite successful (actually, this sequence has high frequency spatial details,
which cannot be well predicted from the coarse resolution).
A point to note is that the first subresolution sequence (which is downsampled by
2 in each dimension) is of good visual quality and could be used for a compatible
coding scheme. This coding scheme was implemented for high quality coding of
HDTV with a compatible subchannel and it performed well at medium compression
(of the order of 10-15 to 1) with essentially no visible degradation [301, 303].
Quincunx sampling for scanning format conversions We have outlined previously the
existence of different scanning standards (such as interlaced and progressive) as well
as the desire for compatibility. A simple technique to deal with these problems is
to use perfect reconstruction filter banks to go back and forth between progressive
7.4. VIDEO COMPRESSION 457
and interlaced scanning, as shown in Figure 7.37 [320]. This is achieved by quin-
cunx downsampling the channels in the (vertical, time)-plane. Properly designed
filter pairs (either orthogonal or biorthogonal solutions) lead to a lowpass channel
that is a usable interlaced sequence, while the original sequence can be perfectly
recovered when using both the lowpass and highpass channels in the reconstruction.
This is a compatible solution in the following sense: A low-quality receiver would
only decode the lowpass channel and thus show an interlaced sequence, while a
high-quality receiver would synthesize a full resolution progressive sequence based
on both the lowpass and the highpass channels.
458 CHAPTER 7
interlaced
sequences
DQ DQ
progressive progressive
sequence sequence
+
DQ DQ
If one starts with an interlaced sequence, one can obtain a progressive sequence
by quincunx downsampling. Thus, an interlaced sequence can be broken into low-
pass and highpass progressive sequences, again allowing perfect reconstruction when
perfect reconstruction filter banks are used. This is a very simple, linear technique
to produce a deinterlaced sequence (the lowpass signal) as well as a helper signal
(the highpass signal) from which to reconstruct the original signal. While more
powerful, motion based techniques can produce better results, the above technique
is attractive because of its low complexity and the fact that no motion model needs
to be assumed.
7.4. VIDEO COMPRESSION 459
FCO sampling for video representation We mentioned previously that using the FCO
lattice (depicted in Figure 7.31(c)) might produce visually more pleasing sequences
if a data reduction by two is needed. This is due in part to the fact that an ideal
lowpass in the FCO case would retain more of the energy of the original signal than
the corresponding quincunx lowpass filter. Actually, assuming that the original
signal has a spherically uniform spectrum, and that the ideal lowpass filters are
Voronoi regions both in the quincunx and the FCO cases, the quincunx lowpass
would retain 84.3% of the original spectrum, while the FCO lowpass would retain
95.5% of the original spectrum [164].
To evaluate the gain of processing a video signal with a true three-dimensional
scheme when a data rate reduction of two is needed, we can use a two-channel
perfect reconstruction filter bank [164]. The sampling matrix is
⎛ ⎞
1 0 1
D F CO = ⎝ −1 −1 1 ⎠ ,
0 −1 0
and the perfect reconstruction filter pair is a generalization of the above diamond-
shaped quincunx filters to three dimensions. To compare the low bands obtained
in this manner, they are interpolated back to the original lattice, since we cannot
observe the FCO output directly. Upon observing the result, the conclusion is that
FCO produces visually more pleasing sequences. For more detail, see [164].
vertical
(a) horizontal (b)
HP 8
temporal HP
LP 7 vertical horizontal
HP
HP 6
LP π
input LP 5
HP 4
HP
LP 3
LP
HP 2 time
LP π
LP 1
of the spectrum given in part (b) [153]. In general, most of the energy will be
contained in the band that has gone through lowpass filtering in all three directions
thus iterating the decomposition on this band is most natural. This is actually
a three-dimensional discrete-time wavelet decomposition and is used in [153, 224].
Such three-dimensional decompositions work best for isotropic data, such as tomo-
graphic images used in medical imaging or multispectral images used in satellite
imagery. In that case, the same filters can be used in each dimension, together with
the same compression strategy (at least as a first approximation).
As we said, in video sequences, time should be treated differently from the
spatial dimensions. Typically, only very short filters are used along time (such as
Haar filters given in (3.1.2) and (3.1.17)) since long filters will smear motion in the
lowpass channel and create artificial high frequencies in the highpass channel. If
one looks at the output of a three-dimensional subband decomposition, one can
note that the lowpass version is similar to the original and the only other channel
with substantial energy is the one containing a highpass filter over time followed by
lowpass filters in the two spatial dimensions. This channel contains energy every
time there is substantial motion and can be used as a motion indicator.
While motion-compensated methods can outperform subband decompositions
over time, recently, there have been some promising results [223, 286]. Also, it is
a simple, low-complexity method and can easily be used in a joint source-channel
coding environment because of the natural ordering in importance of the subbands
[323]. Subband representation is also very convenient for hierarchical decomposition
7.4. VIDEO COMPRESSION 461
input
sequence
ME + SB
•••
•••
SB0 MCL0 •••
input
sequence SB1 MCL1 ME SB-1
••• +
•••
SBN-1 MCLN-1 MC
(a) (b)
and coding [35] and has been used for compression of HDTV [336].
Motion and Subband Coding Intuitively, instead of lowpass and highpass fil-
tering along the time axis, one should filter along the direction of motion instead.
Then, motion itself would not create artificial high frequencies as it does in straight
three-dimensional subband coding. This view, although conceptually appealing, is
difficult to translate into practice, except in very limited cases (such as panning,
which corresponds to a single translational motion). In general, there are different
motion trajectories as well as covering and uncovering of background by moving
objects. Thus, subband decomposition along motion trajectories is not a practical
approach (see [167] for further discussions on this topic).
Instead, one has to go back to more traditional motion-compensation techniques
and see how they fit into a subband coding framework or, conversely, how subband
coding can be used within a motion-compensated coder [110]. Consider inclusion of
motion compensation into a subband decomposition. That is, instead of processing
the time axis using Haar filters, we use a motion-compensation loop in each of the
four spatial bands. One advantage is that the four channels are now treated in an
independent fashion. While this scheme should perform better than the straight
three-dimensional decomposition, it also has a number of drawbacks. First, motion
compensation requires motion estimation. If it is done in the subbands, it is less
accurate than the motion estimates obtained from the original sequence. Also,
motion estimation in the high frequency subbands will be difficult. Thus, motion
estimation should probably be done on the original sequence and the estimates
462 CHAPTER 7
then used in each band after proper rescaling (see Figure 7.39(a)). One of the
attractive features of the original scheme, namely that motion processing is done
in parallel and at a lower resolution, is thus partly lost, since motion estimation
is now shared. Moreover, it is hard to perform motion compensation in the high
frequency subbands, since they mostly consist of edge information and thus slight
motion errors lead to large prediction errors.
As can be been from the above discussion, motion compensation in the subbands
is not easy. An intuitive explanation is the following: motion, that is, translation of
objects, is a sequence-domain phenomenon. Going to a subband domain is similar to
going into frequency domain, but there, translation is a complex phenomenon, with
different phase factors at different frequencies. This shows that motion estimation
and compensation is more difficult in the subband domain than in the original
sequence domain.
Consider the alternative of using subband decomposition within a motion- com-
pensated coder, as shown in Figure 7.39(b). The subband decomposition is used to
decompose the prediction error signal spatially and replaces simply the DCT which
is usually present in such a hybrid motion-compensated DCT coder. This approach
was discussed in Section 7.4.2, where we indicated its feasibility, but also some of
its possible shortcomings.
Comparison of Subband and Pyramid Coding for Video Because both sub-
band and pyramid coding of video are three-dimensional multiresolution decom-
positions, it is natural to compare them. A slight disadvantage of pyramid over
subband coding is the oversampling; however, it is small in this three-dimensional
case. Also, the encoding delay is larger in pyramid coding than in subband coding.
On all other counts, pyramid coding turns out to be advantageous when compared
7.4. VIDEO COMPRESSION 463
I B1 B2 P1 B3 B4 P2 B5 B6 I
Figure 7.40 A group of pictures (GOP) in the MPEG video coding standard.
I, P, and B stand for intra, predicted and bidirectionally interpolated frames,
respectively. There are nine frames in this GOP, with two B-frames between
every P-frame. The arrows show the dependencies between frames.
Fig. 7.?? figref. 7.4.5.1
The source coding methods we have discussed so far are used in order to transport
information (such as a video sequence) over a channel with limited capacity (such
as a telephone line which can carry up to 20 Kbits/sec). In many situations, source
coding can be performed separately from channel coding, which is known as the
separation principle of source and channel coding. For example, in a point-to-point
transmission using a known, time-invariant channel such as a telephone line, one
can design the best possible channel coding method to approach channel capacity,
that is, achieve a rate R in bits/sec such that R ≤ C where C is the channel capacity
[258]. Then, the task of the source compression method is to reduce the bit rate so
as to match the rate of the channel.
However, there exist other situations where a separation principle cannot be
used. In particular, when the channel is time-varying and there is a delay con-
straint, or when multiple channels are present as in broadcast or multicast, it can
7.5. JOINT SOURCE-CHANNEL CODING 465
be advantageous to jointly design the source and channel coding so that, for exam-
ple, several transmission rates are possible.
The development of such methods is beyond the scope of this book. As an
example, the case of multiple channels falls into a well studied branch of informa-
tion theory called multiuser information theory [66]. Instead, we will show sev-
eral examples indicating how multiresolution source coding fits naturally into joint
source-channel coding methods. In all these examples, the transmission, or channel
coding, uses a principle we call multiresolution transmission and can be seen as the
dual of multiresolution source coding.
Multiresolution transmission is based on the idea that a transmission system
can operate at different rates, depending on the channel conditions, or that certain
bits will be better protected than others in case of adverse channel conditions. Such
a behavior of the transmission system can be achieved using different techniques,
depending on the transmission media. For example, unequal error protection codes
can be used, thus making certain bits more robust than others in the case of a
noisy channel. The combination of such a transmission scheme with a multires-
olution source coder is very natural. The multiresolution source coder segments
the information into a part which reconstructs a coarse, first approximation of the
signal (such as the lowpass channel in a subband coder) as well as a part which
gives the additional detail signal (typically, the higher frequencies). The coarse
approximation is now sent using the highly protected bits and has a high prob-
ability of arriving successfully, while the detail information will only arrive if the
channel condition is good. The scheme generalizes to more levels of quality in an
obvious manner. This intuitive matching of successive approximation of the source
to different transmission rates, depending on the quality of the channel, is called
multiresolution joint source-channel coding.
R2
multiplexing
C2
superposition
(a)
R1
C2 C1
(b) C
Figure 7.41 Digital broadcast. (a) Joint capacity region for two classes of users
with channel capacities C1 and C2 , respectively, and C1 > C2 . Any point on
or below the curves is achievable, but superposition outperforms multiplexing.
(b) Example of a signal constellation (showing amplitudes of cosine and sine
carriers in a digital communication system) using superposition of information.
As can be seen, there are four clouds at four points each. When the channel is
good, 16 points can be distinguished, (or four bits of information), while under
adverse conditions, only the clouds are seen (or two bits of information).
a graphical description of the joint capacity region and Figure7.41(b) for a typical
constellation used in digital transmission where information for the users with better
channels is superimposed over information which can be received by both classes
of users. Now, keeping our multiresolution paradigm in mind, it is clear that we
can send coarse signal information to both classes of users, while superposing detail
information that can be taken by the users with the good channel. In [231], a
digital broadcast system for HDTV was designed using these principles, including
multiresolution video coding [301] and multiresolution transmission with graceful
degradation (using constellations similar to the one in Figure 7.41(b)).
The principles just described can be used for transmission over unknown time-
varying channels. Instead of transmitting assuming the worst case channel, one can
superpose information decodable on a better channel, in case the channel is actually
7.A. STATISTICAL SIGNAL PROCESSING 467
better than worst case. On average, this will be better than simply assuming worst
case all the time. As an example, consider a wireless channel without feedback.
Because of the changing location of the user, the channel can vary greatly, and
the worst case channel can be very poor. Superposition allows delivery of different
levels of quality, depending on how good the reception actually is. When there is
feedback (as in two-way wireless communication), then one can use a channel coding
optimized for the current channel (see [114]). The source coder then has to adapt to
the current transmission rate, which again is easy to achieve using multiresolution
source coding. A study of wireless video transmission using a two resolution video
source coder can be found in [157].
tribution PX (A) indicates the probability that the random variable X takes on a
value in A, where A is a subset of the real line. The cumulative distribution function
(cdf) FX is defined as
The probability density function (pdf) is related to the cdf (assume that FX is
differentiable) as
dFX (α)
fX (α) = , α ∈ R,
dα
and thus α
FX (α) = fX (x)dx, α ∈ R.
−∞
∂k
fX (α) = FX (α0 , α1 , . . . , αk−1 ).
∂α0 , ∂α1 , . . . , ∂αk−1
fX0 X1 ···Xk−1 (x0 , x1 , . . . , xk−1 ) = fX0 (x0 ) · fX1 (x1 ) · · · fXk−1 (xk−1 ). (7.A.1)
In particular, if each random variable has the same distribution, then we have an
independent and identically distributed (iid) random vector.
Intuitively, a discrete-time random process is the infinite-dimensional general-
ization of a vector random variable. Therefore, any finite subset of random variables
from a random process is a vector random variable.
An important class of vector random variables is the Gaussian vector random variable
of dimension k. To define its pdf, we need a length-k vector m and a positive definite matrix
Λ of size k × k. Then, the k-dimensional Gaussian pdf is given by
T
Λ−1 (x −m )/2
f (x) = (2π)−k/2 (det Λ)−1/2 e−(x−m) , x ∈ Rk (7.A.2)
7.A. STATISTICAL SIGNAL PROCESSING 469
Note how, for k = 1 and Λ = σ 2 , this reduces to the usual Gaussian (normal) distribution
1 2 2
f (x) = √ · e−(x−m) /2σ , x ∈ R,
2πσ 2
From (7.A.1) we see that independent variables are uncorrelated (but uncorrelated-
ness is not sufficient for independence). Sometimes, the “centralized” correlation,
or covariance, is used, namely
from which it follows that two random variables are uncorrelated if and only if their
covariance is zero. The variance of X, denoted by σX 2 , equals cov(X, X), that is,
2
σX = E((X − E(X))2 ),
and its square root σX is called the standard deviation of X. Higher-order moments
are obtained from E(X k ), k > 2. The above functions can be extended to random
processes. The autocorrelation function of a process {Xn , n ∈ Z}, is defined by
RX [n, m] = E(Xn Xm ), n, m ∈ Z,
KX [n, m] = cov(Xn , Xm )
= RX [n, m] − E(Xn )E(Xm ), n, m ∈ Z.
470 CHAPTER 7
An important class of processes are stationary random processes, for which the
probabilistic behavior is constant over time. In particular, the following then hold:
By the same token, all other moments are independent of n. Also, correlation and
covariance depend only on the difference (n − m), or
where ml is the expected value of Xl . Note that if the input is wide-sense stationary,
that is, E(Xn ) = E(X) for all n, then the output has a constant expected value
equal to E(X) ∞ k=0 h[k]. It can be shown that the covariance function of the output
depends also only on the difference n − m (as in (7.A.5)) and thus, filtering by a
linear time-invariant system conserves wide-sense stationarity (see Problem 7.9).
When considering filtered wide-sense stationary processes, it is useful to intro-
duce the power spectral density function (psdf), which is the discrete-time Fourier
transform of the autocorrelation function
∞
SX (ejω ) = RX [n] e−jωn .
n=−∞
Then, it can be shown that the psdf of the output process after filtering with h[n]
equals
! !2
SY (ejω ) = !H(ejω )! SX (ejω ), (7.A.8)
7.A. STATISTICAL SIGNAL PROCESSING 471
where H(ejω ) is the discrete-time Fourier transform of h[n]. Note that when the
input is uncorrelated, that is, RX [n] = E 2 (X)δ[n], then the output autocorrelation
is simply the autocorrelation of the filter, or RY [n] = E 2 (X)h[k], h[k + n], as can
be seen from (7.A.8). If we define the crosscorrelation function
Again, when the input is uncorrelated, this can be used to measure H(ejω ).
An important application of filtering is in linear estimation. The simplest linear
estimation problem is when we have two random variables X and Y , both with zero
mean. We wish to find an estimate X̂ of the form X̂ = αY from the observation
Y , such that the mean square error (MSE) E((X − X̂)2 ) is minimized. It is easy
to verify that
E(XY )
α = ,
E(Y 2 )
minimizes the expected squared error. One distinctive feature of the MSE esti-
mate is that the estimation error (X − X̂) is orthogonal (in expected value) to the
observation Y , that is,
This is known as the orthogonality principle: The best linear estimate in the MSE
sense is the orthogonal projection of X onto the span of Y . It follows that the
minimum MSE is
E((X − X̂)2 ) = E(X 2 ) − α2 E(Y 2 ),
because of orthogonality of (X − X̂) and Y . This geometric view follows from
the interpretation of E(XY ) as an inner product and thus E(X 2 ) is the squared
length of the vector X. Similarly, orthogonality of X and Y is seen as E(XY ) = 0.
Based on this powerful geometric point of view, let us tackle a more general linear
estimation problem. Assume two zero-mean jointly wide-sense stationary processes
{X[n]} and {Y [n]}. We want to estimate X[n] from Y [n] using a filter with the
impulse response h[n], that is
X̂[n] = h[k] Y [n − k], (7.A.10)
k
In particular, when there is no restriction on the set of samples {Y [n]} used for the
estimation, that is K = Z, then we can take the Fourier transform of (7.A.11) to
find
Sxy (ejω )
H(ejω ) = ,
Sy (ejω )
which is the optimal linear estimator. Note that this is in general a noncausal
filter. Finding a causal solution (K = (−∞, n]) is more involved [122], but the
orthogonality principle is preserved.
This concludes our brief overview of statistical signal processing. One more
topic, namely the discrete-time Karhunen-Loève transform, is discussed in the main
text, in Section 7.1, since it lays the foundation for transform-based signal compres-
sion.
PROBLEMS 473
P ROBLEMS
7.1 For a uniform input pdf, as well as uniform quantization, prove that the distortion between
the input and the output of the quantizer is given by (7.1.14), that is
Δ2
D = ,
12
where Δ is the quantizer step size Δ = (b − a)/N , a, b are the boundaries of the input, and
N is the number of intervals.
7.2 Coding gain as a function of number of channels: Consider the coding gain of an ideal filter
bank with N channels (see Section 7.1.2).
(a) Construct a simple example where the coding gain for a 2-channel system is bigger
than the coding gain for a 3-channel system. Hint: Construct a piecewise constant
power spectrum for which the 2-channel system is better matched than the 3-channel
system.
(b) For the example constructed above, show that a 4-channel system outperforms both
the 2- and 3-channel systems.
7.3 Consider the coding gain (see Section 7.1.2) in an ideal subband coding system with N
channels (the filters used are ideal bandpass filters). Start with the case N = 2 before
looking at the general case.
(a) Assume that the power spectrum of the input signal |X(ejω )|2 is given by
|ω|
|X(ejω )|2 = 1 − |ω| ≤ π.
π
Give the coding gain as a function of N .
(b) Same as above, but with
7.4 Huffman and run-length coding: A stream of symbols has the property that stretches of
zeros are likely. Thus, one can use code the length of the stretch of zeros, after a special
“start of run” (SR) symbol.
(c) As an example, take a typical sequence, including stretches of zeros, and encode it,
then decode it, with your Huffman code (small example). Can you decode your bit
stream?
(d) Give the average compression of this run-length and Huffman coding scheme.
7.5 Consider a pyramid coding scheme as discussed in Section 7.3.2. Assume a one-dimensional
signal and an ideal lowpass filter both for coarse-to-fine and fine-to-coarse resolution change.
7.6 Consider the embedded zero tree wavelet (EZW) transform algorithm discussed in Sec-
tion 7.3.4, and study a one-dimensional version.
(a) Assume a one-dimensional octave-band filter bank and define a zero tree for this case.
Compare to the two-dimensional case. Discuss if the dominant and subordinate passes
of the EZW algorithm have to be modified, and if so, how.
(b) One can define a zero tree for arbitrary subband decomposition trees (or wavelet
packets). In which case is the zero tree most powerful?
(c) In the case of a full tree subband decomposition in two dimensions (for example, of
depth 3, leading to 64 channels), compare the zero tree structure with zig-zag scanning
used in DCT.
(a) Verify that the filters given in (7.4.3) form a perfect reconstruction filter bank for
quincunx downsampling and give the reconstruction filters as well.
(b) Show that cascading the quincunx decomposition twice on a progressive sequence (on
the vertical-time dimension) yields again a progressive sequence, with an intermediate
interlaced sequence. Use the downsampling matrix
1 1
D =
−1 1
PROBLEMS 475
7.8 Consider a two-channel filter bank for three-dimensional signals (progressive video sequences)
using FCO downsampling (see Section 7.4.4).
Show that this corresponds to an orthogonal Haar decomposition for FCO downsam-
pling.
(b) Give the output of a two-channel analysis/synthesis system with FCO downsampling
as a function of the input, the aliased version, and the filters.
(a) In Appendix 7.A, we saw that the mean of {y[n]} is independent of n (see below
Equation (7.A.7)). Show that the covariance function of {y[n]}, KY [n, m] = cov(y[n] ·
y[m]) is a function of (n − m) only, and given by
∞
∞
KY [k] = h[n] h[m] KX [k − (n − m)]
n=0 m=0
∞
KXY [m] = h[k] KX [m − k].
h=0
(c) Consider now one-sided wide-sense stationary processes, which can be thought of as
wide-sense stationary processes that are “turned on” at time 0. Consider filtering of
such processes by causal FIR and IIR filters, respectively. What can be said about
E(Y [n]) n ≥ 0 in these cases?
Projects: The following problems are computer-based projects with an experimental flavor.
Access to adequate data (images, video) is helpful.
7.10 Coding gain and R(d) optimal filters for subband coding: Consider a two-band perfect re-
construction subband coder with orthogonal filters in lattice structure. As an input, use a
first-order Markov process with high correlation (ρ = 0.9). For small filter lengths (L = 4, 6
or so), optimize the lattice coefficients so as to maximize coding gain or minimize first-order
entropy after uniform scalar quantization. Find what filter is optimal, and try for fine and
coarse quantization steps.
Use optimal bit allocation between the two channels, if possible. The same idea can be
extended to Lloyd-Max quantization, and to logarithmic trees. This project requires some
experience with coding algorithms. For relevant literature, see [79, 109, 244, 295].
476 BIBLIOGRAPHY
7.11 Pyramids using nonlinear operators: One of the attractive features of pyramid coding schemes
over critically sampled coding schemes is that nonlinear operators can be used. The goal of
the project is to investigate the use of median filters (or some other nonlinear operators) in
a pyramidal scheme.
The results could be theoretical or experimental. The project requires image processing
background. For relevant literature, see [41, 138, 303, 323].
7.12 Motion compensation of motion vectors: In video coding, motion compensation is used to
predict a new frame from reconstructed previous frames. Usually, a sparse set of motion
vectors is used (such as one per 8 × 8 block), and thus, sending motion vectors contributes
little to the bit rate overhead. An alternative scheme could use a dense motion vector field
in order to reduce the prediction error. In order to reduce the overhead, predict the motion
vector field, since it is usually not changing radically in time within a video scene. Thus,
the aim of the project is to treat the motion vector field as a sequence (of vectors), and find
a meta-motion vector field to predict the actual motion vector field (for example, per block
of 2×2 motion vectors).
This project requires image/video processing background. For more literature on motion
estimation, see [138, 207].
7.13 Adaptive Karhunen-Loève transform: The Karhunen-Loève transform is optimal for energy
packing of stationary processes, and under certain conditions, for transform coding and
quantization of such processes. However, if the process is nonstationary, compression might
be improved by using an adaptive transform. An interesting solution is an overhead free
transform which is derived from the coded version of the signal, based on some estimate of
local correlations.
The goal of the project is to explore such an adaptive transform on some synthetic nonsta-
tionary signals, as well as on real signals (such as speech).
This project requires good signal processing background. For more literature, see [143].
7.14 Three-dimensional wavelet coding: In medical imaging and remote sensing, one often en-
counters three-dimensional data. For example, multispectral satellite imagery consists of
many spectral band images. Develop a simple three-dimensional coding algorithm based on
the Haar filters, and iteration on the lowpass channel. This is the three-dimensional equiv-
alent of the octave-band subband coding of images discussed in Section 7.3.3. Apply your
algorithm to real imagery if available, or generate synthetic data with a lowpass nature.
Bibliography
477
478 BIBLIOGRAPHY
[43] P. M. Cassereau. A new class of optimal unitary transforms for image processing.
Master’s thesis, Massachusetts Institute of Technology, May 1985.
[44] P. M. Cassereau, D. H. Staelin, and G. de Jager. Encoding of images based on a
lapped orthogonal transform. IEEE Trans. Commun., 37:189–193, February 1989.
[45] A. S. Cavaretta, W. Dahmen, and C. Micchelli. Stationary subdivision. Mem. Amer.
Math. Soc., 93:1–186, 1991.
[46] D. C. Champeney. A Handbook of Fourier Theorems. Cambridge University Press,
Cambridge, UK, 1987.
[47] T. Chen and P. P. Vaidyanathan. Multidimensional multirate filters and filter banks
derived from one-dimensional filters. IEEE Trans. Signal Proc., 41(5):1749–1765,
May 1993.
[48] T. Chen and P. P. Vaidyanathan. Recent developments in multidimensional multirate
systems. IEEE Trans. on CSVT, 3(2):116–137, April 1993.
[49] C. K. Chui. An Introduction to Wavelets. Academic Press, New York, 1992.
[50] C. K. Chui. On cardinal spline wavelets. In Ruskai et al., editor, Wavelets and Their
Applications, pages 419–438. Jones and Bartlett, MA, 1992.
[51] C. K. Chui, editor. Wavelets: A Tutorial in Theory and Applications. Academic
Press, New York, 1992.
[52] C. K. Chui and J. Z. Wang. A cardinal spline approach to wavelets. Proc. Amer.
Math. Soc., 113:785–793, 1991.
[53] T. A. C. M. Claasen and W. F. G. Mecklenbräuker. The Wigner distribution - a tool
for time-frequency signal analysis, Part I, II, and III. Philips Journal of Research,
35(3, 4/5, 6):217–250, 276–300, 372–389, 1980.
[54] R. J. Clarke. Transform Coding of Images. Academic Press, London, 1985.
[55] A. Cohen. Ondelettes, Analyses Multiresolutions et Traitement Numérique du Signal.
PhD thesis, Université Paris IX Dauphine, Paris, France, 1990.
[56] A. Cohen. Biorthogonal wavelets. In C. K. Chui, editor, Wavelets: A Tutorial in
Theory and Applications. Academic Press, New York, 1992.
[57] A. Cohen and I. Daubechies. Nonseparable bidimensional wavelet bases. Rev. Mat.
Iberoamericana, 9(1):51–137, 1993.
[58] A. Cohen, I. Daubechies, and J.-C. Feauveau. Biorthogonal bases of compactly
supported wavelets. Commun. on Pure and Appl. Math., 45:485–560, 1992.
[59] L. Cohen. Time-frequency distributions: A review. Proc. IEEE, 77(7):941–981, July
1989.
[60] L. Cohen. The scale representation. IEEE Trans. on Signal Proc., Special Issue on
Wavelets and Signal Processing, 41(12):3275–3292, December 1993.
[61] R. R. Coifman and Y. Meyer. Remarques sur l’analyse de Fourier à fenêtre. C.R.
Acad. Sci., pages 259–261, 1991.
BIBLIOGRAPHY 481
[109] A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer
Academic Publishers, Boston, MA, 1992.
[110] H. Gharavi. Subband coding of video signals. In J. W. Woods, editor, Subband Image
Coding. Kluwer Academic Publishers, Boston, MA, 1990.
[111] H. Gharavi and A. Tabatabai. Subband coding of monochrome and color images.
IEEE Trans. Circ. and Syst., 35(2):207–214, February 1988.
[112] A. Gilloire and M. Vetterli. Adaptive filtering in subbands with critical sampling:
analysis, experiments, and application to acoustic echo cancellation. IEEE Trans.
Signal Proc., 40(8):1862–1875, August 1992.
[113] I. Gohberg and S. Goldberg. Basic Operator Theory. Birkhauser, Boston, MA, 1981.
[114] A. J. Goldsmith and P. P. Varaiya. Capacity of time-varying channels with estimation
and feedback. To appear, IEEE Trans. on Inform. Theory.
[115] R. Gopinath. Wavelet and Filter Banks — New Results and Applications. PhD
thesis, Rice University, 1992.
[116] R. A. Gopinath and C. S. Burrus. Wavelet-based lowpass/bandpass interpolation.
In Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., pages 385–388, San
Francisco, CA, March 1992.
[117] R. A. Gopinath and C. S. Burrus. Wavelet transforms and filter banks. In C. K. Chui,
editor, Wavelets: A Tutorial in Theory and Applications, pages 603–654. Academic
Press, New York, 1992.
[118] A. Goshtasby, F. Cheng, and B. Barsky. B-spline curves and surfaces viewed as
digital filters. Computer Vision, Graphics, and Image Processing, 52(2):264–275,
November 1990.
[119] P. Goupillaud, A. Grossman, and J. Morlet. Cycle-octave and related transforms in
seismic signal analysis. Geoexploration, 23:85–102, 1984/85. Elsevier Science Pub.
[120] R. M. Gray. Vector quantization. IEEE ASSP Magazine, 1:4–29, April 1984.
[121] R. M. Gray. Source Coding Theory. Kluwer Academic Publishers, Boston, MA, 1990.
[122] R. M. Gray and L. D. Davisson. Random Processes: A Mathematical Approach for
Engineers. Prentice-Hall, Englewood Cliffs, NJ, 1986.
[123] K. Gröchenig and W. R. Madych. Multiresolution analysis, Haar bases and self-
similar tilings of Rn . IEEE Trans. on Inform. Theory, Special Issue on Wavelet
Transforms and Multiresolution Signal Analysis, 38(2):556–568, March 1992.
[124] A. Grossmann, R. Kronland-Martinet, and J. Morlet. Reading and under-
standing continuous wavelet transforms. In J. M. Combes, A. Grossmann, and
Ph. Tchamitchian, editors, Wavelets, Time-Frequency Methods and Phase Space.
Springer-Verlag, Berlin, 1989.
[125] A. Grossmann and J. Morlet. Decomposition of Hardy functions into square inte-
grable wavelets of constant shape. SIAM Journ. of Math. Anal., 15(4):723–736, July
1984.
BIBLIOGRAPHY 485
[126] A. Haar. Zur Theorie der orthogonalen Funktionensysteme. Math. Annal., 69:331–
371, 1910.
[127] P. Haskell and D. Messerschmitt. Open network architecture for continuous-media
services: the medley gateway. Technical report, Dept. of EECS, January 1994.
[128] C. Heil and D. Walnut. Continuous and discrete wavelet transforms. SIAM Rev.,
31:628–666, 1989.
[129] P. N. Heller and H. W. Resnikoff. Regular M-and wavelets and applications. In Proc.
IEEE Int. Conf. Acoust., Speech, and Signal Proc., pages III: 229–232, Minneapolis,
MN, April 1993.
[130] C. Herley. Wavelets and Filter Banks. PhD thesis, Columbia University, 1993.
[131] C. Herley. Exact interpolation and iterative subdivision schemes. IEEE Trans. Signal
Proc., 1995.
[132] C. Herley, J. Kovačević, K. Ramchandran, and M. Vetterli. Tilings of the time-
frequency plane: Construction of arbitrary orthogonal bases and fast tiling algo-
rithms. IEEE Trans. on Signal Proc., Special Issue on Wavelets and Signal Process-
ing, 41(12):3341–3359, December 1993.
[133] C. Herley and M. Vetterli. Wavelets and recursive filter banks. IEEE Trans. Signal
Proc., 41(8):2536–2556, August 1993.
[134] O. Herrmann. On the approximation problem in nonrecursive digital filter design.
IEEE Trans. Circuit Theory, 18:411–413, 1971.
[135] F. Hlawatsch and F. Boudreaux-Bartels. Linear and quadratic time-frequency signal
representations. IEEE SP Mag., 9(2):21–67, April 1992.
[136] M. Holschneider, R. Kronland-Martinet, J. Morlet, and Ph. Tchamitchian. A real-
time algorithm for signal analysis with the help of the wavelet transform. In Wavelets,
Time-Frequency Methods and Phase Space, pages 289–297. Springer-Verlag, Berlin,
1989.
[137] M. Holschneider and P. Tchamitchian. Pointwise analysis of Rieman’s “non-
differentiable” function. Inventiones Mathematicae, 105:157–175, 1991.
[138] A. K. Jain. Fundamentals of Digital Image Processing. Prentice-Hall, Englewood
Cliffs, NJ, 1989.
[139] A. J. E. M. Janssen. Note on a linear system occurring in perfect reconstruction.
Signal Proc., 18(1):109–114, 1989.
[140] B. Jawerth and T. Swelden. An overview of wavelet based multiresolution analyses.
SIAM Review, 36(3):377–412, September 1994.
[141] N. S. Jayant. Signal compression: technology targets and research directions. IEEE
Journ. on Sel. Areas in Commun., 10(5):796–818, June 1992.
[142] N. S. Jayant, J.D. Johnston, and R.J. Safranek. Signal compression based on models
of human perception. Proc. IEEE, 81(10):1385–1422, October 1993.
486 BIBLIOGRAPHY
[230] K. Ramchandran. Joint Optimization Techniques for Image and Video Coding and
Applications to Digital Broadcast. PhD thesis, Columbia University, June 1993.
[231] K. Ramchandran, A. Ortega, K. M. Uz, and M. Vetterli. Multiresolution broadcast
for digital HDTV using joint source-channel coding. IEEE JSAC, 11(1):6–23, January
1993.
[232] K. Ramchandran, A. Ortega, and M. Vetterli. Bit allocation for dependent quanti-
zation with applications to multiresolution and MPEG video coders. IEEE Trans.
Image Proc., 3(5):533–545, September 1994.
[233] K. Ramchandran and M. Vetterli. Best wavelet packet bases in a rate-distortion
sense. IEEE Trans. Image Proc., 2(2):160–175, April 1993.
[234] T. A. Ramstad. IIR filter bank for subband coding of images. In Proc. IEEE Int.
Symp. Circ. and Syst., pages 827–830, Helsinki, Finland, 1988.
[235] T. A. Ramstad. Cosine modulated analysis-synthesis filter bank with critical sam-
pling and perfect reconstruction. In Proc. IEEE Int. Conf. Acoust. Speech and Signal
Processing, pages 1789–1792, Toronto, Canada, May 1991.
[236] T. A. Ramstad and T. Saramäki. Efficient multirate realization for narrow transition-
band FIR filters. In Proc. IEEE Int. Symp. Circ. and Syst., pages 2019–2022,
Helsinki, Finland, 1988.
[237] N. Ricker. The form and laws of propagation of seismic wavelets. Geophysics, 18:10–
40, 1953.
[238] O. Rioul. Les Ondelettes. Mémoires d’Option, Dept. de Math. de l’Ecole Polytech-
nique, 1987.
[239] O. Rioul. Simple regularity criteria for subdivision schemes. SIAM J. Math Anal.,
23:1544–1576, November 1992.
[240] O. Rioul. A discrete-time multiresolution theory. IEEE Trans. Signal Proc.,
41(8):2591–2606, August 1993.
[241] O. Rioul. Note on frequency localization and regularity. CNET memorandum, 1993.
[242] O. Rioul. On the choice of wavelet filters for still image compression. In Proc. IEEE
Int. Conf. Acoust., Speech, and Signal Proc., pages V: 550–553, Minneapolis, MN,
April 1993.
[243] O. Rioul. Ondelettes Régulières: Application à la Compression d’Images Fixes. PhD
thesis, ENST, Paris, March 1993.
[244] O. Rioul. Regular wavelets: A discrete-time approach. IEEE Trans. on Signal Proc.,
Special Issue on Wavelets and Signal Processing, 41(12):3572–3578, December 1993.
[245] O. Rioul and P. Duhamel. Fast algorithms for discrete and continuous wavelet trans-
forms. IEEE Trans. on Inform. Theory, Special Issue on Wavelet Transforms and
Multiresolution Signal Analysis, 38(2):569–586, March 1992.
[246] O. Rioul and P. Duhamel. A Remez exchange algorithm for orthonormal wavelets.
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing,
41(8):550–560, August 1994.
492 BIBLIOGRAPHY
[247] O. Rioul and M. Vetterli. Wavelets and signal processing. IEEE SP Mag., 8(4):14–38,
October 1991.
[248] E. A. Robinson. Random Wavelets and Cybernetic Systems. Griffin and Co., London,
1962.
[249] A. Rosenfeld, editor. Multiresolution Techniques in Computer Vision. Springer-
Verlag, New York, 1984.
[250] H. L. Royden. Real Analysis. MacMillan, New York, 1968.
[251] M. B. Ruskai, G. Beylkin, R. Coifman, I. Daubechies, S. Mallat, Y. Meyer, and
L. Raphael, editors. Wavelets and their Applications. Jones and Bartlett, Boston,
1992.
[252] R. J. Safranek and J. D. Johnston. A perceptually tuned sub-band image coder with
image dependent quantization and post-quantization data compression. Proc. IEEE
Int. Conf. Acoust., Speech, and Signal Proc., M(11.2):1945–1948, 1989.
[253] N. Saito and G. Beylkin. Multiresolution representation using the auto-correlation
functions of compactly supported wavelets. IEEE Trans. on Signal Proc., Special
Issue on Wavelets and Signal Processing, 41(12):3584–3590, December 1993.
[254] B. Scharf. Critical bands. In Foundations in Modern Auditory Theory, pages 150–202.
Academic, New York, 1970.
[255] I. J. Schoenberg. Contribution to the problem of approximation of equidistant data
by analytic functions. Quart. Appl. Math., 4:112–141, 1946.
[256] T. Senoo and B. Girod. Vector quantization for entropy coding of image subbands.
IEEE Trans. on Image Proc., 1(4):526–532, October 1992.
[257] I. Shah and A. Kalker. Theory and Design of Multidimensional QMF Sub-Band
Filters From 1-D Filters and Polynomials Using Transforms. Proceedings of the IEE,
140(1):67–71, February 1993.
[258] C. E. Shannon. Communications in the presence of noise. Proc. of the IRE, 37:10–21,
January 1949.
[259] J. M. Shapiro. An embedded wavelet hierarchical image coder. In Proc. IEEE Int.
Conf. Acoust., Speech, and Signal Proc., pages 657–660, San Francisco, March 1992.
[260] J. M. Shapiro. Embedded image coding using zerotrees of wavelet coefficients. IEEE
Trans. on Signal Proc., Special Issue on Wavelets and Signal Processing, 41(12):3445–
3462, December 1993.
[261] M. J. Shensa. The discrete wavelet transform: Wedding the à trous and Mallat
algorithms. IEEE Trans. Signal Proc., 40(10):2464–2482, October 1992.
[262] Y. Shoham and A. Gersho. Efficient bit allocation for an arbitrary set of quantizers.
IEEE Trans. Acoust., Speech, and Signal Proc., 36(9):1445–1453, September 1988.
[263] J. J. Shynk. Frequency-domain and multirate adaptive filtering. IEEE Signal Pro-
cessing Magazine, 9:14–37, January 1992.
BIBLIOGRAPHY 493
[279] G. Stoll and F. Dehery. High quality audio bit rate reduction family for different
applications. Proc. IEEE Int. Conf. Commun., pages 937–941, April 1990.
[280] G. Strang. Linear Algebra and Its Applications, Third Edition. Harcourt Brace
Jovanovich, San Diego, CA, 1988.
[281] G. Strang. Wavelets and dilation equations: a brief introduction. SIAM Journ. of
Math. Anal., 31:614–627, 1989.
[282] G. Strang and G. J. Fix. An Analysis of the Finite Element Method. Prentice-Hall,
Englewood-Cliffs, NJ, 1973.
[283] J.-O. Stromberg. A modified Franklin system and higher order spline systems on RN
as unconditional bases for Hardy spaces. In W. Beckner et al, editor, Proc. of Conf.
in honour of A. Zygmund, pages 475–493. Wadsworth Mathematics series, 1982.
[284] J.-O. Stromberg. A modified Franklin system as the first orthonormal system of
wavelets. In Y. Meyer, editor, Wavelets and Applications, pages 434–442. Masson,
Paris, 1991.
[285] N. Tanabe and N. Farvardin. Subband image coding using entropy-coded quantiza-
tion over noisy channels. IEEE Journ. on Sel. Areas in Commun., 10(5):926–942,
June 1992.
[286] D. Taubman and A. Zakhor. Multi-rate 3-D subband coding of video. IEEE
Trans. Image Processing, Special issue on Image Sequence Compression, 3(5):572–
588, September 1994.
[287] D. Taubman and A. Zakhor. Orientation adaptive subband coding of images. IEEE
Trans. Image Processing, 3(4):421–437, July 1994.
[288] D. B. H. Tay and N. G. Kingsbury. Flexible design of multidimensional perfect
reconstruction FIR 2-band filters using transformations of variables. IEEE Trans.
Image Proc., 2(4):466–480, October 1993.
[289] P. Tchamitchian. Biorthogonalité et théorie des opérateurs. Revista Mathemática
Iberoamericana, 3(2):163–189, 1987.
[290] C. C. Todd, G. A. Davidson, M. F. Davis, L. D. Fielder, B. D. Link, and S. Vernon.
AC-3: Flexible perceptual coding for audio transmission and storage. In Convention
of the AES, Amsterdam, February 1994.
[291] B. Torrésani. Wavelets associated with representations of the affine Weil-Heisenberg
group. J. Math. Physics, 32:1273, 1991.
[292] M. K. Tsatsanis and G. B. Giannakis. Principal component filter banks for optimal
wavelet analysis. In Proc. 6th Signal Processing Workshop on Statistical Signal and
Array Processing, pages 193–196, Victoria, B.C., Canada, 1992.
[293] F. B. Tuteur. Wavelet transformations in signal detection. In J. M. Combes, A. Gross-
mann, and Ph. Tchamitchian, editors, Wavelets, Time-Frequency Methods and Phase
Space. Springer-Verlag, Berlin, 1989.
[294] M. Unser. On the approximation of the discrete Karhunen-Loeve transform for
stationary processes. Signal Proc., 5(3):229–240, May 1983.
BIBLIOGRAPHY 495
[295] M. Unser. On the optimality of ideal filters for pyramid and wavelet signal ap-
proximation. IEEE Trans. on Signal Proc., Special Issue on Wavelets and Signal
Processing, 41(12):3591–3595, December 1993.
[296] M. Unser and A. Aldroubi. Polynomial splines and wavelets: a signal processing
perspective. In C. K. Chui, editor, Wavelets: a Tutorial in Theory and Applications,
pages 91–122. Academic Press, San Diego, CA, 1992.
[297] M. Unser, A. Aldroubi, and M. Eden. On the asymptotic convergence of B-spline
wavelets to Gabor functions. IEEE Trans. on Inform. Theory, Special Issue on
Wavelet Transforms and Multiresolution Signal Analysis, 38(2):864–871, March 1992.
[298] M. Unser, A. Aldroubi, and M. Eden. B-spline signal processing, part I and II. IEEE
Trans. Signal Proc., 41(2):821–833 and 834–848, February 1993.
[299] M. Unser, A. Aldroubi, and M. Eden. A family of polynomial spline wavelet trans-
forms. Signal processing, 30(2):141–162, January 1993.
[300] M. Unser, A. Aldroubi, and M. Eden. Enlargement or reduction of digital images
with minimum loss of information. IEEE Trans. Image Proc., pages 247–258, March
1995.
[301] K. M. Uz. Multiresolution Systems for Video Coding. PhD thesis, Columbia Univer-
sity, New York, May 1992.
[302] K. M. Uz, M. Vetterli, and D. LeGall. A multiresolution approach to motion estima-
tion and interpolation with application to coding of digital HDTV. In Proc. IEEE
Int. Symp. Circ. and Syst., pages 1298–1301, New Orleans, May 1990.
[303] K. M. Uz, M. Vetterli, and D. LeGall. Interpolative multiresolution coding of ad-
vanced television with compatible subchannels. IEEE Trans. on CAS for Video
Technology, Special Issue on Signal Processing for Advanced Television, 1(1):86–99,
March 1991.
[304] P. P. Vaidyanathan. The discrete time bounded-real lemmma in digital filtering.
IEEE Trans. Circ. and Syst., 32(9):918–924, September 1985.
[305] P. P. Vaidyanathan. Quadrature mirror filter banks, M-band extensions and perfect
reconstruction techniques. IEEE ASSP Mag., 4(3):4–20, July 1987.
[306] P. P. Vaidyanathan. Theory and design of M-channel maximally decimated quadra-
ture mirror filters with arbitrary M, having the perfect reconstruction property. IEEE
Trans. Acoust., Speech, and Signal Proc., 35(4):476–492, April 1987.
[307] P. P. Vaidyanathan. Multirate digital filters, filter banks, polyphase networks, and
applications: a tutorial. Proc. IEEE, 78(1):56–93, January 1990.
[308] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice-Hall, Englewood
Cliffs, NJ, 1993.
[309] P. P. Vaidyanathan and Z. Doǧanata. The role of lossless systems in modern digital
signal processing: A tutorial. IEEE Trans. Educ., 32(3):181–197, August 1989.
496 BIBLIOGRAPHY
[310] P. P. Vaidyanathan and P.-Q. Hoang. Lattice structures for optimal design and ro-
bust implementation of two-channel perfect reconstruction filter banks. IEEE Trans.
Acoust., Speech, and Signal Proc., 36(1):81–94, January 1988.
[311] P. P. Vaidyanathan and S. K. Mitra. Polyphase networks, block digital filtering,
LPTV systems, and alias-free QMF banks: a unified approach based on pseudo-
circulants. IEEE Trans. Acoust., Speech, and Signal Proc., 36:381–391, March 1988.
[312] P. P. Vaidyanathan, T. Q. Nguyen, Z. Doǧanata, and T. Saramäki. Improved tech-
nique for design of perfect reconstruction FIR QMF banks with lossless polyphase
matrices. IEEE Trans. Acoust., Speech, and Signal Proc., 37(7):1042–1056, July
1989.
[313] P. P. Vaidyanathan, P. Regalia, and S. K. Mitra. Design of doubly complementary
IIR digital filters using a single complex allpass filter, with multirate applications.
IEEE Trans. on Circuits and Systems, 34:378–389, April 1987.
[314] M. Vetterli. Multidimensional subband coding: Some theory and algorithms. Signal
Proc., 6(2):97–112, April 1984.
[315] M. Vetterli. Filter banks allowing perfect reconstruction. Signal Proc., 10(3):219–244,
April 1986.
[316] M. Vetterli. A theory of multirate filter banks. IEEE Trans. Acoust., Speech, and
Signal Proc., 35(3):356–372, March 1987.
[317] M. Vetterli. Running FIR and IIR filtering using multirate filter banks. IEEE Trans.
Acoust., Speech, and Signal Proc., 36:730–738, May 1988.
[318] M. Vetterli and C. Herley. Wavelets and filter banks: Relationships and new results.
In Proc. ICASSP’90, pages 1723–1726, Albuquerque, NM, April 1990.
[319] M. Vetterli and C. Herley. Wavelets and filter banks: Theory and design. IEEE
Trans. Signal Proc., 40(9):2207–2232, September 1992.
[320] M. Vetterli, J. Kovačević, and D. J. LeGall. Perfect reconstruction filter banks for
HDTV representation and coding. Image Communication, 2(3):349–364, October
1990.
[321] M. Vetterli and D. J. LeGall. Perfect reconstruction FIR filter banks: Some properties
and factorizations. IEEE Trans. Acoust., Speech, and Signal Proc., 37(7):1057–1071,
July 1989.
[322] M. Vetterli and H. J. Nussbaumer. Simple FFT and DCT algorithms with reduced
number of operations. Signal Proc., 6(4):267–278, August 1984.
[323] M. Vetterli and K. M. Uz. Multiresolution coding techniques for digital video: a
review. Special Issue on Multidimensional Processing of Video Signals, Multidimen-
sional Systems and Signal Processing, 3:161–187, 1992.
[324] L. F. Villemoes. Regularity of Two-Scale Difference Equation and Wavelets. PhD
thesis, Mathematical Institute, Technical University of Denmark, 1992.
BIBLIOGRAPHY 497
[325] E. Viscito and J. P. Allebach. The analysis and design of multidimensional FIR
perfect reconstruction filter banks for arbitrary sampling lattices. IEEE Trans. Circ.
and Syst., 38(1):29–42, January 1991.
[326] J. S. Walker. Fourier Analysis. Oxford University Press, New York, 1988.
[327] G. K. Wallace. The JPEG still picture compression standard. Communications of
the ACM, 34(4):30–44, April 1991.
[328] G. G. Walter. A sampling theorem for wavelet subspaces. IEEE Trans. on Inform.
Theory, Special Issue on Wavelet Transforms and Multiresolution Signal Analysis,
38(2):881–883, March 1992.
[329] P. H. Westerink. Subband Coding of Images. PhD thesis, Delft University of Tech-
nology, Delft, The Netherlands, 1989.
[330] P. H. Westerink, J. Biemond, and D. E. Boekee. Subband coding of color images.
In J. W. Woods, editor, Subband Image Coding, pages 193–228. Kluwer Academic
Publishers, Inc., Boston, MA, 1991.
[331] P. H. Westerink, J. Biemond, and D. E. Boekee. Scalar quantization error analysis
for image subband coding using QMF’s. Signal Proc., 40(2):421–428, February 1992.
[332] P. H. Westerink, J. Biemond, D. E. Boekee, and J. W. Woods. Subband coding of
images using vector quantization. IEEE Trans. Commun., 36(6):713–719, June 1988.
[333] M. V. Wickerhauser. Acoustic signal compression with wavelet packets. In C. K.
Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 679–700. Aca-
demic Press, New York, 1992.
[334] S. Winograd. Arithmetic Complexity of Computations, volume 33. SIAM, Philadel-
phia, 1980.
[335] J. W. Woods, editor. Subband Image Coding. Kluwer Academic Publishers, Boston,
MA, 1991.
[336] J. W. Woods and T. Naveen. A filter based bit allocation scheme for subband
compression of HDTV. IEEE Trans. on IP, 1:436–440, July 1992.
[337] J. W. Woods and S. D. O’Neil. Sub-band coding of images. IEEE Trans. Acoust.,
Speech, and Signal Proc., 34(5):1278–1288, May 1986.
[338] G. W. Wornell. A Karhunen-Loeve-like expansion of 1/f processes via wavelets.
IEEE Trans. Inform. Theory, 36:859–861, July 1990.
[339] G. W. Wornell and A. V. Oppenheim. Wavelet-based representations for a class of
self-similar signals with application to fractal modulation. IEEE Trans. on Inform.
Theory, Special Issue on Wavelet Transforms and Multiresolution Signal Analysis,
38(2):785–800, March 1992.
[340] X. Xia and Z. Zhang. On sampling theorem, wavelets, and wavelet transforms. IEEE
Trans. on Signal Proc., Special Issue on Wavelets and Signal Processing, 41(12):3524–
3535, December 1993.
498 BIBLIOGRAPHY
499
500 INDEX
wavelet, 224
wavelet packets, 158, 287
wavelet series, 267
wavelet transform, 80
wavelet transform, 313
admissibility condition, 313
characterization of regularity, 320
conservation of energy, 318
discretization of, 329
frequency localization, 320
properties, 316
reproducing kernel, 323
resolution of the identity, 314
scalograms, 325
time localization, 319
wavelets
”twin dragon”, 296
based on Butterworth filters, 286
based on multichannel filter banks, 287
Battle-Lemarié, 240
biorthogonal, 280
construction of, 224
Daubechies’, 219, 264
Haar, 214, 226, 245
Malvar’s, 299
Meyer’s, 231
Morlet’s, 324
mother wavelet, 313
multidimensional, 291
sinc, 228, 246
spline, 236
Stromberg’s, 240
with exponential decay, 286
Wigner-Ville distribution, 81
Winograd short convolution algorithms, 350