Classical Optics and Its Applications

CLASSICAL OPTICS AND ITS APPLICATIONS
Second Edition
Covering a broad range of fundamental topics in classical optics and electro-

magnetism, this book is ideal for graduate-level courses in optics, providing
supplementary reading materials for teachers and students alike. Industrial scientists
and engineers developing modern optical systems will also find it an invaluable
resource. Now in color, this second edition contains 13 new chapters, covering
optical pulse compression, the Hanbury Brown–Twiss experiment, the Sagnac
effect, Doppler shift and stellar aberration, and optics of semiconductor diode lasers.
The first half of the book deals primarily with the basic concepts of optics,
while the second half describes how these concepts can be used in a variety of
technological applications. Each chapter is concerned with a single topic,
developing an understanding through the use of diagrams, examples, numerical
simulations, and logical arguments. The mathematical content is kept to a
minimum to provide the reader with insightful discussions of optical phenomena.
M asud M ansuripur received a Bachelor of Science degree in Electrical

Engineering from Arya-Mehr University of Technology in Tehran, Iran (1977), a
Master of Science in Electrical Engineering from Stanford University (1978), a
Master of Science in Mathematics from Stanford University (1980), and a Ph.D. in
Electrical Engineering from Stanford University (1981). He has been Professor of
Optical Sciences at the University of Arizona since 1988. His areas of research
include optical data storage, optical signal processing, magneto-optical properties
of thin magnetic films, radiation pressure, interaction of light with sub-wavelength
structures, and the optical and thermal characterization of thin films and stacks. A
Fellow of the Optical Society of America, he has published more than 250 papers
in various technical journals, holds eight patents, has given numerous invited talks
at international scientific conferences, and has been a contributing editor of Optics
& Photonics News, the magazine of the Optical Society of America. Professor
Mansuripur’s published books include Introduction to Information Theory (1987)
and The Physical Principles of Magneto-optical Recording (1995).
http://ebooks.cambridge.org/ebook.jsf?bid=CBO9780511803796
CLASSICAL OPTICS AND ITS
APPLICATIONS
Second Edition
MASUD MANSURIPUR
College of Optical Sciences
University of Arizona, Tucson
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521881692
ª M. Mansuripur 2009
This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without
the written permission of Cambridge University Press.
First edition published 2002

Second edition published 2009
Printed in the United Kingdom at the University Press, Cambridge
A catalogue record for this publication is available from the British Library
ISBN 978-0-521-88169-2 hardback
Cambridge University Press has no responsibility for the persistence or

accuracy of URLs for external or third-party internet websites referred to
in this publication, and does not guarantee that any content on such
websites is, or will remain, accurate or appropriate.
To Annegret, Kaveh, and Tobias
Contents
Preface to the second English edition page ix

Preface to the first edition xi
Introduction 1
1 Abbe’s sine condition 9
2 Fourier optics 23
3 Effect of polarization on diffraction in systems of high
numerical aperture 45
4 Gaussian beam optics 52
5 Coherent and incoherent imaging 62
6 First-order temporal coherence in classical optics 74
7 The van Cittert–Zernike theorem 88
8 Partial polarization, Stokes parameters, and the Poincaré sphere 100
9 Second-order coherence and the Hanbury Brown–Twiss
experiment 113
10 What in the world are surface plasmons? 128
11 Surface plasmon polaritons on metallic surfaces 139
12 The Faraday effect 152
13 The magneto-optical Kerr effect 166
14 The Sagnac interferometer 182
15 Fabry–Pérot etalons in polarized light 197
16 The Ewald–Oseen extinction theorem 209
17 Reciprocity in classical linear optics 224
18 Optical pulse compression 240
19 The uncertainty principle in classical optics 258
20 Omni-directional dielectric mirrors 274
21 Linear optical vortices 289
22 Geometric-optical rays, Poynting’s vector, and the field momenta 301
23 Doppler shift, stellar aberration, and convection of light
by moving media 310
24 Diffraction gratings 323
vii
Contents viii
25 Diffractive optical elements 351
26 The Talbot effect 367
27 Some quirks of total internal reflection 379
28 Evanescent coupling 387
29 Internal and external conical refraction 404
30 Transmission of light through small elliptical apertures 418
31 The method of Fox and Li 447
32 The beam propagation method 459
33 Launching light into a fiber 476
34 The optics of semiconductor diode lasers 489
35 Michelson’s stellar interferometer 505
36 Bracewell’s interferometric telescope 515
37 Scanning optical microscopy 525
38 Zernike’s method of phase contrast 545
39 Polarization microscopy 554
40 Nomarski’s differential interference contrast microscope 566
41 The van Leeuwenhoek microscope 576
42 Projection photolithography 586
43 Interaction of light with subwavelength structures 599
44 The Ronchi test 614
45 The Shack–Hartmann wavefront sensor 624
46 Ellipsometry 632
47 Holography and holographic interferometry 642
48 Self-focusing in nonlinear optical media 654
49 Spatial optical solitons 664
50 Laser heating of multilayer stacks 678
Index 691
Preface to the second English edition
Following the publication of the first edition of this book, I wrote (or co-wrote)
nine additional columns for Optics & Photonics News (OPN), which appeared
between April 2001 and April 2007. Some of these columns were included in the
Japanese enlarged edition of the book, published in 2006; all nine columns are
now included in this second English edition. Throughout the years, I also wrote
four columns which were not submitted to OPN, because they ended up being
somewhat lengthy and perhaps too mathematical for the general readership of the
OPN; these appear here for the first time as Chapters 9, 14, 18, and 25.
The selection of topics and the exposition style of the thirteen new chapters of the
present edition follow the same principles and guidelines as did the original thirty-
seven chapters of the first edition. In each case a topic is chosen either for its intrinsic
value as a foundational contribution to the science of optics (e.g., the Sagnac effect,
second-order coherence, the Doppler shift), or because of its technological
significance (e.g., optical pulse compression, semiconductor diode lasers, diffractive
optical elements). To a large extent, the fifty chapters of the present book are
independent of each other and can be read in any desired sequence. Occasionally,
when the information in one chapter could benefit the understanding of the material
in another, cross references are provided. The presentation style is pedagogic and
informal, with mathematics used sparingly unless it is deemed essential and
unavoidable. Computer simulations are used extensively throughout the book as an
aid to explaining the concepts and to provide concrete examples of the physical
phenomena under consideration. As was the case in the first edition, the software
packages DIFFRACT and MULTILAYER, both products of MM Research,
Inc., Tucson, Arizona, were used for the numerical simulations reported in the new
Chapters 19, 20, 23, 25, 33, 34, 43, and 49. The computer simulations of Chapters 11,
30, and 43 were carried out by Armis Zakharian using his software package
Sim3D_Max, a product of Nonlinear Control Strategies, Inc., Tucson, Arizona.
The basis of Sim3D_Max is the Finite Difference Time Domain (FDTD) technique
for solving Maxwell’s equations as described, for example, in Computational
Electrodynamics by A. Taflove and S. C. Hagness, Artech House (2000).
ix
Preface to the second English edition x
Professor Emeritus Jumpei Tsujiuchi of the Tokyo Institute of Technology, Japan,
has painstakingly translated all of my OPN columns for the Japanese magazine O
Plus E; these articles appeared in print between 2001 and 2005. Subsequently,
Professor Tsujiuchi arranged for the collection of the translated articles to be
published in book form by the New Technology Communications Co., Ltd. The
Japanese enlarged edition of Classical Optics and its Applications has been in print
since 2006. I am grateful to Professor Tsujiuchi for his dedication and his untiring
efforts to bring these articles to the attention of the technical community in Japan.
For guidance and illuminating discussions, I am indebted to Armis Zakharian of
the Corning Corp., Jeffrey Wilde of the Capella Photonics, Inc., Seiji Yonezawa of
the Comets, Inc. (Japan), Sjoerd Stallinga of the Philips Research Laboratories
(Netherlands), and Kenji Konno of Minolta Corp. (Japan). I am also grateful to the
following colleagues from the College of Optical Sciences, University of Arizona,
for sharing their insights with me: Ewan Wright, Jerome Moloney, Brian Anderson,
Mahmoud Fallahi, Jason Jones, Jose Sasian, Nasser Peyghambarian, James Wyant,
Dennis Howe, Pavel Polynkin, and Pierre Meystre. Special thanks are due to Ewan
Wright, Armis Zakharian, and Jerome Moloney for granting permission to publish
our joint articles in this volume. (The co-authors of the corresponding chapters are
acknowledged in footnotes to each chapter.)
While working on several chapters of this book, my research has been supported
by the United States Air Force Office of Scientific Research (AFOSR) through
contracts F49620–03–1–0194, FA9550–04–1–0213, FA9550–04–1–0355 awarded
by the Joint Technology Office; I thank Dr Arje Nachman of the AFOSR for his
support of our research program over the past several years. I also would like to
thank the editor at the Cambridge University Press, Dr Simon Capelin, and his
professional staff for their superb handling of the publication of the English
editions of this book. Last but not least, I must mention with deep gratitude the
loving care and support of my wife, Annegret, during the years that this book has
been in preparation. As with previous editions, it is to her and to our children,
Kaveh and Tobias, that the second English edition of the book is dedicated.
Preface to the first edition
I started writing the Engineering column of Optics & Photonics News (OPN) in
early 1997. Since then nearly forty articles have appeared, covering a broad range
of topics in classical optical physics and engineering. My original goal was to
introduce students and practising engineers to some of the most fascinating topics
in classical optics. This I planned to achieve with minimal usage of the
mathematical language that pervades the literature of the field. I had met many
bright students and practitioners who either did not know or did not fully appreciate
some of the major concepts of classical optics such as the Talbot effect, Abbe’s
sine condition, the Goos–Hänchen effect, Hamilton’s internal and external conical
refraction, Zernike’s method of phase contrast, Michelson’s stellar interferometer,
and so on. My columns were going to have little mathematics but an abundance of
pictures and pedagogical arguments, to bring forth the essence of the physics
involved in each phenomenon. In the process, I hoped, the readers would
appreciate the beauty of the subject and, if they found it interesting, would dig
deeper by searching the cited literature.
A unique tool available to me for this purpose was the computer programs
DIFFRACTTM, MULTILAYERTM, and TEMPROFILETM, which I have developed
in the course of my research over the past 20 years. The first of these programs
simulates the propagation of light through optical systems consisting of discrete
elements such as lasers, lenses, mirrors, prisms, phase/amplitude masks, gratings,
polarizers, wave-plates, multilayer stacks, birefringent crystals, diffraction gratings,
and optically active materials. The other two programs simulate the optical and
thermal behavior of multilayer stacks. I have used these programs to generate graphs
and pictures to explain the various phenomena in ways that would promote a better
understanding.
The articles have been successful beyond my wildest dreams. While I had hoped
that a few readers would find something useful in this series, I have received notes,
e-mails, and verbal comments from distinguished scholars around the world who
have found the columns stimulating and helpful. Some teachers informed me that
they use the articles for their classroom teaching, and I have heard of several
xi
Preface to the first edition xii
readers who collect the articles for future reference. All in all, I have been
pleasantly surprised by the positive reaction of the OPN readers to these columns.
Optics & Photonics News is not an archival journal and, therefore, will not be
widely available to future students. Thus I believe that collecting the articles here
in one book, which provides for ease of cross-referencing, will be useful.
Moreover, the book contains additional explanations of topics that were
originally curtailed for lack of space in OPN; it includes corrections to errors
discovered afterwards and incorporates some comments and criticisms made by
OPN readers as well as my answers to these criticisms.
This book covers a broad range of topics: classical diffraction theory, the optics of
crystals, the peculiarities of polarized light, thin-film multilayer stacks and coatings,
geometrical optics and ray-tracing, various forms of optical microscopy, interfer-
ometry, coherence, holography, nonlinear optics, etc. It could serve as a companion
to the principal text used in a number of academic courses in physics, engineering,
and optics; it should be useful for university teachers as a guide to selecting topics for
a graduate-level course; it should be useful also for self-study by graduate students. It
could be used fruitfully by engineers who develop optical systems such as laser
printers, scanners, cameras, displays, image-processing equipment, lasers and laser-
based systems, telescopes, optical storage and communication systems, spectro-
meters, etc. I believe anyone working in the field of optics could benefit from this
book, by being exposed to some of the major concepts and ideas (developed over the
last three centuries) that shape our modern understanding of optics.
Some of the original OPN columns were written jointly with colleagues and
students; these are identified in the footnotes and the corresponding co-authors
acknowledged. I thank Ewan Wright and Rongguang Liang of the Optical Sciences
Center, Lifeng Li of Tsinghua University, Mahmoud Fallahi of Nortel Co., and
Wei-Hung Yeh of Maxoptix Co. for their collaboration as well as for giving
permission to publish our joint papers in this collection. I also would like to
acknowledge the late Peter Franken, Pierre Meystre, Yung-Chieh Hsieh, Dennis
Howe, Glenn Sincerbox, Harrison Barrett, Roland Shack, José Sasian, Michael
Descour, Arvind Marathay, Ray Chiao, James Wyant, Marc Levenson, Ronald
Gerber, James Burge, Ferry Zijp and Dror Sarid, who shared their valuable insights
with me and/or criticized the drafts of several articles prior to publication. Needless
to say, I am solely responsible for any remaining errors and inaccuracies. For their
help with graphics and word processing, I am grateful to our administrative
assistants Patricia Gransie, Nonie Veccia, Marylou Myers, and Amanda Palma.
Last but not least, I am grateful to my wife, Annegret, who has tolerated me
with love and patience over the past four years while this book was being written.
It is to her and to our children, Kaveh and Tobias, that this book is dedicated.
Introduction
The common threads that run through this book are the classical phenomena of
diffraction, interference, and polarization. Although the reader is expected to be
generally familiar with these electromagnetic phenomena, the book does cover
some of the principles of classical optics in the early chapters. The basic ideas of
diffraction and Fourier optics are introduced in chapters 1 through 4; this intro-
duction is followed by a detailed discussion of spatial and temporal coherence
and of partial polarization in chapters 5 through 8. These concepts are then used
throughout the book to explain phenomena that are either of technological import
or significant in their own right as natural occurrences that deserve attention.
Each chapter is concerned with a single topic (e.g., surface plasmons, dif-
fraction gratings, evanescent coupling, photolithography) and attempts to develop
an understanding of this subject through the use of pictures, examples, numerical
simulations, and logical argument. The reader already familiar with a particular
topic is likely to learn more about its applications, to appreciate better the physics
behind some of the formulas he or she may have previously encountered, and
perhaps even learn a thing or two about the nuances of the subject. For the reader
who is new to the field, our presentation is aimed to provide an introduction, an
intuitive feel for the physical and/or technological issues involved, and, hope-
fully, motivation for digging deeper by consulting the cited references. For the
most part, this book avoids repeating what is already in the open literature,
aiming instead to expose concepts and ideas, ask critical questions, and provide
answers by appealing to the reader’s intuition rather than to his or her math-
ematical skills.
Some of the chapters address fundamental problems that historically have
been crucial to our modern understanding of optics; conical refraction, the
Talbot effect, the principle of holography, and the Ewald–Oseen extinction
theorem are representatives of this class. Other chapters introduce devices and
phenomena of great scientific and technological importance; Fabry–Pérot
1
2 Classical Optics and its Applications
etalons, the magneto-optical Faraday and Kerr effects, and the phenomenon of
total internal reflection fall into this second category. Many of the remaining
chapters single out a tool or an instrument that not only is of immense techno-
logical value but also has its unique principles of operation, worthy of detailed
understanding; examples include various microscopes and telescopes, lithographic
systems, ellipsometers, and so on. Occasionally a theoretical concept or a
numerical method is found that has a wide range of applications; we have devoted a
few chapters to these topics, such as the method of Fox and Li, the beam propa-
gation method, and the concept of reciprocity in classical optics.
The majority of the computer simulations reported in this book were per-
formed with the software packages DIFFRACT, MULTILAYER, and
TEMPROFILE, which I have written in the course of the past twenty years and
which are now commercially available. These programs in turn are based on
theoretical methods and numerical algorithms that are fully documented in sev-
eral of my publications.1,2,3,4,5,6 In a few chapters, I have collaborated with
Professor Lifeng Li (now at the Tsinghua University in China). Here, we have
used Professor Li’s program DELTA, also commercially available, for calcu-
lations pertaining to diffraction gratings. The theoretical foundations of DELTA
are described in Professor Li’s publications.7
Throughout the book, black-and-white pictures will be used to display the
various properties of an optical beam; these include cross-sectional distributions
of intensity, phase, polarization, and the Poynting vector. A unified scheme for
the gray-scale encoding of real-valued functions of two variables is used in all the
chapters, and it is helpful to review these methods at the outset. In the convention
adopted the beam always propagates along the Z-axis, and its cross-sectional
plane is XY. The Cartesian XYZ coordinate system is right-handed, the polar
angles are measured from the positive Z-axis, and the azimuthal angles are
measured from the positive X-axis towards the positive Y-axis. In general, the
beam has three components of polarization along the X-, Y-, and Z-axes of the
coordinate system, that is, its electric field E has components Ex(x, y), Ey(x, y),
and Ez(x, y) at any given cross-sectional plane, say, at z ¼ z0. Since the E-field
components are complex-valued, their complete specification requires two dis-
tributions for each component, namely, amplitude and phase. The following
paragraphs describe in some detail the encoding scheme used for displaying
different cross-sectional properties of the beam and also provide a few examples.
Plots of intensity distribution

The electric-field intensity is the square of the field amplitude at any
given location in space. Thus, for example, the intensity distribution in the
Introduction 3
cross-sectional XY-plane for the E-field component along the X-axis is denoted
by Ix (x, y) ¼ jEx(x, y)j2. Figure 0.1 shows plots of intensity distribution for
the three components of polarization of a Laguerre–Gaussian beam propagating
along the Z-axis. The black pixels represent locations where the intensity is at
its minimum (zero in the present case), the white pixels correspond to the
locations of maximum intensity within the corresponding frame, and the gray
pixels linearly interpolate between these minimum and maximum values. In the
case of Figure 0.1, the beam was taken to be linearly polarized at 45 to the
X-axis, leading to identical distributions for the X- and Y-components of
polarization.
The much weaker Z-component is computed to ensure that the Maxwell
equations will be satisfied for the assumed distributions of the X- and Y-
polarization components. In general, one may assume arbitrary distributions for
Ex and Ey within a given cross-sectional XY-plane. To determine Ez in a self-
consistent manner, one must break up the Ex and Ey distributions into their
plane-wave constituents and proceed to determine Ez for each plane wave that
propagates along the unit vector r ¼ (rx, ry, rz) by requiring the inner product
of E and r to vanish (i.e., Exrx þ Eyry þ Ezrz ¼ 0). One must then superimpose
the Z-components of all the plane waves thus obtained to arrive at the total
distribution of Ez. In Figure 0.1 the peak intensities in the three frames are in the
ratios Ix : Iy : Iz ¼ 1.0 : 1.0 : 1.47 · 107.
Logarithmic plots of intensity distribution

In order to emphasize the weaker regions of an intensity distribution, we will
show on numerous occasions the distribution of the logarithm of the intensity.
First, the intensity distribution is normalized by its peak value, then the base-10
logarithm of the normalized intensity is computed and all values below some
cutoff point are truncated. For instance, if the cutoff is set to a then all values
of the normalized intensity below 10a are reset to 10a; the range of the
logarithm of normalized intensity thus becomes (a, 0). The continuum of gray
levels from black to bright-white is then mapped linearly onto this interval and
used to display plots of normalized intensity on the logarithmic scale. When it is
deemed useful or necessary, the corresponding value of a will be indicated in a
figure’s caption.
Figure 0.2 shows two plots of the same intensity distribution at the focal plane
of a comatic lens. In (a) the distribution is linearly mapped onto the gray-scale,
whereas in (b) the logarithm of intensity with a cutoff at a ¼ 4 is displayed. The
latter is similar to what would be obtained by over-exposing a photographic plate
placed at the focal plane of the lens.
104 a
y/
–10 4
104 b
y/
–104
104 c
y/
–104
–10000 x/ 10000
Figure 0.1 Plots of intensity distribution in the cross-sectional plane of a

Laguerre–Gaussian beam for the three components of the E-field. In each frame
the black pixels represent locations of zero intensity, while the white pixels
represent locations of maximum intensity in the corresponding frame. The beam
is assumed to propagate along the Z-axis, linearly polarized at 45 to the X-axis.
(a) Intensity of the component of polarization along the X-axis, Ix (x, y) ¼ j Ex (x, y)j2,
(b) Iy (x, y) ¼ j Ey (x, y)j2, (c) Iz (x, y) ¼ j Ez (x, y)j2. The peak intensities in (a), (b), (c)
are in the ratios 1.0 :1.0 :1.47 · 107, respectively.
Plots of phase distribution

In several chapters we will show plots of phase distribution in a beam’s cross-
sectional plane. The phase, a modulo-2p entity, will always be limited to a range
less than or equal to 360 . We typically divide the range of phase values for a
given distribution into equal sub-intervals, assigning black to the minimum value,
bright-white to the maximum value, and various gray levels to the values in
between. A sharp discontinuity (from black to white or vice versa) appearing in
Introduction 5
5 a b
y/
–5
–5 x/ +5 –5 x/ +5
Figure 0.2 (a) Intensity distribution in the focal plane of a 0.5NA lens having
1.5k of third-order coma (Seidel aberration). The uniformly distributed incident
beam is assumed to be circularly polarized. In the focal plane, the X-, Y-, and
Z- components of the electric field vector are added together to yield the total
E-field intensity. (b) Same as (a) but on a logarithmic scale with a ¼ 4 (see text).
these phase plots would be of no physical significance, since it merely indicates a

360 phase jump.
Figure 0.3 is a cross-sectional plot of the phase distribution for the Laguerre–
Gaussian beam whose intensity distribution was given in Figure 0.1. The three
frames of Figure 0.3 correspond to the components of polarization along the X-,
Y-, and Z- axes. The black pixels represent the minimum phase, 180 , and the
white pixels correspond to the maximum phase, þ180 ; the gray pixels cover the
continuous range of values in between.
Ellipse of polarization
Consider a collimated beam of light propagating along the Z-axis. In general,
the state of polarization of the beam at any given point is elliptical, as shown in
Figure 0.4. So long as the electric-field vector E may be assumed to be confined to
the XY-plane, it may be resolved into two orthogonal components, along the X- and
Y- axes say. If Ex and Ey happen to be in phase, the polarization will be linear along
some direction specified by the angle q. If, on the other hand, the phase difference
between Ex and Ey is 90 then the polarization will be elliptical, the major and
minor axes of the ellipse lying along the X- and Y-axes. In general, the phase
difference between Ex and Ey is somewhere between 0 and 360 , giving rise to an
ellipse whose major axis has an angle q with the X-axis and whose ellipticity is
given by the angle g. When the polarization is linear, g ¼ 0 ; for light that is right
circularly polarized (RCP), g ¼ þ45 , whereas for light that is left circularly
polarized (LCP), g ¼ 45 . In general, 90 < q 90 and 45 g 45 .
Figure 0.5 shows cross-sectional plots of intensity and polarization state for a
beam with a highly non-uniform state of polarization. Frame (a) is the logarithmic
104 a
y/
–10 4
104
b
y/
4
–10
104 c
y/
–10 4
–10000 x/ 10000
Figure 0.3 Plots of phase distribution in the cross-sectional plane of the

Laguerre–Gaussian beam depicted in Figure 0.1. Frames (a), (b), and (c)
correspond, respectively, to the components of the E-field along the X-, Y-, and
Z- coordinate axes. In each frame the black pixels represent a phase of180 and
the white pixels correspond to a phase of þ180 ; the gray pixels linearly
interpolate between these two extreme values.
intensity pattern in the XY-plane. The polarization rotation angle q(x, y) is depicted
in (b), while the ellipticity g(x, y) is shown in (c). The gray-scale in Figure 0.5(b) is
a linear map of the values of q from 90 (black) to þ90 (white). Similarly, the
plot of g in Figure 0.5(c) is linearly encoded in gray-scale, with black representing
45 and white representing þ45 .
In the plot of q depicted in Figure 0.5(b), there are random-looking jumps
between black and bright-white pixels. This is due to the ambiguity of the
polarization rotation angle when either the E-field intensity is zero or the ellipticity
g is 45 . In these regions, a small numerical error could readily cause a discrete
jump between qmin ¼ 90 and qmax ¼ þ90 .
Introduction 7
Figure 0.4 The ellipse of polarization is uniquely specified by Ex and Ey, the
complex-valued electric field components along the X- and Y- axes. The major
axis of the ellipse makes an angle q with the X-direction, and the angle g facing
the minor axis represents the polarization ellipticity.
150 a
y/
–150
150 b
y/
–150
150 c
y/
–150
–150 x/ 150
Figure 0.5 Distributions of intensity and polarization in the cross-section of a

beam having a non-uniform polarization state. (a) Logarithmic plot of intensity
distribution having cutoff at a ¼ 4. (b) Polarization rotation angle q; the gray-
scale is linearly mapped onto q, from black at qmin ¼ 90 to bright-white at
qmax ¼ þ90 . (c) Polarization ellipticity g; the gray-scale is linearly mapped onto
g, from black at gmin ¼ 45 to bright-white at gmax ¼ þ45 .
References for the Introduction

1 M. Mansuripur, The Physical Principles of Magneto-optical Recording, Cambridge
University Press, UK, 1995.
2 M. Mansuripur, Distribution of light at and near the focus of high numerical aperture
objectives, J. Opt. Soc. Am. 3, 2086 (1986).
3 M. Mansuripur, Certain computational aspects of vector diffraction problems,
J. Opt. Soc. Am. A 6, 786 (1989). See also the erratum in J. Opt. Soc. Am. A 10, 382–383
(1993).
4 M. Mansuripur, Analysis of multilayer thin film structures containing magneto-
optic and anisotropic media at oblique incidence using 2 · 2 matrices, J. Appl.
Phys. 67, 6466–6475 (1990).
5 M. Mansuripur, G. A. N. Connell, and J. W. Goodman, Laser-induced local heating
of multilayers, Appl. Opt. 21, 1106 (1982).
6 M. Mansuripur and G. A. N. Connell, Laser induced local heating of moving
multilayer media, Appl. Opt. 22, 666 (1983).
7 Lifeng Li, Multilayer-coated diffraction gratings: differential method of Chandezon
et al. revisited, J. Opt. Soc. Am. A 11, 2816–2828 (1994).
1
Abbe’s sine condition
Ernst Abbe (1840–1905), professor of physics and mathematics and director

of the astronomical observatory at Jena, was also the research director of the
Zeiss optical works. In 1868 he invented the apochromatic lens, thus elimin-
ating the primary and secondary color distortion in microscopes. Abbe
developed a clear theoretical understanding of limits to resolution and mag-
nification in optical image-forming systems and discovered the sine condition
for a lens to form a sharp image without the defects of coma and spherical
aberration. (Jena Review, 1965, Zeiss Archive, Courtesy AIP Emilio Segré
Visual Archives.)
Ernst Abbe (1840–1905), professor of physics and mathematics at the University of

Jena, Germany, and major partner in the Carl Zeiss company, made important
contributions to the theory and practice of optical microscopy.1 His compound
microscope was a superb optical design based on a theoretical understanding
of diffraction and minimization of the effects of aberrations.2 Abbe enunciated
his famous sine condition regarding the axial point in the object plane of a
centered image-forming system such as a microscope or a telescope. When this
9
condition is satisfied, “aberration-free” imaging of the object points located in

the vicinity of the optical axis is assured.1,2,3,4,5,6 This chapter provides an
heuristic description of the sine condition, which, in the words of Conrady, is
“one of the most remarkable and labor-saving theorems in the whole realm of
applied optics”.7
As the chapter follows a rather unconventional approach towards explain-
ing the sine condition, it is worthwhile to highlight its main features at the
outset. An introduction of the necessary geometric-optical concepts provides
the basis for defining the sine condition. This is followed by establishing, for
an axial object point, a one-to-one mapping between the principal planes of
the imaging system. The wavefront entering the system at the first principal
plane (p.p.) is thus related to that emerging from the second p.p.
To describe the imaging of near-axis regions, we switch to a wave-optical
viewpoint. Assuming that the axial object point is shifted to a nearby off-
axis location, we derive the spatial phase modulation imparted to the emergent
wavefront in consequence of this small shift. By then it should be apparent that
aberration-free imaging of the off-axis point requires this spatial phase modulation
to be linear in a certain coordinate system and that Abbe’s sine condition is both
necessary and sufficient to guarantee this linearity.
A lens that violates the sine condition

To appreciate the significance of Abbe’s sine condition consider the plano-convex
lens shown in Figure 1.1. A collimated beam of light propagating along the optical
axis Z enters the flat facet of this lens and, upon exiting the second, hyperboloidal,
surface, converges toward the focal point. The conic constant of the second surface
is chosen to bring the beam to a perfect (i.e., diffraction-limited) focus at the rear
focal plane of the lens. The logarithmic plot of intensity distribution at the focal
plane (see Figure 1.2(a)) reveals the focused spot to be the well-known Airy pattern
for this 0.75NA lens.
If the incident beam is tilted by a small amount, the focus shifts to an off-axis
location but, more importantly, it acquires a significant amount of coma (see
Figure 1.2(b)). Thus it is clear that a lens that works well for an axial object point
is not necessarily suitable for the imaging of near-axis regions. The sine condition
is intended to alleviate this problem. For comparison with a case to be described
later, Figure 1.2(c) shows the phase distribution of the oblique beam at the
front facet of the plano-convex lens; similarly Figure 1.2(d) shows the phase
distribution of the emergent beam (minus the curvature) at the second p.p. Note
that the clear aperture at the second p.p. is reduced in size and that the emergent
phase pattern is “compressed” toward the optical axis in a nonlinear fashion. As
f = 1.1133 mm
NA = 0.75
Second
principal
plane
Figure 1.1 A plano-convex lens brings a collimated beam to perfect focus on

an axial point. The lens is designed for k ¼ 633 nm; it has a 4 mm diameter clear
aperture, a focal length of 1.1133 mm, and a numerical aperture of NA ¼ 0.75.
The refractive index of the lens glass n ¼ 2.5, its thickness at the center is 1 mm,
and its hyperboloidal surface has radius of curvature Rc¼1.67 mm and conic
constant k ¼ n2 ¼ 6.25. The second principal plane of this lens is tangent to
its curved surface at the apex. Both surfaces of the lens are assumed to be
antireflection coated.
we shall see below, the emergent phase pattern is quite different for a lens that
does satisfy the sine condition.
Geometric-optical concepts
The sine condition applies to a centered optical system designed for “aberration-
free” imaging of a small patch within the object plane to a corresponding patch
within the image plane (see Figure 1.3). The imaging system is intended for a
given pair of conjugate planes, so that the distance z0 between the object and the
first p.p. of the system is fixed, as is the distance z1 between the image and
the second p.p. The lens formula 1/z0þ1/z1¼1/f, where f is the focal length of the
system, applies here.5
Throughout this chapter, attention is confined to systems where both the object
and image are in air; extension of the results to situations where the object space
and image space have differing refractive indices (e.g., immersion-oil micro-
scopy) is straightforward but is not discussed.4,5
In the present context, “aberration-free” imaging means that a cone of light
emanating from any point (x0, y0) in the small patch within the object plane, when
captured by the optical system is turned into a convergent cone that – to a first
approximation in the relevant parameters – comes to focus at (x1, y1) in the image
7.5 a b
y/
–7.5
3250 c d
y/
–3250
Figure 1.2 (a) Logarithmic plot of intensity distribution at the focal plane of
the plano-convex lens of Figure 1.1 for a circularly polarized, collimated beam
traveling along the optical axis. (b) Same as (a) but for an obliquely incident
beam traveling at 0.076 relative to the optical axis. (c) Distribution of phase for
the oblique beam entering the lens at its flat surface. The gray-scale covers the
interval from 180 (black) to þ180 (white). (d) Distribution of phase for the
oblique beam emerging from the lens at its second p.p.
plane (see Figure 1.4).4,5 The point (x1, y1) is conjugate to (i.e., the Gaussian
image of) the point (x0, y0). Since the system is circularly symmetric around the
optical axis, the axial point at the center of the object plane is imaged to
the axial point at the center of the image plane. Denoting the distance between
(x0, y0) and the origin of the object plane by d0 and, similarly, the distance
between (x1, y1) and the origin of the image plane by d1, the transverse
magnification m of the system is d1/d0. It is not difficult to show that m is also
equal to z1/z0 (see Figure 1.3).
Principal planes
The concept of the principal planes is rooted in paraxial ray-tracing (i.e.,
Gaussian optics), where the angles between the rays and the optical axis are so
small that the sine and the tangent of each angle can be approximated by the
X0 X X X1
First Second
principal principal
plane plane
Object Image
Imaging system
Y0 Y Y Y1
Figure 1.3 A small planar object in the vicinity of the optical axis in the X0Y0-plane
is imaged onto a small region of the X1Y1-plane. The principal planes of the imaging
system are also shown. The object and image planes are assumed to be in air, so that the
refractive indices of both the object space and the image space may be set to unity.
Object
plane
(x1, y1)
d1
Z
d0
(x0, y0) Image

plane
First p.p. Second p.p.
Figure 1.4 The cone of light emanating from an off-axis object point (x0, y0) is
captured by the imaging system and brought to focus at the corresponding image
point (x1, y1). Note that beyond the paraxial regime the rays entering the first p.p. at
a given height do not necessarily emerge from the second p.p. at the same height.
value of the angle itself, sin h tan h h. In the neighborhood of the optical axis,
therefore, the entire system may be represented by a 2 · 2 matrix, and the
principal planes are uniquely determined from this so-called ABCD matrix of the
system.5
The principal planes are conjugate planes with unit transverse magnification. A
ray entering the first p.p. at a certain height h will emerge from the second p.p. at
the same height, as shown in Figure 1.5(a). Thus h z0h0 z1h1, where h0 and h1
are the angles of the incident and the emergent rays with the optical axis. Note
that, within the framework of the paraxial approximation, the system’s entrance
aperture at the first p.p. is identical in size and shape to the exit aperture located at
the second p.p. (The term aperture as used here should not be confused with
pupil, which has a more specific meaning in geometrical optics. The entrance and
exit pupils also define the boundaries of the cones of light that enter and exit the
system, but the pupils are not necessarily located at the principal planes.)
Beyond the paraxial regime, the principal planes cease to be conjugate planes.
Depending on its direction, a ray entering the first p.p. at a given height h might
emerge from different locations on the second p.p. One might confine attention to a
specific set of rays, such as those emanating from the axial point in the object plane,
in order to fix the directions of rays that enter the system. Yet there is no guarantee
that the height h of a ray on entering the first p.p. will remain the same when it
emerges from the second p.p. Of course one can impose this as a requirement on the
system, but many other possibilities exist that are equally plausible, as long as they
conform to the constraints of the paraxial regime. Abbe’s sine condition is one such
requirement placed on the heights of the entering and emerging rays.
The sine condition

Let us define two spherical surfaces, one in the object space, centered on the axial
object point and tangent to the first p.p., and the other in the image space, centered
on the axial image point and tangent to the second p.p. (see Figure 1.5(b)).
Instead of assigning heights to the rays in the principal planes, the heights are
assigned at the points where the rays cross these spherical surfaces. Thus, upon
entering the system, h ¼ z0 sin h0. (If the height were assigned at the principal
plane, the above expression would be written with tangent instead of sine.)
Abbe’s sine condition requires that all rays emanating from the axial object point
within the incident cone must emerge in the image space, where they form a
converging cone toward the axial image point, at the same height at which they
entered the system.4
As long as the rays are close to the optical axis (where the spheres are tangent
to the principal planes), the tangent and sine of a given angle are nearly the same.
Thus Abbe’s sine condition is consistent with the fact that, in the paraxial regime,
the principal planes are unit-magnification conjugate planes. For the rays beyond
the paraxial region, sin h deviates from tan h and the height of a given ray at the
entrance sphere is no longer the same as its height at the first p.p. (Similarly,
(a)
u0 u1
Z
First p.p Second p.p
(b)
u0 u1
Z
Axial Axial
Object Point Image Point
Figure 1.5 (a) In the paraxial regime the height h of a ray is measured from the
optical axis in the principal planes. (b) In systems that operate beyond the
paraxial regime one may define the ray height at the point where the ray crosses
a reference sphere. When a system satisfies Abbe’s sine condition the height of a
ray thus defined remains the same upon entering and exiting the system.
the height of an emergent ray at the exit sphere differs from its height at the
second p.p.) In a sense, therefore, the sine condition requires the bending of the
principal planes into spheres to preserve the paraxial property that a ray entering
the system at a given height emerges from the system at the same height.
Whereas in the paraxial regime the angular magnification h1/h0 ¼ 1/m, where
m is the transverse magnification of the system, it is the ratio (sin h1)/(sin h0) that
equals 1/m in a system satisfying the sine condition. This turns out to be of crucial
significance for the image-forming system, as will be shown below. To empha-
size the point, note that in the system of Figure 1.5(a), where the entering and
emerging ray heights are equal at the principal planes, the ratio (tan h1)/(tan h0)
equals 1/m, whereas in the system of Figure 1.5(b), which satisfies Abbe’s sine
condition, the relevant ratio is (sin h1)/(sin h0).
Aplanatic system
A system that yields an aberration-free image of the axial object point and satisfies
Abbe’s sine condition is said to be “aplanatic”.4,5 Many imaging systems in use
today satisfy these conditions to a good approximation, if not exactly. Note that the
clear-aperture diameter of an aplanatic system as seen on the first p.p. is no longer
equal to that on the second p.p. If NA0 is the numerical aperture of the largest
cone of light emanating from the axial object point and captured by the system,
the aperture radius on the entrance sphere is z0NA0 whereas that on the first p.p. is
z0 tan[sin1(NA0)]. Similarly, in the image space the aperture radius on the exit
sphere is z1NA1 while that on the second p.p. is z1 tan[sin1(NA1)]. Abbe’s sine
condition guarantees that z0NA0 ¼ z1NA1 but, unless the imaging system has unit
magnification, the aperture radii at the two principal planes are not equal.
What is surprising about the sine condition is that a requirement imposed
solely on the cones of light corresponding to the on-axis points affects the quality
of imaging for nearby off-axis points: once the sine condition has been satisfied,
all near-axis points within the object plane will be imaged, essentially free of
aberration, to their conjugates in the image plane. Without the sine condition,
however, images of the near-axis points would be degraded by aberrations, most
prominently by coma. It is this surprising property of the sine condition that we
shall elucidate further.
The wave-optical viewpoint

Having secured a one-to-one mapping between the distribution of light entering
the first p.p. and that exiting the second p.p, for an axial object point, we now
switch to the viewpoint of wave optics and consider the perturbation of the
wavefront in response to a slight off-axis shift of the axial object point.
In the diffraction analysis of lenses conducted within the paraxial approxima-
tion, it is customary to assign to the second p.p. the same complex-amplitude
distribution that exists on the first p.p. This distribution is then augmented by
aberrations of the lens, if any, to account for deviations of the emergent wavefront
from perfect sphericity.8 Thus if A1(x,y) represents the complex-amplitude
distribution at the first p.p., the distribution at the second p.p. will be written
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
A2 ðx; yÞ ¼ A1 ðx; yÞ exp½ið2p=kÞWðx; yÞ exp½ið2p=kÞ x2 þ y2 þ f 2 : ð1:1Þ
Here k is the wavelength of the light, W(x, y) represents wavefront aberrations,

and the second exponential factor corresponds to a perfect spherical wavefront
converging toward the focal point in the image space.
For wide-aperture systems, Eq. (1.1) must be modified to account for devi-
ations from the paraxial regime. For example, if the ray emerging from (x,y) in
the second p.p. enters the first p.p. at (x0 , y0 ), then A1(x, y) in Eq. (1.1) must be
replaced by A1(x0 , y0 ), and the Jacobian of the transformation between the two
principal planes must be properly taken into account, to preserve the optical
energy throughput of the system.
Strictly speaking, since in non-paraxial regions the principal planes are no longer
conjugate planes, it follows that a one-to-one mapping between these planes is
meaningless. In practice, however, the field of view of the lens is so small that a
cone of light emanating from any point within the field of view is essentially the
same as the axial cone in Figure 1.5(b), but endowed with some form of phase/
amplitude modulation. Thus the correspondence between a pair of points such as
(x0 , y0 ) on the first p.p. and (x, y) on the second p.p., established for the axial cone,
remains approximately valid for all object points. Any phase/amplitude perturb-
ation affecting the beam at (x0 , y0 ) can then be transferred directly to the beam at
(x, y), and the resulting distribution within the second p.p. may be used as the initial
distribution for further propagation through the image space.
(Many authors prefer to use the amplitude distribution over the spherical exit
surface in Figure 1.5(b) as the initial distribution, without ever referring to the
principal planes. If one is interested in diffraction analysis using a plane
wave spectrum, however, one should start with initial conditions that are
defined on a flat surface, in which case the second p.p. provides a natural frame
of reference.)
Wavefront perturbation due to off-axis shift of the object point

The distribution of the complex amplitude at the first p.p. due to a cone of light
emanating from the off-axis point (x0, y0) may be determined by reference to
Figure 1.6. The distance from (x0 , y0 ) to the off-axis point differs from that to the
on-axis point by
Dl x0 Sx0 þ y0 Sy0 : ð1:2Þ
X X
(x,y)
(x,y)
S (x1,y1)
S
Z
(0,0) (0,0)
(x0,y0)
Figure 1.6 The ray leaving the off-axis point (x0, y0) and arriving at (x0, y0) will
travel a slightly different distance than the ray from the axial point (0, 0) that
travels along S0 toward the same location. When (x0, y0) is sufficiently close to the
optical axis, the path-length difference between these two rays can be approxi-
mated by the projection on S0 of the line joining (x0, y0) to the point at the origin.
The same argument applies to the conjugate rays in the image space.
To a first approximation, therefore, upon arrival at the first p.p. the cone of light
that originates at (x0, y0) will be the same as that which originated from the axial
point, albeit with a modulation by the following phase factor:
exp½ið2p=kÞDl exp½ið2p=kÞðx0 Sx0 þ y0 Sy0 Þ: ð1:3Þ
Note that the phase in Eq. (1.3) is linear in (S0x, S0y) but not in (x0 , y0 ). The same
phase factor will appear on the beam at (x, y) on the second p.p. Now, and this is
the crux of the matter, if the sine condition is satisfied then this phase factor can
be replaced by exp[i(2p/k)(x1Sx þ y1Sy)], because the angular magnification
between (S0x, S0y) and (Sx, Sy) is exactly the reverse of the transverse magnification
m between (x0, y0) and (x1, y1). The distribution at the second p.p. now corres-
ponds to a spherical wavefront, converging toward (x1, y1) and having no
aberrations whatsoever. This is the essence of the sine condition, which cannot
be over-emphasized; it is the reason why there is “aberration-free” imaging of
near-axial points.
A wide-aperture aplanat
As an example, consider an ideal infinite-conjugate aplanatic lens having
z0 ¼ 1, NA0 ¼ 0, z1 ¼ f ¼ 4000k and NA1 ¼ 0.75. The phase pattern of an obliquely
incident plane wave at the first p.p. of this lens is shown in Figure 1.7(a). The
beam has a linear phase over the entire entrance aperture, as expected of a plane
wave at oblique incidence. Upon emerging from the second p.p. the phase
pattern of the beam is that of Figure 1.7(b). In compliance with the sine con-
dition the exit aperture is seen to be larger than the entrance aperture,
and the phase pattern has undergone some sort of nonlinear “stretching”.
(The emergent phase pattern in Figure 1.7(b), however, is nonlinear because
it is displayed in the x, y coordinates; in the coordinates Sx, Sy it would be
perfectly linear.)
The emergent beam comes to focus at the focal plane of the lens, creating
the off-axis Airy pattern shown in Figure 1.7(c). For comparison, the on-axis
focused spot of the same lens is also shown in the figure. As expected, the
off-axis spot is free from aberrations, and the two spots are essentially
identical.
It is not difficult to design an aplanat with the characteristics of the lens in the
above example; a specific design is shown in Figure 1.8. The various parameters
of this meniscus, which consists of two conic surfaces, are listed in the figure
caption.
Offense against the sine condition

Let us now examine the special case of a lens in which the ray heights have been
made equal at the principal planes. Here (x, y) ¼ (x0 , y0 ), and the difference
between the actual and the ideal (i.e., aberration-free) emergent wavefronts will be

W ðx; yÞ ¼ ðx0 Sx0 þ y0 Sy0 Þ x1 Sx þ y1 Sy : ð1:4Þ
Note that Sx and Sy are proportional to sin h, but in the present case it is tan h that
is magnified by 1/m. A Taylor series expansion yields
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
tan h ¼ sin h= 1 sin2 h ¼ sin h þ 12 sin3 h þ 38 sin5 h þ : ð1:5Þ
To a first approximation, therefore, the difference between sin h and tan h

is proportional to sin3 h. This difference, when inserted in Eq. (1.4), produces
primary coma. Thus when the rays that enter at a given height on the first p.p.
emerge at the same height on the second p.p., perfect imaging of the axial point
results in comatic imaging of the near-axis points.
Similar arguments may be advanced for systems that violate the sine con-
dition in ways other than described above. In general, offense against the sine
condition results in primary and higher-order coma in near-axis regions of the
image plane.
4700 a
y/
–4700
4700 b
y/
–4700
7.5
c
y/
–7.5
–7.5 x/ 7.5
Figure 1.7 (a) Distribution of phase at the first p.p. of an infinite-conjugate

lens having NA1 ¼ 0.75 and f ¼ 4000k. The entrance aperture radius is 3000k,
and the incident beam propagates at h ¼ 0.076 relative to the optical axis. The
gray-scale covers the interval from 180 (black) to þ180 (white). (b) Dis-
tribution of phase at the second p.p. Since the lens satisfies Abbe’s sine condition
the exit aperture radius is 4536k. (c) Logarithmic plot of intensity distribution at
the focal plane showing the axial focused spot (center) and the off-axis spot
corresponding to an oblique incidence angle of h ¼ 0.076 . The spots are nearly
identical; both are substantially free from aberrations.
Second
principal
plane
Figure 1.8 An aplanatic meniscus lens brings collimated beams to diffraction-

limited focus within its focal plane in the vicinity of the optical axis. This 4 mm
diameter lens has f ¼ 2.6733mm and NA ¼ 0.75. The refractive index of the lens
glass is n ¼ 2.49486, its thickness at the center is 1 mm, and its conic surfaces have
the following radii of curvature and conic constants: first surface, Rc ¼ 2.26875 mm,
k ¼ 0.20945; second surface, Rc ¼ 3.87493 mm, k ¼ 0.08173. The second princi-
pal plane is 0.2894 mm to the right of the first surface’s vertex.
The image of a diffraction grating

An appealing argument in favor of the sine condition involves the image of a
diffraction grating.9 Consider a small grating of period P placed perpendicular to
the optical axis in the object plane of the system of Figure 1.3; the illumination is
coherent, collimated, and monochromatic with wavelength k. The nth diffraction
order leaves the grating at the Bragg angle hn relative to the optical axis, where
sin hn¼nk/P. In the image plane the grating period is mP, where m is the trans-
verse magnification of the system. Therefore, to obtain a distortion-free image it
is necessary that all the sin hn be magnified by 1/m; in other words, the sine
condition must be satisfied.
References for Chapter 1

1 E. Abbe, Jenaisch. Ges. Med. Naturw. (1879); also Carl. Repert. Phys. 16, 303 (1880).
2 C. Hockin, J. Roy. Micro. Soc. (2) 4, 337 (1884).
3 A. B. Porter, Phil. Mag. (6) 11, 154 (1906).
4 M. Born and E. Wolf, Principles of Optics, sixth edition, Pergamon Press, Oxford,
1980.
5 M. V. Klein, Optics, Wiley, New York, 1970.
6 J. M. Stone, Radiation and Optics, McGraw-Hill, New York, 1963.
7 A. E. Conrady, Applied Optics and Optical Design, Dover, New York, 1957.
8 J. W. Goodman, Introduction to Fourier Optics, McGraw-Hill, New York, 1968.
9 Douglas Goodman, private communication.
Siméon Denis Poisson

2
Fourier optics
The classical theory of diffraction originated in the work of the French physicist
Augustin Jean Fresnel, in the first quarter of the nineteenth century. Fresnel’s
ideas were subsequently expanded and elaborated by, among others, William
Rowan Hamilton, Gustav Kirchhoff, George Biddell Airy, John William Strutt
(Lord Rayleigh), Ernst Abbe, and Arnold Sommerfeld, leading to a complete
understanding of light in its wave aspects.1
The Fourier-transform operation occurs naturally in any formulation of the
theory of diffraction, giving rise to a body of literature that has come to be known
as Fourier optics.2 The prominence of Fourier transforms in physical optics is
rooted in the fact that any spatial distribution of the complex amplitude of light
can be considered a superposition of plane waves.3 (Plane waves, of course, are
eigenfunctions of Maxwell’s equations for the propagation of electromagnetic
fields through homogeneous media.1,4)
Many students of Fourier optics are intimidated by the approximations
involved in deriving its basic formulas, but it turns out that the majority of these
approximations are in fact unnecessary: by starting from a plane-wave expansion
of the light amplitude distribution, rather than the traditional Huygens’
principle,1,2,4 one can readily arrive at the fundamental results of the classical
theory either directly or after applying the stationary-phase approximation.1,3
(For a detailed discussion of the stationary-phase method see the appendix to this
chapter.)
The goal of the present chapter is to show how decomposition into, and
subsequent superposition of, plane waves can lead straightforwardly to the near-
field (Fresnel) and far-field (Fraunhofer) formulas, to elucidation of the Fourier
transforming properties of a lens, and to the essence of Abbe’s theory of image
formation. Along the way, several numerical examples will demonstrate the
utility of the derived formulas.
23
George Biddell Airy Gustav Robert Kirchhoff

Augustin Jean Fresnel (1788–1827). His work in optics received scant public
recognition during his lifetime, but Fresnel maintained that not even acclaim
from distinguished colleagues could compare with the pleasure of discovering a
theoretical truth or confirming a calculation experimentally. (Photo: Smithsonian
Institution, courtesy of AIP Emilio Segré Visual Archives.)
Jean Baptiste Joseph Fourier (1768–1830), began to work on the theory of heat
around 1804 and by 1807 had completed a memoir, On the Propagation of Heat in
Solid Bodies, in which periodic functions were expressed as the sum of an infinite
series of sines and cosines. Lagrange and Laplace objected to Fourier’s expansion
on the grounds that it lacked generality and rigor. Fourier’s treatise, The Analytical
Theory of Heat, was not published until 1822. (Photo: Deutsches Museum, courtesy
of AIP Emilio Segré Visual Archives.)
Siméon Denis Poisson (1781–1840). In 1818, during the judging of Fresnel’s

paper on diffraction at the Paris Academy, Poisson argued that the consequence of
Fresnel’s theory was the absurdity that the center of the shadow of an opaque disk
should be illuminated. This unexpected effect was subsequently observed. (Photo:
courtesy of AIP Emilio Segré Visual Archives.)
Joseph von Fraunhofer (1787–1826) German physicist who first studied the
dark lines in the spectrum of the Sun. The first to use diffraction gratings, his work
set the stage for the further development of spectroscopy. (Photo: Bavarian
Academy of Sciences, courtesy of AIP Emilio Segré Visual Archives.)
Sir George Biddell Airy (1801–1892), became Lucasian Professor of Math-

ematics at Cambridge only three years after graduating from Trinity College in
1823. He was Astronomer Royal from 1835 to 1881. Airy contributed to the
understanding of the rainbow by studying the effects of diffraction from raindrops.
(Photo: courtesy of AIP Emilio Segré Visual Archives, E. Scott Barr Collection.)
2 Fourier optics 25
Electromagnetic plane waves

A plane-wave solution of Maxwell’s equations in a homogeneous environment
can be expressed as

aðx; y; zÞ ¼ A0 exp ið2p=kÞ xrx þ yry þ zrz : ð2:1aÞ
Here k is the wavelength of the light, A0 is a complex vector representing the

magnitude and state of polarization of the E-field at the origin of the coordinate
system, and r ¼ (rx, ry, rz) is a unit vector specifying the direction of propaga-
tion. In general, rz is related to rx and ry by
rz ¼ ð1 r2x r2y Þ1=2 : ð2:1bÞ
On the one hand, rz will be real-valued if r2x þ r2y 1, in which case the plane
wave is said to be homogeneous or propagating. On the other hand, if r2x þ r2y > 1
then rz becomes imaginary and the plane wave is called inhomogeneous or
evanescent.
In scalar diffraction theory, the state of polarization of the light is ignored and
A0 is treated as a complex constant. Furthermore, if the x, y, z coordinates are
normalized by the wavelength k, then this parameter disappears from all subse-
quent equations. Throughout this chapter, therefore, all lengths will be assumed
to be normalized by k; a propagation distance of 1000, for example, should be
understood as a distance of 1000k.
Expansion into plane waves

Consider the complex-amplitude distribution a(x, y, z ¼ 0) in the XY-plane at
z ¼ 0. The Fourier transform of a(x, y, z ¼ 0) is defined as
ZZ1
Aðrx ; ry Þ ¼ aðx; y; z ¼ 0Þ exp½i2pðxrx þ yry Þ dx dy: ð2:2aÞ
1
Gustav Robert Kirchhoff (1824–1887), Professor of physics at Heidelberg,

Breslau and Berlin. His discovery that a gas absorbs the same wavelengths
that it emits when heated explained the numerous dark lines (Fraunhofer
lines) in the Sun’s spectrum, marking the beginning of a new era in
astronomy. Kirchhoff placed Fresnel’s ideas on a firm theoretical basis,
formulating what is now referred to as the Fresnel–Kirchhoff diffraction
theory. (Photo: courtesy of AIP Emilio Segré Visual Archives, W. F. Meggers
Collection.)
The inverse Fourier transform may therefore be written

ZZ1
aðx; y; z ¼ 0Þ ¼ Aðrx ; ry Þ exp½i2pðxrx þ yry Þ drx dry : ð2:2bÞ
1
Because Maxwell’s equations are linear, any superposition of plane waves within
homogeneous linear media is also a solution of Maxwell’s equations. In general,
the superposition of plane waves in Eq. (2.2b) contains both propagating and
evanescent waves. At a distance z ¼ z0 from the origin, the complex-amplitude
distribution of the light is thus given by
ZZ1
aðx; y; z ¼ z0 Þ ¼ Aðrx ; ry Þ exp½i2pðxrx þ yry þ z0 rz Þdrx dry : ð2:3Þ
1
Equation (2.3) is the fundamental formula of the classical theory of diffraction. It

provides the following simple recipe for computing the distribution of the field at
the plane z ¼ z0 given the initial distribution at z ¼ 0:
(i) compute the Fourier transform A(rx,ry) of the initial distribution;
(ii) multiply A(rx, ry) by the phase factor, which may be written as
exp (i2pz0rz) ¼ exp[i2pz0(1rx2ry2)1/2];
(iii) compute the inverse Fourier transform of the resulting function.
The above recipe is applicable to many practical problems, without the need to
introduce any approximations or simplifications. Some consequences of Eq. (2.3)
are explored in the following examples.
Diffraction from a circular aperture

Figure 2.1 shows the computed intensity patterns at various distances from a
circular aperture of radius r0 ¼ 3000 illuminated by a uniform plane wave. From
(a) to (f) the assumed distances from the aperture are z0 ¼ 0, 0.5 · 106, 0.75 · 106,
1.0 · 106, 2.25 · 106, and 9.0 · 106. (These distances correspond to the Fresnel
numbers1 N ¼ r20/z0 ¼ 1, 18, 12, 9, 4, and 1, respectively.) The computations
were carried out by discretizing the initial distribution on a 512 · 512 mesh and
then applying the fast Fourier transform (FFT) algorithm. On a modern personal
computer the time needed for these calculations is less than a second.
Diffraction-free beams
If the propagation phase factor in Eq. (2.3) happens to be a constant then it can
be taken out of the integral, in which case, aside from a multiplicative phase
2 Fourier optics 27
a b c
d e f
–4000 x 4000 –4000 x 4000 –4000 x 4000
Figure 2.1 Computed intensity patterns at various distances from a circular

aperture of radius r0 ¼ 3000, illuminated by a uniform plane wave. The
assumed distances from the aperture are (a) z0 ¼ 0, (b) z0 ¼ 0.5 · 106, (c) z0
¼ 0.75 · 106, (d) z0 ¼ 106, (e) z0 ¼ 2.25 · 106, (f) z0 ¼ 9.0 · 106. Note that
the center of the diffracted beam is dark in (b), (c) and (e), while it is bright in
(d) and (f).
factor, the distribution at z ¼ z0 becomes equal to that at z ¼ 0. This occurs if the

Fourier transform A(rx, ry) of the initial distribution happens to be non-zero
only over a circle of fixed radius in the Fourier plane, that is, if A(rx, ry) ¼ 0
everywhere except where r2x þ r2y ¼ q20. Under these circumstances, Eq. (2.3)
yields
h 1=2 i
aðx; y; z ¼ z0 Þ ¼ exp i2pz0 1 q20 aðx; y; z ¼ 0Þ: ð2:4Þ
According to Eq. (2.4), any initial distribution that is confined to a circle of radius
q0 in the Fourier domain will not diffract while propagating along the Z-axis.5 A
particularly simple case occurs when A(rx, ry) ¼ d(qq0), where d(·) is Dirac’s
delta function and q ¼ (r2x þ r2y)1/2. The inverse transform of this delta function is
a zeroth-order Bessel function of the first kind, namely, a(x, y, z ¼ 0) ¼ J0(2pq0r),
where r ¼ (x2 þ y2)1/2.
Needless to say, any azimuthal variation of the amplitude and/or phase of the
above delta function around the circle of radius q0 in the Fourier domain yields
another non-diffracting beam. Moreover, if the radius q0 is less than unity then
the non-diffracting beam will be a propagating beam, whereas q0 > 1 corresponds
to an exponentially attenuating, non-diffracting, evanescent beam.
Poisson’s bright spot

A bright spot appearing at the center of the geometrical shadow of an opaque disk
was first predicted by S. D. Poisson in an attempt to refute Fresnel’s theory of
diffraction. Fresnel’s theory was vindicated, however, when François Arago
confirmed the existence of the bright spot in an experiment.1,4,6
The diagram in Figure 2.2 shows a collimated beam blocked at the center by a
disk of radius r0. The complex-amplitude distribution immediately after the disk
is denoted by a(x, y, z ¼ 0). The volume under the Fourier transform A(rx, ry) of
this distribution over the rxry-plane is zero, because the central value a(0, 0, 0)
of the initial distribution is zero. However, the volume under the Fourier transform
of the beam’s cross-section at z ¼ z0 is not zero, because A(rx, ry) is multiplied by
the phase-factor exp[i2pz0(1q2)1/2], which changes the phase of the Fourier
transform as a function of the radius q in the rxry-plane. A non-zero volume in the
Fourier domain implies that the central value of the distribution a(x ¼ 0, y ¼ 0,
z ¼ z0) is also non-zero, that is, the center of the distribution at z ¼ z0 is no
longer dark. For a disk of radius r0 ¼ 2500, Figure 2.3 shows the computed
intensity distributions (a) immediately after the disk, (b) at z0 ¼ 2.0 · 106, and (c)
at z0 ¼ 4.0 · 106.
Poisson’s bright spot may be considered as the focus of a collimated beam
produced by an opaque disk. The disk, therefore, behaves as a lens, albeit a
dark one;6 an illuminated object placed before the disk forms an image
through the dark lens, as shown in Figure 2.4. In this particular example,
the object, shown in Figure 2.4(a), is a circular aperture partially covered by
X X
r0
Z
Y
z0
Figure 2.2 A collimated beam illuminates an opaque circular disk of radius r0.
At a distance z0 from the disk the intensity distribution in the XY-plane contains a
bright spot at the center of the geometrical shadow of the disk.
2 Fourier optics 29
a
–5000 x 5000
Figure 2.3 Computed intensity patterns at various distances from an opaque

circular disk of radius r0 ¼ 2500, illuminated by a collimated Gaussian beam
having a 1/e (amplitude) radius of 5000. The distances from the disk are
(a) z0 ¼ 0, (b) z0 ¼ 2.0 · 106, (c) z0 ¼ 4.0 · 106.
four small obstacles. The object is back-illuminated incoherently, by an

extended quasi-monochromatic source, through a 0.005NA condenser lens. A
dark lens of radius r0 ¼ 2500 at a distance of 106 from the object produces the
real image shown in Figure 2.4(b) at a distance of 2.0 · 106 behind the dark
lens.
We mention in passing that the incoherence of the illumination is essential
for the success of this imaging process; interference effects totally obscure the
image when the object is coherently illuminated.
a b
–1200 x 1200 –2400 x 2400
Figure 2.4 Incoherent imaging by means of a dark lens. The object in (a) is
illuminated by an extended quasi-monochromatic source through a 0.005NA
condenser of focal length f ¼ 6.0 · 105. The source consists of 529 mutually
incoherent point sources, imaged by the condenser at a distance of Dz ¼ 105
before the object. The dark lens is an opaque circular disk of radius r0 ¼ 2500,
placed a distance of Dz ¼ 106 from the object. The image in (b) is computed at a
distance of z0 ¼ 2.0 · 106 behind the dark lens.
Distribution of light in the far field

As the value of z0 increases, Eq. (2.3) becomes exceedingly difficult to compute,
because the rapid oscillations of the exponential phase factor require dense sam-
pling of the functions in the rx ry-plane. In this regime, however, the stationary-
phase approximation1 becomes applicable.
For a fixed value of (x, y, z0), the exponent under the integral in Eq. (2.3) may
be considered to be a function of (rx, ry). This function has a single stationary
point at (rx0, ry0) ¼ (x, y)/(x2þy2þz20)1/2. At all other points in the rx ry-plane the
complex exponential oscillates so rapidly that the local integral effectively
vanishes; only at the stationary point does the integral yield a non-zero value. At
this point the exponent can be replaced by the first few terms in its Taylor-series
expansion around the stationary point, namely,

xrx þ yry þ z0 rz ðx2 þ y2 þ z20 Þ1=2 1 12 ð1 þ x2 =z20 Þðrx rx0 Þ2

ðxy=z20 Þðrx rx0 Þðry ry0 Þ 12 ð1 þ y2 =z20 Þðry ry0 Þ2 :
ð2:5Þ
The integral in Eq. (2.3) is then readily computed, without further approximations,
yielding

aðx; y; z ¼ z0 Þ i=ðx2 þ y2 þ z20 Þ1=2 exp i2pðx2 þ y2 þ z20 Þ1=2
1=2
· Aðrx0 ; ry0 Þ= 1 þ ðx=z0 Þ2 þ ðy=z0 Þ2 : ð2:6Þ
2 Fourier optics 31
X X
Incident
plane wave s
x x
s0
u
Z
Object
Z0
Figure 2.5 A phase/amplitude object is illuminated by a plane wave propa-

gating along the Z-axis. The diffracted beam is a superposition of plane waves of
differing amplitudes, propagating along directions indicated by the unit vectors
r. The far-field pattern appears at a sufficiently large distance z0 from the object.
Whether the field is observed on the XY-plane at z ¼ z0 or on the spherical
surface of radius z0 centered on the object, the x, y coordinates of a given point
are the usual coordinates measured along the X- and Y-axes. In either case, the
far-field amplitude is proportional to the complex amplitude of the plane wave
whose propagation direction r0 is directly aimed at the observation point.
This is the so-called Fraunhofer (or far-field) distribution arising from the initial
distribution a(x, y, z ¼ 0). The far field is expressed in terms of the Fourier transform
A(rx, ry) of the initial distribution evaluated at (rx0, ry0) ¼ (x, y)/(x2þy2þz20)1/2.
Note how the obliquity factor cos h ¼ 1/[1þ(x/z0)2þ(y/z0)2]1/2 enters the above
equation (see Figure 2.5).
If the far field is observed on a spherical surface of radius z0 centered on the
object (see Figure 2.5) then the curvature phase factor becomes a constant and
(rx0, ry0) reduces to (x/z0, y/z0), yielding the following simple formula for the far-
field pattern on a spherical surface of radius z0:
aðx; y; zÞ ði=z0 Þ expði2pz0 Þ Aðx=z0 ; y=z0 Þ cos h: ð2:7Þ
The conservation of optical power passing through any cross-section of the beam
may be verified by integrating the squared modulus of the functions appearing in
Eqs. (2.6) and (2.7) over their respective domains.
Far field of an annular aperture

To demonstrate the utility of Eq. (2.6), we use as the initial distribution the
narrow ring of light transmitted through an annular aperture (of width 100 and
average radius 1000), shown in Figure 2.6(a). After propagating a distance of 106,
the far-field pattern of Figure 2.6(b) is obtained. (To enhance the weak rings of
this distribution, a gray-scale plot of the logarithm of intensity is displayed.) The
far field is essentially a Bessel beam with a curvature phase factor. To eliminate
the curvature, we use a 0.0075NA lens of focal length f ¼ 106 to collimate the
beam in the far field of the annular aperture. The emerging truncated and colli-
mated Bessel beam at the exit pupil of the lens is shown in Figure 2.6(c). This
beam is not completely diffraction-free because it has a finite diameter. For
instance, after it has propagated a distance of 106 from the exit pupil of the
collimating lens one observes the intensity pattern of Figure 2.6(d). The intensity
distribution after propagating another distance of 2 · 106 is shown in Figure 2.6(e).
Finally, Figure 2.6(f) shows the intensity distribution observed at a distance of
5 · 106. Note how the decay of this truncated Bessel beam starts from the outer
rings and moves toward the center as the beam propagates.
a b c
–1500 x 1500 –8000 x 8000 –8000 x 8000
d e f
–8000 x 8000 –8000 x 8000 –8000 x 8000
Figure 2.6 Logarithmic plots of intensity distribution at various cross-sections

of a beam. (a) A transparent ring (radius 1000, width 100), illuminated with a
collimated uniform beam propagating along the Z-axis. (b) Far-field pattern of the
ring in the XY-plane at z0 ¼ 106. (c) The beam in (b) after collimation by a
0.0075NA lens of focal length f ¼ 106. (d) The collimated beam in (c) after
propagating in free space a distance of 106. (e) The beam in (d) after propagating a
distance of 2.0 · 106. (f) The beam in (e) after propagating a distance of 5.0 · 106.
2 Fourier optics 33
The Airy pattern at the focal plane of a lens

Consider the infinite-conjugate aplanatic lens of focal length f shown in Figure 2.7.
(For a discussion of aplanatism see Chapter 1, Abbe’s sine condition.) To determine
the light amplitude distribution around the focal point F, we need the distribution in
the second principal plane, which is given by

aðx; y; z ¼ 0Þ ¼ a0 ðx1 ; y1 Þ cos3=2 h exp i2pðx2 þ y2 þ f 2 Þ1=2 : ð2:8Þ
The amplitude distribution at the entrance pupil (assumed to coincide with the 1st
principal plane) is denoted by a0(x1, y1). The coordinates at the 1st and 2nd
principal planes are related as follows: (x1, y1) = (fx, fy)/(x2 þ y2 þ f 2)1/2. The
corresponding infinitesimal areas in the two principal planes are in the ratio
cos3 h, where cos h ¼ f/(x2 þ y2 þ f 2)1/2; the amplitude in Eq. (2.8) is therefore
scaled by cos3/2 h to conserve optical power between the entrance and exit pupils.
The exponential phase factor in Eq. (2.8) is the curvature imparted by a perfect
lens to the emergent beam.
To determine, in accordance with Eq. (2.2a), the Fourier transform of the initial
distribution given by Eq. (2.8), we invoke the stationary-phase approximation.1 The
X1 X X2
u F
Z
ra = f NA
Incident
beam
f
z0
Figure 2.7 A collimated beam of light enters an infinity-corrected, aplanatic

lens of focal length f and numerical aperture NA. The entrance and exit pupils are
at the first and second principal planes. A ray entering at a height (x1, y1) on the
first principal plane appears at the same height on the spherical surface centered
at the rear focal point F and tangent to the second principal plane. In the absence
of aberrations, all emergent rays converge to the focal point F. The distribution
in the XY-plane at z ¼ z0 is given by Eq. (2.11).
exponent of the integrand under the Fourier integral may be expanded in a Taylor
series around its stationary point,
ðx0 ; y0 Þ ¼ ð f rx ; f ry Þ=ð1 r2x r2y Þ1=2 ;
yielding
n 1
xrx þ yry þ ðx2 þ y2 þ f 2 Þ1=2 ð1 r2x r2y Þ1=2 f þ 12 ð1 r2x Þðx x0 Þ2
f
rx ry ðx x0 Þð y y0 Þ
o
2
þ 2 ð1 ry Þð y y0 Þ :
1 2
ð2:9Þ
Without any other approximations, the Fourier transform of the initial distribution
is found to be

Aðrx ; ry Þ i fa0 ðf rx ; f ry Þ exp i2pf ð1 r2x r2y Þ1=2 =ð1 r2x r2y Þ1=4 :
ð2:10Þ
When the above function is substituted in Eq. (2.3) we obtain

ZZ

aðx2 ; y2 ; z ¼ z0 Þ i f a0 ðf rx ; f ry Þ=ð1 r2x r2y Þ1=4

· exp i2pðz0 f Þð1 r2x r2y Þ1=2
· exp½i2pðx2 rx þ y2 ry Þ drx dry : ð2:11Þ
For a given distribution a0(x1, y1) at the entrance pupil, Eq. (2.11) gives
the distribution at and near the focal plane of the aplanatic lens of Figure 2.7. If
the final distribution is sought in the focal plane (i.e., z0 ¼ f ) and if the factor
cos1/2h ¼ (1r2xr2y)1/4 is ignored (i.e., the paraxial approximation), then the
focal-plane distribution becomes simply the Fourier transform of the entrance-
pupil distribution. For an aberration-free lens having a circular aperture of
radius ra ¼ fNA, and for a uniform incident beam, the focal- plane distribution is
thus proportional to J1(2pNAr)/r, where J1(·) is the first-order Bessel function of
the first kind and r ¼ (x22þy22)1/2. This is known as the Airy pattern, a plot of
which appears in Figure 2.8.
Fourier-transforming property of a lens

An infinity-corrected lens produces in its focal plane the Fourier transform A0(rx, ry)
of a complex-amplitude distribution a0(x1, y1) placed before the lens. This behavior
2 Fourier optics 35
1.0 5
0.8
J1(2pr)/(pr)
0.6
0.4
–5
–5 x 5
0.2
0.0
0 1 2 3 4 5
r
Figure 2.8 Plot of the Airy function J1(2pr)/pr versus the radial distance r
from the focal point. The first zero of the Airy function is at r 0.61. The inset
shows a logarithmic plot of the intensity distribution at the focal plane of a
0.5NA diffraction-limited lens. This Airy pattern, being the result of a scalar
calculation, shows circular symmetry. In practice, both unpolarized and circu-
larly polarized incident beams produce circularly symmetric Airy patterns.
However, for linearly polarized light the Airy pattern tends to be slightly
elongated along the direction of the incident E-field.
is readily understood if one recognizes that the input distribution is a super-

position of plane waves, each propagating in a different direction. The lens
captures these plane waves and brings them to focus within its focal plane. The
amplitude of each focused spot is thus proportional to the corresponding plane-
wave amplitude. The finite aperture of the lens spreads each focused spot into an
Airy function, giving rise to a focal plane distribution that is the convolution
between the object’s Fourier transform and the lens’s Airy pattern.
To study in some detail the properties of an aplanatic, infinite-conjugate lens,
consider Figure 2.9. Here a plane wave propagating at angle h relative to the
Z-axis enters the lens at its first principal plane. At the entrance pupil (which is
assumed to coincide with the 1st principal plane) the ray heights are the same as
those at the exit pupil, which is a spherical cap of radius f centered at the focal
point F. A ray entering at height x1 has phase 2px1rx, which it retains as it
emerges from the exit pupil. The ray then acquires an additional phase in
Figure 2.9 Fourier transform lens having focal length f and aperture radius
ra ¼ fNA. The incident plane wave makes an angle h with the Z-axis in the XZ-
plane, that is, (rx, ry) ¼ (sin h, 0). The beam emerging from the lens converges to
the point (x2, y2) ¼ ( f sin h, 0) within the focal plane. The height of a ray entering
the lens at the first principal plane is the same as that of the emergent ray
measured on a spherical surface of radius f centered at the rear focal point F.
propagating from the exit pupil to the focus at x2 ¼ frx. The total phase at this
focus (relative to that at F) is thus given by

2
2 1=2
ðx1 ; rx Þ ¼ 2p x1 rx þ ðx1 f rx Þ þ ð f x1 Þ
2
f

1=2 ð2:12Þ
¼ 2pf ðx1 =f Þrx þ 1 þ rx 2ðx1 =f Þrx
2
1 :
For small values of both x1/f and rx, the above expression may be approximated as
(x1,rx) pfrx2, which is independent of x1. The various rays of the plane wave,
having thus acquired the common phase factor exp(ip f rx2), converge to a common
focus in the vicinity of the optical axis. Further away from the axis, of course,
higher-order terms will cause aberration. Unless the lens is properly designed to
correct these aberrations, the acceptable values of NA and rx will indeed be very
small. For example, Figure 2.10 shows plots of (x1, rx)pfr2x versus x1/f for
several values of rx, for a lens having NA ¼ 0.05 and f ¼ 25 000. Note that to keep
the maximum phase deviation at the edge of the pupil below 90 one must restrict
the aperture radius to ra 0.05f and the values of rx to the range within 0.055.
We conclude that, under appropriate conditions, a plane wave entering the
lens at rx ¼ sin h comes to diffraction-limited focus at x2 ¼ frx, with a phase
2 Fourier optics 37
sx = 0.01
0
0.02
–20 0.03
f(x1, sx ) – pfs2x (degrees)

0.04
–40
0.05
–60
0.06
–80
f = 25000, NA = 0.05
–100
–0.050 –0.025 –0.000 0.025 0.050

x1/f
Figure 2.10 Plots of (x1,rx)pfrx2 versus x1/f for several values of rx ¼ sin h
from 0.01 to 0.06 in the system of Figure 2.9. The function is given by
Eq. (2.12), and the specific values of the lens parameters used in the calculations
are NA ¼ 0.05, f ¼ 25 000.
pfr2x ¼ px22/f. Because of the finite aperture of the lens, the focused spot will
be not a geometric point but an Airy pattern of diameter 1/NA. Therefore, for an
object a0(x1, y1) at the entrance pupil the focal-plane distribution is related to the
Fourier transform A0(rx, ry) of the object as follows:

aðx2 ; y2 Þ exp ipðx2 þ y2 Þ=f A0 ðx2 =f ; y2 =f Þ
Airyðx2 ; y2 Þ:
2 2
ð2:13Þ
Needless to say, the range of (x2, y2) in Eq. (2.13) is limited to the region
for which the lens is properly designed to focus the incident plane waves into
diffraction-limited spots. In the absence of aberrations, the angular resolution of
such a lens is solely dependent on the lens-aperture radius ra and is given by
Drx ¼ Dry 0.61/ra. (Like all other spatial dimensions in this chapter, ra is
assumed to be normalized by the wavelength k of the light.)
Similar considerations apply when the object is placed a distance z1 before the
first principal plane. In this case each plane wave leaving the object must travel a
different distance to reach the entrance pupil. By the time it reaches the entrance
pupil, a plane wave traveling along the direction (rx, ry, rz) will have acquired
a phase 2pz1rz, which may be approximated as pz1(r2x þ r2y). Under these
Figure 2.11 (a), (b) Intensity and phase distributions in the XY-plane for an
object and (c), (d) for its Fourier transform. The object is in the front focal plane
of a 0.05NA lens having f ¼ 105, illuminated with a plane wave propagating
along the Z-axis; the Fourier transform is observed in the rear focal plane. The
intensity distribution in the Fourier-transform plane, (c), is displayed on a
logarithmic scale to enhance its weak regions. The phase plots in (b) and (d) are
encoded in gray-scale (black represents 180 , white represents þ180 ).
circumstances, Eq. (2.13) remains valid provided the exponent of the first term on
the right-hand side is multiplied by (1z1/f). In the special case where z1 ¼ f, the
quadratic phase factor in Eq. (2.13) disappears altogether, leaving a simple
Fourier-transform relation between the distributions in the front and rear focal
planes. As an example, Figure 2.11 shows a phase/amplitude object placed in the
front focal plane of a 0.05NA lens (see frames (a) and (b)), and the corresponding
Fourier transform as observed in the rear focal plane (frames (c) and (d)).5
Abbe’s theory of image-formation

Figure 2.12 is a diagram of the basic image-forming system. Both the entrance and
exit pupils are assumed to be at the principal planes of the lens; in compliance with
2 Fourier optics 39
Figure 2.12 Diagram of a simple imaging system. The object and image
distances from the respective principal planes are z1 and z2. The height of a ray
entering the lens is measured on a spherical surface of radius z1 centered at the
axial object point. Similarly, the height of a ray exiting the system is measured
on the spherical surface of radius z2 centered on the axial image point. For any
given ray, the entering and exiting heights are equal. Only one plane wave
(leaving the object at an angle h) is shown. The various rays of this plane wave
converge to a focus in the image space, then continue to propagate to the image
plane.
Abbe’s sine condition, the pupils are spherical caps centered at the axial object and
image points. The distance between the object and the first principal plane is z1 and
that between the second principal plane and the image is z2. The lateral magnifi-
cation of the system, therefore, is M ¼ z2/z1.
A plane wave leaving the object at an angle h relative to the Z-axis emerges
from the exit pupil, each one of its rays having the same height and the same
optical phase as at the entrance pupil. Confining attention to the two-dimensional
XZ-plane, and denoting the direction cosine of a ray in the object space
by rx1 ¼ sin h, the ray height x at the entrance pupil is found from simple
geometry to be
1=2
x ¼ x1 ð1 r2x1 Þ þ rx1 z21 x12 ð1 r2x1 Þ : ð2:14Þ
A ray leaving the object at x1 intersects the image at x2 ¼ Mx1. Obviously, the
ray fan reaching the image plane in Figure 2.12 is not a plane wave. However, it
will be seen that this bundle of rays has a phase distribution that can be expressed
as the sum of a linear term 1 and a nearly quadratic term 2. The linear term is
identical with that of the plane wave leaving the object, namely,
1 ðx2 ; rx2 Þ ¼ 2px2 rx2 ¼ 2px1 rx1 : ð2:15Þ
Since x2 is a version of x1 magnified by a factor of M, rx2 must be a version of rx1

demagnified by a factor of 1/M, so the above equality is exactly satisfied. Note in
Figure 2.12 that although h is the same for all the rays that leave the object within
a given plane wave, the corresponding angle h0 in the image plane varies from ray
to ray. Therefore rx2, which is defined here as rx1/M, equals sin h0 only for the
ray that goes through the center of the image at x2 ¼ 0.
The quadratic phase 2 is acquired while covering the path from x1 at the
object plane to x2 in the image plane. A ray leaving the object at x1 enters the lens
at a height x given by Eq. (2.14), emerges from the exit pupil at the same height
and with the same optical phase as at the entrance pupil, and then proceeds to x2
in the image plane. The phase acquired in going from x1 to x2 relative to that at
the image center may thus be written
1=2
2 ðx1 ; rx1 Þ ¼ 2p ðx x1 Þ=rx1 þ ðx x2 Þ2 þ ðz22 x2 Þ ðz1 þ z2 Þ : ð2:16Þ
Noting that x2 ¼ Mx1, 2 may also be considered a function of x2 and rx1.

Equation (2.16) yields a nearly quadratic phase factor in x2, which may be plotted
for different values of rx1 ¼ sin h. Figure 2.13 shows a set of such plots within a
field of view jx2j < 250 for a system in which z1 ¼ 104 and z2 ¼ 105. Different curves
correspond to different values of h. Note that if the slight differences between these
curves are ignored (and the maximum difference in 2 is only about 10 in the
present example), then the quadratic phase factor 2 is essentially independent of h.
When 2 as a function of x2 is expanded in a Taylor series, the lowest-order
term is found to be

2 ðp=z2 Þ 1 þ ðz1 =z2 Þ ðz1 =z2 Þr2x1 x22 : ð2:17Þ
If z21, this quadratic phase can be ignored, yielding a plane-wave output for a

plane-wave input. However, when 2 is too large to be ignored its dependence on
rx1 may be insignificant. This happens when the magnification z2/z1 is either very
large or very small. The case z2/z11 is obvious when one considers the coefficient
2
of rx1 in Eq. (2.17). In the case of large demagnification, z2/z11, the range of rx1
is limited to jrx1j < z2/z1, rendering 2 essentially independent of rx1 once again.
The quadratic phase factor exp(i2), being more or less independent of h, can
thus be factored out. This means that those plane waves that leave the object and
manage to get through the lens to the image plane have the requisite uniform
amplitude and linear phase expected of a plane wave. These plane waves, when
2 Fourier optics 41
120
z1 = 104, z2 = 105
100
x1 = 0.00
0.25
80
f2(x2, sx1) (degrees)

0.50
0.75
1.00
60
40
20
0
–250 –125 0 125 250
x2
Figure 2.13 Plots of the function 2(x2, rx1) versus x2 for several values of
rx1 ¼ sin h equal to (top to bottom) 0.00, 0.25, 0.50, 0.75, 1.00. (See Figure 2.12
and Eq. (2.16); x2 is related to x1 through x2 ¼ Mx1.) The assumed system
parameters are z1 ¼ 104, z2 ¼ 105. The field of view in the image plane is con-
fined to the region jx2j < 250.
superimposed upon each other, produce in the image plane a magnified (or
demagnified) image of the object. Thus the differences between object and image
are: (i) the image is multiplied by a nearly quadratic phase factor, exp(i2); (ii)
the plane waves having a large angle h miss the lens and, therefore, do not
contribute to the image. This truncation by a circular aperture in the Fourier
domain is equivalent to convolution with an Airy function in the image plane.
The amplitude distribution in the image plane is thus given by

aimage ðx2 ; y2 Þ ¼ expði2 Þ aobject ðx2 =M; y2 =MÞ
Airyðx2 ; y2 Þ : ð2:18Þ
Figure 2.14 shows two examples of coherent imaging through a diffraction-

limited lens. The object’s intensity and phase are shown in Figures 2.14(a), (b).
This object has several fine features which, being smaller than a wavelength, are
below the resolution of any optical imaging system. A coherent and uniform
beam propagating along the Z-axis illuminates the object. The entrance pupil of
the imaging lens, located at z1 ¼ 104, is in the far field of the object. Figures 2.14
(c), (d) show the intensity and phase patterns at the image plane of a 10 ·, 0.6NA
lens. Similarly, Figures 2.14(e), (f) show the intensity and phase distributions in
Figure 2.14 Distributions of intensity (left column) and phase (right column) at
the object and image planes of a coherent imaging system. The phase plots are
encoded in gray scale: black represents 180 , white represents þ180 . (a), (b)
Distributions in the plane of the object. (c), (d) Image obtained with a 10 ·, 0.6NA
objective lens. (e), (f) Image obtained with a 10 ·, 0.95NA objective lens.
the image plane of a 10 ·, 0.95NA lens. The higher-NA lens, capturing more of
the high-frequency Fourier components of the object, yields a superior image.
Both lenses, however, fail to reproduce the very fine features of the object.
Appendix to Chapter 2: The stationary-phase approximation

Consider the two-dimensional integral
ZZ
I¼ f ðx; yÞ exp½iggðx; yÞdx dy; ðA2:1Þ
2 Fourier optics 43
where, in general, f(x, y) is a complex function, g(x, y) is a real function, g is a large

real number, and the domain of integration is a subset of the XY-plane. In the
neighborhood of an arbitrary point (x0, y0), within the domain of integration, small
variations in g(x, y) will be amplified by g; this will result in rapid oscillations of
the phase factor exp[igg(x, y)]. Assuming that f(x, y) in the neighborhood of (x0, y0)
is a slowly varying function, the oscillations result in a negligible contribution from
this neighborhood to the integral. The main contributions to the integral then come
from the regions in which g(x, y) is nearly constant. These regions are in the
vicinity of stationary points (x0, y0), which are defined by the following relation:
@gðx; yÞ=@x ¼ @gðx; yÞ=@y ¼ 0: ðA2:2Þ
Around each stationary point one may expand g(x, y) in a Taylor series up to the
second-order term to obtain
gðx; yÞ gðx0 ; y0 Þ þ 12 gxx ðx0 ; y0 Þðx x0 Þ2

þ gxy ðx0 ; y0 Þðx x0 Þð y y0 Þ þ 12 gyy ðx0 ; y0 Þðy y0 Þ2 : ðA2:3Þ
Replacing the expression for g(x, y) in Eq. (A2.1) with that in Eq. (A2.3), and
taking f(x, y) outside the integral, yields
X Z Z1

I f ðx0 ; y0 Þ exp½iggðx0 ; y0 Þ · exp iðg=2Þ gxx ðx x0 Þ2
1

2 ðA2:4Þ
þ 2gxy ðx x0 Þðy y0 Þ þ gyy ð y y0 Þ dx dy;
where the summation is over all stationary points (x0, y0). Notice that the domain
of integration is now extended to the entire plane, since the contribution to the
integral from regions outside the immediate neighborhood of the stationary points
is, in any event, negligible. The double integral in Eq. (A2.4) can be readily
carried out, yielding
X
I ð2pi=gÞ mj gxx gyy g2xy j1=2 exp½iggðx0 ; y0 Þ f ðx0 ; y0 Þ; ðA2:5Þ
where the summation is again over all stationary points (x0, y0) and the coefficient
m is given by

i if gxx gyy <g2xy
m¼
1 if gxx gyy >g2xy and gxx > < 0:
Equation (A2.5) is the final result of this appendix. If the numerical value of
gxx gyyg2xy happens to be exactly zero at a particular stationary point or if a

stationary point occurs on the boundary of the domain of integration in Eq. (A2.1)
then Eq. (A2.5) no longer applies. In our analysis of diffraction problems,
however, these special cases will not be encountered.

1980.
2 J. W. Goodman, Introduction to Fourier Optics, second edition, McGraw-Hill, New
York, 1996.
3 L. Mandel and E. Wolf, Optical Coherence and Quantum Optics, Cambridge
5 J. Durnin, J. J. Miceli, and J. H. Eberly, Diffraction-free beams, Phys. Rev. Lett. 58,
1499–1501 (1987).
6 F. A. Jenkins and H. E. White, Fundamentals of Optics, fourth edition, McGraw-Hill,
New York, 1976.
3
Effect of polarization on diffraction in systems
of high numerical aperture
The classical theory of diffraction, according to which the distribution of light

at the focal plane of a lens is the Fourier transform of the distribution at its
entrance pupil, is applicable to lenses of moderate numerical aperture (NA).
The incident beam, of course, must be monochromatic and coherent, but its
polarization state is irrelevant since the classical theory is a scalar theory (see
Chapter 2, “Fourier optics”). If the incident beam happens to be a plane wave
and the lens is free from aberrations then the focused spot will have the well-
known Airy pattern. When the incident beam is Gaussian the focused spot
will also be Gaussian, since this particular profile is preserved under Fourier
transformation. In general, arbitrary distributions of the incident beam, with
or without aberrations and defocus, can be transformed numerically, using the
fast Fourier transform (FFT) algorithm, to yield the distribution in the vicinity
of the focus.
There are two basic reasons for the applicability of the classical scalar
theory to systems of moderate NA. The first is that bending of the rays by the
focusing element(s) is fairly small, causing the electromagnetic field vectors
(E and B) before and after the lens to have more or less the same orientations.
A scalar amplitude assigned to each point on the emergent wavefront from a
system having low to moderate values of NA is sufficient to describe its
electromagnetic state, whereas in the high-NA regime one can no longer
ignore the vectorial nature of light. The second reason for the success of the
classical scalar theory (within its proper limits) is that a certain integral – that
which represents the decomposition of a convergent wavefront into its plane-
wave constituents – submits to evaluation by the method of stationary-phase
approximation. The remaining integral – that which represents the superposition
of plane waves arriving at the focal plane – is then calculated with the aid
of Fourier transformation. When the stationary-phase technique fails, so does
the classical scalar theory, as is evidenced, for instance, in systems of very
45
low numerical aperture: The well-known focal-shift phenomenon is but one

manifestation of the failure of the stationary-phase approximation in very-low-NA
systems.1
In the stationary-phase approximation the plane-wave spectrum of the con-
vergent beam at the exit pupil coincides with the light amplitude distribution at
that pupil, thus enabling each geometric-optical ray to represent one plane wave
of the spectrum, namely, that which propagates in the direction of the ray.2 This
correspondence between rays and plane waves, which is an important feature of
many diffraction problems, is therefore understood to be a direct consequence
of the stationary-phase approximation. Now, let h be the angle between a
converging ray in the image space and the optical axis at the focal point. Since
the projection of the wave vector k onto the exit pupil has length k sin h,
whereas the intersection of the ray with the pupil occurs at a radius r ¼ f tan h ,
then in order to convert from light amplitude distribution to the corresponding
plane-wave spectrum one must compress the distribution function at the exit
pupil. Aside from a trivial scaling of the aperture’s radius by the focal length f,
the radial compression must assign to r ¼ sin h the value of the function at
r ¼ tan h; this must be followed by proper normalization to preserve the inte-
grated intensity. The compressed distribution is therefore confined to a disk of
radius NA ¼ sin hmax, where hmax is the angle subtended by the rim of the exit
pupil at the focal point. This scaling, compression, and normalization procedure
is not merely justified on heuristic grounds but, as discussed in the preceding
chapter, is a rigorous consequence of the stationary-phase approximation itself.
For lenses of low to moderate numerical aperture (say, NA < 0.2) the difference
between sin h and tan h is negligible, and the effects of compression can be ignored.
At the exit pupil, the plane-wave spectrum of these lenses is usually the same as the
incident distribution at the entrance pupil, modified only by the presence of aber-
rations. For lenses of high numerical aperture, however, it is necessary to obtain the
exit-pupil distribution (from the knowledge of lens characteristics and the entrance-
pupil distribution) before proceeding to the compression operation. Noteworthy in
this respect is the aplanatic lens, which, by virtue of satisfying Abbe’s sine condition,
guarantees that the compressed exit-pupil distribution is identical with the entrance-
pupil distribution.
Bending of polarization vector

To account for polarization effects at high numerical aperture, one usually
ignores transmission losses at the various surfaces of a lens, assuming that a ray
goes through the system unattenuated but with its polarization vector bent in
accordance with the known laws of refraction.2,3,4,5 (The assumption of losslessness
3 Effect of polarization on diffraction in systems of high numerical aperture 47
X
E
E
Z
E F
E
Figure 3.1 Focusing of linearly polarized light by a high-NA lens, shown in

perspective, causes bending of the polarization vectors. The amount and direc-
tion of bending depend on the coordinates of the ray.
is not necessary here, but it simplifies the problem by enabling the polarization
state of individual rays at the exit pupil to be determined solely on the basis of
their coordinates, without requiring detailed knowledge of the lens structure.) For
a linearly polarized incident beam, Figure 3.1 shows the bending of the E-vector
at two azimuthal positions. The ray at the top of the lens contributes both an
X- and a Z-component to the distribution in the image space, whereas the ray in
the YZ-plane contributes only an X-component. By the same token, rays inter-
mediate between those shown here will contribute to the polarization along all
three axes.
We present a simple treatment of polarization-related phenomena within the
framework of the classical theory of diffraction. This will not be a rigorous
treatment based on Maxwell’s equations; rather, it will be rooted in reasonable
physical arguments based on the bending of rays (or plane waves) by prisms. Our
approach to vector diffraction is in keeping with the spirit of diffraction theory; it
is not exact as far as Maxwell’s equations are concerned but incorporates intuitive
ideas about the propagation of electromagnetic waves.
With reference to Figure 3.2, consider a plane wave propagating along the unit
vector r0 ¼ (0, 0, 1), i.e., along the Z-axis, having linear polarization in the
X-direction. Let a prism be placed in the path of this beam, with orientation such
that the emerging beam would propagate in a direction specified by the unit
vector r1 ¼ (rx, ry, rz). Now, the incident polarization vector E0 ¼ (1, 0, 0) may
be decomposed into two components: one, the so-called p-polarization, is in the
plane of r0 and r1; the other, known as the s-polarization, is perpendicular to this
plane. As the latter component (perpendicular to the r0r1-plane) emerges from
the prism, it will have suffered no deviation in direction. The p-component,
Incident
Beam
p
0 = (0,0,1) Diffracted
s Beam
Prism 1 = (x, y, z)

(Diffraction Grating)
Figure 3.2 Lossless refraction of a polarized plane wave by a prism. The

original direction of propagation is r0 ¼ (0, 0, 1) and the corresponding
polarization vector is E0. After refraction, the beam assumes a new direction
r1 ¼ (rx, ry, rz), and its new polarization state becomes E1. The same geometry
would apply for diffraction of the beam by a grating.
however, will have been reoriented such that it remains perpendicular to the
emergent direction. If it is further assumed that no losses, due to surface
reflections or otherwise, occur in this refraction process, one can use simple
geometry to determine the emerging polarization direction. A similar calculation
can be performed for an incident plane wave linearly polarized along the Y-axis.
Details of these calculations are left to the reader, but the final results are listed in
Table 3.1. Notice that the reorientation of the polarization vector described in
Table 3.1, while a consequence of the refraction of the direction of propagation,
is independent of the particular mechanism responsible for refraction. Given
an initial direction r0 and a direction for the emerging beam r1, one can use
Table 3.1 to identify the emergent components of polarization for an arbitrary
state of incident polarization.
In the stationary-phase approximation each ray is associated with a single
plane wave, the three polarization components of which may be treated inde-
pendently of each other. Therefore, for each of the components Ex, Ey, Ez of the
emergent beam, a single superposition integral (i.e., Fourier transform) yields the
sought-after distribution in the focal plane.
Example
The technique described in the preceding section is quite general and can be
applied to arbitrary incident distributions having arbitrary polarization states,
while taking into account various lens aberrations (including substantial amounts
of defocus). Computed results for an aberration-free, aplanatic lens having NA
¼ sin 75 ¼ 0.966 and f ¼ 3000k are shown in Figure 3.3. The assumed geometry
in these calculations is that depicted in Figure 3.1, where the incident beam is a
Table 3.1. Polarization E1 of a refracted beam when the incident polarization
E0 is along the X- or Y- axes. The refraction (from r0 to r1) is lossless
Incident polarization Emergent polarization

with r0 ¼ (0, 0, 1) with r1 ¼ (rx, ry, rz)

E0 ¼ (1, 0, 0) E1 ¼ 1 ½r2x =ð1 þ rz Þ; rx ry =ð1 þ rz Þ; rx

E0 ¼ (0, 1, 0) E1 ¼ rx ry =ð1 þ rz Þ; 1 ½r2y =ð1 þ rz Þ; ry
(a) | Ex | 2 (b) | Ey | 2
+3 +3
y/ +3 y/ +3
x/
–3 –3 x/
–3 –3
(c) | Ez | 2
+3
y/ +3
x/
–3
–3
Figure 3.3 Intensity profiles of the three components of polarization at the

focal plane of an aplanatic lens (NA ¼ 0.966, f ¼ 3000k), illuminated with a
linearly polarized plane wave. For best viewing, the vertical scale is chosen
differently in the three cases: the peak intensities in (a), (b), (c), corres-
ponding to the X-, Y-, Z- components of polarization, are in the ratios 1.00 :
0.0081 : 0.192.
uniform plane wave with linear polarization along the X-axis. Frames (a)–(c) in
Figure 3.3 are intensity plots for the X-, Y-, and Z- components of polarization in
the focal plane; their peak intensities are in the ratio 1.00 : 0.0081 : 0.192. The
corresponding gray-scale plots appear in Figure 3.4; frames (a)–(c) show the
intensity distributions and frames (d)–(f) display their logarithmic counterparts.
The observed four-fold symmetry of the Y-component and the two-fold symmetry
a d
b e
c f
–2 x/ +2 –3 x/ +3
Figure 3.4 Gray-scale plots of intensity distribution at the focal plane of an

aplanatic lens (NA ¼ 0.966, f ¼ 3000k), illuminated with a linearly polarized
plane wave. Frames (a)–(c) show the intensity plots, while frames (d)–(f) display
the logarithm of intensity. In each column the top frame represents the X-
component of polarization, the middle frame corresponds to the Y-component,
and the bottom frame to the Z-component.
of the Z-component are consistent with one’s expectations based on ray-bending

arguments.
The contour plot in Figure 3.5 of the total E-field energy density, jExj2 þ
jEyj2 þ jEzj2, shows an elliptical profile, the ellipse having its major axis in the
direction of the incident polarization. (Richards and Wolf 5 obtained the same
result using a somewhat different formulation of the diffraction problem.) This
result indicates a slight improvement in the resolution of a microscope or tele-
scope that uses linearly polarized light, as long as the feature that needs to be
resolved is oriented along the minor axis of the ellipse, namely, in the direction
perpendicular to that of the incident polarization.
3
1
y/
–1
–2
–3
–3 –2 –1 0 1 2 3
x/
Figure 3.5 Contour plot representing the sum of the three intensity profiles
shown in Figure 3.3, i.e., the total E-field energy density distribution in the focal
plane of the aplanatic lens.
The computations reported here required no more than three seconds on a

modern pentium-based personal computer using a 512 · 512 square mesh.

1 V. N. Mahajan, Axial irradiance and optimum focusing of laser beams, Appl. Opt. 22,
3042–3053 (1983).
2 J. J. Stamnes, Waves in Focal Regions, Adam Hilger, Bristol, 1986.
3 M. Mansuripur, Certain computational aspects of vector diffraction problems, J. Opt.
Soc. Am. A 6, 786–805 (1989).
4 H. H. Hopkins, The Airy disk formula for systems of high relative aperture, Proc.
Phys. Soc. London 55, 116–128 (1943).
5 B. Richards and E. Wolf, Electromagnetic diffraction in optical systems: structure of
the image field in an aplanatic system, Proc. Roy. Soc. Ser. A 253, 358–379 (1959).
4
Gaussian beam optics
A Gaussian beam is perhaps the simplest possible waveform that shows many of
the effects of diffraction. Using Gaussian beams one can study diffraction in the
near field and the far field, examine beam divergence upon propagation, inves-
tigate diffraction-limited focusing through a lens, observe the Gouy phase shift,
and analyze many other interesting properties of electromagnetic waves.
Although Gaussian beams have been thoroughly analyzed in the literature,1,2 it
is worthwhile to examine them in the Fourier domain from a less well-known
perspective. The need for the paraxial approximation (inherent in all treatments of
Gaussian beams) becomes particularly clear when employing the Fourier method
of analysis. There is also the issue of separability of the x- and y- dependences of
the Gaussian beam profile (assuming propagation along the Z-axis), which is often
assumed but not properly explained in the literature. It turns out that separability is
neither necessary nor desirable and that the two-dimensional analysis of a non-
separable beam is quite straightforward. It must be emphasized that separability is
not always achievable by rotating the coordinate axes. When the real and imaginary
parts of the Gaussian exponent require different rotations to become separable, the
x- and y-dependences remain entangled, thus necessitating a two-dimensional
analysis.
Cross-sectional amplitude profile

For a generalized Gaussian beam propagating along the Z-axis, the complex
amplitude distribution in the cross-sectional XY-plane is given by

^
aðx; y; z ¼ 0Þ ¼ ^
a0 exp pðax2 þ 2bxy þ cy2 : ð4:1aÞ
Here the complex constant â0 is the amplitude at the origin of the coordinate
system and the coefficients a ¼ (a1 þ ia2), b ¼ (b1 þ ib2), c ¼ (c1 þ ic2) are fixed
52
complex numbers. The only constraints on these parameters are a1 0, c1 0,

and a1c1 b21, lest the amplitude diverges to infinity. The power content of the
beam (i.e., the integrated intensity over the XY-plane) is readily found to be
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2
a0 j2
P ¼ 12 j^ a1 c1 b1 : ð4:1bÞ
The real parts of the a, b, c parameters determine the profile of the beam’s
magnitude in the XY-plane at z ¼ 0, while their imaginary parts determine the
beam’s phase profile. The contours of constant magnitude are ellipses oriented at
h1 relative to X, where tan 2h1 ¼ 2b1/(a1–c1); the major and minor diameters of
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1=2:
these ellipses are proportional to ða1 þ c1 Þ ða1 c1 Þ2 þ 4b21 The
phase contours are ellipses or hyperbolas whose axes are oriented at h2 relative to
X, where tan 2h2 ¼ 2b2 / (a2c2). In general h1 6¼ h2 and therefore coordinate
rotations cannot separate the x- and y- dependences of the Gaussian beam profile.
When a2c2 > b22 the contours of constant phase are ellipses; otherwise, they are
hyperbolas. Figure 4.1 shows two examples of amplitude and phase distributions
for Gaussian beams having different sets of the a, b, c parameters.
Propagation in free space

The Fourier transform of the Gaussian profile in (Eq. 4.1a) is given by
^ x ; ry Þ ¼ ff^
Aðr aðx; y; z ¼ 0Þg
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð4:2aÞ
¼ â0 ac b2 exp pðar2x þ 2brx ry þ cr2y Þ :
Here a ¼ c/(acb2), b ¼ b/(acb2), and c ¼ a / (acb2). In matrix notation,

1
a b a b
¼ : ð4:2bÞ
b c b c
itsq
When a beam travels a distance z0 in free space, Fourier transform ffi is multiplied
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
by the transfer function of propagation, exp i2pz0 1 rx ry (see chapter 2,
2 2
“Fourier optics”). This is true irrespective of whether z0 is positive or negative; in

other words, both forward and backward propagation can be treated by the same
formalism. The reason that the wavelength k of the light does not appear in these
equations is that all spatial coordinates are assumed to be normalized by k, that is,
x, y, z0 are dimensionless quantities.
Invoking the standard paraxial approximation, the above transfer function is
replaced by exp(i2pz0) exp[ipz0(r2x þ r2y)]. Multiplying this transfer function
a b
c d
–20 x/ 20 –20 x/ 20
Figure 4.1 Distributions of intensity (left) and phase (right) in the cross-
sections of two Gaussian beams having different a, b, c parameters. The phase
plots are encoded in gray-scale, black representing 180 and white representing
þ180 . (a), (b) a ¼ 0.009 0.023i, b ¼ 0.006 0.002i, c ¼ 0.012 0.016i, (c),
(d) a ¼ 0.011 0.023i, b ¼ 0.01 0.003i, c ¼ 0.016 þ 0.012i.
into Â(rx,ry) of Eq. (4.2a) converts â0 to â0 exp(i2pz0), a to a þ iz0, and c to
c þ iz0, while keeping b unchanged. The beam’s Fourier transform thus retains its
Gaussian form and, consequently, the profile of the beam at z ¼ z0 remains
Gaussian, albeit with different a, b, c parameters and with a different value for â0.
It is readily verified that the new parameters of the beam at z ¼ z0 are given by
0 1
a b0 a þ iz0 b
¼ ; ð4:3aÞ
b0 c0 b c þ iz0
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a00
^ a0 expði2pz0 Þ ða0 c0 b02 Þ=ðac b2 Þ
¼^
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼^
a0 expði2pz0 Þ= 1 ðac b2 Þz20 þ i ða þ cÞz0 : ð4:3bÞ
Thus the beam remains Gaussian as it propagates along Z, but its magnitude and
phase profiles change continuously. Figure 4.2 shows computed cross-sectional
profiles for a beam at several locations along the Z-axis. Note how the elliptical
8
y/
–8
10
y/
–10
20
y/
–20
1000
y/
–1000
Figure 4.2 Distributions of intensity (left) and phase (right) in the cross-
sectional planes of a Gaussian beam propagating along the Z-axis. The parameters
of the beam at z ¼ 0 are a ¼ 0.01 þ 0.05i, b ¼ 0.005 0.04i, c ¼ 0.02 0.12i.
The phase plots are encoded in gray-scale, black representing 180 and white
representing þ 180 . From top to bottom, the propagation distances along Z are 0,
5k, 25k, and 1000k. In the bottom right-hand frame the far-field curvature phase
factor (corresponding to a2 ¼ c2 ¼ 0.001) has been subtracted.
cross-section of the intensity profile rotates with increasing z0 and also how the
phase contours change from hyperbolas to ellipses and vice versa.
The beam waist

The waist is the cross-section of the beam at which the phase is uniform, i.e., it
is independent of x and y. In general, a Gaussian beam does not have to have a
waist but if a waist exists then the a, b, c parameters in that cross-section will be
real. A question arises as to when an arbitrary Gaussian beam (for which the
a, b, c parameters at a given cross-section are complex) can be said to have
a waist. In other words, does a value of z0 (positive or negative) exist at which
a0 , b0 , c0 are real? According to Eq. (4.3a), this requirement is met if b is real and
the imaginary parts of a and c are identical, so that iz0 will end up canceling
their imaginary parts. This is equivalent to requiring both b and ac to be
real-valued.
Considering the relationship between a, b, c and a, b, c in Eq. (4.2b), it is not
difficult to show that the necessary and sufficient condition for an arbitrary
Gaussian beam to have a waist is that, in the complex plane, the three vectors b,
a c, and ac b2 must be parallel (or antiparallel) to each other. In other words,
these three complex numbers must lie along a straight line that goes through
the origin of the complex plane. This requirement, of course, is in addition to the
other Gaussian beam requirements, namely, a1 0, c1 0, a1c1 b21. When the a,
b, c parameters satisfy all the above constraints, the beam will have a waist at a
specific location along the Z-axis. The waist is unique, because there is only one
value of z0 that can cancel the imaginary parts of both a and c in Eq. (4.3a).
When a waist exists, there is symmetry between the locations before and after
the waist. Let the waist be at z ¼ 0. Then the a, b, c parameters at this location
will be real, which means that the corresponding a, b, c are real as well. Now, any
value of z0 will make a and c complex, while z0 will yield the conjugates of the
same a and c. Therefore, the a, b, c parameters on opposite sides of the waist will
be complex conjugates of each other. This means that the intensity profiles on
opposite sides are identical, while the phase profiles differ by a minus sign. The
beam is always convergent before, and divergent after, the waist.
The Gouy phase shift

Aside from the usual linear phase factor exp(i2pz0), Eq. 4.3(b) contains an addi-
tional phase whose value depends non-linearly on z0. This is the phase associated
with the square-root factor on the right-hand side of the equation. Consider the
special case when a, b, c are all real-valued, i.e., when the beam waist is at z ¼ 0.
As z0 increases from zero and acquires positive values, the real part under the
square root, 1(acb2)z02, decreases while the imaginary part, (a þ c)z0, increases.
Thus, the phase of â00 associated with the square root, namely,

w ¼ 12 tan1 ða þ cÞz0 =½1 ðac b2 Þz20 ; ð4:4Þ
approaches 90 for sufficiently large z0. Similarly, when z0 goes from 0 to
negative values, w moves toward þ90 . It is thus seen that, in crossing the waist,
the beam undergoes a 180 phase shift. This phase shift, which is particularly
rapid near the focus of a lens, was first observed experimentally by the French
physicist L. Georges Gouy in 1890.2,3,4
To demonstrate an observable effect of the Gouy phase, consider the experiment
depicted in Figure 4.3. Here an aberration-free lens is split into two identical halves,
and the upper half-lens is translated forward by Dz ¼ 300k. A collimated uniform
beam of light is directed at the split lens, and the distribution of intensity in the region
between the two foci, F1 and F2, is monitored. Figure 4.4 shows computed intensity
X
⌬z
F1 F2
Figure 4.3 A split lens brings a collimated uniform beam of wavelength k to

two different foci, F1 and F2, along the Z-axis. The region of interest is between
the two focal planes and above the Z-axis. At first glance the rays going through
each half-lens are expected to have the same phase when they arrive in the
vicinity of the Z-axis. However, because of the Gouy effect, the beam going
through the upper lens and arriving at an observation point before focus will be
phase-shifted by about 180 relative to the beam going through the lower lens
and arriving at the same observation point after focus. In our numerical example,
the lens (before splitting) has NA ¼ 0.1 and focal length ¼ 30000k, and the
separation between the half-lenses is Dz ¼ 300k.
20 a b c
y/
–20
–20 x/ 20 –20 x/ 20 –20 x/ 20
Figure 4.4 Computed intensity patterns in the XY-plane at the mid-point

between the two foci in the system of Figure 4.3. (a) Upper half-lens blocked; (b)
lower half-lens blocked; (c) both half-lenses transmitting the incident beam, and
the two emergent beams interfering at the observation plane. Note that the central
region of the distribution in (c), corresponding to points near the Z-axis, is dark.
20 a b c
y/
–20
20 d e f
y/
–20
20 g h i
y/
–20
–20 x/ 20 –20 x/ 20 –20 x/ 20
Figure 4.5 Plots of intensity distribution in the XY-plane at various locations

along the Z-axis in the system of Figure 4.3. From (a) to (i) the observation plane
moves in steps of 37.5k from the first focus, at F1, to the second focus, at F2.
patterns in a vertical plane half-way between F1 and F2 when (a) the upper half-lens
is blocked, (b) the lower half-lens is blocked, and (c) the light is allowed to go
through both half-lenses. For locations near the Z-axis, where the optical path lengths
are nearly identical, the light amplitudes contributed by the two half-lenses are
expected to be in phase, resulting in constructive interference. However, as
Figure 4.4(c) clearly demonstrates, the vicinity of the optical axis is dark. This
destructive interference is caused by the nearly 180 Gouy phase shift between
the “before-focus” and “after-focus” beams arriving from the two half-lenses.
Figure 4.5 shows several cross-sectional plots of intensity distribution in the region
between F1 and F2, starting at F1 and moving in steps of 37.5k to F2. The intensity at
and near the Z-axis is seen to diminish as the mid-plane between the foci is
approached from either side.
The Rayleigh range

For the generalized Gaussian beam of Eq. (4.1a), there is only one way to define
the Rayleigh range,2 and that is in terms of the Gouy phase shift. To admit a
Rayleigh range the beam must have a waist, which we assume to be at z ¼ 0, so

the a, b, c parameters at this location are real-valued. With reference to Eq. (4.4),
the Rayleigh range is the distance z0 at which the Gouy phase w is 45 , i.e.,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
z0 ¼ 1/ ac b2 . In the special case when a ¼ c and b ¼ 0 (i.e., when the beam
is circularly symmetric) the Rayleigh range z0 is 1/a. In this case the beam
pffiffiffi
diameter at the Rayleigh range is a factor of 2 larger than that at the waist; also
the beam curvature can be shown to attain its maximum value at the Rayleigh
range.
Effect of lens on Gaussian beam

In the paraxial approximation a lens imparts a quadratic phase factor to the
incident beam. If the lens happens to be astigmatic, the phase factors along
the X- and Y- axes will have different curvatures, and if the astigmatic lens
happens to have rotated within the XY-plane then the quadratic phase factor
will have an xy term as well. We assume that the lens aperture is large enough
to transmit the beam without significant truncation and, therefore, to affect
negligibly its amplitude profile. All in all, the effect of a lens on a Gaussian
beam is to multiply its complex amplitude by the following transmission
function:
tðx; yÞ ¼ exp½ipðpx2 þ 2qxy þ ry2 Þ: ð4:5Þ
Here p, q, r are real-valued constants related to the principal radii of curvature of

the lens. Thus when the Gaussian beam of Eq. (4.1a) passes through the lens
described by Eq. (4.5), a2 will be augmented by p, b2 by q, and c2 by r. The beam
can then be propagated in the free-space region beyond the lens using the
aforementioned analytical tools.
Higher-order Gaussian beams

We confine the discussion of higher-order beams to the one-dimensional case
only, as the extension to two dimensions is straightforward. Consider the
Gaussian function exp(pax2), where a is a complex constant. The nth derivative
of this function with respect to x may be used to define an initial amplitude
distribution as follows:
pffiffiffiffiffiffi dn
^ a0 Hn ð pa xÞ expðpax2 Þ ¼ â0 ð1Þn ðpaÞn=2 n ½expðpax2 Þ:
aðx; z ¼ 0Þ ¼ ^
dx
ð4:6aÞ
Here the nth-order Hermite polynomial Hn(x) is defined as

dn
Hn ðxÞ expðx2 Þ ¼ ð1Þn ½expðx2 Þ: ð4:6bÞ
dxn
The Fourier transform of the distribution in Eq. (4.6a) is readily evaluated using
the differentiation theorem,5
n
d
f ½expðpax Þ ¼ ði2prÞn f½expðpax2 Þ
2
dxn
¼ ði2prÞn a1=2 expðpr2 =aÞ: ð4:7Þ
To account for propagation by a distance z0 along the Z-axis, the Fourier transform
of the initial distribution in Eq. (4.6a) is multiplied by the transfer function of free-
space propagation, which, in the paraxial approximation, is exp(i2pz0) exp(ipz0 r2).
This means that the coefficient 1/a in the exponent of the Gaussian function on the
right-hand side of Eq. (4.7) is augmented by iz0, yielding
1 1
¼ þ iz0 : ð4:8Þ
a0 a
The light amplitude distribution at z ¼ z0 is then obtained by an inverse Fourier
transform, yielding
pffiffiffiffiffiffiffi
aðx; z ¼ z0 Þ ¼ ^
^ a0 expði2pz0 Þða0 =aÞðnþ1Þ=2 Hn pa0 x expðpa0 x2 Þ: ð4:9Þ
Since in general a0 is complex, the above eigenfunctions of propagation in free

space contain Hermite polynomials with a complex argument. Siegman2,6 refers to
these as the “elegant” solutions of the wave equation in free space. The elegant
solutions are substantially different from the so-called “standard” solutions, whose
argument of the Hermite polynomial is real.
Assuming that the Hermite–Gaussian beam of Eq. (4.9) has its waist at z ¼ 0,
the parameter a of the beam will be real-valued. The normalization factor
(a0 /a)(n þ 1)/2 ¼ (1 þ i az0)(n þ 1)/2 thus contributes its phase angle, 12(n þ 1)
tan1(az0), to the Gouy phase. Note that the complex-argument Hermite poly-
pffiffiffiffiffiffiffi
nomial Hn( pa0 xÞ also has a z-dependent phase, which contributes to the overall
phase pattern in the beam’s cross-section.

1 H. Kogelnik and T. Li, Laser beams and resonators, Appl. Opt. 5, 1150–1167 (1966).
2 A. E. Siegman, Lasers, University Science Books, California (1986).
3 L. G. Gouy, Compt. Rend. Acad. Sci. Paris 110, 1251 (1890).
4 M. Born and E. Wolf, Principles of Optics, sixth edition, Pergamon Press, New York,
1980.
5 R. N. Bracewell, The Fourier Transform and its Applications, McGraw-Hill,
New York, 1978.
6 A. E. Siegman, Hermite-gaussian functions of complex argument as optical-beam
eigenfunctions, J. Opt. Soc. Am. 63, 1093–1094 (1973).
5
Coherent and incoherent imaging
The basic elements of an imaging system are shown in Figure 5.1. The light
from a source, either coherent (e.g., a laser) or incoherent (e.g., an incandescent
lamp or an arc lamp), is collected by the illumination optics (e.g., a condenser
lens) and projected onto the object. An image is then formed by an objective
lens upon a screen, a photographic plate, a CCD camera, the retina of an eye,
etc. Assuming that the objective lens is free from aberrations, the resolution
and the contrast of the image are determined not only by the numerical aperture
of the objective lens but also by the properties of the light source and the
illumination optics.
The source and the illumination optics

Three types of illumination will be considered. For collimated and coherent
illumination we assume a monochromatic laser beam brought to focus at the
plane of the object with a condenser lens having a very small numerical aperture
(NA). Figure 5.2(a) is the logarithmic intensity distribution at the object plane,
produced by a 0.03NA condenser. This distribution has the shape of an Airy
pattern, with a central lobe diameter of 1.22k/NA 41k, where k is the wave-
length of the light source. Since the objects of interest will be small compared
to the Airy disk diameter, and since they will be placed near the center of the
Airy disk, this illumination qualifies as coherent, fairly uniform, and nearly
collimated.
The second type of illumination is also produced by a coherent monochromatic
laser beam, but with a high-NA condenser. This time we place the focal point of
the condenser somewhat before the object in order to produce within the object
plane a bright spot large enough to cover the field of view of the objective lens.
Figure 5.2(b) is the logarithmic intensity distribution at the object plane, pro-
duced by a coherent beam brought to focus by a 0.25NA condenser at a distance
62
Source
Illumination optics Object Objective lens Image
(Condenser lens)
Figure 5.1 Schematic diagram of a simple imaging system. The light source is
projected by the illumination optics onto an object, allowing the objective lens to
form an image of this object at the image plane.
a b
–45 x/ 45 –45 x/ 45
c d
–12.5 x/ 12.5 –25 x/ 25
Figure 5.2 Computed intensity patterns at the plane of the object corres-
ponding to various types of illumination. (a) Logarithmic plot (a ¼ 4) of the
intensity distribution obtained from a coherent source with a 0.03NA condenser
lens. (b) Logarithmic plot (a ¼ 4) of the intensity distribution obtained from a
coherent source with a 0.25NA condenser lens. The beam is focused to a plane
located just 50k before the plane of the object. (c) Same as (b) but showing the
intensity distribution rather than its logarithm. (d) Intensity distribution corres-
ponding to an incoherent light source consisting of 37 independent point sources
obtained with a 0.25NA condenser lens. Again the source is imaged to a plane
located 50k before the plane of the object.
of 50k before the object. The beam incident on the object is, therefore, divergent
and, although it covers the area of interest, its intensity distribution is not very
uniform. This nonuniformity may be better appreciated by considering the cor-
responding plot of intensity distribution in Figure 5.2(c). (Note the different
scales of Figures 5.2(b), (c).)
The third type of illumination to be examined is incoherent illumination. We
emphasize at the outset that our concern here is solely with spatial incoherence
and, as such, we will assume that the source is quasi-monochromatic. (Departure
from monochromaticity is a requirement for any source that is to exhibit
spatial incoherence; the bandwidth of the source can nonetheless be narrow
enough to give its light a long coherence time, making it in effect a temporally
coherent source.) To simulate an incoherent source we assumed that the quasi-
monochromatic light emerging from a fiber bundle consisting of 37 fibers is
imaged with a 0.25NA condenser lens to a plane located a distance of 50k before
the plane of the object in Figure 5.1. Each fiber within the bundle acts as a
coherent point source whose projected intensity distribution at the object plane
will be the same as that shown in Figure 5.2(c). When these fibers are properly
arranged in space and their intensity distributions added together, we obtain the
intensity pattern displayed in Figure 5.2(d). This is a fairly uniform distribution
over its central region, which is where the objects of interest will be placed.
Although the source could have been imaged directly onto the object plane in
this case, the 50k defocus helps to create a more uniform illumination. With this
type of illumination, in order to compute the intensity distribution at the image
plane, we treat the 37 fibers as independent point sources – each a coherent
point source in its own right. We then compute the image obtained with each
source independently, and add the intensities of the resulting 37 images together
to obtain the final image.
The imaging optics

The objective lens used in the simulations described below is free from aberra-
tions and, therefore, its performance is diffraction-limited. The objective is a
finite-conjugate lens with a numerical aperture of 0.25 (on the side of the object),
a focal length of 5000k, and a magnification of 10.
Two types of object will be used in these simulations. The first is an amplitude
grating with a period of 3k and a 50% duty cycle, shown in Figure 5.3(a).
According to the classical optics textbooks,1,2,3 the spatial frequency of this
grating is higher than the cutoff frequency of the modulation transfer function
(MTF) of the objective lens for coherent illumination, fc ¼ NA/k ¼ (4k)1,
but less than that for incoherent illumination, fc ¼ 2NA/k ¼ (2k)1. We will
12.5 a b
y/
–12.5
–12.5 x/ 12.5 –12.5 x/ 12.5
Figure 5.3 (a) Amplitude grating with a period of 3k and a 50% duty
cycle, used as the object in some of the simulations. (b) Pattern of marks with
different sizes and separations on a uniform background. In some cases these
marks will be black on a transparent background, in other cases they will
be transparent marks on a black background, in yet other cases they will be
phase objects with 100% transmissivity, imparting a 180 phase shift to the
incident beam.
examine the images of this grating under both coherent and incoherent illu-
mination and draw certain conclusions about the classical treatment of this
problem.
The second type of object with which we will be concerned is a mask
imprinted with seven marks of various sizes and shapes, shown in Figure 5.3(b).
The largest mark is 10k long, and the smallest mark is 3k wide. These marks are
large enough to yield a reasonably clear image with both coherent and incoherent
illumination. In one case the marks will be assumed to be bright objects on a dark
background, in another case they will be dark objects on a bright background, in
yet a third case they will be 180 phase objects having the same amplitude
transmissivity as the background.
Resolution of the imaging system

Let the grating of Figure 5.3(a) be the object in the system of Figure 5.1. If the
collimated beam of Figure 5.2(a) is used to illuminate this object then no image
will be formed, because all diffracted orders (except the zeroth order) will miss
the entrance pupil of the objective lens; the situation is depicted schematically in
Figure 5.4. Denoting the period of the grating by P, the deviation angle h of the
first diffracted order will be given by sin h ¼ k/P. This is the origin of the well-
known assertion that the MTF cutoff frequency of a coherent imaging system is
fc ¼ NA/k.
If, however, the coherent illuminating beam is not collimated but is in the form
of a cone of light, as in the case of the distribution shown in Figure 5.2(c), then an
+1st order
Grating
0th order
Incident
beam
–1st order
Figure 5.4 A collimated coherent beam (wavelength k) illuminates a grating

of period P at normal incidence. The first diffracted orders will miss the
objective lens if the lens’s numerical aperture NA is less than k/P. (Some
provision must be made for the expansion of the beam diameter at long
propagation distances.)
a b
–1500 x/ 1500 –125 x/ 125
Figure 5.5 Computed intensity distribution (a) at the exit pupil of the
objective lens and (b) at the image plane, corresponding to coherent illumin-
ation with the divergent beam of Figure 5.2(c). The object is the grating of
Figure 5.3(a).
“image” of the grating will be formed. Figure 5.5 shows computed plots of
intensity distribution (a) at the exit pupil of the objective lens, where the overlap
between the zeroth-order and the first-order beams is clearly visible, and (b) at
the image plane, where an “image” of the grating is seen superimposed on a
nonuniform pattern of illumination. The reason that a coherent cone of light
produces an image of the grating whereas a collimated beam fails to do so may be
understood by studying Figure 5.6: the diffracted first-order cones are captured
by the objective lens as long as the lens’s NA is greater than k/(2P). The MTF
cutoff frequency for this type of illumination, therefore, is fc ¼ 2NA/k.
+1st order cone
Incident beam
Grating
Condenser lens
Objective lens
–1st order cone
Figure 5.6 A cone of coherent light (wavelength k), coming from a

condenser lens and illuminating a grating of period P, creates several dif-
fracted cones. If the apex angle of the incident cone is sufficiently large,
the first-order beams will be captured by the objective lens as long as
NA > k/(2P).
a b
–1500 x/ 1500 –125 x/ 125
Figure 5.7 Computed intensity distribution (a) at the exit pupil of the objective
lens and (b) at the image plane, corresponding to the incoherent illumination
depicted in Figure 5.2(d). The object is the grating of Figure 5.3(a).
The case of incoherent illumination is now easy to understand. Since the

beam in Figure 5.2(d) is a superposition of 37 divergent cones similar to that of
Figure 5.2(c), the grating’s image will have the same resolution as that obtained
with a single cone of light, but it will have a more uniform contrast because it is
an average over a large number of point sources. Figure 5.7 shows the computed
patterns of intensity (a) at the exit pupil of the objective lens and (b) at the
image plane obtained with incoherent illumination. The MTF cutoff for this
type of illumination, fc ¼ 2NA/k, is the same as that for coherent illumination
with a cone of light, which is twice as large as the cutoff frequency for colli-
mated coherent illumination.
Images of non-periodic objects

The mask containing marks of different sizes shown in Figure 5.3(b) provides a
good test object for comparing images obtained under coherent and incoherent
illumination. Consider the case of transparent marks on a dark background
(an amplitude object), imaged with a collimated coherent illumination (see
Figure 5.8), and also with incoherent illumination (see Figure 5.9). The reso-
lution of the former is obviously inferior to that of the latter, and the spurious
fringes appearing in the coherent image are responsible for at least some of the
image-quality degradation. (As an aside, note that the exit-pupil distribution in
the case of coherent illumination displays much more structure than that
obtained with incoherent light.)
Using the same object as in Figure 5.3(b) but assuming that the marks
are black features on a transparent background (i.e., reversing the contrast) we
a b
–1500 x/ 1500 –450 x/ 450
c d
–125 x/ 125 –125 x/ 125
Figure 5.8 Coherent imaging of the seven transparent marks on a black

background shown in Figure 5.3(b). The incident distribution is the collimated
coherent beam of Figure 5.2(a). (a) Logarithmic plot (a ¼ 4) of intensity dis-
tribution at the exit pupil of the objective lens. (b) Logarithmic plot (a ¼ 4) of
intensity distribution at the image plane. (c) Magnified view of the central region
of the image shown in (b); in this case a ¼ 3. (d) Same as (c) but showing the
distribution of intensity rather than its logarithm.
1500 a
y/
–1500
125 b
y/
–125
125 c
y/
–125
Figure 5.9 Incoherent imaging of the seven transparent marks on a

black background; the incident distribution is that of Figure 5.2(d). (a) Intensity
distribution at the exit pupil of the objective lens. (b) Intensity distribution at the
image plane. (c) Same as (b) but on a logarithmic scale (a ¼ 3).
obtain the distributions of Figure 5.10 in the case of collimated coherent

illumination and those of Figure 5.11 in the case of incoherent illumination.
Note the similarity between the exit-pupil distributions in Figures 5.8(a) and
5.10(a), indicating that Babinet’s principle is at work here.2,3 Also note in
Figure 5.10(b) that, in addition to the marks, the rings of the Airy pattern of
the illuminating beam are also captured in the image. The logarithmic
intensity distributions in Figures 5.10(b), (c) show gray spots in the middle
of dark marks, a feature that is less prominent in the incoherent image of
Figure 5.11(c).
a b
–1500 x/ 1500 –450 x/ 450
c d
–125 x/ 125 –125 x/ 125
Figure 5.10 Coherent imaging of the seven black marks on a transparent

background shown in Figure 5.3(b). The incident distribution is the collimated
coherent beam of Figure 5.2(a). (a) Logarithmic plot (a ¼ 4) of intensity
distribution at the exit pupil of the objective lens. (b) Logarithmic plot (a ¼ 4) of
intensity distribution at the image plane, showing the images of the marks as
well as the rings of the Airy pattern. (c) Magnified view of the central region of
the image shown in (b); in this case a ¼ 3. (d) Same as (c) but showing the
distribution of intensity rather than its logarithm.
Finally we assume that the marks on the mask of Figure 5.3(b) represent
transparent phase objects that impart a phase shift of 180 (relative to the
background) to the incident beam. Figure 5.12 shows the computed intensity
distributions at the objective’s exit pupil and at the image plane, for the case of
illumination by the collimated coherent beam of Figure 5.2(a). Figure 5.13
shows the corresponding distributions for incoherent illumination. Note how
diffraction from mark boundaries can create an “image” of the marks in a case
where no explicit phase-contrast mechanism is present.3 In the two simulations
depicted in Figures 5.10 and 5.12, the amplitude transmission functions of the
respective objects differ only by an additive constant term. Therefore, the image
in Figure 5.10(b), for instance, may be derived from that in Figure 5.12(b) by
the addition of the image of the incident beam, it being understood that the
quantities being added are the complex amplitudes, not the intensities.
1500
a
y/
–1500
125 b
y/
–125
125 c
y/
–125
Figure 5.11 Incoherent imaging of the seven black marks on a transparent

background; the incident distribution is that of Figure 5.2(d). (a) Intensity dis-
tribution at the exit pupil of the objective lens. (b) Intensity distribution at the
image plane. (c) Same as (b) but on a logarithmic scale (a ¼ 1.7).
a b
–1500 x/ 1500 –450 x/ 450
c d
–125 x/ 125 –125 x/ 125
Figure 5.12 Same as Figure 5.10 but for a phase object. The assumed object in
this case is the mask of Figure 5.3(b), which has uniform transmissivity
everywhere; its marks impart a relative phase shift of 180 to the incident beam.
1500 a
y/
–1500
125 b
y/
–125
125 c
y/
–125
Figure 5.13 Same as Figure 5.11 but for a phase object. The assumed object in
this case is the mask of Figure 5.3(b), which has uniform transmissivity
everywhere; its marks impart a relative phase-shift of 180 to the incident beam.
(For the logarithmic plot in (c) a ¼ 1.4.)

2 M. Born and E. Wolf, Principles of Optics, 6th edition, Pergamon Press, Oxford,
1980.
6
First-order temporal coherence in classical optics†
A truly monochromatic beam of light, if it ever existed, would be perfectly

coherent. Suppose that such a beam is split into two parts and each part propa-
gated over an arbitrary distance. When the parts are finally brought together and
mixed, no matter how different the two path lengths may have been, the resulting
waveform will exhibit constructive and destructive interference in the form of
bright and dark fringes. The coherence length of a monochromatic beam is
therefore infinite, in the sense that the path-length difference can be as large as
desired without hampering one’s ability to create interference patterns.
Real sources of light, of course, are never monochromatic. White light
restricted to the visible range of wavelengths from 400 nm to 700 nm, for
example, has a coherence length of only a couple of micrometers. A green filter
passing sunlight at k0 ¼ 550 nm with a 10 nm bandwidth produces a beam with a
coherence length of about 50 lm. The red line of cadmium (k0 ¼ 643.8 nm) has a
nearly Gaussian spectrum with a 0.0013 nm width at half peak intensity, leading
to a coherence length of nearly 30 cm.1 This is similar to the coherence length of
a short, inexpensive HeNe laser (k0 ¼ 632.8 nm) with a few longitudinal modes
and a typical bandwidth of Df 1 GHz. A stabilized HeNe laser operating in a
single longitudinal mode (Df 100 kHz) has a coherence length of several
kilometers. It is important therefore to understand the role of spectral bandwidth
in enhancing or diminishing the performance of an optical system that, by design
or by coincidence, involves interference.
The subject of temporal coherence has been covered extensively in modern
and classical textbooks,1,2,3,4,5 and it is not our intention here to repeat what is
already well known. Instead, we present an alternative viewpoint that draws on
the similarities between a waveform extended over a long span of time and a
compact wave packet that exists for a relatively short period. We will show that,
†
This chapter is coauthored with Ewan M. Wright, Professor of Optical Sciences at the University of Arizona.
74
as far as first-order temporal coherence is concerned, the wave packet can be

substituted for the extended waveform in analyzing the results of interference
experiments. While describing the properties of wave packets, we also mention
some interesting observations concerning their reflection from, and transmission
through, multilayer stacks.
Time dependence, frequency spectrum, and phase

Consider a superposition of plane waves, propagating in free space along the
Z-axis and covering a range of (temporal) frequencies at and around f ¼ f0.
The discrete frequencies fn comprising the spectrum of this waveform are
assumed to have a fixed spacing Df as follows:
fn ¼ f0 þ nD f ¼ ðN0 þ nÞD f : ð6:1Þ
The amplitude of the waveform is

X
aðz; tÞ ¼ An ðD f Þ1=2 cos½2p fn ðz=c tÞ þ n ; ð6:2Þ
n
where An and n are the amplitude and phase of the spectral component whose
frequency is fn, and c is the speed of light in vacuum. The constant multiplier
(Df)1/2 is for normalization purposes only, its significance becoming clear as the
discussion proceeds. We set the central frequency f0 ¼ 4.74 · 1014 Hz (corres-
ponding to k0 ¼ 632.8 nm) and D f ¼ 4.74 · 1012 Hz, which leads to N0 ¼ 100. We
adopt a Gaussian shape for the distribution of the amplitudes An, as shown in
Figure 6.1(a), and let the values of n in Eq. (6.2) range from 15 to þ14, for a
total of 30 discrete wavelengths in the spectrum. To a large extent these choices
are arbitrary, but the points that we seek to clarify by way of examples based on
these choices are quite general in nature.
Throughout this chapter the same amplitude coefficients {An} are assumed for
all realizations of the waveform a(z, t), but the phase angles {n}, although fixed
for any particular waveform, differ for different realizations. The statistical pro-
perties of a(z, t) are thus uniquely determined by the joint probability distribution
over {n}. Furthermore, we consider stationary processes for which the ensemble
average over different phase-angle realizations coincides with the time average
derived from a single realization. This restriction of randomness to spectral phase
simplifies the discussion without affecting the validity of the final results.
Since the spectrum in Figure 6.1(a) is a discrete function of frequency, the
corresponding amplitude a(z, t) considered either as a function of time at a
fixed point z, or as a function of z at a given instant of time t, will be periodic.
With z fixed, for example, the period of the function in the time domain will be
1.00
(a)
0.75
A (f)
0.50
0.25
0.00
0 1 2 3 4 5 6
Frequency (1014 Hz)
15 (b)
10
5
0
a (t)
–5
–10
–15
0 50 100 150 200

Time (fs)
15 (c)
10
5
a (t)
0
–5
–10
–15
5 10 15 20 25 30 35
Time (fs)
Figure 6.1 (a) A truncated Gaussian function sampled at regular intervals

represents the frequency spectrum of a waveform. (b) The waveform as a
function of time obtained by Fourier-transforming the spectrum in (a), assuming
that the phase is a linear function of frequency. Since the spectrum is sampled at
Df ¼ 4.74 · 1012 Hz, the waveform is repeated with a period of 211 fs; only one
period of the wave packet is shown. (c) Close-up of the wave packet.
T ¼ 1/Df ¼ 211 femtoseconds. A plot of a(z = 0, t) over a full period T is shown in

Figure 6.1(b), and a close-up of the wave packet appears in Figure 6.1(c). (This is
reminiscent of the pulse train emerging from a mode-locked laser.) The width of
the packet in Figure 6.1(b) is 20 fs, which is of the same order of magnitude as
the inverse of the spectral width (29Df ¼ 1.27 · 1014 Hz). To increase the period T
without changing the overall shape of the wave packet one must increase the
rate of sampling of the spectrum of Figure 6.1(a), by selecting additional
frequencies in between those that are already chosen. In this way, both the
spectrum and the wave packet retain their shapes but Df becomes smaller while T
becomes larger. In the limit Df ! 0 the separation T between adjacent wave
packets approaches infinity.
Where the first-order coherence of a given waveform is concerned, the phase
distribution over its spectral range is irrelevant, even though the shape of the
waveform as a function of time is significantly affected by this phase distribution.
For example, in Figure 6.1(b) the phase n is assumed to be a linear function of
frequency, whereas if n is picked randomly for each fn then an extended function
such as that in Figure 6.2 is obtained. (The latter might, for example, be the output
of a multi-longitudinal-mode laser.) There are many possible choices for {n} and
each choice yields a more or less extended function of time. Only in rare occasions
do we find a compact wave packet similar to that in Figure 6.1(b). However, all
functions obtained by different choices of {n} are identical in their first-order
coherence attributes. In other words, the compact packet of Figure 6.1(b) has the
same degree of first-order coherence as the extended waveform of Figure 6.2.
The time-averaged intensity of the waveform at an arbitrary point z ¼ z0 is
readily computed from Eq. (6.2) as follows:
Z
1 T 2 1X 2
hIðz ¼ z0 Þi ¼ a ðz ¼ z0 ; tÞ dt ¼ A Df: ð6:3Þ
T 0 2 n n
Note that the right-hand side of Eq. (6.3), being the area under the square of the
spectral distribution of Figure 6.1(a), remains constant as the sampling rate
increases. Thus reducing Df in order to increase the period T does not affect the
average intensity of the waveform.
7.5
5.0
2.5
a (t)
0.0
–2.5
–5.0
–7.5
0 50 100 150 200
Time (fs)
Figure 6.2 Waveform obtained by Fourier-transforming the frequency

spectrum of Figure 6.1(a) after assigning it a randomly selected phase at each
frequency.
The Mach–Zehnder interferometer

Temporal coherence is usually measured with a Michelson interferometer. For
our present purposes, however, we will consider a slightly modified version of the
Mach–Zehnder interferometer, shown in Figure 6.3. The collimated beam of light
entering the device is split equally between its two arms at the first beam splitter
(BS). The two beams are reflected by the mirrors at the end of each arm, then
recombined at the second BS. If, on the one hand, the two beams happen to be
perfectly in phase when they arrive at the second BS, they interfere constructively
in channel 1 (see Figure 6.3) and deliver their combined total optical energy to
photodetector 1; photodetector 2 in this case receives no light at all. If, on the
other hand, the two beams are relatively phase-shifted by D ¼ 180 , they appear
collectively at detector 2, leaving detector 1 in the dark. For intermediate values
of D the energy of the beams is split between the two detectors, the splitting
ratio being 50/50 when D ¼ þ90 .
Now suppose the relative phase between the two beams can be varied
continuously by adjusting the length of one of the interferometer’s arms. Then S1,
the output of detector 1, reaches its maximum when the two arms become identical
in length. As the length of the adjustable arm then increases by a quarter of a
wavelength, D becomes 180 and S1 reaches a minimum. As long as the two beams
remain coherent (or partially coherent) this behavior is periodically repeated, the
output of each detector oscillating between a maximum and a minimum. Once the
arm lengths differ by more than the coherence length of the beam, the oscillations
die down and both channels receive equal amounts of light, irrespective of the path-
length difference between the arms. For the wave packet of Figure 6.1, we show in
Figure 6.4 the computed output of detector 1 as a function of D z ¼ 12 cs, where s is
the time delay between the two arms of the interferometer.
The time-averaged detector outputs may be written
Z 2
1 T 1
S1;2 ðsÞ ¼ ½aðtÞ aðt sÞ dt
T 0 2
Z T Z T
1 1
¼ a ðtÞ dt
2
aðtÞ aðt sÞ dt
2T 0 2T 0
" #
1 1X 2
¼ hIi A D f cosð2p fn sÞ :
2 2 n n ð6:4Þ
The first term on the right-hand side of this equation is a constant, independent
of s, while the second term is the autocorrelation function of the waveform a(t)
and coincides with the first-order field coherence function in the case of a
stationary process. The Fourier series coefficients of this autocorrelation
Movable reflector
Photodetector 2
ΔZ
Photodetector 1
Mirror
Beam-splitter 2
Mirror
Incident beam
Beam-splitter 1 Fixed reflector
Figure 6.3 The Mach–Zehnder interferometer is used in analyzing the tem-

poral coherence of a collimated beam of light. The incoming beam is split
equally between the two arms of the device at the first BS. The two arms are
identical except that the end-reflector is fixed in one arm and movable in the
other. After traveling along these separate arms the beams are recombined at
the second BS. When the optical path lengths of the two arms are identical, the
beams interfere constructively in channel 1 and deliver their entire energy to
photodetector 1. Deviations from path-length equality can send to channel 2
either the entire beam or a fraction of it. The movable reflector is used to adjust
the optical path-length difference between the arms.
1.0
0.8
0.6
S1
0.4
0.2
0.0
–3 –2 –1 0 1 2 3
Δz (μm)
Figure 6.4 The signal S1 of detector 1 as a function of the extension Dz of the

movable end-reflector of the interferometer. The assumed incoming beam is the
packet shown in Figure 6.1.
function are {A2n} and are independent of {n}. It is thus clear that the signals
S1(s) and S2(s), and hence the first-order temporal coherence of the waveform,
depend only on the magnitude – and not the phase – of the spectral distribution,
as was asserted earlier.
Coherence length
Figure 6.5 shows the waveforms arriving in channels 1 and 2 when the wave
packet of Figure 6.1(b) is sent through the interferometer, with its movable arm
extended by D z ¼ cT/8 ¼ 7.91 lm. The time delay between the packets traveling
in the two arms is therefore s ¼ 14T. Since this delay is longer than the duration of
each packet, the two packets upon arriving at the second BS do not overlap and,
therefore, appear separately in both channels. Obviously no interference takes
(a)
7.5 Channel 1
5.0
2.5
a (t)
0.0
–2.5
–5.0
–7.5
0 50 100 150 200
(b)
7.5 Channel 2
5.0
2.5
a (t)
0.0
–2.5
–5.0
–7.5
0 50 100 150 200

Time (fs)
Figure 6.5 Waveforms arriving at (a) channel 1 and (b) channel 2 of the
Mach–Zehnder interferometer. The assumed incoming beam is the packet
of Figure 6.1, and the movable arm of the interferometer has been extended by
Dz ¼ cT/8 ¼ 7.91 lm. Because the delay is longer than the width of the packet
no interference takes place. The two packets act independently and appear in
both channels, albeit at half the original magnitude of the incoming wave. Note
that the first packet in channel 2, having been transmitted through both beam-
splitters, is flipped relative to the second packet, which has been reflected at
both beam-splitters. In contrast, each packet arriving in channel 1 has been
reflected at one and transmitted at the other beam-splitter. As a result, there is
no relative phase shift between the two packets in channel 1.
place in this case and each channel receives an equal share from each packet,
each with one-half of the original amplitude.
In the above example, where the delay s between the two arms of the inter-
ferometer is 14T, one can divide the frequency content of the wave packet into
four categories. The first category consists of the frequencies f ¼ 85 D f, 89 D f,
93D f, . . . , 113 D f. All these terms are phase-shifted by 90 and, when combined
at the second BS, are equally split between channels 1 and 2. The output of
channel 1 for these frequency components is shown in Figure 6.6(a). The second
category consists of the frequencies f ¼ 86 D f, 90 D f, 94 D f, . . . , 114 D f, which
are phase-shifted by 180 and, therefore, appear exclusively in channel 2. The
third category, consisting of the frequencies f ¼ 87D f, 91D f, 95D f, . . . , 111Df ,
is phase-shifted by 90 and is, once again, equally split between the two
channels; the output of channel 1 for these components is shown in Figure 6.6(b).
The fourth and last category consists of frequencies f ¼ 88 D f, 92D f, 96D f, . . . ,
112D f, which are not phase-shifted at all and appear in their entirety in channel 1;
these are shown in Figure 6.6(c). Now if the three sets of signals in Figure 6.6 are
added together the twin packet of Figure 6.5(a) will be obtained.
It is clear that the behavior of individual frequency components (or groups of
such components that acquire the same phase shift) is independent of all the
other components; this is simply a statement of the principle of superposition
for the linear system under consideration. Furthermore, the fraction of each
component appearing in a given channel is only a function of the phase delay
acquired by that component between arms 1 and 2, independent of the original
phase of that component. Remembering that the various frequency terms are
orthogonal to each other, the behavior of the overall waveform within the
interferometer must be independent of the initial phase of its individual com-
ponents. Thus we see that the analysis of the packet of Figure 6.1(b) applies
equally to the extended waveform of Figure 6.2. These different-looking
functions share the same spectrum but have differing phase distributions
over their common range of frequencies. In particular, the coherence length is
equal to the width of the wave packet obtained by setting all n equal to zero.
The width of the packet, of course, is roughly equal to the inverse of its spectral
bandwidth.
In addition to the phase angles n initially present in, and those acquired
during propagation of, a given wave packet, the field may accumulate further
phase shifts due to dispersive elements (such as mirrors and prisms) in its path.
These phase shifts manifest themselves as delays or distortions of the packet. It
is of some interest, therefore, to study reflection and transmission delays caused
by dispersive elements in order to evaluate their impact on interferometric
measurements.
4 (a)
a (t)
0
–2
–4
0 50 100 150 200
4 (b)
2
a (t)
–2
–4
0 50 100 150 200
4 (c)
2
a (t)
–2
–4
0 50 100 150 200
Time (fs)
Figure 6.6 The spectrum of the wave packet in Figure 6.1(a) can be considered
as the superposition of four groups of frequencies. One of these groups appears
exclusively in channel 2. The other three groups appear in channel 1 either fully or
partially. The waveforms shown here are those that would have appeared in
channel 1 had the other groups been absent. When these three waveforms are
added together they reconstruct the pair of wave packets shown in Figure 6.5(b).
Delay upon reflection

As an example consider a 12-layer dielectric stack, Figure 6.7, consisting of
alternating layers of quartz and strontium titanate. At the central wavelength of
k0 ¼ 632.8 nm the refractive indices of these materials are 1.46 and 2.39,
Quartz Strontium
(108 nm) titanate
Substrate
(66 nm)
Incident beam
Figure 6.7 Schematic diagram of a quarter-wave stack consisting of six pairs

of SiO2/SrTiO3 layers; the entire stack is 1044 nm thick. To simplify the analysis
of the transmitted beam, the central region of the substrate is assumed to have
been etched away. In calculating the reflection and transmission coefficients of
the stack the wavelength dependence of the refractive indices of both types of
layer has been taken into consideration.
respectively.6 (The indices vary somewhat within the wavelength range of interest,
and the corresponding dispersion is taken into account in the following calcula-
tions.) The thickness of the quartz layer is 108 nm and that of SrTiO3 is 66 nm,
each being a quarter-wave thick at k0. The stack is grown on a substrate whose
central region has been subsequently removed. The hole thus created in the
substrate is of no consequence for our analysis of reflection, but it simplifies the
discussion in the following section concerning transmission through the stack.
Figure 6.8 shows computed plots of amplitude and phase for the reflection and
transmission coefficients of the stack in the frequency range covered by the wave
packet of Figure 6.1. Note that, within the bandwidth of interest, the phase r of
the reflection coefficient is essentially a linear function of frequency with a slope
of 1.5 per THz. This slope represents a 4.2 fs delay for the packet upon reflection
from the stack. It might therefore be argued that, upon arrival at the surface, the
packet spends 4.2 fs in exploring the stack before bouncing back. Roughly
speaking, the delay may be associated with a penetration depth of 625 nm for
this stack 1044 nm thick. (For an aluminum mirror the corresponding slope is
found to be 0.03 per THz, leading to a reflection delay of 0.083 fs and an
estimated penetration depth of only 12.5 nm.)
Delay upon transmission

For the wave packet transmitted through the stack of Figure 6.7 the slope of the
phase t( f ) in Figure 6.8(b) is about 0.95 per THz, which amounts to a delay of
D t ¼ 2.6 fs. Note, however, that the total thickness of the stack is 1044 nm,
1.00 (a)
|r|
0.75
Amplitude
0.50
0.25
|t|
0.00
3.75 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75
200 (b)
fr
100
ft
Phase (degrees)
–100
–200
3.75 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75
Frequency (1014 Hz)
Figure 6.8 Computed amplitude and phase of the reflection and transmission
coefficients r and t of the multilayer stack of Figure 6.7. The depicted range of
frequencies covers the entire bandwidth of the wave packet shown in Figure 6.1.
requiring 3.5 fs for the light to cover this distance at its vacuum speed c. It
appears therefore that in passing through the stack the packet has exceeded the
speed of light.7,8,9,10 Since the special theory of relativity appears to have been
violated, we take a closer look at the transmitted beam.
Note in Figure 6.8(a) that the transmitted amplitude jt j is not constant over the
range of frequencies of the wave packet but rises at both ends. This means that
the actual transmitted spectrum is somewhat broadened (see Figure 6.9(a)).
Taking into account the actual amplitude and phase of the transmission
coefficient, we find the transmitted packet to be that of Figure 6.9(b). The peak of
this packet is in fact delayed by about 2.6 fs, implying its faster-than-light
0.100 (a)
0.075
A(f )
0.050
0.025
0.000
0 1 2 3 4 5 6
Frequency (1014 Hz)
2 (b)
1
a (t)
–1
–2
5 10 15 20 25 30 35
Time (fs)
2 (c)
1
a (t)
–1
–2
5 10 15 20 25 30 35
Time (fs)
Figure 6.9 The wave packet transmitted through the stack of Figure 6.7 has a
broadened spectrum as shown in (a). This spectral broadening, together with the
linear phase shift t ( f ) depicted in Figure 6.8(b), results in the compressed and
delayed packet shown in (b). Had the spectral broadening been ignored and only
the phase shift t(f) taken into account, the transmitted packet would have
resembled that in (c).
propagation, but the entire packet is also compressed, which means that its
starting point is about 5 fs behind that of the incoming packet (compare Figure
6.9(b) with Figure 6.1(c)). This delay of the starting point ensures that special
relativity is not violated. Had we ignored the broadening of the spectrum and
only included the phase shift t( f ) in our transmission calculations, we would
have obtained the packet of Figure 6.9(c), which is only delayed relative to the
1.0
0.9
S1 0.8
0.7
0.6
–3 –2 –1 0 1 2 3
Δz (μm)
Figure 6.10 The signal S1 of detector 1 versus the extension Dz of the movable
end-reflector of the interferometer. The stack of Figure 6.7 is installed in the
fixed arm while the adjustable arm is extended to compensate for the trans-
mission delay through the stack. The incoming beam is assumed to be the wave
packet of Figure 6.1(b)
incoming packet by 2.6 fs, in obvious violation of special relativity. Spectral

broadening caused by the transmission curve of the stack thus results in a
compression that ultimately delays the emergence of the packet, and in so doing
reaffirms the impossibility of communication beyond the speed of light.
It is interesting to note that a measurement of the transmission delay by the
interferometer of Figure 6.3 also leads to an apparent violation of special
relativity. Such a measurement ostensibly determines the delay by measuring
the peak of S1 (the output of detector 1) when the multilayer stack is inserted in
the fixed arm of the device and the movable arm is extended to maximize S1.
The corresponding signal (see Figure 6.10) is obtained by cross-correlating
the wave packets of Figures 6.1(c) and 6.9(b). The peak of this curve occurs at
Dz ¼ 0.4 lm, which is in agreement with the 2.6 fs delay calculated earlier. One
must bear in mind, of course, that the interferometer measures the average delay
of the packet upon transmission through the stack, and not the delay of its
starting point.

1980.
4 R. Loudon, The Quantum Theory of Light, second edition, Clarendon Press, Oxford,
1992.
5 P. Meystre and M. Sargent III, Elements of Quantum Optics, Springer-Verlag,
Berlin, 1990.
6 The refractive indices of strontium titanate in the wavelength range
(500 nm, 800 nm) are taken from W. J. Tropf, M. E. Thomas, and T. J. Harris,
Handbook of Optics, Vol. II, Michael Bass, editor, chapter 33, p. 33.72. Those of
quartz and aluminum are taken from the Handbook of Chemistry and Physics, 67th
edition, R. C. Weast, editor.
7 C. G. B. Garrett and D. E. McCumber, Phys. Rev. A 1, 305, 1970.
8 S. Chu and S. Wong, Phys. Rev. Lett. 49, 1293, 1982.
9 R. Y. Chiao, P. G. Kwiat, and A. M. Steinberg, Faster than light? Scientific
American, 52–60, August 1993.
10 R. Y. Chiao and A. M. Steinberg, Tunneling times and superluminality, in Progress
in Optics, ed. by E. Wolf, Vol. 37, 347–406, Elsevier, Amsterdam, 1997.
7
The van Cittert–Zernike theorem
The beam of light emanating from a quasi-monochromatic point source (or a

sufficiently distant extended source) is said to be spatially coherent: the reason is
that, at any two points on a given cross-section of the beam, the oscillating
electromagnetic fields maintain their relative phase at all times. If an opaque
screen with two pinholes is placed at such a cross-section, Young’s interference
fringes will form, and the observed fringe contrast will be 100% (at and around
the center of the fringe pattern). This is the sense in which the fields at two points
are said to be spatially coherent relative to each other. If the relative phase of the
fields at the two points varies randomly with time, the pair of point sources will
fail to produce Young’s fringes and, therefore, the fields are considered to be
incoherent. In practice there is a continuum of possibilities between the afore-
mentioned extremes, and the resulting fringe contrast may fall anywhere between
zero and 100%. The fields at the two points are then said to be partially coherent
with respect to one another, and the properly defined fringe contrast in Young’s
experiment is used as the measure of their degree of coherence.
Optical systems involving partially coherent illumination are explored in
several other chapters of this book; see, for example, “Coherent and incoherent
imaging” (Chapter 5), “Michelson’s stellar interferometer” (Chapter 35),
“Zernike’s method of phase contrast” (Chapter 38), and “Polarization micro-
scopy” (Chapter 39). Chapter 6 described first-order temporal coherence using a
simple analytical method. A similar approach will be employed here to study
first-order spatial coherence.
Coherence theory has been treated extensively in modern and classical text-
books, and it is not our intention here to repeat what is already well known.1,2,3,4
Our goal is to present a simple derivation of the van Cittert–Zernike theorem without
invoking the theories of probability and stochastic processes. This is possible
because most sources of practical interest are ergodic, meaning that time-averaging
over a typical waveform emanated by the source yields statistical information about
88
the source’s inherently random radiation processes. We shall make exclusive use of
time-averaging to derive the degree of coherence of a pair of points within the field of
an extended, quasi-monochromatic, incoherent source.
Time dependence, frequency spectrum, and phase

Consider a point source P, radiating into free space with a range of temporal
frequencies at and around f ¼ f0. The discrete frequencies fn are assumed to have a
fixed spacing Df as follows:
fn ¼ f0 þ nDf ¼ ðN0 þ nÞD f : ð7:1Þ
At a given point in space, the (scalar) amplitude of the radiated waveform may be
written
X
aðtÞ ¼ An ðDf Þ1=2 cosð2pfn t n Þ; ð7:2Þ
n
where An and n are the amplitude and phase of the component whose frequency
is fn. The significance of the constant multiplier (Df)1/2, which is there for nor-
malization purposes only, becomes clear shortly. We set the central frequency
f0 ¼ 5.454 · 1014 Hz (corresponding to yellow light of wavelength k ¼ 550 nm)
and choose Df ¼ 5.454 · 1012 Hz, which leads to N0 ¼ 100. We adopt a Gaussian
shape for the distribution of the amplitudes An, as shown in Figure 7.1(a), and let
the value of n in Eq. (7.2) range from 4 to þ4, for a total of nine discrete
wavelengths in the spectrum. To a large extent these choices are arbitrary but, as
before, the points that we seek to clarify by way of examples based on these
choices are quite general in nature.
Since in the Fourier domain the spectrum in Figure 7.1(a) is a discrete function
of frequency, the corresponding amplitude a(t) must be periodic, with a period of
T ¼ 1/Df 183 fs. A plot of a(t) over a full period T is shown in Figure 7.1(b),
where the values of n at each frequency are chosen randomly and independently
of each other. To increase the period T without changing the overall shape of the
spectrum one must increase the rate of spectral sampling in Figure 7.1(a), by
selecting additional frequencies in between those that are already chosen. In this
way the spectrum retains its shape, but Df becomes smaller while T becomes
larger. In the limit Df ! 0 the period T of the waveform approaches infinity.
As far as the first-order coherence of a given waveform is concerned, the specific
phase distribution over its spectral range is irrelevant, even though the shape of the
function a(t) may be significantly affected by this phase distribution. For example,
in Figure 7.1(b) the value of n at each frequency is chosen randomly, whereas if
n were chosen as a linear function of frequency then a waveform such as that of
1.0 (a)
0.8
Amplitude 0.6
0.4
0.2
0.0
0 1 2 3 4 5 6
Frequency (1014 Hz)
3 (b)
1
Amplitude
–1
–2
–3
0 25 50 75 100 125 150 175
Time (fs)
(c)
4
2
Amplitude
–2
–4
0 25 50 75 100 125 150 175

Time (fs)
Figure 7.1 (a) A truncated Gaussian function sampled at regular intervals

represents the frequency spectrum of a waveform. (b) The waveform obtained by
Fourier-transforming the spectrum in (a), after assigning it a randomly selected
phase at each frequency. Since the spectrum is sampled at Df ¼ 5.454 · 1012 Hz,
its Fourier transform is repeated with a period of 183 fs. Only one period of the
waveform is shown. (c) The waveform as a function of time derived by Fourier-
transforming the spectrum in (a), assuming that its phase is a linear function of
frequency.
Figure 7.1(c) would have been obtained. There are many possible choices for {n},
and each one yields a more or less extended function of time such as that of Figure
7.1(b). Only in rare occasions does one find a compact wave packet similar to that
of Figure 7.1(c). However, the compact packet has the same first-order coherence
properties as the extended waveform.
Intensity
The average intensity of the waveform a(t) in Eq. (7.2) is readily computed as
follows:
Z
1 T 2 1X 2
hIi ¼ a ðtÞ dt ¼ A Df: ð7:3Þ
T 0 2 n n
Note that the right-hand side of Eq. (7.3), being the area under the square of the
spectral distribution function of Figure 7.1(a), remains constant as the sampling
rate increases. Thus reducing Df in order to increase the period T does not affect
the average intensity.
Although the average intensity does not depend on {n}, the fluctuations in
intensity are most definitely affected by this phase distribution. A thermal source
(such as an incandescent lamp) tends to “assign” the values of n randomly and
independently of each other, thus resulting in significant fluctuations in I(t). This
behavior may be observed by examining the typical waveform in Figure 7.1(b).
In fact, it can be shown that, for a thermal source, h[I(t) hIi]2i ¼ hIi2. However,
it is possible to assign the phase angles in such a way as to minimize the intensity
fluctuations. In a well-stabilized single-mode laser, for instance, the locking of
the phase angles renders the root-mean-square fluctuations of intensity negligible,
that is, hI2(t)i ¼ hIi2. These considerations, however, pertain to higher-order
statistics and, as far as first-order coherence is concerned, one could as well
ignore the specific phase distribution.
The cross-correlation function

Consider two points P and P0 on an extended source. The oscillations at these
0
points are independent of each other. Thus the set of values {n} and { n}
assigned to aP(t) and aP0 (t) may be considered independent. The cross-correlation
function between the amplitudes at P and P0 is given by
Z
1 T 1X 2
Cpp0 ðsÞ ¼ aP ðtÞaP0 ðt sÞ dt ¼ An Df cos 2p fn s þ n 0n : ð7:4Þ
T 0 2 n
0
Since n and n are randomly selected with a uniform distribution over [0, 2p],
0
it follows that their difference n n is also a random variable with the same
distribution. The function CPP0 (s) thus resembles a(t) of Eq. (7.2) (with random
phase angles), depicted in Figure 7.1(b). There is, however, a major difference
between these functions: whereas Df in Eq. (7.2) appears with a power 12, the
corresponding power of Df in Eq. (7.4) is unity. This means that the average of
C2PP0 (s) is inversely proportional to T, namely,
Z
1 T 2 1 X 4
CPP0 ðsÞ ds ¼ A Df: ð7:5Þ
T 0 8T n n
Thus when T!1 the magnitude of CPP0 (s) for essentially all s goes to zero,
whereas in the same limit the average intensity hIi given by Eq. (7.3) remains
non-zero. If the fields from P and P0 are brought together in an attempt to create
interference fringes, their combined intensity will be the sum of their individual
intensities plus the cross-correlation term CPP0 (s). Since CPP0 (s)!0 for suffi-
ciently long T, the intensity of the sum will be the sum of individual intensities
and, therefore, no fringes will be observed.
Interpretation
One may think of the radiation emanating from the two point sources P and P0 in
terms of two finite-duration wave packets (see Figure 7.1(c)). However, since the
wave packets do not have a random relative phase it is impossible to get their
cross-correlation to vanish. Nonetheless, we can assume that the packets are
separated in time by an interval much longer than their individual widths and also
much longer than any time delay that might occur in a system under consider-
ation. In other words, as far as first-order coherence is concerned, an extended
incoherent source emitting continuous radiation from its various points is
equivalent to an identical source that emits relatively short bursts of light sepa-
rated by long intervals. In this model of an incoherent, quasi-monochromatic,
extended source each point emits only one pulse, no two points emit overlapping
pulses, and all pulses from the various source locations have the same duration
and shape.
As an example, consider an imaging system where a quasi-monochromatic
spatially incoherent light source illuminates a sample, of which an image is
formed on a photographic plate. One may imagine the individual points of the
source as being independent coherent point sources, each creating a coherent
image of the sample on the photographic plate. Because different points
radiate at different times, there will be no interference among the various
images. The photographic plate duly records the intensity pattern produced by
each point source, automatically adding these images together as they arrive
sequentially. The final image is thus the sum of the intensity distributions of
all the coherent images produced by the various point sources.
Double pinhole interference

Figure 7.2 shows a quasi-monochromatic point source P, illuminating a screen
located at z ¼ 0. The screen is pierced with two small pinholes, which are
separated by a distance d along the X-axis; their interference fringes are
observed on the ng-plane at z ¼ z0. We assume that P is far enough away to
yield equal intensities at the pinholes. Also the path-length difference D‘ from
P to the pinholes must be short compared to the coherence length of the
source, in order to give the pinhole radiations a high degree of temporal
coherence. The relative phase between the fields at the pinholes, however, is
not negligible and is given by D ¼ 2pD‘/k. This phase difference causes a
translation of the fringe pattern along the n-axis. In the neighborhood of the
X j
z0
d
Z
P
Point source
Double pinhole Observation

screen
Figure 7.2 A pair of pinholes in the XY-plane at z ¼ 0 is illuminated by a

relatively distant point source at P. The resulting interference fringes are
observed at the fg-plane located at z ¼ z0.
origin at the observation plane the intensity distribution is given by

I ðnÞ ¼ aI ðPÞf1 þ cos½ð2p=kÞðd=z0 Þn þ Dg: ð7:6aÞ
Here a is an inconsequential proportionality constant and I(P) is the intensity of

the point source at P. Note that the fringe periodicity is independent of the
location of the source P; it is determined solely by the values of d, z0, and k. The
shift of the fringe pattern along the n-axis, however, is a function of D, which
does depend on the location of the source.
For future reference we rewrite Eq. (7.6a) using complex notation as
follows:

I ðnÞ ¼ a I ðPÞ þ a Re I ðPÞ expðiDÞ exp½i2pdn=ðkz0 Þ : ð7:6bÞ
The van Cittert–Zernike theorem

This theorem, which was first discovered by van Cittert5 and later in a simpler
form by Zernike,6 relates the intensity distribution of an extended, quasi-
monochromatic, planar source to the degree of spatial coherence observed on a
parallel plane located at a relatively large distance from the source. Figure 7.3
shows a spatially incoherent, quasi-monochromatic source of wavelength k in
X X j
zs z0
(x, y, –zs) (x1, y1, 0)
Source
(x2, y2, 0)
Y Y h
Figure 7.3 An extended, quasi-monochromatic, spatially incoherent, planar

source is placed in the X 0 Y 0 -plane at z ¼ zs. The light from each point (x,y,zs)
of this source reaches the points (x1,y1,0) and (x2,y2,0) on the XY-plane at z ¼ 0.
A pair of pinholes placed at the latter locations produces a fringe pattern in the
fg observation plane at z ¼ z0. The superposition of all intensity distributions
thus produced by the various points of the source yields the final intensity pattern
at the observation plane.
the X0 Y 0 -plane at z ¼ zs. The distance zs between the source and the XY-plane
at z ¼ 0, on which we seek to determine the degree of coherence, is large
enough that all the simplifying assumptions invoked in the previous sections
still apply. We wish to determine the first-order coherence properties of the
light that reaches the XY-plane at z ¼ 0. We select two points (x1,y1) and (x2,y2)
on this plane and assume that two pinholes are placed at these points. The light
reaching the pinholes from a point source at (x,y,zs) will have nearly the
same amplitude but different phase. The phase difference at the pinholes is
given by
D ¼ 2pD‘=k
2p 1 2
x1 þ y21 12 x22 þ y22 ½ðx1 x2 Þx þ ðy1 y2 Þyg:
kzs 2 ð7:7Þ
Consider what happens when all the point sources are active. They all act
independently, each creating its own fringe pattern at the observation screen.
All fringes thus produced will have the same period but different strengths and
are shifted by different amounts along the n-axis. Because the point sources are
completely incoherent, their overlapping fringe patterns must simply be added
together. In other words, the final intensity distribution is the sum of Eq. (7.6)
over all points P. We assume a to be the same for all the point sources. The
fringe period kz0/d is also the same. Therefore, the sum of Eq. (7.6b) over all
point sources may be written as follows:
Z
I ðnÞ ¼ a I ðx; yÞdx dy
Z
source

þ aRe I ðx; yÞ exp½iDðx; yÞdx dy exp½i2pdn=ðkz0 Þ : ð7:8Þ
source
To simplify the notation we define the following parameters:

Z
I0 ¼ a I ðx; yÞ dx dy ð7:9aÞ
source
Z
I^ðx; yÞ ¼ I ðx; yÞ I ðx; yÞ dx dy ð7:9bÞ
source
Z
cðx1 ; y1 ; x2 ; y2 Þ ¼ I^ðx; yÞ exp½iDðx; yÞ dx dy: ð7:9cÞ
source
Equation (7.8) may then be rewritten as

I ðnÞ ¼ I0 ð1 þ Refcðx1 ; y1 ; x2 ; y2 Þ exp½i2pdn=ðkz0 ÞgÞ: ð7:10Þ
A comparison of Eqs. (7.10) and (7.6) reveals that the fringe contrast produced
by the pinholes at (x1, y1) and (x2, y2) is equal to jcj and that the phase of c
determines the shift of these fringes from the center. The function c is thus
described as the complex degree of spatial coherence between (x1, y1) and
(x2, y2). Substituting expression (7.7) for D in Eq. (7.9c) yields

cðx1 ; y1 ; x2 ; y2 Þ ¼ exp ip x12 þ y21 x22 þ y22 ðkzs Þ

Z
· I^ðx; yÞ expfi2p½ðx1 x2 Þx þ ðy1 y2 Þy=ðkzs Þgdx dy:
source
ð7:11Þ
Equation (7.11) is a compact statement of the van Cittert–Zernike theorem:

aside from a phase factor, the complex degree of spatial coherence is the
Fourier transform of the (normalized) intensity distribution at the incoherent
source.
Example
Consider the uniform, quasi-monochromatic, incoherent source depicted in
Figure 7.4(a). The source’s central wavelength is k, and its linear dimensions
are 3250k on each side. A square array of 13 · 13 independent point sources on
a rectangular mesh (with spacing 250k) is used to simulate this source. A pair of
pinholes in an otherwise opaque screen is located at zs ¼ 107k from the source.
The square pinholes shown in Figure 7.4(b) are each of side-length 350k
and separated by a distance d along the X-axis. The light from the source,
having gone through the pinholes, arrives at the observation plane located at
z0 ¼ 106k.
Figure 7.5 shows the computed fringe patterns at the observation plane for four
different values of d. Note that with increasing d the fringe period decreases. The
fringe contrast also declines at first, going to zero when d ¼ 3333k. Subsequently,
however, the contrast increases as d continues to increase. Whereas in frames (a)
and (b) the central fringe is bright, in frame (d) corresponding to d ¼ 4000k the
central fringe becomes dark. This is equivalent to a half-period shift of the pattern
upon crossing the point of zero contrast.
Figure 7.6 shows cross-sections of the fringe patterns of Figure 7.5. The
contrast calculated from these plots can be shown to be in good agreement with
the values predicted by the van Cittert–Zernike theorem.
a b
–3000 x/ 3000 –3000 x/ 3000
Figure 7.4 (a) Intensity distribution over the surface area of a uniform, quasi-
monochromatic, incoherent source. The linear dimensions of the source
are 3250k along each side, where k is the wavelength of its radiation. (b) A
pair of square pinholes each measuring 350k along each side. The center-to-
center spacing d between the pinholes is an adjustable parameter of the
simulations.
a b
c d
–1500 x/ 1500 –1500 x/ 1500
Figure 7.5 Computed intensity distributions in the vicinity of the optical axis
at the observation plane of Figure 7.3 for the source and pinholes of Figure 7.4.
The distance between the source and the plane of the pinholes is zs ¼ 107k, while
the distance between the pinholes and the observation screen is z0 ¼ 106k. Each
frame corresponds to a different spacing d between the pinholes: (a) d ¼ 1250k;
(b) d ¼ 2500k; (c) d ¼ 3333k; (d) d ¼ 4000k.
1.0 1.0 (b)
(a) d = 1250 d =2500
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
–4800 –2400 0 2400 4800 –2400 –1200 0 1200 2400
1.0 1.0 (d)

(c) d = 3333 d = 4000
0.8 0.8
Normalized Intensity
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
–1800 –1200 –600 0 600 1200 1800 –1500 –1000 –500 0 500 1000 1500
x/ x/
Figure 7.6 Cross-sectional view of the intensity distributions of Figure 7.5.

(Note: the scale of the horizontal axis is different for each plot.) The fringe
contrast, which is 0.75 in (a), drops to 0.3 in (b), and is effectively zero in (c).
With the increasing of the pinhole separation the contrast climbs up once again to
0.15 in (d). Whereas the central fringe in (a) and (b) is bright, it is dark in (d).

1980.
4 R. Loudon, The Quantum Theory of Light, second edition, Clarendon Press, Oxford,
1992.
5 P. H. van Cittert, Physica 1, 201 (1934).
6 F. Zernike, Physica 5, 785 (1938).
8
Partial polarization, Stokes parameters,
and the Poincaré sphere
A strictly monochromatic plane wave is fully polarized; to obtain partial

polarization one must consider a superposition of two or more plane waves of
differing wavelengths. A collimated beam of light is considered to be fully
polarized if a quarter-wave plate followed by an ideal polarizer can be used to
extinguish the beam. Failure at extinction reveals the beam as either fully or
partially unpolarized.
In the classical literature it is customary to analyze the degree of polarization
of a beam of light in terms of the cross-correlation function between two orth-
ogonal components of the beam’s E-field.1,2,3 It is somewhat easier, however, to
carry out the same calculations in the frequency domain and so to derive the
relevant parameters as integrals over the frequency spectrum of the beam. One
advantage of the latter approach is that it applies to beams of arbitrary bandwidth,
thus removing from the results the restriction to quasi-monochromaticity.
Another advantage is that it avoids the use of mutual coherence, which, at times,
tends to confuse discussion of the subject. In the following sections we show how
a frequency-domain analysis leads to a compact expression for the degree of
polarization of a polychromatic beam of light in terms of its Stokes parameters.
Orthogonal polarization components

Consider a polychromatic beam of light propagating along the Z-axis and
possessing polarization components in both the X- and the Y-direction:
X
Ex ðz; tÞ ¼ An ðD f Þ1=2 cos½2p fn ðt z=cÞ þ n ; ð8:1aÞ
n
X
Ey ðz; tÞ ¼ Bn ðD f Þ1=2 cos½2p fn ðt z=cÞ þ wn : ð8:1bÞ
n
100
George Gabriel Stokes Jules Henri Poincaré

Sir George Gabriel Stokes (1819–1903). Irish-born mathematical physicist; he
spent most of his adult life at Cambridge, where he held the Lucasian chair for
over half a century. He was an intimate friend of Lord Kelvin and James Clerk
Maxwell. Stokes’ most important researches were concerned with hydrodynamics,
optics, and geodesy. In optics he was mainly responsible for the explanation of
fluorescence, and made significant contributions to the theory of diffraction. He
was generous in sharing his ideas with colleagues and students and readily gave
credit to others when there were any priority disputes. A few days after his death,
The Times of London wrote in an obituary that “Sir G. Stokes was remarkable . . .
for his freedom from all personal ambitions and petty jealousies.” (Photo: courtesy
of AIP Emilio Segré Visual Archives, E. Scott Barr Collection.)
Jules Henri Poincaré (1854–1912), received his doctorate in mathematics

from the University of Paris in 1879, and was appointed, in 1886, to the chair
of mathematical physics at the Sorbonne and to a chair at the Ecole Poly-
technique. Having made significant contributions to many aspects of math-
ematics, physics, and philosophy, Poincaré is often described as one of the
great geniuses of all time and as the last universalist in mathematics. In applied
mathematics he studied optics, electricity, telegraphy, capillarity, elasticity,
thermodynamics, potential theory, the theory of relativity, and cosmology. His
studies of the three-body problem in celestial mechanics mark the beginning of
modern chaos theory. He is acknowledged as a co-discoverer, with Albert
Einstein and Hendrik Lorentz, of the special theory of relativity. (Photo: Percy
Bridgman Collection, courtesy of AIP Emilio Segré Visual Archives.)
This beam’s spectrum consists of individual frequencies fn ¼ (N0 þ n)Df within a

finite bandwidth around the central frequency f0 ¼ N0Df, as depicted in Figure 8.1.
The term associated with fn along the X-axis has amplitude An and phase n; the
corresponding term along the Y-axis has amplitude Bn and phase wn; c is the
1.0
0.8
Amplitude
0.6
0.4
0.2
0.0
5.0 5.1 5.2 5.3 5.4 5.5 5.6
Frequency (1014 Hz)
Figure 8.1 Distribution of the amplitudes of a polychromatic beam in the

frequency range (5.4054 5.5146) · 1014 Hz. The arrowheads represent the
sampled values of the amplitude spectrum at intervals of Df ¼ 3.64 · 1011 Hz.
The central frequency f0 ¼ 1500Df ¼ 5.46 · 1014 Hz corresponds to the wavelength
k ¼ 549.5 nm; the upper and lower bounds of the spectrum are at k ¼ 544 nm and
555 nm, respectively. The sampled amplitudes are representative of An and/or Bn, in
Eqs. (8.1). Associated with each sample is a corresponding phase n or wn (not
shown).
speed of light in vacuum, and the constant multiplier (Df )1/2 is for normalization
purposes only, its significance becoming clear as the discussion proceeds.
As described by Eqs. (8.1), the contribution to the beam of each frequency
term fn is a fully polarized plane wave. For this plane wave, which is elliptically
polarized in general, one can determine the ellipticity and the orientation of the
ellipse of polarization in terms of An, Bn, and n wn. However, the superpos-
ition of different frequency terms, each having a different state of polarization,
results in partially polarized light.
Ideal phase-retarder and polarizer; transmitted power

To determine the degree of polarization of the beam described by Eqs. (8.1), we
assume a perfect retarder and a perfect polarizer placed in the path of the beam,
orthogonal to the propagation direction, as in Figure 8.2. The variable retarder
induces a phase delay v between Ex and Ey, the value of v being adjustable in the
range 180 . It is imperative for the following analysis that v be independent of
optical frequency within the bandwidth of interest; in other words, for the beam
described by Eqs. (8.1) the phase delay v must be the same for all fn contained in
X
Incident
beam u
Retarder Polarizer
Figure 8.2 A polychromatic beam of light propagating along the Z-axis is sent
through a variable retarder and a polarizer. The retarder’s fast and slow axes are
fixed along the X- and Y- directions, but its phase shift v as well as the polarizer’s
orientation angle h may be adjusted to minimize the amount of light that is
transmitted through the system. v must be the same for all the wavelengths
contained in the incident beam.
the spectrum. (A retarder based on total internal reflection provides a good

approximation to an ideal retarder; this is the same principle of operation as used
in a Fresnel rhomb.1) Such a retarder can modify the shape of the cross-correlation
function between Ex and Ey, but, contrary to what has been asserted in the litera-
ture, it cannot destroy the mutual coherence between these components of polar-
ization. The confusion is perhaps rooted in the fact that a time delay between Ex
and Ey can destroy the mutual coherence, whereas a frequency-independent phase
shift v leaves the mutual coherence essentially intact.
The beam is subsequently passed through the polarizer, which may be rotated
around the Z-axis until the transmitted optical power is minimized. The retardation
v is then adjusted and the orientation h of the polarizer is changed accordingly until
the transmitted power reaches its absolute minimum. The optimum retardation v0
together with the optimum orientation h0 of the polarizer thus obtained determine
the state of polarization of that fraction of the beam which is fully polarized. The
minimum transmitted power is a measure of the unpolarized content of the beam.
In the system of Figure 8.2 the amplitude of the light emerging from the
polarizer is
X
Eðz ¼ 0; tÞ ¼ An cosð2p fn t þ n Þ cos h
n

þ Bn cosð2p fn t þ wn þ vÞ sin h ðD f Þ1=2 : ð8:2Þ
Because all frequencies fn in Eq. (8.2) are integer multiples of Df, namely,
fn ¼ (N0 þ n)Df, the transmitted amplitude E(z ¼ 0, t) is a periodic function of
time, with period T ¼ 1/Df. The time-averaged transmitted intensity as a func-
tion of v and h is thus given by
Z
1 T 2
Iðv; hÞ ¼ E ðz ¼ 0; tÞ dt
T 0
1X 2
¼ An cos2 h þ B2n sin2 h þ An Bn sinð2hÞ cosðn wn vÞ D f :
2 n
ð8:3Þ
The presence of D f in the above expression allows a smooth transition from the
discrete sum to a continuous integral in the limit D f ! 0; this, of course, is the
same limit in which T ! 1.
Stokes parameters
To streamline the calculation of the values of v and h that minimize I(v, h), we
follow Sir George Gabriel Stokes (1819–1903) in defining the four parameters
that now bear his name:4
1X 2
S0 ¼ ðAn þ B2n ÞD f ; ð8:4aÞ
2 n
1X 2
S1 ¼ ðAn B2n ÞD f ; ð8:4bÞ
2 n
X
S2 ¼ An Bn cosðn wn ÞD f ; ð8:4cÞ
n
X
S3 ¼ An Bn sinðn wn ÞD f : ð8:4dÞ
n
To minimize the transmitted intensity in Eq. (8.3) we first set the derivative of
I(v, h) with respect to v equal to zero. This yields v0, independently of the
value of h, as follows:
v0 ¼ arctanðS3 =S2 Þ: ð8:5aÞ
Substituting v0 for v in Eq. (8.3) and differentiating with respect to h, we find the
optimum h0 as

h0 ¼ 12 arctan ðS2 =S1 Þ cos v0 þ ðS3 =S1 Þ sin v0 : ð8:5bÞ
The transmitted intensity thus turns out to have a minimum at (v0, h0) and a
maximum at (v0, h0 þ 90 ), or vice versa. These values are given by
Imin ¼ 12 S0 12 ðS12 þ S22 þ S32 Þ1=2 ; ð8:6aÞ
Imax ¼ 12 S0 þ 12 ðS12 þ S22 þ S32 Þ1=2 : ð8:6bÞ
Degree of polarization
The minimum transmitted intensity Imin in Eq. (8.6a), being that part of the beam
which cannot be extinguished with a retarder and a polarizer, represents the
depolarized content of the beam. This, of course, is only half the total amount of
depolarized light, because the same amount must also be contained in Imax. The total
amount of depolarized light, therefore, is 2Imin, while the remaining part, Imax Imin,
is fully polarized. The degree of polarization P of the beam may thus be defined as
1=2
P ¼ ðImax Imin Þ=ðImax þ Imin Þ ¼ ðS1 =S0 Þ2 þ ðS2 =S0 Þ2 þ ðS3 =S0 Þ2 : ð8:7Þ
Using the Schwartz inequality,5 it is not difficult to show that S12 þ S22 þ S32 S02;
consequently, 0 P 1. (See Note 1 at the end of the chapter.)
One may question the generality of the above result because, in deriving it, the fast
and slow axes of the wave-plate were fixed along the X- and Y- axes. In other words,
one wonders if the result would have been different had the axes of the wave-plate
been allowed to rotate around the Z-axis. The result can be shown to be quite general,
however, because P of Eq. (8.7) remains invariant under a rotation of the XY-plane
around Z. The value of S0, being the total power of the beam, obviously remains the
same for arbitrary orientations of the coordinate system. Moreover, with some
elementary algebra, the quantity S12 þ S22 þ S32 may also be shown to be invariant
under coordinate rotation. (See Note 2 at the end of the chapter.)
In retrospect the variable retarder of Figure 8.2 could have been replaced by an
achromatic quarter-wave plate (e.g., a Fresnel rhomb) in a rotary mount. The axes
of the quarter-wave plate could then be made to coincide with the axes of the
ellipse of polarization in order to linearize that part of the beam which is fully
polarized. This is precisely what the variable retarder accomplishes in that it
adjusts the retardation v while maintaining a fixed orientation in the XY-plane.
The Poincaré sphere

In general, the fraction of the beam that is fully polarized has elliptical polari-
zation, with ellipticity g and orientation angle q (this is the angle between the
Z
S3
S
O
2 S2
Y
2
S1
Figure 8.3 The Poincaré sphere is the location of all points S with coordinates
(x, y, z) ¼ (S1, S2, S3). The radius of the sphere is PS0, and the latitude and
longitude of S specify the ellipticity g and orientation angle q of the polarized
component of the beam.
major axis of the ellipse and the X-axis). These parameters may be readily
expressed in terms of the Stokes parameters:
sinð2gÞ ¼ S3 =ðS12 þ S22 þ S32 Þ1=2 ; ð8:8aÞ
tanð2qÞ ¼ S2 =S1 : ð8:8bÞ
Using the above relations, the French mathematical physicist Henri Poincaré
(1854–1912) represented the state of polarization as a point S on the surface of a
sphere, as shown in Figure 8.3. In this representation the three Cartesian
coordinates of S are S1, S2, and S3. Thus, according to Eq. (8.7), the radius of the
Poincaré sphere is PS0, the power of that fraction of the beam which is fully
polarized. The latitude of S is twice the ellipticity g of the polarized component,
in accordance with Eq. (8.8a), while the longitude of S represents twice the
orientation angle q of the major axis of the ellipse of polarization, as prescribed
by Eq. (8.8b).
Unpolarized light
A completely unpolarized beam of light cannot be altered by the wave-plate and
polarizer of Figure 8.2. No matter what the phase shift v of the retarder and the
orientation h of the polarizer may be, the output power will be one-half the input
power. For this light S0 will be the total power of the beam, but S1 ¼ S2 ¼ S3 ¼ 0.
P 2 P
Since S1 ¼ 0, the relation An Df ¼ Bn2 Df implies that the power along the
X-axis equals that along the Y-axis. For natural light, where the polarization
components along the X- and Y- axes are independent of each other, the
relative phase angles n wn are uniformly distributed over (0, 2p) and tend to
be a random function of n. Hence, in the limit D f ! 0, the Stokes parameters S2
and S3 approach zero as well. However, there exist other combinations of n
and wn that yield totally unpolarized light. For example, a superposition of
two equal-magnitude beams of frequencies f1 and f2, where one beam is right-
and the other left-circularly polarized, can be readily shown to be fully
unpolarized.
Partial depolarization by a glass slab upon

reflection or transmission
Figure 8.4 shows a glass slab 100 lm thick and of refractive index n ¼ 1.5, upon
which a linearly polarized beam is incident at an oblique angle c ¼ 75 . The
incident beam has equal amounts of p- and s-polarization with equal phase,
giving its linear polarization a 45 angle relative to both p- and s-directions. The
spectral content of the beam is that depicted in Figure 8.1. Upon reflection from
the slab the computed amplitudes of the p- and s-components of the beam as
Ep(i ) Es(r)
Es(i) Ep(r)
100 m n = 1.5
Ep(t )
Es(t)
Figure 8.4 A polychromatic plane wave is incident on a glass slab 100 lm

thick at c ¼ 75 . The index of refraction of the glass, n ¼ 1.5, is independent of
the wavelength. The incident beam is linearly polarized at 45 to the plane of
incidence, that is, it has equal amounts of p- and s-polarization. The reflected and
transmitted beams are slightly depolarized.
1.0 (a)
180 (b) f(rp) f(rs) 10 (c)
|rs|
135 0
0.8
Rotation and Ellipticity (degrees)

90 –10
|rp|
Phase (degrees)
0.6 45 –20
Amplitude
0 –30
0.4 –45 –40
–90 –50
0.2
–135
–60
–180
0.0 –70
546 548 550 552 554 546 548 550 552 554 546 548 550 552 554
(nm) (nm) (nm)
Figure 8.5 A polychromatic plane wave, having the spectrum of Figure 8.1
and a linear polarization at 45 to the plane of incidence, is reflected from a
glass slab at c ¼ 75 (see Figure 8.4). Shown as functions of k: (a) the reflected
amplitudes jrpj (broken line) and jrsj (solid line); (b) the phase angles of rp
(broken line) and rs (solid line); (c) the reflected polarization state, defined by
the rotation angle q and ellipticity g. For the reflected beam the computed
degree of polarization is P ¼ 0.978, the polarized component is essentially
linear (g ¼ 0.000026 ), and the polarization vector makes an angle q ¼
60.2 with the p-direction.
functions of k are depicted in Figure 8.5(a). Multiple reflections at the two facets
of the slab interfere with each other to produce the fine structure seen in the
spectra of Figure 8.5(a). The phase angles of the reflected p- and s- components
are shown in Figure 8.5(b), and the resulting polarization rotation angle q and
ellipticity g appear in Figure 8.5(c). The knowledge of these quantities allows
one to compute the Stokes parameters from Eqs. (8.4), yielding S1/S0 ¼ 0.495,
S2/S0 ¼ 0.844, S3/S0 ¼ 0.89 · 106. Thus the degree of polarization of the
reflected beam is P ¼ 0.978, the wave-plate’s required phase shift v0 is very
small, 0.00006 , and the polarizer’s angle for minimum transmission must be
set to h0 ¼ 29.8 . It is seen that the polarized content of the reflected beam is
essentially linear (g ¼ 0.000026 ) and is oriented at q ¼ 60.2 relative to the
p-direction.
Similar results may be obtained for the beam transmitted through the slab.
The corresponding amplitudes and phases are shown in Figure 8.6, and the Stokes
parameters are found to be S1/S0 ¼ 0.306, S2/S0 ¼ 0.907, S3/S0 ¼ 0.7 · 106.
Thus the degree of polarization is P ¼ 0.957, the wave-plate’s required phase
109
1.0 50 (c)
(a) 180 (b)
f(tp)
f(ts)
135 40
0.8 |tp|

90
30
45
0.6
Phase (degrees)
Amplitude
0 20
0.4 –45
10
|ts| –90
0.2
0
–135
–180
0.0 –10
546 548 550 552 554 546 548 550 552 554 546 548 550 552 554
(nm) (nm) (nm)
Figure 8.6 The counterpart of Figure 8.5 for the case of transmission through the glass slab 100 lm thick. The computed degree
of polarization of the transmitted beam is P ¼ 0.957, the polarized component is essentially linear (g ¼ 0.000022 ), and the
polarization vector makes an angle q ¼ 35.7 with the p-direction.
shift is v0 ¼ 0.000047 , and the polarizer’s angle for minimum transmission is

h0 ¼ 54.3 . Therefore, the polarized content of the transmitted beam, oriented at
q ¼ 35.7 relative to the p-direction, is essentially linear (g ¼ 0.000022 ).
Partial depolarization upon transmission through

a birefringent slab
Figure 8.7 shows the characteristics of a polychromatic beam of light upon
transmission through a birefringent slab of calcite. The thickness of the slab
is 85lm and its ordinary and extraordinary refractive indices are no ¼ 1.6613
and ne ¼ 1.488. The normally incident beam, which is linearly polarized at 45
to the crystal axes, has the spectrum of Figure 8.1. The computed Stokes
parameters of the transmitted beam are: S1/S0 ¼ 0.023, S2/S0 ¼ 0.259, S3/S0
¼ 0.902. Thus the degree of polarization of the transmitted beam is P ¼ 0.939,
the wave-plate’s required phase shift is v0 ¼ 74 , and the polarizer’s angle
60
1.0 (a) 180 (b) f(tp) f(ts) (c)
|tp|
45
135
0.8 |ts| 30
90
15
Phase (degrees)
0.6 45 0

Amplitude
0 –15
0.4 –45 –30
–45
–90
0.2 –60
–135
–75
–180
0.0 –90
546 548 550 552 554 546 548 550 552 554 546 548 550 552 554
(nm) (nm) (nm)
Figure 8.7 The amplitude, phase, and polarization state of a polychromatic

beam, having the spectrum of Figure 8.1, upon transmission through a calcite
slab 85lm thick (n0 ¼ 1.6613, ne ¼ 1.488). The normally incident plane wave is
linearly polarized at 45 to the crystal axes. (a) Transmitted amplitudes of the p-
component (broken line) and s-component (solid line) versus k. (b) Phase angles of
the p-component (broken line) and s-component (solid line) versus k. (c) Polar-
ization rotation angle q and ellipticity g of the transmitted beam versus k.
The computed degree of polarization upon transmission is P ¼ 0.939, and
the rotation angle and ellipticity of the polarized fraction of the beam are q ¼ 47.6 ,
g ¼ 37 .
of minimum transmission is h0 ¼ 44.3 . The polarized fraction of the trans-

mitted beam is, therefore, elliptical with g ¼ 37 and q ¼ 47.6 , relative to the
p-direction.
Note 1
The Schwartz inequality,5 which concerns the integral of the product of two
complex functions of the real variable x, is written as follows:
Z 2 Z Z

f ðxÞg ðxÞ dx j f ðxÞj2 dx jgðxÞj2 dx:

Defining the complex vectors A and B as
A ¼ ½A1 expði1 Þ; A2 expði2 Þ; . . . ; AN expðiN Þ

B ¼ ½B1 expðiw1 Þ; B2 expðiw2 Þ; . . . ; BN expðiwN Þ
we find S22 þ S32 ¼ jS2 þ iS3j2 ¼ jAB*Tj2 kAk2kBk2 ¼ S02 S21, establishing the
desired inequality.
Note 2
The 2 · 2 matrix

A
T
M¼ A B
T
B
has the following properties:

1
Trace M ¼ S0
2
1
Trace2 M Det M ¼ S12 þ S22 þ S32 :
4
A
Upon rotating the XY-plane through an angle f, the vector will be multiplied
B
on the left by the rotation matrix

cos f sin f
:
sin f cos f
However, under this unitary transformation both the trace and the determinant of
M remain unchanged. Therefore, the beam’s total power S0 and the power of its
polarized component (S12 þ S22 þ S32)1/2 are rotation invariant.

1 M. Born and E. Wolf, Principles of Optics, sixth edition, Pergamon Press, 1983.
4 G. G. Stokes, Trans. Camb. Phil. Soc. 9, 399 (1852). Reprinted in his Mathematical
and Physical Papers, Vol. III, p. 233, Cambridge University Press, 1901.
5 Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill,
New York, 1984.
9
Second-order coherence and the Hanbury
Brown–Twiss experiment
Introduction
The degree of first-order temporal coherence, a function denoted by g(1)(s),
provides information about the coherence length and the power spectral density
of a light source. However, without additional information, g(1)(s) has no
bearing on intensity fluctuations and higher-order statistics of the emitted light.
A quasi-monochromatic laser beam and the beam of light from an incandescent
light bulb, provided that the latter is properly filtered to match the spectral line-
shape of the former, will have identical degrees of first-order coherence. Any
interferometric experiment involving the splitting and superposition of ampli-
tudes would yield identical results for the laser beam and the (properly filtered)
thermal light. Therefore, on the basis of such experiments alone, there is no way
to distinguish the two light sources. It turns out, however, that the intensity
fluctuations of laser light are fundamentally different from those of thermal
light. The two sources can, therefore, be distinguished based on their second-
order coherence properties.1,2,3
An ideal photodetector produces an electrical signal proportional to the “cycle-
averaged intensity” of the E-field (or B-field) of the light beam at the location of
the detector. Assuming that the electrical bandwidth of the detector (including all
associated circuitry) is greater than the bandwidth of the incident light wave by at
least a factor of 2, the output of the detector should accurately represent the
intensity fluctuations of the light beam as a function of time. At a given point in
space, the light beam’s degree of second-order coherence, g(2)(s), may be defined
in terms of the autocorrelation function of the output electric signal from such a
detector. In the next section we derive a general expression for g(2)(s), discuss the
fundamental difference between laser light and thermal light as manifested by
this degree of second-order coherence, and obtain a relationship between g(2)(s)
and g(1)(s) for the case of chaotic (e.g., thermal) light.
113
Historically, the first experiments involving the correlations of intensity

fluctuations of light beams were conducted by Robert Hanbury Brown and
Richard Q. Twiss in the 1950s.4,5,6,7,8 The primary goal of these experiments
was to determine the diameters of astronomical objects using the correlations of
their optical (i.e., visible light) emissions as picked up by a pair of Earth-based
photo-multiplier tubes with an adjustable separation. In the third section we
discuss the case of two distant point sources having a small angular separation,
monitored by a pair of ideal photo-detectors. This simple example, which
contains the essence of the Hanbury Brown–Twiss experiment, shows that the
angular separation between distant point sources can, in principle, be extracted
from the cross-correlation of the output signals obtained from two photo-
detectors with a variable separation distance.
Intensity fluctuations and the degree of second-order coherence

At a fixed point in space, we consider the narrow-band function a(t), which describes
the light amplitude as a function of time, having a discrete spectrum consisting of
2M þ 1 frequencies that range from (N0 M)Df to (N0 þ M)Df, as follows:
NX
0 þM pffiffiffiffiffiffi
aðtÞ ¼ An Df cosð2pnDft þ n Þ: ð9:1Þ
n ¼ N0 M
pffiffiffiffiffiffi pffiffiffiffiffiffi
The frequency component nDf has amplitude An Df and phase n, with Df
being introduced here to allow a smooth transition to the continuum limit later on.
The discrete nature of the spectrum makes the field amplitude a(t) periodic, with
period T ¼ 1/Df. As we add more frequency components to the spectrum to fill
the gaps between adjacent frequencies, Df goes to zero, T goes to infinity (as does
the number of discrete frequencies, 2M þ 1), and the spectral density approaches
a continuous limit denoted by the complex function Â( f ) ¼ A( f ) exp[i( f )].
The instantaneous intensity of the field given by Eq. (9.1) may be written as
follows:
XX
0
a2 ðtÞ ¼ 2 An An Df fcos½2pðn n ÞDft þ n n0
1 0
n n0
þ cos½2pðn þ n0 ÞDft þ n þ n0 g: ð9:2Þ
The second term in the above expression has high frequencies, (n þ n0 )Df , which can
be removed by a low-pass filter. (Such low-pass filtering is inherent to all commonly
available photo-detectors.) The low-frequency terms survive the filtering; upon
rearranging the double sum in Eq. (9.2), the filtered function a2(t) – known as the
9 Second-order coherence 115
cycle-averaged intensity I(t) – becomes:

X X
2M N0 X
þMm
IðtÞ ¼ 1
2 A2n Df þ An Anþm Df cosð2pmDftþnþm n Þ: ð9:3Þ
n m ¼ 1 n ¼ N0 M
The above equation may be written in a compact form:

X
2M
IðtÞ ¼ I0 þ Îm cosð2pmDft þ w Þ; ð9:4aÞ
m
m¼1
where
X
I0 ¼ 12 A2n D f ; ð9:4bÞ
n
N0 X
þMm
Îm ¼ Îm expði wm Þ ¼ An Anþm exp½iðnþm n ÞD f
n ¼ N0 M
N0 X
þMm
¼ Ân*Ânþm D f :
n ¼ N0 M ð9:4cÞ
Equation (9.4a) is the Fourier series representation of the time-dependent (fil-

tered) intensity I(t), consisting of the constant term I0, which is half the area under
the power spectral density function |Â( f )|2, and oscillatory terms having fre-
quency mDf and (complex) magnitude Îm, obtained from the autocorrelation of
the (complex) spectrum Â( f). In the limit Df ! 0, we have
Z1

I0 ¼ 1 Âðf Þ2 df ; ð9:5aÞ
2
0
Z1

Îðf Þ ¼ Îðf Þ exp½iwðf Þ ¼
*
Âð f 0 Þ Â ð f 0 f Þd f 0 : ð9:5bÞ
0
Next, we exploit the fact that I(t) given by Eq. (9.4) is a periodic function of time
with period T ¼ 1/Df, and define the autocorrelation of the cycle-averaged
intensity distribution by averaging over one period T, as follows:
ZT X
2M
1 Îm 2 cosð2pmDf sÞ:
hIðtÞIðt þ sÞi ¼ IðtÞIðt þ sÞdt ¼ I02 þ 12 ð9:6Þ
T m¼1
0
Normalization of the intensity autocorrelation function in Eq. (9.6) yields the

classical degree of second-order coherence as
hIðtÞIðt þ sÞi X2M

gð2Þ ðsÞ ¼ ¼ 1 þ 12 ðÎm =I0 Þ2 cosð2pmD f sÞ: ð9:7Þ
hIðtÞihIðt þ sÞi m¼1
Note that, in deriving the above expression for g(2)(s), no assumptions were
made about the statistical properties of either a(t) or its Fourier transform
A( f) exp[i( f)]. In particular, there is no need for a(t) to be stationary, although
ergodicity will be helpful, as the time-averages obtained over one particular
waveform will then be representative of the entire ensemble of such waveforms.
The classical degree of second-order coherence given by Eq. (9.7) has three
fundamental properties.
(i) g(2)(s) is an even function of s.
(ii) g(2)(s) g(2)(0), that is, the maximum of g(2)(s) occurs at s ¼ 0. This is a
consequence of the fact that all the cosines in Eq. (9.7) reach their peak values
simultaneously at s ¼ 0.
(iii) The value of the function at s ¼ 0 is greater than or equal to unity, namely, g(2)(0) 1,
because, inevitably, R (|Îm|/I0)2 0.
We now study two special cases in some detail.
Case 1: Chaotic light

The main feature of chaotic light is that the spectral phase n associated with
the frequency fn ¼ nDf in Eq. (9.1) is not a well-behaved function of n. For any
particular sample a(t) taken from the ensemble of all possible waveforms, the
spectral amplitude An may have some simple frequency dependence, e.g.,
Gaussian or Lorentzian; however, n will vary randomly from one frequency fn
to the next. This randomness of individual phase values does not influence the
value of I0, given by Eq. (9.4b), which is independent of {n}. It will, however,
have a significant effect on the autocorrelation function that yields the values of
Îm in Eq. (9.4c).
The numerical examples depicted in Figures 9.1 and 9.2 show, respectively,
the cases of Gaussian and Lorentzian spectral density functions. We began these
calculations by assuming a fixed profile for the spectral density |Â( f)|. We then
chose a small sampling interval Df , assigned random phases n to each sample,
and proceeded to compute the autocorrelation of Â(f), from which the values of I0
and Îm were determined; see Eqs. (9.4b) and (9.4c). Finally, we constructed the
degree of second-order coherence g(2)(s) in accordance with Eq. (9.7). Although
the numerical results in each case depended on the specific choice of the random
1.0 (a) 0 (b)

–2
0.8 |Â( f )|
–4
–6
0.6
w –8
0.4 –10 log10(|Îm|/I0)
–12
0.2 –14
–16
0.0
–18
f0 – 2500 Δf f0 f0 + 2500 Δf 1 2500 5000
m
2.0 2.0
1.8 (c) (d)
1.6 g(2)() 1.8
1.4
1.2 1.6 4 ln 2/(w)
1.0
0.8 1.4
0.6
0.4 1.2
0.2
0.0 1.0
–½ 0 ½ –0.002 0 0.002
(1/Δf ) (1/Δf )
Figure 9.1 (a) Gaussian amplitude spectrum |Â( f)| centered at f ¼ f0,
having FWHM equal to w ¼ 1000Df; the sampling interval is conveniently
chosen as Df ¼ 1.0 in arbitrary units. (b) Logarithmic plot of |Îm|/I0 computed
from the Gaussian frequency spectrum depicted in (a) with randomly
assigned phase values to each frequency fn; M ¼ 2500, I0 ¼ 376.35. (c) Plot
of g(2)(s) calculated from Eq. (9.7) with s in units of 1/Df. (d) Close-up
of g(2)(s) showing its Gaussian central part having FWHM ¼ 4ln2/(pw)
0.88/w.
phase values, we found the computed g(2)(s) changed only in small and insig-
nificant ways with each choice of {n}, provided that the chosen D f was small
enough to properly sample the line-shape.
Note that g(2)(s) 1 has a Gaussian form in Figure 9.1(d) and an inverse
Lorentzian form in Figure 9.2(d). This, of course, is not a coincidence and a
general relationship can be shown to exist between the line-shape of chaotic light
and the functional form of g(2)(s). Recalling that the degree of first-order
coherence is defined as
Z 1
P 2
An expði2pnDf sÞDf Âðf Þ2 expði2pf sÞ df
gð1Þ ðsÞ ¼ n P 2 ! 0 Z 1 ; ð9:8Þ
An Df Âðf Þ2 df
n
0
1.0 0
(a) (b)
log10(|Îm|/I0)
0.8 | Â( f )| –1
–2
0.6 w
–3
0.4
–4
0.2
–5
0.0 –6
f0 – 1500 Δf f0 f0 + 1500 Δf 1 5000 10000
m
2.0 2.0
1.8 (c) (d)
1.6 g(2)() 1.8
1.4
1.2 1.6 3 ln 2/(w)
1.0
0.8 1.4
0.6
0.4 1.2
0.2
0.0 1.0
–½ 0 ½ –0.005 0 0.005
(1/Δf ) (1/Δf )
Figure 9.2 (a) Lorentzian amplitude spectrum |Â( f)| centered at f ¼ f0,
having FWHM equal to w ¼ 250Df ; the sampling interval is conveniently
chosen as Df ¼ 1.0 in arbitrary units. (b) Logarithmic plot of |Îm|/I0 computed
from the Lorentzian frequency spectrum depicted in (a) with randomly
assigned phase values to each frequency fn; M ¼ 5000, I0 ¼ 112.32. (c) Plot
of g(2)(s) calculated from Eq. (9.7) with s in units of 1/Df. (d) Close-
pffiffiffi of g (s) showing its inverse Lorentzian central part having FWHM ¼
(2)
up
3 ln 2 / (px) 0.38 / w.
it is not difficult to show that, for chaotic light,

2
gð2Þ ðsÞ 1 þ gð1Þ ðsÞ : ð9:9Þ
To see this, note that Îm in Eq. (9.4c) is the sum of (2M þ 1 m) complex numbers.
Therefore, |Îm|2 contains the sum of the squared moduli of these numbers plus many
cross terms. The cross terms, however, all have random phases and, when a large
number of such terms are added together, they tend to cancel out. What remains,
therefore, is mainly the sum of the squared moduli, namely,
Z 1
2 N0 X þMm
Îm An Anþm D f !
2 2 2 0 2 0 0
j Âðf Þj j Âðf f Þj df D f :
2
ð9:10Þ
n¼N0 M 0
Now, starting with the definition of g(1)(s) in Eq. (9.8), a straightforward

calculation yields
P
N0 þM 2M NP
P 0 þMm
A4n D f D f þ 2 A2n A2nþm D f cosð2pmD f sÞD f
ð1Þ 2 n ¼ N0 M n ¼ N0 M
g ðsÞ ¼ m¼1
P :
ð A2n D f Þ2
n
ð9:11Þ
In the limit when D f ! 0 the first term in the numerator of Eq. (9.11) approaches
zero. As for the second term, the coefficient of cos(2pmDfs) is the same as |Îm|2
given by Eq. (9.10). Also, with reference to Eq. (9.4b), the denominator is equal
to 4I02. It is thus clear that, in the case of chaotic light, Eq. (9.9) is a direct
consequence of Eq. (9.7).
Note that, for chaotic light, g(2)(0) ¼ 2, irrespective of the shape of the spectral
density function (i.e., the line-shape), simply because g(1) (0) ¼ 1 according to
Eq. (9.8). Also, if the linewidth approaches zero, g(2)(s) becomes very broad; in
the limit of zero linewidth, therefore, g(2)(s) ¼ 2 for all values of s. The chaotic
fluctuations of intensity are, therefore, intrinsic to this type of light and cannot be
removed by spectral filtering, no matter how narrow the filter’s linewidth may be.
This is the fundamental difference between the coherent light from a laser and the
chaotic light from a thermal source; whereas the classical degree of second-order
coherence for thermal light is equal to 2, that for monochromatic laser light (i.e.,
single longitudinal mode, narrowband) is always equal to unity, as shown in the
following subsection.
Case 2: Coherent laser light

Under ideal circumstances, the amplitude of the light generated by a well-
stabilized, single-longitudinal-mode laser operating well above threshold exhibits
the following time dependence:
aðtÞ ¼ a0 cos½2pf0 t þ vðtÞ: ð9:12Þ
Here a0 is a constant amplitude, and v(t) is a slowly varying function of time,

representing the familiar phenomenon of phase drift in well-stabilized lasers.
Note that the cycle-averaged intensity of this waveform, a02 , is independent of the
time t. The waveform, nonetheless, has a finite bandwidth which may be obtained
by Fourier transforming its amplitude a(t).
For a0 ¼ 1.0 and the particular choice of v(t) shown in Figure 9.3(a), Figure 9.3(b)
shows a typical slice of a(t) over a short time interval. Clearly, the slow variations
180
cos[2 f0 t + (t)]
135
(a) 1.0 (b)
(t)
90
0.5
45
0 0
–45
–0.5
–90
–135 –1.0
–180
0 0. 2 5 0. 5 0. 7 5 1 0.500 0.525
time(1/Δf ) time(1/Δf)
0.5 180 (d)

(c) f(f)
| Â( f ) | 135
0.4
90
0.3 45
0
0.2 –45
–90
0.1
–135
0.0 –180
f0 – 20 Δf f0 f0 + 20 Δf f0 – 20 Δf f0 f0 + 20 Δf
0.010 (e) 1.002 (f)

| Îm| /I0
0.008 1.001
g(2)()
0.006
1.000
0.004
0.999
0.002
0.998
0.000
1 10 20 30 40 –½ 0 ½
m (1/Δf )
Figure 9.3 (a) Phase profile v(t) over the time interval t ¼ 0 to t ¼ T ¼ 1.0 in
units of 1/Df. Note that the range of variation of v is [p : p]. (b) Plot of the
function a(t) ¼ a0 cos[2p f0t þ v(t)] over the brief time interval [0.500, 0.525],
with a0 ¼ 1.0, f0 ¼ 500Df and v(t) as shown in (a). (c), (d) Amplitude and phase
profiles of Â( f), the Fourier transform of a(t), obtained numerically over the
entire time interval [0, 1]. The function Â( f) is truncated, with the values cov-
ering the range f0 ± 20Df retained. (e) Plot of |Îm|/I0, computed from the truncated
Â( f) using Eqs. (9.4b, 9.4c). (f) Computed plot of g(2)(s) obtained with the values
of |Îm|/I0 inserted into Eq. (9.7). Aside from minor fluctuations – caused by the
truncation of Â(f) – the degree of second-order coherence is equal to 1.0 for all
values of s.
of v(t) guarantee a stable amplitude for the waveform over its entire duration.
Figures 9.3(c,d) display the computed amplitude and phase of Â( f), the
Fourier transform of a(t), obtained numerically over the time interval [0, T ]; here
T ¼ 1/Df is the inverse of the frequency domain sampling interval, which is
conveniently chosen as D f ¼ 1.0 in arbitrary units. As usual, the spectrum is
sampled at discrete frequencies, fn ¼ f0 þ nD f ¼ (N0 þ n)D f, then truncated by
limiting its frequency content to the range M n M.
In Figure 9.3, f0 ¼ 500Df, and the truncated Â( f) is confined to the frequency range
f0 ± 20Df. The values of |Îm|/I0, computed from the truncated Â( f) in accordance with
Eqs. (9.4b) and (9.4c), are shown in Figure 9.3(e), while a plot of g(2)(s), obtained
from Eq. (9.7) using these values of |Îm|/I0, appears in Figure 9.3(f). Aside from
minor fluctuations – caused by the truncation of Â( f) – note that the degree of
second-order coherence, g(2)(s), is essentially equal to 1.0 for all values of s.
The Hanbury BrownTwiss Experiment

Although the original work of Hanbury Brown and Twiss was aimed at measuring
the diameter of Sirius and other astronomical objects, the essence of their idea can
be readily explained in terms of the procedure needed to measure the angular
distance between the constituents of a binary star. Figure 9.4 shows two inde-
pendent point sources Sa and Sb, whose angular separation, when observed from
the (far away) plane of the detectors D1 and D2, is h. For simplicity of analysis,
the detectors are positioned such that they are equi-distant from Sa, but receive the
light from Sb with a delay of sb ¼ hL / c, where L is the distance between D1 and
D2, and c is the speed of light in vacuum. Assuming the radiation from both
sources is narrowband and centered at the same frequency f0 ¼ N0D f, one may
D1 I1(t)
Sa L/2
Z

L/2
D2 I2(t)
Sb
Figure 9.4 The light from two independent point sources, Sa, Sb, is detected by
the photo-detectors, D1, D2, located far away from the sources. The radiation from
both sources is narrowband and centered at the same frequency f0. The ideal, point-
like detectors are separated from each other by an adjustable distance L in the same
direction as Sa is separated from Sb. Seen from the detectors’ plane, the angular
distance between Sa and Sb is h. Each detector produces an electrical signal pro-
portional to the cycle-averaged intensity of the corresponding incident light.
write the light amplitudes a1(t) and a2(t) arriving at the two detectors as follows:
NX
0 þM pffiffiffiffiffiffi pffiffiffiffiffiffi
a1 ðtÞ ¼ fAn D f cosð2pnD ft þ n Þ þ Bn Df cosð2pnDft þ vn Þg;
n¼ N0 M
ð9:13aÞ
NX
0 þM pffiffiffiffiffiffi pffiffiffiffiffiffi
a2 ðtÞ ¼ fAn Df cosð2pnDft þ n Þ þ Bn Df cos½2pnDf ðt þ sb Þ þ vn g:
n ¼ N0 M
ð9:13bÞ
Here the frequency component fn ¼ nDf arriving from Sa has the complex amplitude
pffiffiffiffiffiffi pffiffiffiffiffiffi
Ân ¼ An Df exp(in), while that from Sb has the amplitude B̂n ¼ Bn Df exp(ivn).
The source Sa, being equi-distant from the two detectors, makes equal contri-
butions to a1(t) and a2(t), whereas the contributions of Sb are shifted in time by
the relative delay sb.
Following the same steps that led from Eq. (9.1) to Eq. (9.4), we now deter-
mine the filtered (i.e., cycle-averaged) intensities I1(t) and I2(t) observed at the
detectors of Figure 9.4. We find
X
2M
I1 ðtÞ ¼ I10 þ Î1m cosð2pmD ft þ w Þ; ð9:14aÞ
1m
m¼1
X
2M
I2 ðtÞ ¼ I20 þ Î2m cosð2pmD ft þ w Þ: ð9:14bÞ
2m
m¼1
In the above equations

NX
0 þM
I10 ¼ 12 ½An 2 þ Bn 2 þ 2An Bn cosðvn n ÞD f ; ð9:14cÞ

n ¼ N0 M
X
N0 þMm
Î1m ¼ Î1m expðiw1m Þ ¼ ðÂ*n þ B̂*n ÞðÂnþm þ B̂nþm ÞD f ; ð9:14dÞ
n ¼ N0 M
NX
0 þM
I20 ¼ 12 ½A2n þ B2n þ 2An Bn cosð2pnD f sb þ vn n ÞD f ; ð9:14eÞ

n ¼ N0 M
X
N0 þMm
Î2m ¼ Î2m expðiw2m Þ ¼ ½ Â*n þ B̂*n expði2pnDf sb ÞfÂnþm
n ¼ N0 M
þ B̂nþm exp½i2pðn þ mÞD f sb gD f : ð9:14fÞ

The normalized cross-correlation function between I1(t) and I2(t) is thus

given by
ð2Þ hI1 ðtÞI2 ðt þ sÞi 1X 2M

g12 ðsÞ ¼ ¼1þ *
Re½ðÎ1m =I10 ÞðÎ2m =I20 Þ expði2pmDf sÞ:
hI1 ðtÞihI2 ðtÞi 2 m¼1
ð9:15Þ
So far we have not made any assumptions about the nature of Sa and Sb, beyond
the fact that they are distant point sources with narrowband spectra centered at the
same frequency f0 ¼ N0Df. Equations (9.13)–(9.15) are, therefore, valid for any
type of light, so long as the sampled spectral amplitude and phase profiles, {(An,
n)} of Sa and {(Bn, vn)} of Sb, are dense enough to provide proper representations
of the spectral density functions Â( f ) and B̂ð f Þ. If the light beams emerging from
the independent sources Sa and Sb happen to be chaotic (e.g., thermal), then the
phase angles {n} and {vn} will be random and uncorrelated, thus leading to
NX
0 þM
Z1
1
I10 I20 12 ðA2n þ B2n ÞDf ! Âð f Þ2 þB̂ð f Þ2 d f ; ð9:16Þ
n ¼ N0 M
2
0
þMm
N0 X
*
Î1m Î2m A2n A2nþm þ B2n B2nþm expði2pmDf sb Þ
n ¼ N0 M

þ A2n B2nþm exp½i2pðn þ mÞD f sb þ B2n A2nþm expði2pnD f sb Þ ðD f Þ2 :
ð9:17Þ
Next we relate the various terms appearing in Eq. (9.17) to the first-order degrees
of coherence of Sa and Sb, defined as follows:
P 2
ð1Þ An expði2pnD f sÞD f
ga ðsÞ ¼ P 2 ð9:18aÞ
An D f ;
P
ð1Þ Bn2 expði2pnDf sÞD f
gb ðsÞ ¼ P 2 : ð9:18bÞ
Bn D f
Straightforward calculations similar to those that led to Eq. (9.11) may now be
used to determine the expanded forms of |ga(1)(s)|2, |gb(1)(s þ sb)|2, and ga(1)(s)
gb*(1)(sþsb), which turn out to contain the various terms that appear in Eq. (9.17).
*
One must then substitute for (Î1m /I10)(Î2m/I20) in Eq. (9.15) from Eqs. (9.16) and
(9.17), and proceed to replace the resulting expressions with their equivalents in
terms of the aforementioned degrees of first-order coherence. This would lead,

without further approximations, to the following expression for the cross-
correlation function between I1(t) and I2(t) of Figure 9.4, in the special circumstance
that Sa and Sb are both chaotic and independent:
R1 2 R1 2 2
ð1Þ ð1Þ
½ Âðf Þ d f ga ðsÞ þ ½ B̂ðf Þ d f gb ðs þ sb Þ

g12 ðsÞ 1 þ 0 :
ð2Þ 0
R1 2 R1 2 ð9:19Þ
Âð f Þ d f þ B̂ð f Þ d f

0 0
As an example, consider the case of two independent, but otherwise identical,

chaotic point sources with angular separation h ¼ 1.0 lrad. Both sources have the
same Gaussian line-shape, centered at f0 ¼ 1012Df, with a FWHM line width
w ¼ 103Df. Figure 9.5 shows computed plots of g12 (2)
(s) obtained for several values
of L in accordance with Eq. (9.15). In the case of L ¼ 0, depicted in Figure 9.5(a),
sb ¼ 0, and the cross-correlation function reduces to that of a single, equivalent
light source. This may be understood by setting sb ¼ 0 in Eq. (9.19) and verifying
its reduction to Eq. (9.9). In other words, since ga(1)(s) and gb(1)(s) are, respectively,
the Fourier transforms of the normalized power spectral density functions

Âð f Þ2 and B̂ð f Þ2 , the weighted sum on the right-hand side of Eq. (9.19) is
equal to the degree of first-order coherence of a new source whose power spectral
2 2
density function is Âð f Þ þ B̂ð f Þ .
As L increases, the relative delay sb begins to affect the cross-correlation
function. This is seen in Figures 9.5(b)–(d), which correspond, respectively,
to L ¼ 75 m, 150 m, and 300 m. Once again, these results can be explained
(2)
with reference to Eq. (9.19), which expresses g12 (s) in terms of a super-
position of the (complex) degrees of first-order coherence, ga(1)(s) and
gb(1)(sþsb). Note that, according to Eq. (9.8), the envelope of g(1)(s), the
Fourier transform of the line-shape, is modulated by exp(i2p f0s), where f0 is
the central frequency of the light source. Thus, when sb is much less than the
width of the envelope, gb(1)(sþs b) essentially equals gb(1)(s)exp(i2p f0sb). At
L ¼ 150 m, for instance, 2p f0sb ¼ p, causing the terms containing ga(1) and gb(1)
in Eq. (9.19) to cancel out; see Figure 9.5(c). At L ¼ 300 m, however, the
phase shift is 2p, thus restoring the full height of the cross-correlation
function; see Figure 9.5(d).
If L is made extremely large, say L ¼ 5 · 1012 m, then g(1) (1)
a (s) and gb (sþsb) no
longer overlap, leading to the cross-correlation function depicted in Figure 9.5(e).
b (s þ sb) in
The fact that, in the present example, the coefficients of ga(1)(s) and g(1)
Eq. (9.19) are both equal to ½, suffices to explain the reduction of the peak value
(2)
of g12 (s) from 2.0 in Figure 9.5(a) to 1.25 in Figure 9.5(e).
2.0 2.0
(a) (2)
g12()
1.5 1.8
1.6 4 ln 2/(w)
1.0
1.4
0.5 1.2
L = 0, = 1.0 μrad
0.0 1.0
–0.1 0 0.1 –0.002 0 0.002
2.0 2.0
(b)
1.5 1.8
1.6
1.0
1.4
0.5 1.2
L = 75 m, = 1.0 μrad
0.0 1.0
–0.1 0 0.1 –0.002 0 0.002
2.0 2.0
(c)
1.5 1.8
1.6
1.0
1.4
0.5 1.2
L = 150 m, = 1.0 μrad
0.0 1.0
–0.1 0 0.1 –0.002 0 0.002
2.0 2.0
(d)
1.8
1.5
1.6
1.0
1.4
0.5 1.2
L = 300 m, = 1.0 μrad
0.0 1.0
–0.1 0 0.1 –0.002 0 0.002
2.0 1.25
(e)
1.20
1.5
1.15
1.0 1.10
1.05
0.5 L = 5 × 1012 m 1.00
= 1.0 μrad
0.0 0.95
–0.1 0 0.1 –0.015 0
(1/Δf ) (1/Δf )
(2)
Figure 9.5 Computed plots of g12 (s) for a pair of chaotic point sources, Sa and
Sb, having angular separation h ¼ 1.0 lrad. Both sources have a Gaussian line-
shape centered at f0 ¼ 1012Df, with FWHM line width w ¼ 103Df; in these cal-
culations M ¼ 2500. The distance between the detectors is (a) L ¼ 0, (b) L ¼ 75 m,
(c) L ¼ 150 m, (d) L ¼ 300 m, and (e) L ¼ 5 · 1012 m. In each case the close-up of
(2)
the central part of g12 (s) is shown on the right-hand side.
Concluding remarks
Practical photodetectors may have a narrower bandwidth than is required for
producing an ideal cycle-averaged intensity I(t) in a given application. In other
words, the low-pass filtering mentioned in going from Eq. (9.2) to Eq. (9.3) could
influence the low-frequency terms that survive the filtering. The transfer function
of the detector (including all electronic circuitry) must, therefore, be included in
Eq. (9.3) and all subsequent equations. For practical determinations of intensity
fluctuations, of course, the effects of electronic filtering must be taken into
account. However, as far as the fundamental principles discussed in this chapter
are concerned, the consequences of such filtering are irrelevant, and the detection
circuit’s transfer function may safely be ignored.
Another practical concern revolves around the question of noise in photo
detection. The output signal from a photodetector is, in general, accompanied by
several types of noise, such as shot noise, thermal noise, and the noise associated
with the photo-multiplication process. Accurate measurement of intensity cor-
relations and fluctuations requires a careful analysis of all relevant sources of
noise, elimination or minimization of undesirable signals, and collection of a
sufficient number of photons to ensure the adequacy of the available signal-to-
noise ratio. In this context it must also be mentioned that, when measuring the
intensity autocorrelation hI(t) I(t þ s)i at a fixed point in space, it is advantageous
to use a 50/50 beam-splitter in conjunction with two identical photodetectors, as
shown in Figure 9.6. Whereas the noise or other spurious signals from a single
detector could exhibit temporal correlations, a pair of well-isolated detectors is
unlikely to suffer from such complications. The splitting of the beam, of course,
D1
Source I1(t)
I1(t) I2(t )
D2
I2(t)
Delay
Figure 9.6 The degree of second order coherence g(2)(s) of a beam of light
may be determined by two identical photodetectors D1 and D2, placed sym-
metrically with respect to the output ports of a 50/50 beam-splitter. According to
the classical optical theory, the intensity fluctuations at the two detectors are
identical with those of the light arriving at the splitter. The use of two detectors
(instead of one) is thus dictated by the need to mitigate the temporal correlations
of the noise (or other spurious signals).
will halve the signal strength at each detector, but, according to the classical
optical theory, it should not disturb the intensity fluctuations otherwise.
A fundamental issue raised in the wake of the Hanbury Brown–Twiss experi-
ment concerned the quantum nature of light and its role in determining the
measured intensity fluctuations and correlations of the various types of radiation. In
particular, it was pointed out that a single photon leaving the source in Figure 9.6,
could be picked up by either D1 or D2, but not by both, whereas the classical theory
allowed the beam-splitter to divide the photon’s energy between the two receivers.
Attempts to answer this and many related questions eventually ushered in the
modern era of quantum optics.2,3 The results obtained in the present chapter for
classical sources of light have been found to retain their validity under a quantum
mechanical treatment.3 In the meantime, however, several types of non-classical
light have been discovered whose proper treatment requires the full machinery of
the quantum theory of radiation and detection. A striking example of quantum-
optical phenomena is anti-bunching, where the degree of second-order coherence
g(2)(0) for certain non-classical sources is known to be below unity.3,4 In fact, the
entire range of values between 0.0 and 1.0 is accessible to g(2)(0) in quantum optics.
This, of course, is an impossibility in the classical theory, where Eq. (9.7) dictates
that g(2)(0) 1.

1 J. W. Goodman, Statistical Optics, Wiley, New York, 1985.
University Press, London, 1995.
3 R. Loudon, The Quantum Theory of Light, third edition, Clarendon Press, Oxford,
2000.
4 R. Hanbury Brown and R. Q. Twiss, Correlation between photons in two coherent
beams of light, Nature 177, 27–29 (1956).
5 R. Hanbury Brown and R. Q. Twiss, The question of correlation between photons in
coherent light rays, Nature 178, 1447–1448 (1956).
6 R. Hanbury Brown and R. Q. Twiss, Interferometry of the intensity fluctuations in
light. I. Basic theory: the correlation between photons in coherent beams of radiation,
Proc. Roy. Soc. London A, 242, 300–324 (1957).
7 R. Hanbury Brown and R. Q. Twiss, Interferometry of the intensity fluctuations in
light. II. An experimental test of the theory for partially coherent light, Proc. Roy. Soc.
London A, 243, 291–319 (1958).
8 R. Hanbury Brown, The Intensity Interferometer, Taylor and Francis, London, 1974.
10
What in the world are surface plasmons?†
Despite its scary name, a surface plasmon is simply an inhomogeneous plane-

wave solution to Maxwell’s equations. Typically, a medium with a large but
negative dielectric constant e is a good host for surface plasmons. Because in an
isotropic medium having refractive index n and absorption coefficient j we have
e ¼ (n þ ij)2, whenever j n the above criterion, large but negative e, is
approximately satisfied; as a result, most common metals such as aluminum,
gold, and silver can exhibit resonant absorption by surface plasmon excitation. In
order to excite, within a metal, a plane wave that has a large enough amplitude to
carry away a significant fraction of the incident optical energy, one must create a
situation whereby the metal is “forced” to accept such a wave; otherwise, as
normally occurs, the wave within the metal ends up having a small amplitude,
causing nearly all of the incident energy to be reflected, diffracted, or scattered
from the metallic surface, depending upon the condition of that surface.
In this chapter several practical situations in which surface plasmons play
a role will be presented. We begin by describing the results of an experiment
that can be readily set up in any optics laboratory, and we give an explanation
of the observed phenomenon by scrutinizing the well-known Fresnel’s reflec-
tion formula at a metal-to-air interface. We then describe other, slightly more
complicated, situations involving the excitation of surface plasmons, in an
attempt to convey to the reader the generality of the phenomenon and its various
manifestations.
Surface plasmons in a thin metallic film

Perhaps the simplest arrangement in which one may observe surface plasmons
is that shown schematically in Figure 10.1. A thin metal film, coated on the flat
†
The coauthor of this chapter is Lifeng Li, now at the Tsinghua University in China.
128
Incident beam Reflected beam
Glass hemisphere
u
Thin metal film
Figure 10.1 Schematic diagram showing a monochromatic plane wave inci-

dent on a thin metal film through a hemispherical glass substrate. When the film
is sufficiently thin, at a specific incidence angle h a surface plasmon is excited
within the metal layer, causing a substantial fraction of the incident beam’s
energy to be absorbed and converted to heat within the metal layer.
face of a glass hemisphere, is illuminated at oblique incidence through the

hemisphere. In this example the glass will be assumed to have refractive index
1.5, and the metal film will be assumed to be aluminum, although most common
metals coated on just about any type of glass will exhibit a similar behavior. A
plane monochromatic beam of red HeNe light is directed at the glass–metal
interface, and its reflection is monitored as a function of the angle of incidence h.
Figure 10.2(a) shows computed plots of the reflection coefficients jrpj and jrsj
versus h for the case of a very thin (d ¼ 5 nm) aluminum film. At the critical angle
of total internal reflection (TIR) for a glass–air interface, hcrit ¼ 41.8 , the reflection
coefficients show a sudden rise, but jrpj drops sharply above hcrit, attaining a
minimum at h 45 . This sharp reduction in the reflectivity of p-polarized light is
due to the excitation of a surface plasmon in the aluminum layer.
Figure 10.2(b) shows plots of the magnitude of the Poynting vector S through
the thickness of the aluminum layer for both p- and s-polarized light at the
incidence angle h ¼ 45 . Note that for the s-light, approximately 30% of the
incident optical power enters the film, which then proceeds to be absorbed (rather
uniformly) through the film thickness. With the p-light, however, the fraction of
the incident power absorbed by the film is much higher (close to 90%). Evidently,
at this particular angle of incidence the p-light has been able to excite a very
strong wave in the metal layer.
Similar calculations can be done for other thicknesses of the aluminum layer;
the results shown in Figure 10.3 correspond to a film thickness d ¼ 10 nm. The
minimum reflectivity now occurs at h ¼ 42.95 , and the percentage of p-light
absorbed by the film has climbed to over 98%. If we continue to increase the film
thickness, however, the effect begins to decrease (and eventually to disappear), as
demonstrated by the plots of Figure 10.4, which correspond to d ¼ 20 nm. In fact,
(a) (b) = 45°

1.0 1.0
|rs|
0.8 0.8
Amplitude Reflection Coefficient
Magnitude of Poynting Vector

Sp
|rp|
0.6 0.6
0.4 0.4
Ss
0.2 0.2
0.0 0.0
0 15 30 45 60 75 90 0 1 2 3 4 5
(degrees) Z (nm)
Figure 10.2 (a) Computed plots of amplitude reflection coefficients for the p-
and s-components of polarization versus the angle of incidence h, for the
monochromatic plane wave (k ¼ 633 nm) incident at the interface between glass
and a thin aluminum layer (d ¼ 5 nm) shown in Figure 10.1. The dip in jrpj at
h 45 is caused by the excitation of a surface plasmon in the aluminum film.
(b) Plots of the magnitude of the Poynting vector S against the depth z within the
aluminum layer, at h ¼ 45 . Note that approximately 90% of the incident power
of the p-polarized light enters the aluminum film and is absorbed fairly uni-
formly within the film’s thickness. In contrast, only 30% of the s-polarized light
is absorbed by the film.
aside from the weak, plasmon-related feature in the vicinity of hcrit, the plots of
jrpj and jrsj in Figure 10.4(a) already resemble those for a very thick aluminum
film (i.e., one for which d skin depth). It is thus obvious that the lower
interface, between aluminum and air, is responsible for the excitation of surface
plasmons: increasing the film thickness prevents the electromagnetic field from
reaching the aluminum–air interface, thus suppressing the excitation of the
plasma wave. Also note in Figure 10.4(b) that the slope of Ss is greatest near the
glass–aluminum interface, and the flux of optical energy contained in the s-
polarized beam decays exponentially as it moves away from this interface
towards the aluminum–air interface. In contrast, the slope of Sp is greatest at the
aluminum–air interface, indicating that most of the energy is deposited at that
site. This is yet another indication that the aluminum–air interface is responsible
for the excitation of surface plasmons in the system of Figure 10.1.
(a) (b) = 42.95°

1.0 1.0
|rs|
0.8
0.8

Sp
|rp|
0.6
0.6
0.4
0.4
0.2 Ss
0.2
0.0
0.0
0 15 30 45 60 75 90 0 2 4 6 8 10
(degrees) Z (nm)
Figure 10.3 Same as Figure 10.2, except for the thickness of the aluminum film,
which is now 10 nm. The resonant absorption in this case occurs at h ¼ 42.95 , and
the fraction of p-polarized light absorbed by the aluminum layer is over 98%.
A simple explanation based on Fresnel’s reflection coefficients

The Fresnel reflection coefficients at the interface between air and metal provide
a good starting point for an explanation of the nature of surface plasmons and the
conditions under which they occur. Consider the case of a polished metal surface
of dielectric constant e, upon which a monochromatic plane wave of wavelength
k0 is incident from air, at the oblique incidence angle of h. The k-vector of the
incident beam (in air) has magnitude k0 ¼ 2p/k0, and its projections parallel and
perpendicular to the air–metal interface are denoted by kk and k?. The complex
Fresnel reflection coefficients rp and rs for p- and s-polarized light are written
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e ðkk =k0 Þ2 ek? =k0
rp ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ð10:1Þ
e ðkk =k0 Þ2 þ ek? =k0
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e ðkk =k0 Þ2
k? =k0
rs ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð10:2Þ
k? =k0 þ e ðkk =k0 Þ2
(a) (b)
= 42.41°
1.0 |rs| 1.0
|rp|
0.8 0.8

0.6 0.6
Sp
0.4 0.4
0.2 0.2
Ss
0.0 0.0
0 15 30 45 60 75 90 0 5 10 15 20
(degrees) Z (nm)
Figure 10.4 Same as Figures 10.2 and 10.3, except for the thickness of the
aluminum film, which is now 20 nm. The resonant absorption in this case occurs
at h ¼ 42.41 , and the fraction of p-polarized light absorbed within the aluminum
layer is just over 60%.
The denominator
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiof
ffi the expression for rp in Eq. (10.1) goes to zero at
kk =k0 ¼ e=ð1 þ eÞ, indicating that rp has a pole at this point. No such pole,
however, exists for rs. In the case of aluminum, n þ ij ¼ 1.38 þ 7.6i at
k0 ¼ 633 nm, yielding e ¼ 55.86 þ 20.98i; this results in a value 1.008 þ 0.003i
for the pole of rp. Under ordinary circumstances, when the metal surface is
illuminated in air at an oblique angle h we have kk/k0 ¼ sinh, which is less than
unity and, therefore, far from the pole. However, if evanescent waves are
somehow created at an air–aluminum interface, then kk/k0 can exceed unity and,
in the neighborhood of the pole, the reflectivity rp at that interface will approach
infinity. This means that an evanescent p-polarized plane wave of very small
amplitude impinging at the metal surface can excite a very strong plane wave
within the metal. This plane wave, of course, is the surface plasmon, which is
capable of absorbing a good fraction of the energy from the incident beam and
converting it to heat within the metallic medium.
In light of the above arguments it is not difficult to see that, in the system of
Figure 10.1, the creation of evanescent waves with kk/k0 1 at the aluminum–air
interface is responsible for the sharp decline in rp at angles slightly greater than
the critical TIR angle. Since the expression for rs in Eq. (10.2) does not admit a
pole, no such behavior could be expected from the s-polarized light.
Attenuated total internal reflection (ATIR)

Another setup in which the excitation of surface plasmons is readily observed is
shown in Figure 10.5. (Some results from this type of experiment are also
described in Chapter 27, “Some quirks of total internal reflection”.) Here the
presence of an air gap between the glass hemisphere and the metal plate guar-
antees the creation of evanescent waves whereas in the setup of Figure 10.1 the
metal layer had to be sufficiently thin to provide access for the electromagnetic
waves to the metal–air interface.
For the system of Figure 10.5, computed reflection coefficients versus the gap-
width are plotted in Figure 10.6 for several angles of incidence in the vicinity of
hcrit. Because the variations of rs were imperceptible within the chosen range of
incidence angles, 41 43 , it was deemed pointless to label the various coin-
ciding rs curves. In contrast, rp was very sensitive to changes in h, and the various
rp curves in Figure 10.6 are clearly labeled to indicate this dependence. A dip in
the plots of rp versus the gap-width begins to appear at angles of incidence just
below hcrit; the dip becomes more pronounced with an increasing h until the
minimum reflectivity actually reaches zero at h ¼ 41.95 . The dip then decreases
with further increases in h and, by the time h reaches 43 , it has practically
disappeared.
As before we observe in the plots of Figure 10.6 the salient features of
absorption by surface plasmon excitation, namely, a p-polarized incident beam,
the existence of evanescent waves at an air–metal interface, and an angle of
Incident beam Reflected beam
Glass hemisphere
u
Air gap
Metal plate
Figure 10.5 Schematic diagram showing a monochromatic plane wave at

oblique incidence on the flat surface of a glass hemisphere. When the air gap
separating the hemisphere and the polished metal surface is sufficiently thin,
and at a specific angle of incidence h, a substantial fraction of the incident
beam will be coupled into a surface plasmon and thus absorbed by the metal
plate.
(a) |rs| (b) |rs|

1.0 1.0
|rp| 43.0
42.6
42.8

0.8 0.8 |rp|
42.4
= 41.0 deg.
0.6 0.6
41.2 42.2
41.4
0.4 0.4
41.6
0.2 41.8 0.2

= 42.0 deg.
41.9
41.95
0.0 0.0
0 300 600 900 1200 1500 0 300 600 900 1200 1500
Air Gap (nm) Air Gap (nm)
Figure 10.6 Computed plots of amplitude reflection coefficients versus the

width of the air gap for the experiment depicted in Figure 10.5. Each curve
represents a specific incidence angle h in the range (41 , 43 ). The reflection
coefficient jrsj for the s-polarized light does not change very much in this narrow
range of incident angles; thus all the jrsj curves coincide at this scale. The dip in
jrpj begins to appear at angles of incidence just below the critical TIR angle
(hcrit ¼ 41.81 ); it becomes a maximum at h ¼ 41.95 and then decreases again to
insignificance at h ¼ 43 . Note also that the gap-width at which jrpj reaches a
minimum varies with the angle of incidence. (a) h-values from 41.0 to 41.95 ;
(b) h-values from 42.0 to 43.0 .
incidence in the vicinity of the critical TIR angle for the glass–air interface, that
is, when kk/k0 1.
Excitation of surface plasmons in metalized diffraction gratings

A third experiment in which surface plasmons may be observed involves
the reflection of light from metallized diffraction gratings. Here, as shown in
Figure 10.7, the incident beam excites one or more propagating diffracted
orders but also creates non-propagating evanescent waves near the surface.
Whenever one of these evanescent waves happens to have the kk of a surface
plasmon, the conditions for its excitation are ripe, and a good fraction of the
optical energy will be coupled into the grating medium. As might be guessed
Incident beam Diffracted orders
Metal grating
Figure 10.7 A monochromatic plane wave incident on a metal grating creates

multiple diffracted orders within the reflected beam. At certain angles of inci-
dence, when the polarization of the beam happens to have a component per-
pendicular to the grooves, surface plasmons are excited within the grating. These
plasmons, which can convert a good fraction of the incident optical power into
heat, cause a sudden and substantial drop in the diffraction efficiencies of the
various orders.
Incident beam
Es
Ep
Objective
lens Metal grating
Figure 10.8 A good method of observing surface plasmons in practice

involves the reflection of a focused beam of light from the grooved surface of a
metal grating. Because the focused cone contains rays within a wide range of
angles of incidence, upon reflection from the grating dark bands will appear in
the exit pupil of the lens corresponding to those rays that have succeeded in
exciting the surface plasmons. A camera, set up to photograph the reflected beam
at the exit pupil of the lens, will not only reveal the narrow bands corresponding
to surface plasmons but also show the superposition of the various diffracted
orders captured by the lens. To observe the surface plasmon bands, one must
allow the incident beam to have a component of polarization perpendicular to the
grooves of the grating. In this figure, Es is in the direction that will excite the
plasmons.
from the cases discussed in the preceding examples, the range of parameters
over which surface plasmon excitation can be expected is very narrow and,
therefore, the angle of incidence at which surface plasmons are excited must be
sharply defined.
If one directs a focused beam onto a metal grating, as shown in Figure 10.8,
then a wide angular spectrum will be present in the beam, and some of the
a b
Figure 10.9 Photographs showing the intensity distribution at the exit pupil
of a 0.8NA microscope objective lens, through which a collimated beam
of laser light (k ¼ 633 nm) is focused on a gold-coated diffraction grating;
(n, k)gold ¼ (0.13, 3.16). The grooves of the grating are oriented along the Y-axis,
the grating period is 1.6 lm, and the grooves, which have a trapezoidal cross-
section, are 0.5 lm wide at the top and 70 nm deep. The direction of the linear
polarization of the incident beam is parallel to the grooves in (a) and perpen-
dicular to the grooves in (b). (From Ronald E. Gerber, Ph.D. dissertation, Optical
Sciences Center, University of Arizona, Tucson.)
rays will be strongly absorbed. A photograph of the reflected beam at the exit
pupil of the lens will show one or more dark lines corresponding to the absorption
of surface plasmons within the grating. Figure 10.9 shows a typical set of results
obtained in an experiment of this type. When the polarization is parallel to the
grooves, as is the case in Figure 10.9(a), there are no surface plasmon bands.
However, with the polarization vector perpendicular to the grooves, surface
plasmons are clearly excited, as shown in Figure 10.9(b). Results of theoretical
calculations confirming these results are shown in Figures 10.10 and 10.11. In these
calculations, Maxwell’s equations were solved for about 10 000 plane waves
impinging on the metal grating at various angles. These results were then combined
to represent the focused cone of light created by a 0.8NA objective lens.
In the case of Figure 10.10, where the incident polarization vector was parallel to
the grooves, no plasmons were observed. We did the calculations for three different
positions of the focused spot over the grooves, however, to show the so-called
baseball pattern that results from superposition of the various diffracted orders.
Frames (a), (b), and (c) correspond respectively to a beam focused on one groove
edge, on the middle of a groove, and on an opposite groove edge. The phase
differences between various diffracted orders create constructive and destructive
interference among these various orders in their regions of mutual overlap, thus
a
Figure 10.10 Computed plots of intensity distribution at the exit pupil of a 0.8NA
objective lens through which a uniform plane wave is focused on a diffraction
grating. The grooves are oriented at 45 relative to the X-axis. The parameters of the
grating are the same as those used in the experiment (see the caption to Figure 10.9).
The various diffraction orders are clearly visible in these so-called “baseball”
patterns. The incident linear polarization is parallel to the grooves, thus explaining
the absence of plasmon-related dark bands in these pictures. The center of the
focused spot is (a) on a groove edge, (b) in the middle of a groove, and (c) on the
opposite groove edge.
giving rise to black and white areas. When the polarization is perpendicular to the
grooves, the pattern in Figure 10.11 is obtained. For this computation the position
of the focused spot on the grating was on a grooved edge similar to that shown in
Figure 10.10(c). The dark bands of Figure 10.11, predicted by this theoretical
calculation to arise from surface plasmon excitation, agree quite well with the
experimental results of Figure 10.9(b).
Figure 10.11 Logarithmic plot of computed intensity distribution at the exit

pupil of a 0.8NA objective lens. The simulation parameters are the same as those
used to obtain Figure 10.10(c), with the exception of the direction of incident
polarization, which is perpendicular to the grooves. The grooves are oriented at
45 to the X-axis, and the focused spot is centered on the edge of a groove. The
absorption bands caused by the excitation of surface plasmons are identical to
those observed experimentally in Figure 10.9(b).

1 R. E. Gerber, Lifeng Li, and M. Mansuripur, Effects of surface plasmon excitations
on the irradiance pattern of the return beam in optical disk data storage, Appl. Opt. 34,
4929–4936 (1995).
2 R. W. Wood, On a remarkable case of uneven distribution of light in a diffraction
grating spectrum, Phil. Mag. 4, 396–402 (1902).
4 R. H. Ritchie, Plasma losses by fast electrons in thin films, Phys. Rev. 106, 874–881
(1957).
5 For the computations leading to Figures 10.10 and 10.11, reflection coefficients
of the grating were first computed by a vector diffraction program developed by
Lifeng Li. These coefficients were subsequently imported to DIFFRACT, where
they were combined to represent the effects of a focused beam.
6 J. C. Quail, J. G. Rako, and H. J. Simpson, Long-range surface plasmon modes in
silver and aluminum, Opt. Lett. 8, 377 (1983).
7 D. Sarid, Long-range surface-plasma waves on very thin metal films, Phys. Rev. Lett.
47, 1927 (1981).
8 A. D. Boardman, ed., Electromagnetic Surface Modes, Wiley, New York, 1982.
9 A. E. Craig, A. Olson, and D. Sarid, Experimental observation of the long-range
surface-plasmon polariton, Opt. Lett. 8, 380 (1983).
11
Surface plasmon polaritons on metallic surfaces†
Recent advances in nano-fabrication have enabled a host of nano-photonic

experiments involving subwavelength metallic structures.1,2,3,4,5 This flurry of
activity has, in turn, reawakened interest in surface plasmon polaritons (SPPs)
and inspired theoretical research in this area. Although the fundamental proper-
ties of SPPs have been known for nearly five decades,6,7 there remain certain
subtle issues that could benefit from further critical analysis. The goal of the
present chapter is to use numerical simulations to verify the detailed structure of
long-range SPPs. We present field distribution profiles and energy flow patterns
aimed at promoting a physical understanding of SPP generation and propagation
in ways that mathematical equations alone cannot convey. Thus, beginning with
Maxwell’s equations in the next section, we determine the electromagnetic eigen-
modes confined to flat metallo-dielectric interfaces. The behavior of these modes
will then be examined through computer simulations that show the excitation of
SPPs in certain practical settings. Our numerical computations are based on the
Finite Difference Time Domain (FDTD) method.8
General Formulation
With reference to Figure 11.1, in a homogeneous medium of dielectric constant e
the propagation vector is k ¼ k0 ðry^y þ rz^zÞ, where k0 ¼ 2p/k0 and r2y þ r2z ¼ e.
qffiffiffiffiffiffiffiffiffiffiffiffiffi
In general, rz ¼ e r2y , with both plus and minus signs admissible. In each
of the semi-infinite cladding media, however, only one value of rz is allowed,
corresponding to the solution that approaches zero when z ! –1. This is why
rz1 of the upper cladding in Figure 11.1 is chosen to have a plus sign, whereas
that of the lower cladding has a minus sign. (rz1, rz2 have positive imaginary
parts.)
†
This chapter is co-authored with Armis R. Zakharian, now with Corning Corp., and Jerome V. Moloney of
the University of Arizona.
139
z
Ez
Hx Ey
k1 = k0( y y + z1z)
1
w k2 = k0( y y ± z z)
2 y
k3 = k0( y y – z1z)
1
Figure 11.1 Slab of thickness w and dielectric constant e2, sandwiched

between two homogeneous, semi-infinite media of dielectric constant e1. An
electromagnetic mode of the structure consists of two (generally inhomo-
geneous) plane-waves within the slab and a single (inhomogeneous) plane-
wave in each of the surrounding media. Continuity of the fields at z ¼ – 12w
requires that ky ¼ k0ry be the same for all these plane-waves. Although the
polarization state of the mode can, in general, be either TE or TM, only
TM modes are considered in this chapter. The magnetic field, therefore, has a
single component Hx along the x-axis, while the electric field has two com-
ponents (Ey, Ez) in the yz-plane. Throughout the chapter k0 ¼ 650 nm and the
metallic medium is silver, having e ¼ 19.6224 þ 0.443i (corresponding
to n þ ik ¼ 0.05 þ 4.43i).
The E- and H-fields of each plane-wave are related through the Maxwell
equation r · H ¼ @D/@t (where D ¼ e0e E) as follows:
Hx ðy; z; tÞ ¼ H0 expfi ½k0 ðry y rz zÞ xtg; ð11:1aÞ
Ey ðy; z; tÞ ¼ ðZ0 rz =eÞ Hx ðy; z; tÞ; ð11:1bÞ
Ez ðy; z; tÞ ¼ ðZ0 ry =eÞ Hx ðy; z; tÞ: ð11:1cÞ

pffiffiffiffiffiffiffiffiffiffiffi
Here H0 is the (complex) amplitude of the magnetic field, Z 0 ¼ l0 =e0
pffiffiffiffiffiffiffiffiffi
377X is the impedance of the free space, and x ¼ k0 c ¼ k0 = l0 e0 is the tem-
poral frequency of the light wave. The time-dependence factor exp(ixt) will be
omitted in the following discussion. We confine our attention to symmetric
structures where both cladding media have the same dielectric constant e1. In
general, the modal fields are either odd or even with respect to the y-axis,
allowing one to express the H-field of a given mode as follows (– signs for even
and odd modes, respectively):

8
< H1 expðik0 ry yÞ exp½ik0 rz1 ðz 2 wÞ;
> z þ 12 w;
1
Hx ðy; zÞ ¼ H2 expðik0 ry yÞ ½expðik0 rz2 zÞ expðik0 rz2 zÞ; jzj 12 w;

>
:
H1 expðik0 ry yÞ exp½ik0 rz1 ðz þ 12 wÞ; z 12 w:
ð11:2Þ
The corresponding E-field for each mode can be found from Eqs. (11.1).
Continuity of Hx and Ey at the z ¼ – 12w boundaries yields
H2 ½expðik0 rz2 w=2Þ expðik0 rz2 w=2Þ ¼ H1 ; ð11:3aÞ
Z0 H2 ðrz2 =e2 Þ½expðik0 rz2 w=2Þ expðik0 rz2 w=2Þ ¼ Z0 H1 ðrz1 =e1 Þ: ð11:3bÞ
Substituting for H1 from Eq. (11.3a) into Eq. (11.3b), rearranging the terms, and
expressing rz1 and rz2 in terms of ry, we find:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e1 e2 r2y e2 e1 r2y qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi expðik0 e2 r2y wÞ ¼ 1: ð11:4Þ
e1 e2 r2y þ e2 e1 r2y
This transcendental equation in ry ¼ r(r) y þ iry is the characteristic equation of

(i)
the wave-guide depicted in Figure 11.1. Each solution ry of Eq. (11.4) cor-
responds to a particular mode of the waveguide; when the plus (minus) sign is
used on the right-hand side of Eq. (11.4), the solution represents an even (odd)
mode. Since we are presently interested in modes that propagate from left to right
in Figure 11.1, the imaginary part of ry must be non-negative (i.e., r(i) y 0),
otherwise the mode will grow exponentially as y ! 1. Also, when computing the
complex square roots in Eq. (11.4), one must always choose the root which has a
positive imaginary part.
Note that the coefficient multiplying the complex exponential on the left-hand
side of Eq. (11.4) is the Fresnel reflection coefficient rp for a p-polarized (TM)
plane-wave at the interface between media of dielectric constants e1 and e2. The
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Fresnel coefficient has a singularity (pole) at ry ¼ e1 e2 =ðe1 þ e2 Þ, where its
denominator vanishes. The function on the left-hand side of Eq. (11.4) thus varies
rapidly in the vicinity of this pole, where some of the solutions of the equation are
to be found. In particular, when w ! 1, the complex exponential approaches
zero and the pole itself becomes a solution. This can be seen most readily with
reference to Eqs. (11.3); by allowing exp (þik0rz2w/2) ! 0 and substituting
forqH 1 from ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi into Eq. (11.3b), we find rz2 /e2 ¼ rz1/e1, namely,
ffi Eq.q(11.3a)
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e1 e2 r2y þ e2 e1 r2y ¼ 0.
Table 11.1. First few solutions of Eq. (11.4) for a 50 nm-thick

silver slab (k0 ¼ 650 nm, e1 ¼ 1.0, e2 ¼ 19.6224 þ 0.443i)
ry(þ) ry()
1.017 þ i 0.22 · 103 1.041 þ i 1.39 · 103
0.171 þ i 7.868 0.204 þ i 13.738
0.1145 þ i 7.860 0.1722 þ i 13.729
0.211 þ i 20.0012 0.2135 þ i 26.3795
0.1892 þ i 19.992 0.1965 þ i 26.3698
Metallic slab in the free space

Consider the case of e1 ¼ 1.0, e2 ¼ 19.6224 þ 0.443i (silver at k0 ¼ 650 nm).
Fixing the slab’s thickness at w ¼ 50 nm and searching the complex plane
for solutions of Eq. (11.4) yields the first few values of ry(–) ¼ ry(r) þ iry(i) listed in
Table 11.1; the – superscripts identify the even and odd modes, respectively. (Only
solutions having non-negative values of ry(i) are considered so that, as y ! þ1, the
corresponding modes will decay to zero.) Although we will be concerned mainly
with the top two (fundamental) solutions in Table 11.1, there exist an infinite
number of solutions with large values of ry(i). The latter are generally needed to
match the boundary conditions upon launching an SPP; otherwise, due to their
rapid decay along the y-axis, modes with large ry(i) do not appear to have any
practical significance.
As the slab thickness w increases, the fundamental solutions (first row of
Table p 11.1) approach each other, reaching the common value of
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rspp ¼ e1 e2 =ðe1 þ e2 Þ ¼ ð1:0265 þ i0:6217 · 103 Þ. In contrast, reducing the
slab thickness causes the fundamental solutions to move apart (and also further
pffiffiffiffi
away from rspp). As w! 0, the even solution approaches ry ¼ e1 , while the odd
solution acquires a large ry(r) and a fairly large ry(i). Table 11.2 lists the funda-
mental solutions of Eq. (11.4) for a range of values of w.
Prism-coupling
To excite SPPs on the flat surface of a metal slab, one may use the prism-coupling
scheme of Figure 11.2, commonly referred to as the Kretschmann or Otto con-
figuration depending on whether the metal is thin or thick.6 The incident beam
arrives at the bottom of the prism (refractive index ¼ n0) at an angle h slightly
greater than the critical angle hc of total internal reflection. Since ky ¼ k0n0 sin h,
the waves coupled to the metal slab will have ky > k0, a basic requirement for SPP
Table 11.2. Fundamental modes of silver slabs of differing thickness
(k0 ¼ 650 nm, e1 ¼ 1.0, e2 ¼ 19.6224 þ 0.443i)
w (nm) ry(þ) ry()

5 1.0003 þ i 0.7 · 106 2.3422 þ i 0.0432
10 1.0012 þ i 3.7 · 106 1.4657 þ i 0.017 45
50 1.017 þ i 0.22 · 103 1.041 þ i 1.39 · 103
65 1.021 þ i 0.35 · 103 1.033 þ i 1.01 · 103
200 1.026 þ i 0.62 · 103 1.026 þ i 0.6235 · 103
1 1.026 þ i 0.6219 · 103 1.026 þ i 0.6219 · 103
Ez
Ey
Hx
u
z
n0
Gap
1
w Metal slab
2 y
Figure 11.2 A Gaussian beam of wavelength k0 is focused at the bottom of a

glass prism of refractive index n0. The angle of incidence h is slightly greater
than the critical angle hc of total internal reflection. (In this two-dimensional
system, the beam is focused by a cylindrical lens; its shape, therefore, does not
vary along the x-axis.) The beam is linearly polarized, with H-field along x and
E-field components (Ey, Ez) confined to the plane of incidence. A small gap
separates a metallic slab of thickness w and dielectric constant e2 from the
prism; the medium immediately above and below the slab has dielectric constant
e1. The inset shows a half-prism that can be used to eliminate the back-coupling
of the surface plasmon(s), excited on the metallic surface(s), into the prism.
excitation. The incident beam, being mildly focused, has a k-space spectrum that
spans a few degrees around hc. Most of these k-vectors are reflected at the prism’s
base; however, a narrow range of incidence angles evanescently couples to the
metal surface and proceeds to excite the plasmons.
Figure 11.3 shows computed plots of the Fresnel reflection coefficient
rp ¼ jrpj exp(ip) for a p-polarized plane-wave versus the incidence angle h at
the bottom of the prism (n0 ¼ 1.5, k0 ¼ 650 nm). In Figure 11.3(a), corres-
ponding to the case of a thick silver slab separated by a 1078 nm air gap, there is
a single resonant absorption at h ¼ 43.15 . Due to the narrow range of k-vectors
1
(a) (b)
0.8
0.6 |rp|
0.4
0.2 |rp|
200
100
0 fp fp
–100
–200
–300
40 42 44 46 48 50 40 42 44 46 48 50
u (°) u (°)
Figure 11.3 Plots of the Fresnel reflection coefficient for p-polarized (TM)
light, rp ¼ jrpj exp(ip), versus the incidence angle h at the bottom of the prism of
Figure 11.2; k0 ¼ 650 nm, n0 ¼ 1.5. (a) The case of a thick silver slab separated
from the prism by a 1078 nm air gap. (b) The case of a 65 nm-thick silver slab
separated by a 950 nm air gap. In each case the gap is optimized to enhance the
strength of the excited plasmon(s).
that cross the gap, we expect the footprint of the beam on the metal surface to be
much wider than the diameter of the focused spot at the prism’s base. The rapid
variation of the phase p in the vicinity of the resonance implies that the
footprint on the metal surface will not be centered under the incident spot but,
rather, it will be shifted to the right. Figure 11.3(b), corresponding to a 65 nm-
thick silver slab separated from the prism by a 950 nm air gap, exhibits two
resonant absorptions, representing the odd and even modes of the metallic slab.
The first resonance at h1 ¼ 42.86 , having the smaller value of ky, excites the
even mode, while the second resonance, at h2 ¼ 43.52 , excites the odd mode.
The coupling of a focused beam of light through a glass prism to a thick (semi-
infinite) silver slab is depicted in Figures 11.4 and 11.5. At the base of the prism, the
Gaussian beam’s full-width-at-half-maximum-amplitude (FWHM) is 4.0 lm, the
central ray’s incidence angle is h ¼ 43 , and the air gap is 1.078 lm. The expected SPP
wavelength, k0/Re[rspp] ¼ 633.22 nm, is consistent with k0/(n0 sin h) ¼ 633.6 nm
estimated from Figure 11.3(a) at the minimum of rp(h). From Figure 11.4(a), the
profile of Hx(y) sampled at Dz ¼ 10 nm below the metal surface has a period of 634 nm
(peak of the function’s Fourier spectrum), in excellent agreement with the theory.
The Poynting vector plots of Figure 11.5 show how a fraction of the evan-
escent field’s energy reaches the metal surface, of which fraction a certain portion
immediately returns to the prism, while the remainder turns around and propa-
gates along the metal surface in the y-direction.
Hx × 10–3 |Hx | × 10–3

–7.4 –3.7 0.0 3.7 7.4 0.00 1.88 3.75 5.62 7.50
0.8 a b
0.6
0.4
z [m]
0.2
0.0
–0.2
–0.4
–35 –30 –25 –20 –30 –20 –10 0 10 20
|Ey| |Ez|
0.00 0.16 0.32 0.48 0.64 0.0 0.7 1.4 2.1 2.8
0.8 c d
0.6
0.4
z [m]
0.2
0.0
–0.2
–0.4
–30 –20 –10 0 10 20 –30 –20 –10 0 10 20
y [m] y [m]
Figure 11.4 Electromagnetic fields in the gap region between the prism and
the semi-infinite metal surface. (a) Instantaneous Hx. (b-d) Magnitudes of Hx, Ey,
Ez. The evanescent field at the bottom of the prism is visible in the upper left-
hand corner of each frame. The SPP is launched at the lower left-hand side. Due
to back-coupling to the prism, the SPP’s decay rate along the y-axis is nearly
twice the expected rate.
The best fit to Re[Hx] of Figure 11.4(a) is exp(0.013 67 y) sin [9.9165(y þ 0.245)].
While k0 Re[rspp] ¼ 9.9226 is quite close to the observed value of 9.9165, the
decay rate of 0.013 67 is substantially greater than the SPP extinction rate of k0 Im[rspp]
¼ 0.006; this is caused by the SPP’s back-coupling to the prism. We truncated the
simulated prism by removing the glass that lies directly above the excited SPP (see
Sy ×10–5 Sz ×10–5 |S| ×10–5
–21 231 482 734 986 –88 –44 0 44 88 0 246 493 739 986
0.8 a 0.8 b c
0.6
0.6 0.6
0.4
0.4 0.4
z [m]
z [m]
z [m]
0.2 0.2 0.2
0.0 0.0 0.0

–0.2 –0.2 –0.2
–0.4 –0.4 –0.4
–30 –20 –10 0 10 20 –30 –20 –10 0 10 20 –34 –32 –30 –28 –26 –24 –22–20
y [m] y [m] y [m]
Figure 11.5 (a, b) Components Sy and Sz of the Poynting vector S in the gap
region between the prism and the semi-infinite metal. (c) Close-up of jSj;
superimposed arrows show the direction of S.
|Hx | ×10–3 Sy ×10–5 Sz ×10–5

0.00 1.88 3.75 5.62 7.50 –21 231 482 734 986 –88 –44 0 44 88
0.8 a 0.8 b 0.8 c
0.6 0.6 0.6
0.4 0.4 0.4
z [m]
z [m]
z [m]
0.2 0.2 0.2
0.0 0.0 0.0
–0.2 –0.2 –0.2
–0.4 –0.4 –0.4
–30 –20 –10 0 10 20 –30 –20 –10 0 10 20 –30 –20 –10 0 10 20
y [m] y [m] y [m]
Figure 11.6 (a-c) Plots of Hx, Sy, Sz in the gap region between the prism and a
semi-infinite metallic medium. To eliminate the back-coupling of the SPP to the
prism, the part of the prism that lies above the launched SPP has been removed
(see the inset in Figure 11.2); the prism thus extends from 40 lm to 0 along the
y-axis. The SPP’s decay rate along y now agrees with the theoretical prediction.
the inset in Figure 11.2); the truncated prism thus occupied only the interval (40
lm, 0) along the y-axis. The simulation results for the truncated prism shown in
Figure 11.6 exhibit a period of 634 nm in the y 0 region (obtained from
the waveform’s Fourier spectrum). The best fit to Re[Hx], namely, the function
exp(0.006 y) sin [9.9165(y þ 0.056)], now yields the expected decay rate as well.
Interference between odd and even modes

Figures 11.7 and 11.8 show the results of FDTD simulations pertaining to a
Gaussian beam (FWHM at prism’s base ¼ 8.0 lm, k0 ¼ 650 nm, h ¼ 43.55 ),
coupled through a truncated prism (n0 ¼ 1.5, y ¼ 80 lm to 0, air gap ¼ 950 nm)
to a 65 nm-thick silver slab. The Fourier transform of Re[Hx( y)], sampled at
Hx ×10–3 |Hx | ×10–3

–8 –4 0 4 8 0 2 4 6 8
a b
1.5 1.5
1.0 1.0
z [m]
z [m]
0.5 0.5
0.0 0.0
–0.5 –0.5
–50 0 –50 –60 –40 –20 0 20 40 60
|E y | |E z |
0.000 0.188 0.375 0.562 0.750 0.0 0.8 1.6 2.4 3.2
c d
1.5 1.5
1.0 1.0
z [m]
z [m]
0.5 0.5
0.0 0.0
–0.5 –0.5
–60 –40 –20 0 20 40 60 –60 –40 –20 0 20 40 60
y [m] y [m]
Figure 11.7 Electromagnetic field profiles on both sides of a 65 nm-thick silver
slab illuminated through a truncated prism; the slab is centered at z ¼ 0.6 lm,
while the prism’s base at z ¼ 1.55 lm extends from 80 lm to 0 along the y-axis.
(a) Profile of instantaneous Hx. (b–d) Magnitudes of Hx, Ey, Ez. The evanescent
field just below the prism appears in the upper left-hand corner of each frame.
Both odd and even modes of the slab are excited, their interference causing the
peaks and valleys of the field distributions.
Dz ¼ 10 nm below the slab, yields ky1 ¼ 636.9 nm, ky2 ¼ 629.4 nm, in excellent
agreement with k0 /(n0 sin h1,2) ¼ 637.1 nm, 629.3 nm obtained from the minima of
rp(h) of Figure 11.3(b). Computed values of ry for a 65 nm-thick slab in Table 11.2
yield ky(–) ¼ k0/Re[ry(–)] ¼ 636.6 nm, 629.2 nm, and k0 Im[ry(–)] ¼ 0.0034, 0.0098,
once again in agreement with the simulated profile of Re[Hx(y)] shown in
Figure 11.7(a).
S y × 10–5 Sz × 10–5
–50 264 577 890 1203 –50 –32.5 –15.0 2.5 20.0
a b
1.5 1.5
1.0 1.0
z [m]
z [m]
0.5 0.5
0.0 0.0
–0.5 –0.5
–60 –40 –20 0 20 40 60 –60 –40 –20 0 20 40 60
|S| × 10–5 |S| × 10–5

0 301 602 903 1203 0 301 602 903 1203
1.0 c 1.0 d
0.8 0.8
0.6 0.6
z [m]
z [m]
0.4 0.4
0.2 0.2
0.0 0.0
–60 –50 –40 –30 –20 0 10 20 30 40
y [m] y [m]
Figure 11.8 (a, b) Poynting vector components Sy, Sz around the 65 nm-thick
silver slab illuminated through a truncated prism. (c, d) Close-ups of jSj,
showing the flow of energy in the early and late parts of the propagating SPP.
Profiles of the Poynting vector S in the gap region between the silver slab
and the prism and also in the region immediately below the slab are shown in
Figure 11.8. The Sz plot shows a fraction of the evanescent field’s energy
reaching the metal slab, of which a certain proportion immediately returns to the
prism, while the remainder turns around and straddles the slab along the y-axis.
In general, the odd mode, being lossier than the even mode, has a shorter
propagation distance along the y-axis. The physics behind the loss mechanism may
be understood as follows. With the even mode, the field component Ez has the same
sign above and below the slab; therefore, at a given point along y, the electrical
charges at the top and bottom surfaces have opposite signs. Inside the metallic slab,
the field component Ez – reduced by a factor of e2/e1 relative to the Ez immediately
outside – helps move the charges back and forth between the top and bottom sur-
faces. The slab being thin, the transport distance is short; hence the charge velocity
and the corresponding electrical current are small. In contrast, the charges of the odd
mode have the same sign on opposite sides of the slab. Consequently, positive and
negative charges must move laterally (in the –y-directions) during each period of
oscillation. The travel distance is now on the order of the SPP wavelength, which is
typically greater than the slab thickness. Therefore, the current densities of the odd
mode are relatively large, leading to correspondingly large losses.
Polarization dependence of SPP

A mathematical analysis similar to the one that led to Eq. (11.4) reveals that
TE-polarized electromagnetic waves cannot support SPPs at metal–dielectric
interfaces. The following argument proves the same point by appealing to the
underlying physics of surface plasmons. For the sake of simplicity, consider a
thick metal plate in vacuum, as shown in Figure 11.9. An SPP consists of two
inhomogeneous plane-waves (one in the free space, the other in the metal), both
having the same ry in the propagation direction (phase velocity Vp ¼ c/ry). The
diagram in Figure 11.9(a) represents a true SPP, with the E-fields originating on
positive (surface) charges and terminating on negative ones. If the continuity of
Hk at the surface is assumed, then a negative emetal ensures the continuity of D?,
and Ek can be made continuous by the proper choice of ry, namely, ry ¼ rspp. In
contrast, the diagram in Figure 11.9(b) represents a physical impossibility;
absence of magnetic charges in nature means that the H-field must be divergence-
free everywhere and, in particular, at the metal–vacuum interface; however, since
Hk will now have opposite directions above and below the surface, it cannot
satisfy the requisite boundary condition. This is the reason why SPPs must, of
necessity, be TM-polarized.
Concluding remarks
In this chapter we analyzed the surface modes of thin and thick metallic slabs.
Maxwell’s equations admit many solutions for electromagnetic fields that can be
considered localized at and around metallic surfaces (or, in general, confined to
the vicinity of metallo-dielectric interfaces). However, only a handful of such
solutions extend far enough beyond their point of origination to be considered
useful for practical applications. The odd and even waves that propagate along
the surfaces of metallic slabs are examples of such long-range surface plasmon
(a) (b)
E H
H E

o/ y
Vp= c/ y Vp= c/ y
Metal Metal
Figure 11.9 (a) The SPP’s E-fields originate on positive charges and terminate
on negative ones. Continuity of Hk and a negative emetal ensure the continuity of
D?, while Ek becomes continuous when ry ¼ rspp. (b) A physical impossibility,
since the divergence-free nature of the H-field requires Hk to have opposite
directions above and below the surface, thus prohibiting the continuity of Hk at
the boundary.
polaritons. The remaining solutions – properly classified as short-range or lossy

modes – should not be ignored, however, as they participate in the matching of
the boundary conditions wherever a long-range SPP is launched, or whenever an
existing SPP crosses the boundary from one environment into another.
Our FDTD simulations have verified the validity of the simple theoretical
analysis, but they also have provided a physical picture of field distributions and
energy flow patterns in realistic systems that are generally inaccessible to exact
mathematical analysis. We have seen, for example, that an SPP excited through a
glass prism possesses the expected spatial frequency, but that its decay rate is
substantially greater than the theoretical value (due to back-coupling and sub-
sequent leakage through the prism). We also argued that the even mode of a thin
metallic slab is less lossy (hence longer range) than the odd mode, primarily
because the electrical currents that sustain the even mode flow in the thickness
direction, whereas those of the odd mode flow laterally, in the plane of the slab.
Much insight can be gained from a detailed analysis of the electromagnetic field
profiles in their intimate and intricate relationship with the behavior of the
conduction electrons of the metallic medium.

1 T. W. Ebbesen, H. J. Lezec, H. F. Ghaemi, T. Thio, and P. A. Wolff, Extraordinary
optical transmission through subwavelength hole arrays, Nature 391, 667–669 (1998).
2 R. D. Averitt, S. L. Westcott, and N. J. Halas, Ultrafast electron dynamics in gold
nanoshells, Phys. Rev. B 58, R10203–R10206 (1998).
3 J. J. Mock, S. J. Oldenburg, D. R. Smith, D. A. Schultz, and S. Schultz, Composite
plasmon resonant nanowires, Nano Letters 2, 465–469 (2002).
4 H. F. Ghaemi, T. Thio, and D. E. Grupp, Surface plasmons enhance optical
transmission through subwavelength holes, Phys. Rev. B 58, 6779–6782 (1998).
5 G. Gay, O. Alloschery, B. V. de Lesegno, C. O’Dwyer, J. Weiner, and H. J. Lezec,
The optical response of nano-structured surfaces and the composite diffracted
evanescent wave model, Nature Phys. 264, 262–267 (2006).
6 H. Raether, Surface Plasmons on Smooth and Rough Surfaces and on Gratings,
Springer-Verlag, Berlin, 1986.
7 J. J. Burke, G. I. Stegeman, and T. Tamir, Surface-polariton-like waves guided by
thin, lossy metal films, Phys. Rev. B 33, 5186–5201 (1986).
8 A. Taflove and S. C. Hagness, Computational Electrodynamics: The Finite-Difference
Time-Domain Method, second edition, Artech House, 2000.
12
The Faraday effect
Michael Faraday (1791–1867) (Photo: National Portrait Gallery, London,

courtesy of AIP Emilio Segré Visual Archives.)
Michael Faraday (1791–1867) was born in a village near London into the family of
a blacksmith. His family was too poor to keep him at school and, at the age of 13,
he took a job as an errand boy in a bookshop. A year later he was apprenticed as a
bookbinder for a term of seven years. Faraday was not only binding the books but
was also reading many of them, which excited in him a burning interest in science.
When his term of apprenticeship in the bookshop was coming to an end, he
applied for the job of assistant to Sir Humphry Davy, the celebrated chemist, whose
lectures Faraday was attending during his apprenticeship. When Davy asked the
advice of one of the governors of the Royal Institution of Great Britain about the
employment of a young bookbinder, the man said: “Let him wash bottles! If he is
any good he will accept the work; if he refuses, he is not good for anything.”
Faraday accepted, and remained with the Royal Institution for the next fifty years,
first as Davy’s assistant, then as his collaborator, and finally, after Davy’s death, as
his successor. It has been said that Faraday was Davy’s greatest discovery.
In 1823 Faraday liquefied chlorine and in 1825 he discovered the substance
known as benzene. He also did significant work in electrochemistry, discovering
152
the laws of electrolysis. However, his greatest work was with electricity. In 1821
Faraday built two devices to produce what he called electromagnetic rotation,
that is, a continuous circular motion from the circular magnetic force around a
wire. Ten years later, in 1831, he began his great series of experiments in which
he discovered electromagnetic induction. These experiments form the basis of
modern electromagnetic technology.
Apart from numerous publications in scientific magazines, the most remarkable
document pertaining to his studies is his Diary, which he kept continuously
from the year 1820 to the year 1862. (This was published in 1932 by the Royal
Institution in seven volumes containing a total of 3236 pages, with a few thousand
marginal drawings.) Queen Victoria rewarded Faraday’s lifetime of achievement
by granting him the use of a house at Hampton Court and a knighthood. Faraday
accepted the cottage but gracefully rejected the knighthood.1
On 13 September 1845, Faraday discovered the magneto-optical effect that
bears his name. This day’s entry in his Diary reads: “Today worked with lines
of magnetic force, passing them across different bodies (transparent in different
directions) and at the same time passing a polarized ray of light through them
and afterwards examining the ray by a Nichol’s Eyepiece or other means.” After
describing several negative results in which the ray of light was passed through
air and several other substances, Faraday wrote in the same day’s entry: “A
piece of heavy glass which was 2 inches by 1.8 inches, and 0.5 of an inch thick,
being silico borate of lead, and polished on the two shortest edges, was
experimented with. It gave no effects when the same magnetic poles or the
contrary poles were on opposite sides (as respects the course of the polarized
ray) – nor when the same poles were on the same side, either with a constant or
intermitting current – BUT, when contrary magnetic poles were on the same
side, there was an effect produced on the polarized ray, and thus magnetic force
and light were proved to have relation to each other. This fact will most likely
prove exceedingly fertile and of great value in the investigation of both con-
ditions of natural force.”
Electromagnetic basis of the Faraday effect

Magneto-optical (MO) effects are best described in terms of the dielectric tensor e
of the medium in which the interaction between the light and the applied mag-
netic field (or the internal magnetization of the medium) takes place:2
0 1
exx exy exz
e ¼ @ eyx eyy eyz A:
ezx ezy ezz
In an isotropic material (such as ordinary glass) the three diagonal elements are
identical and, in the presence of a magnetic field along the Z-axis, there is a non-
zero off-diagonal element e0 , which couples the x- and y- components of the
optical E-field, that is,
0 1
e e0 0
e ¼ @ e0 e 0 A:
0 0 e
In general, e and e0 are wavelength dependent, but over a narrow range of

wavelengths they might be treatable as constants. In a transparent material,
where there is no optical absorption, e is real and e0 is imaginary. However, in
the most general case of an absorbing MO material both e and e0 may be
complex numbers. For diamagnetic and paramagnetic media e0 is proportional
to the applied magnetic field H, while for ferromagnetic and ferrimagnetic
materials spin–orbit coupling is the dominant source of the MO interaction,
making e0 proportional to the magnetization M of the medium.2 Since B ¼ H
þ 4pM (in CGS units), we consider the B-field inside the medium as the source
of the MO effects.
Now we discuss the basis of the MO effect. When a polarized beam of light
propagates in a medium along the direction of the magnetic field B, the right and
left circularly polarized (RCP and LCP) components of the beam experience
different refractive indices, n ¼ (e ie0 )1/2. For fused silica glass at a wavelength
k ¼ 550 nm, for example, e 2.25, and e0 107i per kOe of applied magnetic
field. (Note that both nþ and n in this case are real-valued and, therefore, there is
no absorption.) For linearly polarized light passing through a length L of the
material under the influence of a B-field, the two circular-polarization compon-
ents suffer a relative phase shift D ¼ 2pL(nþ– n)/k.3,4 As shown in Figure 12.1,
a change in the relative phase of the RCP and LCP components is equivalent to a
rotation of the plane of polarization by the Faraday angle hF ¼ 12D. In the above
example, hF 0.22 at k ¼ 550 nm for a slab 1 cm thick immersed in a 1 kOe
magnetic field. The figure of 0.22 /cm kOe is known as the Verdet constant of
fused silica at the specified wavelength.3
Certain magnetic materials (e.g., magnetic garnets) are transparent enough to
transmit a good fraction of the light while producing a fairly large Faraday
rotation. These materials can be magnetized in a given direction and sustain their
magnetization when the external field is removed. Therefore, the Faraday effect in
these media may be observed in the absence of an external magnetic field. At
k ¼ 550 nm, for instance, a typical crystal of bismuth-substituted rare-earth
iron garnet may have e 5.5 þ 0.025i and e0 0.002 – 0.01i. The complex
refractive indices for RCP and LCP light are thus (n þ ik)þ 2.347 þ 0.006i and
E a–E
a+E

E
RCP LCP RCP LCP
Figure 12.1 A linearly polarized beam of light may be considered as the

superposition of equal amounts of right and left circularly polarized beams. In
going through a perpendicularly magnetized slab of material at normal inci-
dence, the two components of circular polarization experience different (com-
plex) refractive indices and, therefore, each emerges from the medium with a
different phase and amplitude. The amplitudes of the emergent beams may be
denoted by aþ and a, and their phase difference by D. The superposition of
the emergent circular polarization states yields elliptical polarization. The angle
of rotation of the major axis of the ellipse from the horizontal direction (which is
the direction of the incident linear polarization) is given by h ¼ 12D, and the
ellipticity g is given by tan g ¼ (aþ– a)/(aþþ a).
(n þ ik) 2.343 þ 0.005i, yielding a Faraday rotation angle hF 1.3 for a

micron-thick slab of this crystal. The absorption coefficient of the material is
a ¼ 4pkL/k, where k is the imaginary part of the complex refractive index. For the
above garnet, therefore, a 0.12 per micron, which is equivalent to 1 dB loss of
light for every 2 lm of crystal thickness. In other words, this garnet delivers 2.6
of polarization rotation per dB of loss. These crystals can be grown in a range of
thicknesses from a fraction of a micron to about 100 microns. Thicker crystals are
useful at longer wavelengths, where the losses are small, but the Faraday rotation
generally decreases with increasing wavelength as well.
Faraday rotation in a transparent slab

For the sake of simplicity we will ignore the effects of absorption in the Faraday
medium and consider a transparent slab of magnetic material having a real e and
a purely imaginary e0 . Thus we consider a slab 20 lm thick having e ¼ 5.5,
e0 ¼ 0.01i. The material is magnetized perpendicularly to the plane of its surface,
and a linearly polarized beam of light (with its E-field along the X-axis) is sent at
normal incidence through the slab, as in Figure 12.2(a).5,6 Real sources of light,
of course, are never perfectly monochromatic and, therefore, we assume a finite
spectral bandwidth for the light source, covering the range k ¼ 545 – 555 nm.
Figure 12.3 shows computed plots of the transmitted amplitudes, jtxj and jtyj, as
well as the polarization rotation and ellipticity angles, hF and gF, versus k. Because
(a) (b)
Ep
Es
Ex
Ex
Z Z
Ey
B B
Ep
Faraday medium Faraday medium
Figure 12.2 Faraday effect in the polar geometry. (a) In going normally
through a slab of magnetic material, a linearly polarized beam of light with its
E-field along the X-axis acquires a component of polarization along Y. The lines
of B-field shown within the medium represent either an externally applied mag-
netic field or the intrinsic magnetization of the medium. (b) The effect is also
observed at oblique incidence. Shown here is a p-polarized incident beam, which
acquires a s-component upon transmission through the magnetic medium. (If the
incident beam is s-polarized, the magneto-optically induced polarization is then in
the p-direction.) In general, upon reversing the B-field from the þZ to the Z
direction the magneto-optically induced component of polarization changes sign.
of multiple reflections at the front and rear facets of the slab these functions vary
periodically with k. (The same interference phenomena are responsible for the non-
zero values of gF, which would otherwise be absent in a transparent medium.) The
net Faraday rotation angle is the average value of hF over the relevant range of
wavelengths, but one should also recognize that the wavelength dependence of the
direction of emergent polarization produces a certain amount of depolarization in
the emergent beam. The Faraday rotation combined with the spectral bandwidth of
the light source thus causes partial depolarization as a direct consequence of
interference among the multiple reflections.
Oblique incidence
Figure 12.4 shows the transmitted amplitudes and polarization angles versus
the angle of incidence h in the case of the slab 20 lm thick magnetized along the
Z-axis (e ¼ 5.5, e0 ¼ 0.01i) when, as shown in Figure 12.2(b), a p-polarized plane
wave at the single wavelength of k ¼ 550 nm is incident on the slab. The
0.8 (a)
0.7 |tx|
0.6
Amplitude
0.5
|ty|
0.4
0.3
0.2
546 548 550 552 554
40 (b)
Rotation/Ellipticity (degreees)
F
30
20
10
F
–10
546 548 550 552 554
(nm)
Figure 12.3 A plane wave, linearly polarized along the X-axis, is normally
incident on a slab 20 lm thick, as shown in Figure 12.2(a). The slab (e ¼ 5.5,
e0 ¼ 0.01i) is magnetized along the Z-axis. (a) Plots of jtxj and jtyj, the transmitted
polarization components along the X- and Y- axes, as functions of k. (b) Plots of
polarization rotation angle hF and ellipticity gF, versus k.
oscillations in the transmitted amplitudes and polarization angles are caused by

interference among the beams multiply reflected from the facets of the slab.
Aside from these interference oscillations, however, note that the Faraday effect
does not show any signs of abatement with increasing angle of incidence. The
reason is that even though the direction of propagation of the beam increasingly
deviates from the direction of the B-field, the propagation distance simultan-
eously increases, keeping the net interaction between the magnetic material and
the beam of light at a constant level.
Figure 12.5 shows results for the case of oblique incidence, at h ¼ 85 , on the
same slab as above in the range k ¼ 545555 nm. As in the case of normal
incidence depicted in Figure 12.3, we note a significant variation of the Faraday
angles and the amplitudes within this narrow range of wavelengths. Although the
1.0
(a)
|tpp|
0.8
Amplitude
0.6
0.4
0.2 |tsp|
0.0
0 15 30 45 60 75 90
35 (b)
30
Rotation/Ellipticity (degrees)
F
25
20
15
10
F
5
0
–5
–10
0 15 30 45 60 75 90
(degrees)
Figure 12.4 A p-polarized plane wave (k ¼ 550 nm) is incident at oblique

angle h on a slab 20 lm thick, as shown in Figure 12.2(b). The slab (e ¼ 5.5,
e0 ¼ 0.01i) is magnetized along the Z-axis. (a) Plots of jtppj and jtspj, the trans-
mitted polarization components along the p- and s-directions, as functions of h.
(b) Plots of hF and gF versus h.
beam inside the slab travels at 25 relative to the direction of magnetization of
the material, the maximum Faraday effect as exemplified by jtspj is the same as at
normal incidence, because the propagation distance is correspondingly adjusted.
The wavelength-averaged Faraday rotation may be lower at larger angles of
incidence, but this is just a consequence of interference; it is not caused by any
reduction in the intrinsic optical activity of the slab. If, for instance, the facets are
antireflection coated, or if the beam enters and exits through index-matched
spherical surfaces, then multiple reflections would be eliminated and the Faraday
rotation becomes independent of the incidence angle.
The above discussions were confined to the case of a p-polarized incident
beam, but the conclusions remain valid for s-polarized light as well. For
example, Figure 12.6 is the counterpart of Figure 12.4, showing the transmitted
1.0
(a)
0.8 |tpp|
Amplitude
0.6
0.4
0.2 |tsp|
0.0
546 548 550 552 554
40 (b)
F
30
20
10
0 F
–10
546 548 550 552 554
(nm)
Figure 12.5 A p-polarized plane wave is incident at h ¼ 85 on the slab

described in Figures 12.2–12.4. (a) Plots of jtppj and jtspj, the transmitted
p- and s-components of polarization, as functions of k. (b) Plots of hF and gF
versus k.
amplitudes and polarization angles versus the angle of incidence for a s-polarized
incident beam. Note that the magneto-optically generated component of
polarization tps in Figure 12.6 is identical to tsp in Figure 12.4. This is an
important and completely general result, indicating that the amount of light
converted from one polarization state to another is independent of the incident
polarization state.
Faraday medium in a Fabry–Pérot resonator

Because the Faraday effect is amplified when the beam propagates back and forth
within a magnetized medium, it is interesting to observe the enhancement of the
Faraday effect in a Fabry–Pérot resonator. Figure 12.7 shows a system that may
be used to monitor such enhancement over a range of angles of incidence.
1.0
(a)
0.8 |tss|
Amplitude
0.6
0.4
0.2 |tps|
0.0
0 15 30 45 60 75 90
50 (b)
F
40
30
20
10
F
0
–10
0 15 30 45 60 75 90
(degrees)
Figure 12.6 Same as Figure 12.4, except that here the incident beam is s-polarized.
Dielectric mirrors
X
Ex Lens Lens
Y Z
Ex
Faraday medium
Figure 12.7 A Faraday medium in a Fabry–Pérot resonator is placed in a

convergent cone of light. The incident plane wave is linearly polarized along the
X-axis, and the 0.8NA focusing lens is free from aberrations. The Faraday medium,
20 lm thick, and with e ¼ 5.5, e0 ¼ 0.01i, is uniformly magnetized along the Z-axis.
The mirrors coated on the front and back facets of the Faraday slab each consist of
10 alternating layers of high-index (n ¼ 2.0) and low-index (n ¼ 1.5) quarter-wave-
thick dielectrics. The collimating lens is identical to the focusing objective, and the
emergent beam is observed at the exit pupil of the collimator.
a b
c d
–3200 x/ 3200 –3200 x/ 3200
Figure 12.8 Intensity and polarization patterns in the exit pupil of the colli-
mating lens of Figure 12.7. (a) The intensity distribution of the emergent
X-polarized component. The bright rings indicate the regions where the condi-
tions of resonance are met and the light passes through the resonator. (b) The
intensity distribution of the emergent Y-polarized component. The bright rings
coincide with those in (a), indicating that the conditions of resonance for the
incident polarization are the same as those for the magneto-optically induced
polarization. (c) Polarization rotation angle hF of the emergent beam encoded in
gray-scale. The range of values of hF is 23 (black) to þ63 (white). (d) The
polarization ellipticity gF of the emergent beam encoded in gray-scale. The
range of values of gF is 32 (black) to þ42 (white).
The first objective lens (NA ¼ 0.8) focuses a linearly polarized beam of light onto
the Fabry–Pérot resonator, and the second, identical, lens collimates the trans-
mitted beam, thus allowing observation at the exit pupil. For a slab of transparent
magnetic material 20 lm thick sandwiched between a pair of dielectric mirrors,
Figure 12.8 shows the computed patterns of intensity and polarization angle at the
exit pupil of the collimator. This figure indicates that the rings of maximum
transmission also correspond to locations of maximum polarization rotation. The
maximum and minimum rotation angles in Figure 12.8(c) are þ63 and 23 ,
respectively, well in excess of the rotations obtained from the bare slab. Also note
in Figures 12.8(c), (d) the asymmetrical nature of the polarization angles in the
first and third quadrants, on the one hand, and in the second and fourth quadrants
on the other hand.
(a) (b)
Ep
Ep
Es
Y Z
Ep Ep
B B
Longitudinal Transverse
Figure 12.9 (a) Longitudinal Faraday effect is observed when the direction of
the B-field within the slab of material is parallel both to the surface of the slab
and to the plane of incidence. The rotation of polarization in this case occurs
only at oblique incidence, where, upon transmission, a p-polarized beam
acquires a s-component and vice versa. If the direction of B is reversed, the
magneto-optically induced component of polarization will change sign. (b) The
transverse effect occurs when the B-field lies in the plane of the sample per-
pendicular to the plane of incidence. The MO interaction in this case occurs only
when the incident beam is p-polarized. Even then there is no polarization
rotation; the only effect is that a change in the magnitude of the B-field causes a
slight change in the magnitude of the transmitted p-light. The transverse effect is
small and is not bipolar, meaning that reversing the direction of B does not affect
the emergent beam.
Longitudinal and transverse geometries

When the direction of the B-field is in the plane of the slab as well as in
the plane of incidence, as in Figure 12.9(a), one observes the longitudinal
Faraday effect. In this case e0 occupies the position of eyz in the dielectric tensor.
The transverse effect occurs when the B-field, while in the plane of the sample,
is perpendicular to the incidence plane, as in Figure 12.9(b). In this case
e0 occupies the position of exz.
In the longitudinal case at normal incidence no polarization rotation
occurs, but the effect begins to show with increasing angle of incidence. For a
1.0
(a)
|tpp|
0.8
0.6
Amplitude
0.4
0.2 |tsp|
0.0
0 15 30 45 60 75 90
10 (b)
5
F
0
–5
–10 F
–15
–20
–25
–30
0 15 30 45 60 75 90
(degrees)
Figure 12.10 The longitudinal Faraday effect arising when a p-polarized plane
wave (k ¼ 550 nm) is incident at oblique angle h on a slab 20 lm thick. The slab
(e ¼ 5.5, e0 ¼ 0.01i) is magnetized along the X-axis, as depicted in Figure 12.9(a).
(a) The transmitted amplitudes jtppj and jtspj versus h. (b) The polarization
rotation angle hF and the ellipticity gF versus h.
p-polarized plane wave (k ¼ 550 nm) obliquely incident on a slab of magnetic

material 20 lm thick (e ¼ 5.5, e0 ¼ 0.01i), Figure 12.10 shows the computed
amplitudes of the transmitted p- and s-polarized light as well as the angles of
rotation and ellipticity versus the incidence angle h. One could readily
compute similar results for a s-polarized incident beam as well. In both
cases the MO effect is bipolar, meaning that a reversal of the direction of the
B-field reverses the signs of hF and gF. Moreover, as in the polar case dis-
cussed earlier, the magneto-optically generated component of polarization
1.0
Tp(0)
0.8
Transmitted Intensity
0.6
0.4
0.2
Tp– Tp(0)
0.0
0 15 30 45 60 75 90
(degrees)
Figure 12.11 The transverse Faraday effect arising when a p-polarized

plane wave (k ¼ 550 nm) is incident at oblique angle h on a slab 20 lm thick.
The slab (e ¼ 5.5) is magnetized along the Y-axis, as shown in Figure 12.9(b).
In the absence of the B-field, e0 ¼ 0, and the transmission of the slab for a p-
polarized incident beam is denoted by Tp(0). When a strong B-field is
introduced (corresponding to e0 ¼ 0.1i in this case), the transmission changes
to Tp. Shown here is the transmission differential DTp ¼ Tp Tp(0) as a
function of h.
turns out to be the same for both directions of incident polarization; that is,
tsp ¼ tps.
The transverse effect is very different from both the polar and the longitudinal
effects. With s-polarized incident light, where the optical E-field is parallel to
the direction of the B-field in the slab, there is no MO effect whatsoever,
but for the p-polarized light the medium exhibits an effective refractive index
n ¼ [e þ (e0 2/e)]1/2. Thus in the transverse case neither s- nor p-polarized beams
undergo polarization rotation, but the magnitude of the transmitted p-light shows a
weak dependence on magnetization, that is, Tp ¼ jtpj2 becomes a function of the
strength of the B-field. The transverse effect is not bipolar, so that changing the
direction of the B-field from þY to Y does not alter the magnitude of Tp. For a
slab of transparent material 20 lm thick and with a fairly large MO coefficient
(e ¼ 5.5, e0 ¼ 0.1i), Figure 12.11 shows computed plots of Tp(0) (i.e., transmission in
the absence of a B-field, when e0 ¼ 0) and DTp ¼ TpTp(0) versus the angle of
incidence h. Note, in particular, that DTp 0 around the Brewster angle hB ¼ 66.9 ,
where a vanishing surface reflectivity results in minimal interference effects.

1 Adapted from George Gamow, The Great Physicists from Galileo to Einstein, Dover
Publications, New York, 1961. Some of the historical anecdotes have been compiled
from information available on the worldwide web; see, for example, www.phy.uct.ac.
za, www.iee.org.uk, www.woodrow.org.
2 P. S. Pershan, Magneto-optical effects, J. Appl. Phys. 38, 1482–1490 (1967).
New York, 1976.
4 R. W. Wood, Physical Optics, third edition, Optical Society of America, Washington
DC, 1988.
5 D. O. Smith, Magneto-optical scattering from multilayer magnetic and dielectric
films, Opt. Acta 12, 13 (1965).
13
The magneto-optical Kerr effect
The Scottish physicist John Kerr (1824–1907) discovered the magneto-optical

effect named after him in 1888. When linearly polarized light is reflected from
the polished surface of a magnetized medium its polarization vector rotates and
becomes somewhat elliptical. The direction of rotation and the sense of ellipticity
are reversed when the direction of magnetization M of the sample is reversed,
thus providing a powerful tool for optically monitoring the state of magnetization
of the sample under investigation.1,2,3
The physical mechanism of the Kerr effect is identical to that of the Faraday
effect and, in fact, the same theoretical model can be used to describe both
phenomena, one in reflection, the other in transmission (see Chapter 12, “The
Faraday effect”).
The Kerr effect can be analyzed under quite general conditions, with the
direction of magnetization of the sample oriented arbitrarily relative to the
plane of incidence of the light beam. However, the three geometries shown in
Figure 13.1 are of particular importance and will be analyzed separately in the
present chapter. When the magnetization M is perpendicular to the sample’s
surface, the observed phenomenon is referred to as the polar Kerr effect. When
M is parallel to the surface and in the plane of incidence, the Kerr effect is
longitudinal. Finally, when M is parallel to the surface but perpendicular to the
plane of incidence, the observed phenomenon is known as the transverse Kerr
effect.4,5
Electromagnetic basis of the Kerr effect

For convenience, we repeat in this short section the relevant text from
chapter 10. Magneto-optical (MO) effects are best described in terms of the
dielectric tensor e of the medium in which the interaction between the light
166
Ep Ep Ep
Es Es Es
M
Polar Longitudinal Transverse
Figure 13.1 The MO Kerr effect is polar, longitudinal, or transverse,

depending on the orientation of the magnetic moment M relative to the sample’s
surface and to the plane of incidence. The incident beam is p- or s-polarized
according to whether its E-field is in the plane of incidence (Ep) or perpendicular
to it (Es).
and the applied magnetic field (or the internal magnetization of the medium)
takes place:1
0 1
exx exy exz
@
e ¼ eyx eyy eyz A:
ezx ezy ezz
In an isotropic material the three diagonal elements are identical and, in the
presence of a magnetic field along the Z-axis, there is a non-zero off-diagonal
element e0 , which couples the x- and y- components of the optical E-field:
0 1
e e0 0
e ¼ @ e0 e 0 A:
0 0 e
In general, e and e0 are wavelength-dependent, but over a narrow range of

wavelengths they might be treatable as constants. In a transparent material, where
there is no optical absorption, e is real and e0 is imaginary. However, in the most
general case of an absorbing MO material both e and e0 would be complex
numbers. For diamagnetic and paramagnetic media e0 is proportional to the
applied magnetic field H, while for ferromagnetic and ferrimagnetic materials
spin–orbit coupling is the dominant source of the MO interaction, making e0
proportional to the magnetization M of the medium.1 Since B ¼ H þ 4pM (in
CGS units), in general the B-field inside the medium may be considered the
source of the MO effects.
When a polarized beam of light propagates in a medium along the direction of
the magnetic field B, the right and left circularly polarized (RCP and LCP)
components of the beam experience different refractive indices n ¼ (e ie0 )1/2.
Since the Fresnel reflection coefficients depend on the refractive index, the two
circular polarizations are reflected with different reflectivities, rþ and r, say.
When rþ and r happen to have a phase difference, the reflected beam exhibits a
polarization rotation, and if the magnitudes jrþj and jrj differ from each other,
then there will be some degree of ellipticity. When the medium is transparent, n
are real and, therefore, there is no phase difference between rþ and r, although
their magnitudes will be different. In this case the reflected light exhibits
polarization ellipticity only. However, in the general case of reflection from the
surface of an absorbing medium (both e and e0 complex), the reflected light
exhibits elliptical polarization, with the major axis of the ellipse rotated relative
to the direction of incident polarization.
For concreteness, we will confine our attention throughout this chapter
to a metallic magnetic material having e ¼ 8 þ 27i and e0 ¼ 0.6 þ 0.2i at
the red HeNe wavelength, k0 ¼ 633 nm. This is typical of the TbFeCo amorphous
alloys used in magneto-optical disks for data storage. The discussion, however,
will be kept quite general in nature, and the conclusions drawn from specific
examples should be applicable to a wide variety of magnetic materials.
The polar effect

Figure 13.2(a) shows computed plots of the various reflection coefficients versus
the angle of incidence h for the case of a perpendicularly-magnetized sample. The
conventional reflection coefficients for p- and s-light, rpp and rss, show the behavior
expected for a metallic surface. We denote by rps the cross-polarization factor from
incident p to reflected s, and by rsp that from incident s to reflected p. These
coefficients represent the ability of the magnetic medium to convert, upon reflec-
tion, p-polarized light into s, and vice versa. It can be shown quite generally that
rps ¼ rsp at all angles of incidence. Thus the power of the magnetic medium to
“rotate” the polarization is independent of whether the incident beam is p- or
s-polarized. However, the polarization rotation and ellipticity angles, q and g,
which depend on rpp and rss as well as rps, exhibit differing behaviors for p- and
s-light (see Figures 13.2(b), (c)). Note also in Figure 13.3(a) that rps remains more
or less constant up to fairly large angles of incidence.
The longitudinal effect

Plots of the various reflection coefficients versus the angle of incidence h for the
longitudinal geometry appear in Figure 13.3(a). As in the polar case, it turns out
that rsp ¼ rps for all values of h. At normal incidence the interaction between the
incident E-field and the magnetization of the medium cannot produce any
polarization rotation; therefore, rsp ¼ 0 at h ¼ 0. As h increases, however, the MO
1.0
(a)
|rss|
Reflection Coefficient
0.8
0.6 |rpp|
0.4 150 |rsp|
0.2
0.0
0 15 30 45 60 75 90
0.05 (b)
Rotation & Ellipticity (deg.)
0.00
–0.05 p
–0.10
–0.15
–0.20 p
–0.25
0 15 30 45 60 75 90
0.05 (c)
0.00
s
–0.05
–0.10
–0.15
–0.20 s
–0.25
0 15 30 45 60 75 90
(degrees)
Figure 13.2 A linearly polarized plane wave is reflected from the polished
surface of a magnetic material having perpendicular magnetization (the polar
case); exx ¼ 8 þ 27i, exy ¼ 0.6 þ 0.2i. (a) Plots of jrppj, jrssj, and jrspj ¼ jrpsj
versus the angle of incidence h. (b) The polarization rotation angle q and
the ellipticity g versus h for p-polarized incident beam. (c) Same as (b) for
s-polarized beam.
signal gains strength, peaking at h ¼ 65 . Again, q and g depend on whether the
incident polarization is p or s (see Figures 13.3(b), (c)), but the effective MO
signal, rsp, is independent of the incident polarization. The longitudinal MO
signal is typically weaker than its polar counterpart by almost one order of
magnitude.
1.0
(a)
|rss|
0.8
0.6 |rpp|
0.4
1000 |rsp|
0.2
0.0
0 15 30 45 60 75 90
0.05 (b)
0.04 p
0.03
0.02
0.01
0.00 p
–0.01
–0.02
–0.03
0 15 30 45 60 75 90
(c)
0.02
s
0.01
0.00
–0.01
s
–0.02
0 15 30 45 60 75 90
(degrees)
Figure 13.3 Same as Figure 13.2 but here for the longitudinal Kerr effect.
Again rsp ¼ rps at all angles of incidence. The MO effect is zero at normal
incidence, reaching its peak at a fairly large angle. Note that jrpsj, q, g are about
an order of magnitude smaller than their counterparts for the polar geometry
case. Both polar and longitudinal effects are bipolar, in the sense that a reversal
in the direction of M results in a p phase shift of rps, leading to a reversal in the
signs of both q and g.
The transverse effect

The behavior of the reflected light in this case differs fundamentally from that
in the other two cases. First, there is no interaction whatsoever between
the magnetic moment of the sample and s-polarized light. Here the optical E-field
is parallel to M and, therefore, does not “see” the magnetization of the sample.
When the incident beam is p-polarized, the interaction is confined to the plane of
incidence, creating an extra E-field component within the same plane. Unlike the
polar and longitudinal effects, no E-fields are generated perpendicular to the
plane of incidence. Therefore, there are no polarization rotations in the transverse
geometry. What is interesting, however, is that the reflectivity of the sample,
Rp ¼ jrppj2, depends on the magnitude and direction of the magnetic moment M.
ð0Þ
In Figure 13.4(a) the reflectivity in the absence of M is denoted by Rp (i.e., e0
is set to zero). With M pointing along þY the reflectivity changes slightly, becoming
ðþÞ
Rp ; the difference is shown as the solid curve at the bottom of Figure 13.4(a).
Similarly, when M is reversed to point along Y, the corresponding change in
Rp is given by the broken curve. The change in Rp is thus seen to depend on the
direction of M. This behavior is rather curious and, at first sight, appears to
violate the principles of symmetry, although a careful analysis shows it to be
correct.1 It is noteworthy that this bipolar nature of Rp critically depends on the
magnetic medium being absorptive; for transparent magnetic media (where e
1.0 (a) 0.15 (b)
= 60°
0.8 0.10
40°
0.6 0.05
Rp(M) – Rp(0) (×100)
Rp(0)
20° 80°
Reflectivity
0.4 0.00 0°
0.2 –0.05
Rp(+) – Rp(0) (×100)
0.0 –0.10
Rp(–) – Rp(0) (×100)

–0.2 –0.15
0 15 30 45 60 75 90 –1.0 –0.5 0.0 0.5 1.0
(degrees) M
Figure 13.4 Variation of the reflectivity Rp with the magnitude and/or direc-
tion of M in the transverse geometry. The incident beam is p-polarized in all
cases; there are no transverse effects for s-polarized light. (a) The dependence of
Rp on the angle of incidence h; the superscript zero indicates that M ¼ 0. When
the medium is fully magnetized in the Y direction, the reflectivity is denoted by
ðÞ
Rp . (b) The variation of Rp with M at various angles of incidence. At h ¼ 0 the
dependence on M is quadratic, while at h ¼ 20 , 40 , 60 it is nearly linear.
(The off-diagonal element e0 of the dielectric tensor is assumed to be directly
proportional to M.)
is purely real and e0 purely imaginary), the dependence of Rp on M is quadratic,

showing no change with the reversal of the direction of magnetization.
ðMÞ ð0Þ
Figure 13.4(b) shows the variations in the reflectivity difference Rp Rp
with the magnitude of M, as M varies continuously from a maximum value along
þY to zero and then reverses direction and reaches a maximum in the opposite
direction. At normal incidence the dependence on M is quadratic, but at larger
angles (20 , 40 , 60 ) it is almost (but not quite) linear. Like the longitudinal
effect, the transverse effect in this case is about an order of magnitude weaker
than the polar effect.
Localized probe of the state of magnetization

It is sometimes desirable to probe the local state of a magnetic surface. This can
be done by focusing onto the surface a polarized laser beam through a high-NA
objective, as shown in Figure 13.5. The lens focuses the beam to a diffraction-
limited spot (diameter k0), providing access to the sample’s magnetization
within a tiny region. The focused beam, of course, contains many rays arriving at
the sample from different directions, making the analysis of the resulting Kerr
signal somewhat tedious.
To begin with, even in the absence of a magnetic moment M the reflected
polarization state is complicated. Figure 13.6 shows the various distributions at the
exit pupil of the objective when M is set to zero. The intensity of the x-component
E Incident beam
Objective
Magnetic sample
Figure 13.5 A linearly polarized beam of light having its E-field parallel to
the X-axis is focused onto the flat surface of a magnetic medium through a
diffraction-limited microscope objective lens (NA ¼ 0.95, f ¼ 3158k). The power
of the incident beam – its integrated intensity – is set to unity. The reflected
light’s distribution at the exit pupil has a small but important contribution from
the magnetization M of the sample.
a b
c d
e f
–3200 x/ 3200 –3200 x/ 3200
Figure 13.6 Various distributions at the exit pupil of the objective of Figure 13.5,
when M is set to zero (i.e., no Kerr effect). (a) Distribution of intensity for the
reflected Ex; the total power ¼ 0.62. (b) Distribution of phase for Ex; min ¼ 0 ,
max ¼ 55 . (c) Distribution of intensity for the reflected Ey; the total power ¼ 0.011.
(d) Distribution of phase for Ey; min ¼ 36 , max ¼ 150 . (e) The polarization
rotation angle q; qmin ¼ 20.5 , qmax ¼ 20.5 . (f) The polarization ellipticity g;
gmin ¼ 25.1 , gmax ¼ 25.1 .
of the reflected light, Ix ¼ jExj2, depicted in Figure 13.6(a), shows slight variations
across the aperture, in agreement with the rpp and rss curves of Figure 13.2(a).
Similar variations are seen in the corresponding phase plot of Figure 13.6(b). In
addition to Ex, the reflected light also contains a y-component, Ey, whose intensity
and phase plots appear in Figures 13.6(c), (d). While the total power (i.e., the
integrated intensity) of Ex is 62% of the incident power, that of Ey is only 1.1%.
The reflected Ey in adjacent quadrants of the aperture exhibits a phase shift of p,
indicating a sign reversal from one quadrant to the next. The presence of Ey in the
reflected beam gives rise to the patterns of polarization rotation and ellipticity
depicted in Figures 13.6(e), (f); note the fairly large values of q and g in the four
corners of the aperture qmin, qmax ¼ 20.5 ; gmin, gmax ¼ 25.1 ).
To determine the contribution to the reflected E-field by the sample’s

magnetization, we compute the complex reflected amplitudes for M up and
M down, then subtract one distribution from the other. In the process the
x-component of polarization disappears, indicating that Ex is indifferent to the
reversal of M. However, the residual y-component shows the distribution depicted
in Figure 13.7. The total power of Ey contributed by the MO interaction in this case
is 0.0042% of the incident power. Both the phase and intensity of this residual Ey
are fairly uniform, with the intensity showing a mild decline towards the edge of
the aperture, consistent with the behavior of rsp in Figure 13.2(a). (Note that, even
at NA ¼ 0.95, the largest angle of incidence on the sample is less than 72 .)
A similar calculation for the longitudinal case yields the plots in Figure 13.8.
Here the complex amplitude distributions are computed for M along þX and X,
then subtracted from each other. Unlike the polar Kerr signal in Figure 13.7, both
the reflected Ex and the reflected Ey in the longitudinal geometry contain some
MO contribution. The total power of the MO contribution to Ex is 0.0000065%,
which is rather small and concentrated in the four corners of the aperture. Note
that the top half of the aperture containing the Ex signal has a p phase shift
relative to the bottom half. In contrast, the Ey contribution to the MO signal (see
Figures 13.8(c), (d)) contains 0.000054% of the incident power, equally divided
between the right and left halves of the aperture with a p phase shift.
Finally, if the magnetization of the sample in Figure 13.5 is aligned with the
Y-axis (perpendicular to the plane of the figure) then the MO contributions to the
reflected beam will be those shown in Figure 13.9. As before, we obtain these
distributions by computing the complex amplitudes at the exit pupil with M along
þY and Y and then subtracting one from the other. The MO contribution to Ex,
having 0.00026% of the incident power, is fairly strong. The contribution to Ey
contains 0.000054% of the incident power, exactly as in the longitudinal case
a b
–3200 x/ 3200 –3200 x/ 3200
Figure 13.7 Contribution of the magnetic moment M of the sample in

Figure 13.5 to the Ey distribution at the objective’s exit pupil; M is assumed to
be perpendicular to the sample’s surface. (a), (b) Intensity and phase patterns
of Ey; the total power ¼ 0.42 · 104.
a b
c d
–3200 x/ 3200 –3200 x/ 3200
Figure 13.8 Contribution of the magnetic moment of the sample to the E-field
distribution at the exit pupil of the objective of Figure 13.5; M is assumed to be
aligned with the X-axis. (a), (b) Intensity and phase patterns of Ex; total power
0.65 · 107. The top and bottom halves of the aperture have a relative phase of p.
(c), (d) Intensity and phase patterns of Ey; the total power ¼ 0.54 · 106. Note
the p phase difference between the right and left halves of the aperture.
depicted in Figures 13.8(c), (d). Note that, with the exception of a 90 rotation of
coordinates, the distributions in Figures 13.9(c), (d) are identical to those in
Figures 13.8(c), (d).
Signal detection
The MO contribution to the reflected polarization state can be converted to an
electronic signal with the aid of polarization-sensitive optics and photodetectors.
For instance, to detect the polar Kerr signal shown in Figure 13.7, one can employ
the differential scheme shown in Figure 13.10. Here the reflected beam is directed
toward a quarter-wave plate, which helps to eliminate the phase shift between Ex
and Ey. The quarter-wave plate is followed by a Wollaston prism, which mixes the
MO component of polarization contained in Ey with the reflected x-component of
polarization, Ex. The two mixed beams emerging from the Wollaston are detected
by a pair of photodetectors whose difference signal DS conveys information about
the sample’s magnetic state. A computed plot of the normalized differential signal
versus the orientation angle w of M is given in Figure 13.11. As M moves away
a b
c d
–3200 x/ 3200 –3200 x/ 3200
Figure 13.9 Same as Figure 13.8, but for the transverse geometry, where M is
switched between þY and Y directions. Ex, depicted in (a), (b), has total power
0.26 · 105. Ey, depicted in (c), (d), has total power 0.54 · 106.
Incident
beam
Plate
4 Wollaston
Differential
s1 amplifier
+
ΔS
s2
–
Split
Splitter
detector
Objective
Magnetic
M sample
Figure 13.10 A differential detection scheme is used to probe the direction of

M via the state of polarization of the reflected beam. To attain high spatial
resolution, the laser beam is focused on the sample surface. The reflected beam
goes through a quarter-wave plate, whose fast and slow axes are at 45 to the
direction of incident polarization. The Wollaston prism divides the beam
between two photodetectors, and the difference DS between the outputs of these
detectors is monitored. To maximize the swing of DS one must adjust the
orientation of the Wollaston around the optical axis.
0.75
Normalized Differential Signal

0.50
100ΔS/(S1 + S2)
0.25
0.00
–0.25
–0.50
–0.75
0 45 90 135 180
(degrees)
Figure 13.11 The normalized differential signal as a function of the orientation

angle w of M (see Figure 13.10). The detection module has been adjusted for
maximum swing of DS. This signal is bipolar, in the sense that it switches sign
when M is reversed. DS is zero at w ¼ 90 (i.e., M in the plane of the sample).
from its initial orientation at w ¼ 0 toward the plane of the sample at w ¼ 90 ,
and continues downward until w ¼ 180 , DS follows these changes continuously.
(We mention in passing that, as M rotates, the sum signal S1 þ S2 undergoes slight
variations, but, for all practical purposes, it remains a constant.)
Similar systems may be designed to extract the longitudinal and transverse
MO signals depicted in Figures 13.8 and 13.9. However, because in these cases
the E-field contributions have different signs in opposite halves of the aperture,
any viable detection scheme must extract the signals from these half-apertures
separately, before combining them with the proper sign at the end.
Enhancing the Kerr signal

To enhance the MO signal one should force the magnetic sample to absorb a
greater fraction of the incident beam. As an example of how this can be done,
consider the system of Figure 13.12, which consists of a magnetic sample placed
under a high-reflectivity dielectric mirror. The majority of the rays in the focused
beam are reflected from the mirror without ever reaching the magnetic sample.
However, when the direction of the ray is such that the cavity between the mirror
and the sample becomes resonant, the ray is strongly absorbed by the magnetic
sample. This strong absorption produces in the reflected beam a rather large
polarization component perpendicular to the incident E-field, which can then be
detected at the exit pupil of the objective.
E Incident beam
Objective
Dielectric mirror
Spacer
Magnetic sample
Figure 13.12 A collimated linearly polarized laser beam (k ¼ 633 nm) is

focused through a 0.75NA, f ¼ 4000k objective onto a high-reflectivity dielectric
mirror sitting on top of a magnetic sample (e ¼ 8 þ 27i, e0 ¼ 0.6 þ 0.2i). The
mirror, consisting of seven pairs of high- and low-index quarter-wave layers
(nH ¼ 2, nL ¼ 1.5), is deposited on a glass substrate 10lm thick and of index
n ¼ 1.5. The substrate is in direct contact with the magnetic surface. The mag-
netization M is uniform and perpendicular to the plane of the sample’s surface,
and the beam entering the lens is polarized along the X-axis. Upon reflection
from the sample, the light collected by the objective is photographed at the exit
pupil. Most rays within the focused beam are reflected at the mirror, without ever
reaching the magnetic sample. At certain angles of incidence, however, where
the cavity becomes resonant, the light passes through to the magnetic medium
and is absorbed by it. It is only for these resonant rays that the MO effect is
observed at the exit pupil.
Figure 13.13 shows the computed distributions at the exit pupil of a 0.75NA
lens. The intensity plot for Ex in Figure 13.13(a) shows absorption bands in the
angular spectrum of the incident beam. The reflected Ey in Figure 13.13(b) is
strong in certain regions of the aperture, but these contributions mostly come
from spurious light reflected from the mirror, not from the magnetic sample.
To determine the MO signal at the exit pupil, we once again compute the
reflected complex amplitudes with M up and M down, then subtract the corres-
ponding distributions. Figure 13.13(c) is the result of this calculation, showing
the intensity of the residual Ey contributed by the MO interaction. The peak value
of this MO signal is nearly twice that shown in Figure 13.7(a).
Quadrilayer stack
A practical method of enhancing the MO Kerr effect involves the incorporation
of a thin magnetic film in a quadrilayer stack structure. Figure 13.14 shows one
such stack, consisting of an aluminum reflector, a dielectric underlayer, a thin
magnetic film, and a dielectric overlayer. By optimizing the thicknesses of these
–3200 x/ 3200
Figure 13.13 Various distributions at the exit pupil of the objective of

Figure 13.12. (a) Intensity of the reflected Ex; the total power ¼ 87% of the
incident power. (b) Intensity of Ey; the total power ¼ 0.3% of the incident power.
Most of this Ey, which is primarily produced by oblique reflections from the
dielectric mirror, serves only to obscure the MO-generated component of polar-
ization. (c) The true MO signal obtained by subtracting the distributions produced
with M up and M down; the total power 0.0007% of the incident power.
layers it is possible to improve the MO signal substantially. In the following

example, we will fix the thicknesses of three of the layers and optimize the
thickness of the remaining one. This results in a significant gain in the performance
of the stack. (It is possible to achieve further improvement by optimizing the other
layers as well.)
Figure 13.15 shows plots of the reflectivity R ¼ jrppj2 ¼ jrssj2, the MO Kerr
signal jrspj, and the polarization rotation and ellipticity all versus the thickness t2
Plane-wave
t1 Dielectric
Magnetic
t2 Dielectric
Aluminum
Substrate
Figure 13.14 A quadrilayer MO stack consists of an aluminum reflector, an

intermediate dielectric layer, a thin magneto-optic film, and an overcoating
dielectric layer. The thicknesses of the various layers may be adjusted to
maximize the MO signal jrspj obtained upon reflection.
2.0
0.6 (a) R
(b)

50 |rsp|
0.5 1.5
0.4 1.0
Reflectivity
0.3
0.5
0.2
0.0
0.1
–0.5
0.0
0 50 100 150 0 50 100 150
t2 (nm) t2 (nm)
Figure 13.15 The dependence of reflected signals from a quadrilayer MO stack on

the thickness of the dielectric underlayer (k0¼633 nm, normal incidence). The alu-
minum layer (n þ ik ¼ 1.4 þ 7.6i) is 50 nm thick, the MO film is 20 nm thick, and the
overcoat layer (n ¼ 2) is 80 nm thick (t1 ¼ k0/(4n)). The underlayer’s index of
refraction is n ¼ 2, but its thickness t2 is adjustable. (a) Computed plots of the
reflectivity R and the MO signal jrspj versus t2. (b) The Kerr rotation angle q and the
ellipticity g versus t2. The maximum of q occurs when g is nearly zero, and vice versa.
of the dielectric underlayer. (Since the dependence on t2 is periodic, only one

period, ranging from zero to k0/(2n), is shown.) Note that jrspj peaks when R is at
a minimum, and vice versa. The maximum value of jrspj in this example is about
three times greater than that of the bare magnetic sample shown in Figure 13.2.

1 P. S. Pershan, Magneto-optical effects, J. Appl. Phys. 38, 1482–1490 (1967).
New York, 1976.
3 R. W. Wood, Physical Optics, third edition, Optical Society of America, Washington
DC, 1988.
4 D. O. Smith, Magneto-optical scattering from multilayer magnetic and dielectric
films, Optica Acta 12, 13 (1965).
14
The Sagnac interferometer
The Sagnac effect pertains to the relative phase shift between two beams of light
that travel on an identical path in opposite directions within a rotating frame.1,2,3,4
Modern fiber-optic gyroscopes (Sagnac interferometers) used for navigation are
based on this effect, allowing highly accurate measurements of rotation rates
down to about 104105 degrees per hour. Georges Sagnac (1869–1926) was
the first to perform a ring interferometry experiment in 1913 aimed at observing
the correlation of angular velocity and optical phase-shift. (An experiment con-
ducted in 1911 by Francis Harress, attempting to measure the Fresnel drag of
light propagating through rotating glass, was later recognized as actually con-
stituting a Sagnac experiment; Harress had ascribed the observed “unexpected
bias” to some other factor.) An ambitious ring interferometry experiment was set
up by Albert Michelson and Henry Gale in 1926 to determine whether the Earth’s
rotation has an effect on the propagation of light in its vicinity. The Michelson–
Gale interferometer with a 1.9 km perimeter was large enough to detect the
rotation of the Earth, confirming its known value of angular velocity (obtained
from astronomical observations). The Michelson–Gale ring interferometer was
not calibrated by comparison with an outside reference, an impossible task given
that the setup was fixed to the Earth.
Figure 14.1 shows the general design of a triangular Sagnac interferometer
consisting of a light source, a beam-splitter S, mirrors M1, M2, and an obser-
vation plane, mounted on a base that rotates at a constant angular velocity X
around a fixed axis. The rotation axis, not necessarily perpendicular to the plane
of the interferometer, crosses that plane at C. The source and the observation
plane are mounted on the same rotating base as M1, M2, and S, although, strictly
speaking, this is not necessary (that is, either the source or the observation plane
or both may be stationary while the rest of the system rotates; this would
require synchronizing the light pulses with the rotating base, but would not
modify the behavior of the system in any significant way). Between the source
182
M1
C
Rotation Axis
S
Source
M2
Observation
Plane
Figure 14.1 Diagram of a Sagnac interferometer consisting of the beam-

splitter S and the mirrors M1 and M2. The instruments are mounted on a base that
rotates around a fixed axis at a constant angular velocity X. The rotation axis,
which crosses the plane of the interferometer at C, is not necessarily perpen-
dicular to that plane. In this configuration, the source and the observation plane
are mounted on the same (rotating) platform as M1, M2, and S.
and the observation plane, the clockwise-propagating beam undergoes four

reflections, whereas the beam that travels counterclockwise suffers two
reflections (at M1, M2) and two transmissions (both at S). Given the 90 phase
difference between the reflection and transmission coefficients of any beam-
splitter, the counter-propagating beams arrive at the observation plane with a
relative phase of 180 , thus cancelling each other out and resulting in a dark (or
null) fringe. Any phase difference imparted to the two beams as a result of the
rotation of the system will therefore change the strength of the signal picked up
by a photodetector at the observation plane.
There exist references in the literature to a “fringe shift” at the observation
plane resulting from the rotation of the Sagnac interferometer.3 Such fringe shifts
are observed only when a small misalignment of the system causes the separation
of the counter-propagating beams upon arrival at the beam-splitter S. As a specific
example based on typical numerical values, Figure 14.2 shows a Sagnac inter-
ferometer incorporating a lens that focuses the incoming laser beam onto the front
and back sides of the beam-splitter after the beam has completed its round-trips in
the two opposite directions (red: clockwise, blue: counter-clockwise). A slight
tilt of one of the mirrors (say, M2) separates the two focused spots at S. The
beams emerging from these two spots travel in parallel and interfere at the
observation plane; the inset in Figure 14.2 shows the (computed) fringes that
result from this interference. Any phase difference of the counter-propagating
beams produced in a rotating system will cause a lateral shift of the fringes within
the observation plane.
M1
Light
Source M2
Observation
Plane
Figure 14.2 A lens (NA ¼ 0.005, f ¼ 3.0 m) focuses the incoming laser beam
(k0 ¼ 0.633 lm, 1/e amplitude diameter ¼ 1.0 cm) onto the beam-splitter after
the completion of a single round-trip in either direction (red: clockwise, blue:
counter-clockwise). The 0.9 m-long arms of the interferometer form an equi-
lateral triangle. The observation plane is 1.0 m away from the beam-splitter S,
which consists of an 8.0 nm-thick silver film on a glass substrate (Rs ¼ 50.5%,
Ts ¼ 46.8%). The mirrors M1 and M2 are 0.5 lm-thick silver films on glass
substrates (RM ¼ 97%). Tilting one of the mirrors (say, M2) by Dh ¼ 0.05
separates the focused spots at the beam-splitter by 1.57 mm (in the direction
parallel to the observation plane); each focused spot is 0.24 mm in diameter.
The beams emerging from the focused spots travel in parallel and interfere at the
observation plane; the inset shows computed fringes over a 6 · 6 mm2 area.
The three main features of a Sagnac interferometer are as follows:

1. The observed relative phase between counter-propagating beams around the Sagnac
loop is proportional to A· X, where A is the loop area and X is the loop’s angular
velocity, irrespective of the shape of the loop, or the location and orientation of the
rotation axis.
2. Doppler shifts produced by reflections from the moving beam-splitter and mirrors do
not give rise to different optical frequencies for the counter-propagating beams when
these beams arrive at the observation plane, even though the frequencies of the two
beams could be different elsewhere along the path.
3. The refractive indices of the media traversed by the counter-propagating beams do not
affect the relative phase of these beams, provided that such media co-rotate with the
rest of the Sagnac interferometer.
The objective of the present chapter is to explain the physical basis of the
above features of the Sagnac interferometer without resort to the principles of
general relativity. Our goal is to provide an explanation based on geometry, the
theory of special relativity, and the classical theory of optical wave propagation,
while maintaining some level of generality.
Fundamental formula of the Sagnac interferometer

Figure 14.3 shows that, in its clockwise path around the Sagnac loop, the light
beam propagates the distances r1, r2, r3 along the unit vectors r1, r2, r3. A total
of four reflections (at S, M1, M2, and again at S) bring the beam from the source
to the observation plane. With respect to the rotation center C, which is in the
r1r2r3 plane, the centers of M1, M2, S are located at R1, R2, R3, respectively.
The area A of the triangle r1r2r3 is equal to the area A1 of r1R1R3 plus the
area A2 of R3R2r3 minus the area A3 of R1R2r2. The monochromatic light
source (frequency ¼ f0, wavelength k0 ¼ c/f0, wave-number k0 ¼ 2p/k0 ¼ 2pf0/c)
launches the incident beam along the unit vector r0; the emergent beam reaches
the observation plane along the unit vector r4. Note that r1 · R1, a vector per-
pendicular to the plane of r1R1R3 with a magnitude equal to the perpendicular
distance from C to r1, is exactly equal to r1 · R3. Similarly, r2 · R1 ¼ r2 · R2 and
r3 · R2 ¼ r3 · R3. It is thus clear that r1r1 · R1 ¼ r1r1 · R3 ¼ 2A1; similar identities
may be readily established for the areas A2 and A3 of the triangles R3R2r3 and
R1R2r2 as well.
Consider a plane-wave that leaves the beam-splitter S and propagates toward
the mirror M1 along r1; the complex amplitude of this wave may be expressed as
a0 exp[i k0(r1 · r – ct)]. The destination of the beam is the center of M1 located at r1
relative to the center of S, and the travel time is Dt ¼ r1/c. However, the rotation of the
interferometer causes the center of M1 to shift to a new location r10 ¼ r1 R1 · X Dt.
Upon arriving at M1 the additional phase of the beam due to this rotation will be
D1 ¼ k0 r1 ðr01 r1 Þ ¼ k0 r1 ðR1 · XÞDt ¼ k0 ðr1 · R1 Þ Xr1 =c

¼ k0 ðr1 · R1 Þ X=c ¼ 2k0 A1 X=c: ð14:1Þ
In the above derivation, we have used the vector identity a · (b · c) ¼ (a · b) · c,

which is readily proven by considering the volume of a parallelepiped constructed
around a, b, and c. A1 is the area of the triangle with base r1 and vertex C (i.e., the
point at which the rotation axis crosses the interferometer’s plane). With refer-
ence to Figure 14.3, the sign of the cross-product giving rise to A1 is positive in
the present example. When the contributions to the acquired phase of the three
legs of the interferometer are added up, the positive and negative areas combine
to yield the net area A of the triangle enclosed by the light beam circulating
M1
2 R1
C

r1 r2
R2
R3
1
S
0
r3
Light 3 M2
Source 4
Observation
Plane
Figure 14.3 In the clockwise path around the Sagnac loop, the light beam
propagates the distances r1, r2, r3 along the unit vectors r1, r2, r3. Four
reflections (at S, M1, M2, and again at S) bring the beam from the source to the
observation plane. With respect to the rotation center C, the centers of M1, M2, S
are at R1, R2, R3, respectively. The monochromatic light source (frequency ¼ f0)
launches the incident beam along r0; the emergent beam (frequency ¼ f4)
reaches the observation plane along r4. Viewed from an inertial frame outside
the rotating system, f4 could differ from f0; from the perspective of a co-rotating
observer, however, the two frequencies are identical.
around the loop. The total phase shift must be doubled when the counter-
clockwise-propagating beam is taken into account as well. The net phase between
the two beams at the observation plane is
D ¼ 4k0 A X=c: ð14:2Þ
Equation (14.2) is the fundamental equation of the Sagnac interferometer.

Although the present analysis has been limited to a triangular loop, it is obvious
that the procedure can be readily extended to loops of arbitrary shape, without
modifying the fundamental formula.
Note that, in arriving at Eq. (14.2), we ignored the Doppler shifts of the beams
produced by the moving reflectors. The reason is that the analysis has been per-
formed from the perspective of a co-rotating observer, i.e., one who resides on the
rotating platform. For this observer, the source, the splitter, and the mirrors are
stationary and, therefore, the light’s frequency everywhere is the same. The
analysis of the next section, conducted from the viewpoint of an observer residing
in an inertial frame outside the rotating platform, provides an alternative derivation
of Eq. (14.2) using the Doppler shifts produced by the moving reflectors.
Figure 14.4 shows that the counter-propagating beams in each arm of the
Sagnac loop interfere with each other, setting up a standing-wave pattern
(period ¼ 12k0) that is stationary within the rotating platform. The effect of the
rotation on the standing wave is a uniform, longitudinal shift of the fringes by
A · X/c. The reason for the fringe shift is that, at any given point around the
loop, one beam arrives with its phase advanced, while the other (counter-
propagating) beam arrives with its phase retarded. The combined phase-shift of
Du ¼ 2 k0A· X/c is the same as the accumulated phase in a single round trip for
either beam; the proof follows the same line of argument as that employed in
conjunction with Eq. (14.1).
Doppler shift caused by moving reflectors

To analyze the Doppler shift upon reflection from the moving mirrors M1, M2,
and the beam-splitter S, we consider the system depicted in Figure 14.5. In the xyz
frame of reference, the mirror M moves with constant velocity v along the z-axis.
From the perspective of an observer in the x0y0z0 frame in which the mirror is
stationary, the incident and reflected plane-waves have the same frequency f 0
(wavelength k0 ¼ c/f 0 ), and the propagation directions are uniquely identified with
unit vectors r10 and r20 . In the xyz frame, however, the incident plane-wave has
frequency f1, wavelength k1 ¼ c/f1, and propagation direction r1, whereas the
reflected wave’s parameters are f2, k2, and r2. The Lorentz transformation of the
spatio-temporal coordinates from (x0, y0, z0, t0) to (x, y, z, t) yields the relationship
between the incident and reflected waves in the xyz frame.
In the co-moving x0y0z0 frame, the complex amplitudes of the incident and
reflected beams may be written as follows (r 0 denotes r10 for the incident beam
and r20 for the reflected beam):
aðx0 ; y0 ; z0 ; t0 Þ ¼ a00 exp½ið2pf 0 =cÞðrx0 x0 þ ry0 y0 þ rz0 z0 ct0 Þ: ð14:3Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Substituting x0 ¼ x, y0 ¼ y, z0 ¼ (z – vt)/ 1 v2 =c2 , t0¼ (t – vz/c2)/ 1 v2 =c2 in
accordance with the Lorentz transformation yields
aðx; y; z; tÞ ¼ a0 exp½ið2pf =cÞðrx x þ ry y þ rz z ctÞ; ð14:4aÞ
where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
f ¼ f 0 ½1 þ ðv=cÞrz0 = 1 v2 =c2 ; ð14:4bÞ
r ¼ ðrx ; ry ; rz Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ ðrx0 1 v2 =c2 ; ry0 1 v2 =c2 ; rz0 þ v=cÞ=½1 þ ðv=cÞrz0 : ð14:4cÞ
M1
C

½0
S
Light
Source M2
Observation
Plane
Figure 14.4 The counter-propagating beams of the Sagnac loop interfere with
each other, setting up a standing wave pattern that remains stationary in the
rotating frame. The wavelength everywhere within the rotating platform is k0,
and the standing-wave fringes have a period of 12k0. At any given point around
the loop, the clockwise beam is delayed and the counter-clockwise beam is
advanced, yielding a combined phase-shift of Du ¼ 2k0A·X/c. The phase shift
results in a longitudinal translation of the fringe pattern by A·X/c.
x x
2
v
1
y
. z
y . z
Figure 14.5 In the xyz frame, the mirror M moves with constant velocity v
along the z-axis. From the perspective of an observer in the x0y0z0 frame, the
mirror is stationary, and the incident and reflected plane-waves have the same
frequency f 0 and propagation directions identified with unit vectors r10 and r20 . In
the xyz frame, the incident wave has frequency f1 and propagation direction r1,
while the reflected wave’s parameters are f2 and r2.
It is not difficult to verify that jrj ¼ 1. From Eq. (14.4c) one can compute rz0 as a
function of rz, then substitute into Eq. (14.4b) to find:
f ¼ f 0 1 v2 =c2 =ð1 r v=cÞ: ð14:5Þ
Here r · v/c is an alternative expression for (v/c)rz. Consequently, in the xyz frame,
where the incident beam has frequency f1 and propagation direction r1, while the
reflected beam has frequency f2 and propagation direction r2, we have
f2 =f1 ¼ ð1 r1 v=cÞ=ð1 r2 v=cÞ: ð14:6Þ
Returning now to the system depicted in Figure 14.3, where, for the beam-
splitter and the mirrors, r · v ¼ – r · (R · X ) ¼ – (r · R ) · X, we note that the
magnitude of r · R is simply the perpendicular distance from C to a straight line
aligned with r. The vector r · R is perpendicular to the plane of the interferometer,
pointing either up or down depending on the direction of r. Thus we may write, for
the clockwise path in Figure 14.3,
f4 =f0 ¼ ðf4 =f3 Þðf3 =f2 Þðf2 =f1 Þðf1 =f0 Þ

¼ ½1 þ ðr0 · R3 Þ X=c=½1 þ ðr4 · R3 Þ X=c: ð14:7Þ
In the final analysis, therefore, the ratio of the emergent frequency f4 to the
source frequency f0 is a function of the perpendicular distance from C to the
incidence vector r0 as well as that from C to the emergent vector r4. We
emphasize that f4 and f0 appear to be different only to a stationary observer
outside the rotating system; a comparison of Eq. (14.7) with Eq. (14.5) clearly
indicates that, from the perspective of a co-rotating observer, the frequency at the
observation plane is the same as the source frequency.
In the counter-clockwise direction the beam is transmitted twice through the
splitter S; neither passage introduces any Doppler shifts, as the propagation
direction remains unaltered before and after transmission through a parallel-
plate slab (in other words, r1 ¼ r2 in conjunction with Eq. (14.6) immediately
implies that f1 ¼ f2). The Doppler shifts produced by reflection from the moving
mirrors M1 and M2, however, need to be taken into account. A similar analysis
as the one that led to Eq. (14.7) then reveals that, for the counter-clockwise
path, the ratio of the emergent frequency f4 to the source frequency f0 is the
same as that for the clockwise path. Therefore, at the observation plane,
the emerging clockwise and counter-clockwise beams will have the same
frequency f4.
We now derive Sagnac’s fundamental equation, Eq. (14.2), from the per-
spective of an observer outside the rotating platform, an observer residing in an
inertial frame in which the rotation axis is stationary. Our alternative derivation
relies on the Doppler-shifted frequencies in the three arms of the loop depicted
in Figure 14.3. When the inertial observer considers the clockwise path at a
frozen instant in time, the Doppler-shifted frequencies f1, f2, f3 in the three arms
of the loop yield the accumulated phase as follows:
D1 þ D2 þ D3 ¼ 2pðf1 r1 þ f2 r2 þ f3 r3 Þ=c
¼ ð2pf0 =cÞ½ðf1 =f0 Þr1 þ ðf2 =f0 Þr2 þ ðf3 =f0 Þr3
¼ ð2pf0 =cÞ½1 þ ðr0 · R3 Þ X=c
(
r1 r2
· þ
1 þ ðr1 · R3 Þ X=c 1 þ ðr2 · R1 Þ X=c
)
r3
þ
1 þ ðr3 · R2 Þ X=c

ð2pf00 =cÞ ðr1 þ r2 þ r3 Þ ½ðr1 · R3 Þ þ ðr2 · R1 Þ

þ ðr3 · R2 Þ X=c
¼ k00 ðr1 þ r2 þ r3 Þ 2k00 A X=c: ð14:8Þ
In the above derivation we have used Eq. (14.5) to relate f0 to f00 , the frequency of
the source within the rotating system, while ignoring terms of the order (v/c)2 and
higher. A similar treatment of the counter-clockwise path leads to the same result
as in Eq. (14.8), except for the minus sign between the two terms being replaced
by a plus sign. Thus the net phase-shift Du between the counter-propagating
beams is, once again, given by Eq. (14.2). Note that k00 in Eq. (14.8) is the same as
k0 in Eq. (14.2), both symbols representing the light source’s wave-number as
measured on the rotating platform.
Finally, we must show that the counter-propagating beams in each arm of the
Sagnac interferometer produce running fringes that travel with the velocity of the
arm itself; this would corroborate our earlier assertion that the fringes co-rotate
with the platform. For the sake of concreteness, we denote by f2þ and f2,
respectively, the clockwise and counter-clockwise frequencies in the arm located
between the mirrors M1 and M2. (A similar analysis applies to the other arms as
well.) In analogy with Eq. (14.7), we write
f2 =f0 ¼ ½1 þ ðr0 · R3 Þ X=c=½1 ðr2 · R1 Þ X=c: ð14:9Þ
Again, relating f0 to f00 through Eq. (14.5) and ignoring terms of the order (v/c)2
and higher yields
f2 ½1 ðr2 · R1 Þ X=c f00 : ð14:10Þ
Since r2 · R1 is a vector perpendicular to the loop, having the magnitude of

the vertical distance from C to r2, one may replace R1 in Eq. (14.10) with
any vector R from C to an arbitrary point along the straight line that connects
the centers of M1 and M2. The scalar product (r2 · R1) · X is thus equal to
v2 ¼ r2·(X · R), namely, the linear velocity (along r2) of each and every
point that resides on the path from the center of M1 to the center of M2.
We have
f 2 ½1 ðv2 =cÞ f00 : ð14:11Þ
In the region between M1 and M2 the counter-propagating beam amplitudes

are a0 cos(x2þt – k2þx þ u2þ) and a0 cos(x2t þ k2x þ u2); here x2 ¼ 2p f2,
k2 ¼ x2/c, u2þu2 is the relative phase, and x is the distance from the center of
M1 along the unit vector r2. The total intensity I(x, t) in this region may thus be
written
Iðx; tÞ ¼ a20 h½cosðxþ þ þ 2

2 t k2 x þ 2 Þ þ cosðx2 t þ k2 x þ 2 Þ i; ð14:12Þ
where the angle brackets represent time-averaging to eliminate rapid oscillations.

We have
Iðx; tÞ a0 2 f1 þ cos½ðxþ þ þ
2 x2 Þt ðk2 þ k2 Þx þ ð2 2 Þg
a0 2 f1 þ cos½4pðf0 0 =cÞðx v2 tÞ ðþ
2 2 Þg ð14:13Þ
The running fringes thus have a period of 12k0 and travel with velocity v2 in the r2
direction.
The effect of a co-rotating dielectric medium

With reference to Figure 14.6, a transparent dielectric slab of thickness L moves
with constant velocity v along the z-axis in the xyz frame. From the perspective of
an observer in the x0y0z0 frame, which also moves with velocity v along z, the slab
is stationary, and the incident, intermediate, and transmitted (or emergent) plane-
waves all have the same frequency f 0 as well as identical propagation directions
r0 ¼ r10 ¼ r20 . (The entrance and exit facets of the slab are perpendicular to the
common propagation direction.) We denote by n0 the refractive index of the
(stationary) slab at f ¼ f 0. In the xyz frame, the incident and transmitted plane-
waves have frequency f1, wavelength k1 ¼ c/f1, and propagation direction r1.
(The beam parameters inside the dielectric, f2, k2 ¼ c/(n^2 f2), and r2 will not enter
the following analysis.)
x
x
L 1
2
1
v
n
y
. z
y . z
Figure 14.6 In the xyz frame, a transparent dielectric slab of thickness L moves
with constant velocity v along the z-axis. From the perspective of an observer in
the x0y0z0 frame, the slab is stationary, and the incident, intermediate, and
transmitted plane-waves all have the same frequency f 0 and (identical) propa-
gation directions r0. In the xyz frame, both the incident and transmitted waves
have frequency f1 and propagation direction r1, while the beam parameters
inside the dielectric are f2 and r2.
The Lorentz transformation of the spatio-temporal coordinates from (x0, y0, z0, t0)
to (x, y, z, t) yields the relationship between the various plane-waves in the xyz and
x0y0z0 systems. In particular, one can readily show that
f 0¼ ð1 r1 v=cÞf1 = 1 v2 =c2 ð1 r1 v=cÞf1 ; ð14:14aÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r0¼ ðrx1 1 v2 =c2 ; ry1 1 v2 =c2 ; rz1 v=cÞ=ð1 r1 v=cÞ
ð14:14bÞ
ðr1 v=cÞ=ð1 r1 v=cÞ:
Let the entrance facet of the slab at t ¼ t0 ¼ 0 be centered at the (coincident)

origin of the two coordinate systems, namely, (x0, y0, z0) ¼ (x00 , y00 , z00 ) ¼ (0, 0, 0).
The reference phase of the plane-wave entering the slab is thus zero in both
coordinate systems. In the absence of any movement, i.e., when v ¼ 0, the fre-
quency f1 and the propagation direction r1 of the beam are the same everywhere,
inside and outside the slab. Moreover, the propagation direction r1 may be
assumed to be perpendicular to the entrance and exit facets, so that at the center
r1 ¼ (x1, y1, z1) of the exit facet, r1 · r1 ¼ L. Denoting by n1 the refractive index of
the stationary slab at f ¼ f1, one can write the phase u1s of the emergent beam at
the center of the exit facet (relative to the phase at the center of the entrance facet)
as follows:
1s ¼ 2pn1 f1 L=c: ð14:15Þ
With the slab traveling at a constant velocity v, the frequency f 0 and the
propagation direction r0 inside the slab (within the co-moving x0y0z0 frame)
determine the phase at the center of the exit facet located at r10 ¼ Lr1 (relative to
that at the center of the entrance facet) as u10 ¼ 2pn0f 0Lr1 · r0/c. The emergent
beam may thus be written as
aðr0 ; t0 Þ ¼ a00 expði0 Þ exp½ið2pf 0=cÞðr0 r0 ct0 Þ; ð14:16aÞ
where
0 ¼ 2pðn0 1Þf 0 Lr1 r0 =c ð2pf1 L=cÞðn0 1Þð1 r1 v=cÞ: ð14:16bÞ
We denote the (group) index of refraction at f ¼ f 0 by ng, corresponding to the group

velocity of a light pulse inside the stationary slab within the moving x0y0z0 frame,
namely,
ng ðf 0 Þ ¼ nðf 0 Þ þ ðdn=df Þf 0 n0 þ ðn1 n0 Þf 0 =ðf1 f 0 Þ

ð2n0 n1 Þ þ ðn1 n0 Þ=ðr1 v=cÞ
n0 þ ðn1 n0 Þ=ðr1 v=cÞ: ð14:17Þ
In the last line of Eq. (14.17), the near-equality of n0 and n1 has been used to
substitute n0 for (2n0 – n1); however, the same approximation cannot be applied to
the second term, because of the division by the small quantity r1 · v /c.
When a short pulse launched at the entrance facet at t0 ¼ 0 reaches the slab’s
exit facet at t0 ¼ ngL/c, the center of the exit facet, located at r10 ¼ Lr1 in the x0y0z0
system, will have reached the point r1 in the xyz frame. Using the Lorentz
transformation, we find
h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii
r1 ¼ Lrx1 ; Lry1 ; ðLrz1 þ vng L=cÞ= 1 v2 =c2 Lðr1 þ ng v=cÞ: ð14:18Þ
Therefore, in the xyz system, the phase of the emergent beam at the center of the
exit facet will be
1 ¼ 0 þ ð2pf1 =cÞr1 r1 ð2pf1 L=cÞ½ðn0 1Þð1 r1 v=cÞ

þ ð1 þ ng r1 v=cÞ
ð2pf1 L=cÞðn1 þ r1 v=cÞ: ð14:19Þ
The difference between the above phase and that obtained in Eq. (14.15) for a
stationary slab is
1 1s ð2pf1 L=cÞr1 v=c: ð14:20Þ
Note that the above expression for the phase difference between a rotating
Sagnac interferometer (with co-rotating slab) and a stationary one, both computed
at the exit facet of the slab (relative to the entrance facet), is independent of the
slab’s refractive index. The expression for u1 – u1s in Eq. (14.20) is the same as
that which would have been obtained had the beam traveled from the entrance to
the exit facet in the free space (rather than in a dielectric of refractive index n).
We have thus proven that the presence of co-moving dielectric media within one
or more arms of the device will not alter the fundamental formula, Eq. (14.2), of
the Sagnac interferometer.
The laser gyroscope

Figure 14.7 shows a diagram of a ring laser gyroscope. The co-rotating gain medium
is now an integral part of the Sagnac loop, and the beam-splitter S is oriented in such
a way as to act as a semi-transparent mirror for the ring laser. The beam traveling
clockwise around the loop emerges from the splitter S along the direction r0, and is
subsequently directed toward the photodetector array. Similarly, the counter-
clockwise-traveling beam emerges along r4, and is directed toward the detector.
When the loop is stationary (i.e., X ¼ 0), the clockwise and counterclockwise laser
modes have identical frequencies f0. Denoting by l the perimeter of the loop, and
assuming that the entire loop is uniformly filled with a gain medium of refractive
index n0, the wavelength k ¼ c/(n0 f0) must fit within the loop, that is, l/k ¼ n0 f0 l/c
¼ m, where m is an integer. Any rotation of the loop will introduce a phase shift in the
clockwise path, resulting in the lasing frequency of the clockwise mode to drift from
f0 to f0þ. A nearly equal but opposite phase shift in the counter-clockwise path
will result in the lasing frequency of the counter-clockwise mode to drift from f0 to
f0. Note that f0 are the frequencies measured by the co-rotating observer.
Assuming the frequency shifts are not large enough to modify the mode
number m, one may write:
2pm ¼ 2pn0 f0 ‘=c

¼ ð2pnþ f 0þ l=cÞ ð4pf 0þ =cÞA X=c
¼ ð2pn f 0 l=cÞ þ ð4pf 0 =cÞA X=c: ð14:21Þ
M1
2 R1
C
r2
r1
R2
R3
0
r3
3 M2
4
S
Detector Array
Figure 14.7 An active Sagnac gyroscope is a ring laser whose gain medium is
placed within one or more arms of a Sagnac interferometer. The beam-splitter S
is now re-oriented to allow the counter-propagating beams to circulate around
the loop. Fractions of the clockwise and counter-clockwise modes emerge from
the cavity along r0 and r4, respectively. When the platform is stationary, the two
modes have the same frequency f0; the mode frequencies, however, drift in
opposite directions when X6¼0. The two modes are brought together on
an observation plane to form an interference pattern with running fringes. A
photodetector array picks up the beat frequency Df between the modes, which is
proportional to the platform’s angular frequency X. The detector array is also
capable of detecting the sign of X by monitoring the direction of travel of the
fringes.
For sufficiently small frequency shifts (i.e., Df ¼ f þ0 f 0 f0), the refractive

indices n ¼ n( f0) will be nearly the same as n0, and the average frequency
f ¼ 12 ( f0þ þ f0) will be essentially equal to f0. Therefore,
Df =f ðf 0þ f 0 Þ=f0 4A X=ðn0 lcÞ: ð14:22Þ
When the two beams emerging from the laser cavity are brought together on a
photodetector array, as in Figure 14.7, they produce a pattern of running fringes.
The beat frequency Df then yields the magnitude of X, while its sign is
determined by the direction of fringe travel.

1 G. Sagnac, Comptes Rendus de l’Académie des Sciences (Paris) 157, 708–710,
1410–1413 (1913).
2 H. Ives, JOSA 28, 296–299 (1938).
3 E. J. Post, Sagnac effect, Rev. Mod. Phys. 39, 475–493 (1967).
4 H. Lefèvre, The Fiber-Optic Gyroscope, Artech House, Boston, 1993.
15
Fabry–Pérot etalons in polarized light
The principles of operation of Fabry–Pérot interferometers are well known, and

their application in spectroscopy has established their status as one of the most
sensitive instruments ever invented.1,2 However, the behavior of a Fabry–Pérot
device in polarized light, especially when birefringence and optical activity are
present within the mirrors or in the cavity, is less well known. We devote this
chapter to a description of some of these phenomena, in the hope of clarifying their
physical origins and perhaps suggesting some new applications.
The dielectric mirror

A multilayer stack mirror is shown schematically in Figure 15.1. The substrate is a
transparent slab of glass, and the layers are made of high- and low-index dielectric
materials.3 In the examples used in this chapter the low-index layers will have
(n, k) ¼ (1.5, 0) and thickness d ¼ 105.5 nm, and the high-index layers will have
(n, k) ¼ (2, 0) and d ¼ 79.125 nm. (At the operating wavelength, 633 nm, both these
layers will be one quarter-wave thick.) Figure 15.2 shows computed plots of amplitude
and phase for the reflection coefficients of a 10-layer mirror. At normal incidence
(h ¼ 0) both p- and s-components of polarization have an amplitude reflectivity jqj
¼ 0.844. The mirror, therefore, reflects about 71% of the incident optical power and
transmits the remaining 29%. Ignoring any loss of light at the uncoated facet of the
substrate, the amplitude transmission coefficient (outside the substrate) turns out to
be jsj ¼ 0.536. At larger angles of incidence both the amplitude and phase for p-
and s-light begin to deviate from their normal-incidence values and from each
other, but we are not concerned with these variations here. What is important is to
note that, at small angles of incidence (say up to 30 ), the reflectivity remains high.
Let us also mention in passing that, when light shines on a dielectric mirror through
its substrate, the reflection and transmission coefficients q0 and s0 generally retain
the same amplitudes as above, but their phases will differ from those of q and s.
197
Z
Ep
Es
Θ Y
X
Layer n
Layer 1
Substrate
Figure 15.1 A multilayer dielectric mirror and a plane wave at oblique incidence.
In the examples used in this chapter, the substrate refractive index n is 1.5, the odd-
numbered layers have an index of 2 and are 79.125 nm thick, and the even-numbered
layers have an index of 1.5 and are 105.5 nm thick. At the design wavelength,
k ¼ 633 nm, these layer thicknesses correspond to one-quarter of the wavelength.
(a) (b)
1.0 0
|s|
fp
Phase of Reflection Coefficient
0.8
–45
0.6
|p|
–90
0.4
–135 fs
0.2
0.0 –180
0 15 30 45 60 75 90 0 15 30 45 60 75 90
(degrees) (degrees)
Figure 15.2 Computed plots of (a) amplitude and (b) phase of the reflection
coefficients of a dielectric mirror for p- and s-polarized beams versus the angle
of incidence. The assumed mirror is as shown in Figure 15.1, with a total of 10
layers; the medium of incidence is air.
The Fabry–Pérot etalon

Figure 15.3 shows the schematic of a Fabry–Pérot etalon. Two dielectric mirrors,
separated by an air gap, are placed face-to-face and parallel to each other. A plane
monochromatic beam is shown at oblique incidence h on one of the mirrors.
Fabry–Pérot etalons in polarized light 199
Air gap
Z
Θ
Quarter-wave
Substrate stacks Substrate
Figure 15.3 A Fabry–Pérot etalon consists of two face-to-face dielectric

mirrors separated by an air-gap. Also shown is a plane wave at an oblique
incidence angle h.
(In practice the uncoated facets of both substrates are given a slight wedge to
eliminate spurious reflections.) For the system of Figure 15.3 the computed plots
of reflection amplitude and phase versus h are shown in Figure 15.4, for mirrors
with 10 dielectric layers each, and an air gap 8.229 lm wide, which is exactly
13k. Note that, within the 0 to 30 range of angles of incidence depicted, the p-
and s- reflectivities are nearly the same. Sharp drops in the etalon’s reflectivity
occur at h ¼ 0 , 15.37 , 21.83 , and 26.85 ; at these angles (d/k) cos h ¼ 13,
12.53, 12.07, and 11.6, respectively. In other words, when the effective gap-width
is an integer multiple of a half-wavelength, the etalon becomes transparent to the
incident light. To be sure, there are slight deviations from exact half-wavelength
multiplicity here, which have to do with the h-dependence of the phase of the
individual mirror reflectivities (see Figure 15.2 (b)), but, for our purposes, these
differences are small and may be ignored.
Next, we study the setup of Figure 15.5, which is designed to send a focused
beam of light onto an etalon and to analyze the resulting reflection. The setup
includes a path for a reference beam, so that Twyman–Green interferometry may
be used to reveal the reflected phase pattern. It also includes a polarizer before the
observation plane to allow selection of the polarization direction of interest.
Figure 15.6 shows computed plots of intensity at the observation plane obtained
under various conditions. Frames (a) and (b) are obtained when the reference
beam is blocked, whereas frames (c) and (d) are interferograms obtained in the
presence of the reference beam. The circular area in each frame represents the
aperture of the objective lens (NA ¼ 0.5).
(a) 180 (b)

|rp| |rs|
1.0 fp
135
fs
90
Phase of Reflection Coefficient

0.8
45
0.6
0
–45
0.4
–90
0.2
–135
–180
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
(degrees) (degrees)
Figure 15.4 Computed plots of (a) amplitude and (b) phase of the reflection
coefficients of a Fabry–Pérot etalon for p- and s-polarized beams versus the angle
of incidence. The assumed etalon is that shown in Figure 15.3 with 10-layer
mirrors and a 8.229 lm gap.
In (a) the polarizer is taken to transmit the same direction of polarization as that
of the incident beam. The dark rings correspond to the angles of incidence at which
the reflectivity plots of Figure 15.4(a) exhibit their minima. In frame (b) the
polarizer is turned by 90 so that only a small fraction of the light (about 3 · 104
of the original incident power) passes through to the observation plane. The four
corners of this distribution correspond to the four corners of the focused cone of
light, which have a mix of p- and s-polarization. Here, the rays incident on the
etalon are subject to slightly different reflectivities in their p- and s-components
(see Figure 15.4(a)), which gives rise to a small rotation of polarization from its
original direction. It is this polarization rotation in the four corners of the lens that
is responsible for the four corners of the intensity distribution in Figure 15.6(b).
Frames (c) and (d) of Figure 15.6 are obtained by unblocking the reference arm
in the system of Figure 15.5, thus allowing the interference pattern (between the
beam reflected from the etalon and that reflected from the reference mirror) to
impinge on the observation plane. The case for the parallel component of
polarization depicted in (c) shows the phase of the pattern to be more or less
uniform over the entire aperture; in particular, it shows that there are no phase
jumps between adjacent rings. The case for the perpendicular component of
Reference
mirror
Quarter-wave
plate
Microscope objective
Neutral-density lens (NA = 0.5)
filter
Linearly polarized
incident beam
Cube beam-splitter Fabry–Pérot

Polarizer etalon
Observation plane
Figure 15.5 Schematic diagram of a system used for observing Fabry–Pérot

fringes. A linearly polarized beam of light is focused on an etalon, which reflects the
beam and sends it back through the system. The recollimated beam may then be
viewed after passing through a polarizer. By setting the transmission axis of the
polarizer perpendicular to the direction of incident polarization, one may observe
regions of the beam which have suffered a small (but measurable) polarization
rotation. A Twyman–Green interferometer is also incorporated into the system for
observing the reflected phase pattern. The neutral-density filter is needed to adjust
the amplitude of the reference beam in order to obtain high-contrast interferograms.
A 45 rotation of the quarter-wave plate around the optical axis causes a 90 rotation
of the reference beam’s polarization; this is needed when the polarizer’s trans-
mission axis is set perpendicular to the direction of incident polarization.
polarization depicted in (d) shows a 180 phase shift between adjacent corners.
This is caused by the fact that, in adjacent corners of the lens, the polarization
vector rotates in opposite directions.
Mirror birefringence
Next we consider the effects of birefringence in the mirrors of the Fabry–
Pérot etalon.4,5 For this analysis we assume that the mirrors have 20 layers each
(jqj ¼ 0.9905, jsj ¼ 0.1375) and that the uppermost layer of both mirrors is
slightly birefringent. We will suppose that the uppermost layer has a nominal
index of 1.5, except along the Y-axis (see Figure 15.1) where the index is 1.505.
We also assume a normally incident plane wave and an adjustable gap-width. For
a b
c d
Figure 15.6 Computed plots of intensity distribution at the observation plane of

the system of Figure 15.5. Frame (a) is obtained when the reference beam is blocked
and the polarizer is set to transmit the direction of incident polarization. In the case
of frame (b), the reference beam is still blocked, but the polarizer is rotated by 90 .
The logarithm of the intensity distribution is plotted here in order to enhance the
weak regions of the pattern; this is similar to over-exposing a photographic film
placed at the observation plane. Frames (c) and (d) show the corresponding inter-
ference patterns obtained when the reference beam is unblocked. In the case of
frame (d) the polarizer is rotated by 90 and the quarter-wave plate by 45 , a strong
neutral-density filter is used to attenuate the reference beam substantially, and again
the logarithm of the intensity distribution is plotted to enhance weak regions.
this etalon the computed transmission coefficients and the polarization state of the
transmitted light versus the gap-width are plotted in Figure 15.7. Two different
peaks are observed in transmission, one for the p-polarized, the other for the
s-polarized incident beam. (The E-field of the p-light is parallel to X, while that of
the s-light is parallel to Y.) The peak separation arises because the mirrors give a
slightly different phase upon reflection to the two components of polarization.
The gap-width, therefore, must be adjusted to compensate for this phase differ-
ence. If the incident beam is linearly polarized at 45 (i.e., halfway between p
and s), the transmitted beam will show the rotation angle w and the ellipticity n
depicted in Figure 15.7(b). The maximum ellipticity is close to 40 , which shows
that the transmitted light at this point is nearly circularly polarized. A very small
amount of birefringence in the mirrors can, therefore, have substantial effects on
the polarization state of the transmitted (or reflected) beam.
(a) 70 (b)
1.0
60
c
Polarization Rotation and Ellipticity (degrees)

50
0.8 40
Amplitude Transmission Coefficient
30
0.6 20
10
j
0.4 0
|ts|
–10
|tp| –20
0.2
–30
–40
0.0
8210 8220 8230 8240 8250 8210 8220 8230 8240 8250
Gap Width (nm) Gap Width (nm)
Figure 15.7 (a) Computed amplitude transmission coefficients and (b) the state
of transmitted polarization plotted versus the gap-width for a normally incident
beam on the Fabry–Pérot etalon of Figure 15.3. The mirrors are assumed to have
20 layers each and, for both mirrors, the uppermost layer is assumed to be
birefringent. With reference to Figure 15.1, the refractive indices of the top layer
along the X-, Y-, and Z- axes are 1.500, 1.505, and 1.500, respectively. The
normally incident beam is linearly polarized at 45 to the X-axis. In (b) the
polarization rotation angle, w, is also referred to the X-axis. By definition,
the polarization ellipticity n is the arctangent of the ratio of the minor axis of the
ellipse of polarization to its major axis. Thus e ¼ 0 corresponds to linear
polarization, whereas n ¼ 45 represents circular polarization.
In practice, if the mirrors are known to have the same amount of birefringence,
these problems can be avoided by rotating one of the mirrors by 90 relative to
the other. Also, it might be of some interest to note that birefringence of the top
layer poses the most serious problem for the Fabry–Pérot etalons. In our calcu-
lations, the effects diminished as we moved the birefringent layer down the stack
(closer to the substrate). By the time the birefringence is moved to layer 14 of
both mirrors, its effects are totally negligible.
Enhancement of Faraday rotation

Figure 15.8 shows a Fabry–Pérot etalon with a Faraday rotator inserted in the gap
between its mirrors. The Faraday rotator may be a slice of a transparent magnetic
Quarter-wave
Substrate stacks Substrate
Ep
Ep Es
Z
Linearly polarized
incident beam
Faraday rotator
Figure 15.8 Schematic diagram showing a Fabry–Pérot etalon with a Faraday

rotator placed in the gap. The normally incident plane wave is linearly polarized
along the p-direction, but the optical activity of the Faraday rotator produces a
transmitted component of polarization along the s-direction. The mirrors are
taken to have 21 layers each and the Faraday medium to be 2.11lm thick and to
have refractive indices (n, k) ¼ (1.5 0.333 · 104, 0) for the states of right and
left circular polarization.
crystal (such as an iron garnet) or a transparent piece of glass in which an

externally applied magnetic field has induced polarization rotation. In our simu-
lations this medium was 2.11lm thick, with refractive indices (n, k) ¼ (1.5
0.333 · 104, 0) for the states of right and left circular polarization (RCP and
LCP). This small amount of optical activity would produce only 0.04 of polar-
ization rotation in a single pass of the beam through the medium. However, as we
shall see shortly, the etalon enhances the rotation because, in effect, it circulates
the beam through the Faraday medium.
Figure 15.9 shows the transmitted amplitudes and the polarization state of a
linearly polarized beam after going through the etalon of Figure 15.8. Since
tuning of the cavity may be accomplished by varying the incident wavelength k,
we have plotted the data versus k in the vicinity of resonance (633 nm). Note that
almost all the incident beam is transmitted through the etalon and that its
polarization rotation at resonance is close to 11 . (Even more rotation may be
obtained if higher-reflectivity mirrors are used.)
Absorption within the Faraday medium reduces the quality factor Q of the cavity
and, therefore, hampers its ability to enhance the Faraday rotation. If the same
medium as above is assumed to have an absorption coefficient k ¼ 104, the char-
acteristics of the etalon shown in Figure 15.9 will change to those in Figure 15.10.
1.0 (a) (b)

10
c
8

0.8
6
|tp|
0.6 4
2
0.4
0
–2 j
0.2
–4
|ts|
0.0
–6
631 632 633 634 635 631 632 633 634 635
(nm) (nm)
Figure 15.9 (a) Computed amplitude transmission coefficients and (b) the state
of transmitted polarization plotted versus k for a plane wave normally incident
on the etalon of Figure 15.8. The assumed direction of incident polarization is p.
In (a) the transmission coefficient ts is defined as the ratio of the transmitted
s-component to the incident p-component. In (b) the polarization rotation angle
w is relative to the direction of incident polarization.
Note that the transmitted power has dropped by more than 60% and that the peak
rotation angle is reduced by about 4 . The plot in Figure 15.10(c) of the magnitude of
the Poynting vector along the propagation path shows constant values in the mul-
tilayer mirrors but a rapid decline within the Faraday medium. Of the roughly 85% of
the optical power that enters the etalon, 46% gets absorbed in the Faraday rotator and
only 39% eventually passes out of the device. The small plateaux in the Poynting
vector plot of Figure 15.10(c) are caused by the standing-wave pattern of the E-field
within the Faraday medium: the absorption rate goes through minima and maxima
following the E-field intensity variations. In our example, where the Faraday
medium is 5k thick, there are exactly 10 such plateaux.
A simple analysis
We now present a simple derivation of the basic properties of the Fabry–Pérot
interferometer. Let us assume that the two mirrors are identical, with reflection
coefficients q and transmission coefficients s. If the light is incident from the
1.0 (a) (b) 1.0 (c)
10
8
0.8 0.8
6
c
0.6 4 0.6
2
0.4 |tp|
0 0.4
206
–2 j
0.2

0.2
Normal Component of Poynting Vector

–4
|ts|
0.0 –6
0.0
631 632 633 634 635 631 632 633 634 635 0 1500 3000 4500 6000
(nm) (nm) Z (nm)
Figure 15.10 (a) and (b) are the same as in Figure 15.9, except for the presence of a small absorption coefficient (k ¼ 104) in the
Faraday medium. The plot in (c) shows the magnitude of the Poynting vector as a function of position along the beam’s
propagation path. The flat parts of the curve indicate that optical energy passes unattenuated through the dielectric mirrors. The
steep, staircase-like drop in the curve is caused by absorption within the Faraday medium.
substrate side of the mirror, these coefficients will be denoted by q0 and s0 ,

respectively. For dielectric mirrors, in general, we have jqj ¼ jq0 j and jsj ¼ js0 j,
and there are simple relations among the corresponding phase factors, but these
are not needed here. Also, since there is no absorption in the mirrors, we have
jqj2 þ jsj2 ¼ 1.
Consider the case of a unit-amplitude beam normally incident on a Fabry–Pérot
etalon, such as that shown in Figure 15.3 but with h ¼ 0. Denote the gap-width by D,
and let there be two counter-propagating beams in the cavity, one with amplitude A
traveling to the right and the other with amplitude B traveling to the left. Since at the
second mirror there are no incoming beams from the outside, and since the beam
with amplitude A is reflected with coefficient q from this mirror, we must have
B ¼ qA exp(i2pD/k). At the first mirror, the beam with amplitude B is reflected once
again, and its amplitude becomes q2A exp(i4pD/k). Since the incident beam has unit
amplitude, its contribution to the field just inside the cavity is s0 . Therefore
A ¼ q2 A expði4pD=kÞ þ s0 ; ð15:1Þ
which yields
s0
A¼ : ð15:2Þ
1 q2 expði4pD=kÞ
Resonance occurs when the phase of q2 (if any) plus the phase acquired in a round
trip through the cavity, 4pD/k, becomes a multiple of 2p, at which point the
denominator in Eq. (15.2) will be at a minimum and the field amplitude A within
the cavity at a maximum.
The light transmitted through the device will have amplitude
t ¼ sA expði2pD=kÞ ð15:3Þ
and that reflected from the device will have amplitude

r ¼ q 0 þ sqA expði4pD=kÞ: ð15:4Þ
The same equations may be used at oblique incidence, provided that the gap-
width D is multiplied by cos h and that q, s, q0 , and s0 represent the corresponding
quantities at the particular angle of incidence. When the medium of the cavity
happens to be absorptive, the same type of analysis may still be used to arrive at
the relevant formulas.
We now demonstrate the application of the preceding equations to some of the
cases discussed earlier. In the case of the 10-layer stack, the mirror coefficients
were q ¼ q0 ¼ 0.844 and s ¼ s0 ¼ 0.536. From Eqs. (15.2)–(15.4) we find that,
at resonance, A ¼ 1.863, t ¼ 1 and r ¼ 0. In the case of the 20-layer stack,
q ¼ q0 ¼ 0.9905 and s ¼ s0 ¼ 0.1375, yielding at resonance A ¼ 7.27, t ¼ 1 and

r ¼ 0. For the component of polarization that sees the higher refractive index of the
uppermost layer, q acquires a phase angle of 0.9 , which is canceled when the gap-
width is reduced by 1.6 nm. This is exactly the peak shift observed in Figure 15.7(a).
The 21-layer stacks used with the Faraday rotator were symmetric, in the sense
that their substrate and their medium of incidence had the same refractive index,
n ¼ 1.5. For these mirrors q ¼ q0 ¼ 0.9964 and s ¼ s0 ¼ 0.0843i, yielding at res-
onance A ¼ 11.73i, t ¼ 1 and r ¼ 0. Since the refractive indices of the Faraday
medium for RCP and LCP light deviated from the nominal value by 0.0022%, a
similar change in wavelength was needed to re-establish the conditions of res-
onance for each state of circular polarization. At k ¼ 633 nm, however, both RCP
and LCP components of the incident beam were slightly off resonance. This,
according to Eq. (15.2), caused a large phase shift between the values of A for
RCP and LCP light, which translated into a large phase difference between the
transmitted RCP and LCP components, in accordance with Eq. (15.3). The
resulting Faraday rotation angle of the transmitted beam was a manifestation of
this phase difference. Similar arguments can be advanced to explain the conse-
quences of the absorption observed in Figure 15.10.
Note
In general, it is possible to eliminate from a stack non-absorbing layers whose
thicknesses are multiples of k/2. Now, if the gap happens to be an integer multiple
of k/2 its elimination will bring the top dielectric layers of the two mirrors into
contact. These layers, each being a quarter-wave thick, will combine into a half-
wave layer that can be subsequently eliminated, paving the way for the elimin-
ation of all the remaining layers in similar fashion. At the end, the two substrates
will come into direct contact, and the incident light will be fully transmitted, as is
expected of a well-tuned etalon.

1 C. Fabry and A. Pérot, Ann. Chim. Phys. (7) 16, p. 115 (1899).
2 R. W. Wood, Physical Optics, third edition, Optical Society of America,
Washington, 1988.
3 H. A. Macleod, Thin Film Optical Filters, second edition, Macmillan, New York,
1986.
4 S. C. Johnston and S. F. Jacobs, Some problems caused by birefringence in dielectric
mirrors, Appl. Opt. 25, 1878 (1986).
5 C. Wood, S. C. Bennett, J. L. Roberts, D. Cho, and C. E. Wieman, Birefringence,
mirrors, and parity violation, Opt. & Phot. News 7, 54 (1996).
16
The Ewald–Oseen extinction theorem
When a beam of light enters a material medium, it sets in motion the resident
electrons, whether these electrons are free or bound. The electronic oscillations
in turn give rise to electromagnetic radiation which, in the case of linear media,
possesses the frequency of the exciting beam. Because Maxwell’s equations are
linear, one expects the total field at any point in space to be the sum of the
original (exciting) field and the radiation produced by all the oscillating elec-
trons. However, in practice the original beam appears to be absent within the
medium, as though it had been replaced by a different beam, one having a
shorter wavelength and propagating in a different direction. The Ewald–Oseen
theorem1,2 resolves this paradox by showing how the oscillating electrons
conspire to produce a field that exactly cancels out the original beam every-
where inside the medium. The net field is indeed the sum of the incident beam
and the radiated field of the oscillating electrons, but the latter field completely
masks the former.3,4
Although the proof of the Ewald–Oseen theorem is fairly straightforward, it
involves complicated integrations over dipolar fields in three-dimensional space,
making it a brute-force drill in calculus and devoid of physical insight.5,6 It is
possible, however, to prove the theorem using plane waves interacting with thin
slabs of material, while invoking no physics beyond Fresnel’s reflection coeffi-
cients. (These coefficients, which date back to 1823, predate Maxwell’s equa-
tions.) The thin slabs represent sheets of electric dipoles, and the use of Fresnel’s
coefficients allows one to derive exact expressions for the electromagnetic field
radiated by these dipolar sheets. The integrations involved in this approach are
one-dimensional, and the underlying procedures are intuitively appealing to
practitioners of optics. The goal of the present chapter is to outline a general
proof of the Ewald–Oseen theorem using arguments that are based primarily on
thin-film optics.
209
Dielectric slab
Consider the transparent slab of dielectric material of thickness d and refractive
index n, shown in Figure 16.1. A normally incident plane wave of vacuum
wavelength k0 produces overall a reflected beam of amplitude r and a transmitted
beam of amplitude t. Both r and t are complex numbers in general, having a
magnitude and a phase angle. Using Fresnel’s coefficients at each facet of the slab
and accounting for multiple reflections, it is fairly straightforward to obtain
expressions for r and t. The reflection and transmission coefficients at the front
facet of the slab are5,7
q ¼ ð1 nÞ=ð1 þ nÞ; ð16:1Þ
s ¼ 2=ð1 þ nÞ: ð16:2Þ
At the rear facet the corresponding entities are

q0 ¼ ðn 1Þ=ðn þ 1Þ; ð16:3Þ
s0 ¼ 2n=ðn þ 1Þ: ð16:4Þ
A single path of the beam through the slab causes a phase shift w, where
w ¼ 2pnd=k0 : ð16:5Þ
Adding up all partial reflections at the front facet yields an expression for the
reflection coefficient r of the slab. Similarly, adding all partial transmissions at
d
r n
1 9exp(i)
9r9exp(i2)
9r92 exp(i3)
9r93 exp(i4)
9r94 exp(i5)
9r95 exp(i6)
Figure 16.1 A transparent slab of homogeneous material of thickness d and

refractive index n, on which is normally incident a monochromatic plane wave
of wavelength k0. The beam suffers multiple reflections at the two facets of the
slab. By adding the various reflected and transmitted amplitudes one obtains the
expressions for the total r and t given in Eqs. (16.6) and (16.7).
the rear facet yields the transmission coefficient t. Thus

X
1
ss0 q0 expði2wÞ
r ¼ q þ ss0 q0 expði2wÞ ½ q0 expðiwÞ2m ¼ q þ ð16:6Þ
m¼0
1 q02 expði2wÞ
X
1
ss0 expðiwÞ
t ¼ ss0 expðiwÞ ½ q0 expðiwÞ2m ¼ : ð16:7Þ
m¼0
1 q02 expði2wÞ
Rather than try to simplify these complicated functions of n, d and k0, we give
numerical results in Figure 16.2 for the specific case of n ¼ 2 and k0 ¼ 633 nm. The
magnitudes of r and t are shown in Figure 16.2(a), and their phase angles in Figure
16.2(b), both as functions of the thickness d of the slab. For any given value of d it is
possible to represent r and t as complex vectors (see Figure 16.3). Since the phase
difference between r and t is always 90 , these complex vectors are orthogonal to each
other. Also, the conservation of energy requires that jrj2 þ jtj2 ¼ 1. These observations
lead to the conclusion that the hypotenuse of the triangle in Figure 16.3 must have unit
length, that is jt rj ¼ 1, which is also confirme numerically in Figure 16.2(c).
Within the slab the incident beam sets the atomic dipoles in motion. These
dipoles in turn radiate plane waves in both the forward and the backward dir-
ections, as shown in Figure 16.4. When the slab is sufficiently thin, symmetry
requires forward- and backward-radiated waves to be identical, that is, they must
both have the same amplitude r. In the forward direction, however, the incident
beam continues to propagate unaltered, except for a phase-shift caused by
propagation in free-space through a distance d. Thus we must have
t ¼ r þ expði2pd=k0 Þ: ð16:8Þ
It was pointed out earlier in conjunction with the diagram of Figure 16.3 that t r
has unit amplitude, which is in agreement with Eq. (16.8). It is by no means
obvious, however, that the phase of t r must approach 2pd/k0 as d ! 0. Figure
16.2(c) shows computed plots of the phase of t r normalized by 2pd/k0. It is
seen that in the limit d ! 0 the normalized phase approaches unity as well. This
confirms that the slab radiates equally in the forward and backward directions,
and that the incident beam, having set the dipolar oscillations in motion, con-
tinues to propagate undisturbed in free space.
Radiation from a uniform sheet of oscillating dipoles

In the limit of small d Eq. (16.6) reduces to the following simple form:
r i½pðn2 1Þd=k0 exp½ipðn2 þ 1Þd=k0 d=k0 1: ð16:9Þ

1.0
(a)
|t|
0.8
0.6
Amplitude
|r|
0.4
0.2
0.0
0 25 50 75 100 125 150
270 (b)
Phase (degrees)
180 f (r)
f (t)
90
0
0 25 50 75 100 125 150
2.0 (c)
1.5 f (t – r) / (2d/ 0 )
Normalized (t– r)
1.0
|t – r|
0.5
0.0
0 25 50 75 100 125 150
Thickness d (nm)
Figure 16.2 Computed plots of r and t for a slab of thickness d and refractive
index n ¼ 2, when a plane wave with k0 ¼ 633 nm is normally incident on the
slab. The horizontal axis covers one cycle of variations in r and t, corresponding
to a half-wave thickness of the slab.
In this limit the radiated field is slightly more than 90 ahead of the incident field,
while its amplitude is proportional to d/k0 and also proportional to n21, the
latter being the coefficient of polarizability of the dielectric material. Note that
the small phase angle of r over and above its 90 phase, i.e., the exponential
d
n r
Imaginary
–r
l t
r t t–r
Real
Figure 16.3 A dielectric slab of thickness d and refractive index n, reflecting the
unit- amplitude incident beam with coefficient r while transmitting it with coef-
ficient t. The complex-plane diagram on the right shows the relative orientations of
r, t and their difference t r. For a non-absorbing slab (i.e., one with a real-valued
index n) r and t are orthogonal to each other, and t r has unit magnitude.
d
n
Oscillating
dipole
1 exp (i2d/0)
t
r r
Figure 16.4 Bound electrons within a very thin dielectric slab, when set in
motion by a normally incident plane wave of unit amplitude, radiate with
equal strength in both the forward and backward directions. The magnitude of
the radiated field is the reflection coefficient r of the slab. The incident beam
continues to propagate undisturbed as in free-space, acquiring a phase shift of
2pd/k0 upon crossing the slab. The sum of the incident beam and the forward-
propagating part of the radiated beam constitutes the transmitted beam.
factor in Eq. (16.9), is essential for the conservation of energy among the
incident, reflected, and transmitted beams (see Figure 16.4).
Equation (16.9) is in fact the exact solution of Maxwell’s equations for the
radiation field of a sheet of dipole oscillators. Although derived here as an aid in
proving the extinction theorem, it is an important result in its own right. Note, for
example, that the amplitude of the radiated field is proportional to 1/k0 even
though the field of an individual dipole radiator is known to be proportional to
1/k02. The coherent addition of amplitudes over the sheet of dipoles has thus
modified the wavelength dependence of the radiated field.3
Δz
1
z0 Z
Figure 16.5 A semi-infinite medium of refractive index n is illuminated by a

unit-amplitude plane wave at normal incidence. The medium may be considered
as a contiguous sequence of thin slabs, each radiating with equal strength in both
the forward and backward directions. Adding coherently the backward-radiated
fields yields the reflection coefficient at the front facet of the medium. Similarly,
the internal field at z ¼ z0 is obtained by coherent addition of the incident beam,
the forward-propagating radiations from the left side of z0, and the backward-
propagating radiations from the right side of z0.
The extinction theorem

Having derived Eq. (16.9) for the field radiated by a sheet of dipoles, we are now
in a position to outline the proof of the extinction theorem. Consider a semi-
infinite, homogeneous medium of refractive index n, bordering with free space at
z ¼ 0, as shown in Figure 16.5. A unit-magnitude plane wave of wavelength k0 is
directed at this medium at normal incidence from the left side. To determine the
reflected amplitude r at the interface, divide the medium into thin slabs of
thickness Dz, then add up (coherently) the reflected fields from each of these
slabs. Similarly, the field at an arbitrary plane z ¼ z0 inside the medium may be
computed by adding to the incident beam the contributions of the slabs located to
the left of z0 as well as those to the right of z0. The simplest way to proceed is by
assuming that the field inside the medium has the expected form, s exp(i2pnz/k0),
then showing self-consistency. These calculations involve simple one-dimensional
integrals, and are in fact so straightforward that there is no need to carry them out
here. The interested reader may take a few minutes to evaluate the integrals and
convince himself or herself of the validity of the theorem.
Slab of absorbing material

When the material of the slab is absorbing, similar arguments to those above may
be advanced to prove the Ewald–Oseen theorem, although the expressions for the
reflection and transmission coefficients become more complicated. Numerically,
however, it is still possible to describe the situation with great accuracy.
1.0 (a)
0.8 |r|
Amplitude
0.6
0.4
0.2 |t|
0.0
0 20 40 60 80 100
90 (b)
Phase (degrees)
f (t)
0
–90
f (r)
–180
0 20 40 60 80 100
2.0
(c)
Normalized (t–r)
1.5
|t – r |
1.0
0.5
f(t – r)/(2d/ 0 )
0.0
0 20 40 60 80 100
Thickness d (nm)
Figure 16.6 Computed plots of r and t for a slab of thickness d and complex
refractive index (n, k) ¼ (2,7), when a plane wave with k0 ¼ 633 nm is normally
incident on the slab. The horizontal axis covers the penetration depth of the material.
Figure 16.6(a) shows computed plots of r and t for a metal slab having
complex index n þ ik ¼ 2 þ i7. (Compare these plots with the corresponding plots
for the dielectric slab in Figure 16.2.) It is seen that the reflectance drops sharply
while the transmittance increases as the film thickness is reduced below about
20 nm. The phase plots in Figure 16.6(b) are quite different from those of the
dielectric slab, indicating a phase difference greater than 90 between r and t. A
complex-plane diagram for this type of material is given in Figure 16.7. The
angle between r and t being greater than 90 implies that jt rj2 > jtj2 þ jrj2,
while the conservation of energy requires that jtj2 þ jrj2 < 1 in the case of
Imaginary
t–r –r
t Real
Figure 16.7 A complex-plane diagram showing the reflection coefficient r,

transmission coefficient t, and their difference t r for a thin slab of an
absorbing material.
d
n
r t
S Oscillating
dipole
Figure 16.8 An s-polarized plane wave is obliquely incident at an angle h on a

dielectric slab of thickness d and index n. The electric dipoles of the slab
oscillate in a direction perpendicular to the plane of the diagram, radiating
identical fields in the forward and backward directions.
absorbing media. The fact that jt rj can approach unity is borne out by the
numerical results depicted in Figure 16.6(c). In the limit d ! 0, not only does the
magnitude of t r become unity but also its phase approaches 2pd/k0. Therefore,
in the limit of small d, the transmitted beam may be expressed as the sum of the
reflected beam and the phase-shifted incident beam, the phase shift being due to
free-space propagation over the distance d. This is all that one needs in order to
prove the extinction theorem for absorbing media.
Oblique incidence on a dielectric slab

Figure 16.8 shows an s-polarized plane wave at oblique incidence on a dielectric
slab of thickness d and index n. The oscillating dipoles are parallel to the
s-direction of polarization and radiate with equal magnitude in the forward and
backward directions. The computed plots of rs and ts versus d for the specific case
of k0 ¼ 633 nm, n ¼ 2, and h ¼ 50 are shown in Figure 16.9. The angle of
propagation inside the medium is obtained from Snell’s law as h0 ¼ 22.52 , and
the half-wave thickness of the slab is given by k0/(2n cos h0 ) ¼ 171.3 nm. These
curves are again very similar to those of Figure 16.2, showing a 90 phase
1.0
(a)
0.8 |ts|
Amplitude
0.6
0.4 |rs|
0.2
0.0
0 25 50 75 100 125 150 175
270 (b)
f (rs)
Phase (degrees)
180
f (ts)
90
0
0 25 50 75 100 125 150 175
3.0
(c)
2.5
Normalized (t s – r s )
2.0
f (t s – r s ) / ( 2d c o s / 0 )
1.5
1.0
|ts– rs|
0.5
0.0
0 25 50 75 100 125 150 175
Thickness d (nm)
Figure 16.9 Computed plots of r and t for a slab of thickness d and index
n ¼ 2, when a s-polarized plane wave with k0 ¼ 633 nm illuminates the slab at
h ¼ 50 . The horizontal axis covers one cycle of variations of r and t, corres-
ponding to a half-wave thickness of the slab at this particular angle of incidence.
d
n
r t

p
Oscillating
dipole
Figure 16.10 A p-polarized plane-wave is obliquely incident at an angle h on a

dielectric
00
slab of thickness d and index n. The oscillating dipoles make an angle
h with the surface of the slab, radiating with different amplitudes in the forward
and backward directions.
difference between rs and ts, unit magnitude for ts rs, and a phase for ts rs that
approaches 2p(d/k0)cos h as d ! 0. The Ewald–Oseen theorem for the case of
s-polarized light at oblique incidence can therefore be proven along the same
lines as described earlier for normal incidence.
The case of p-polarized light, depicted in Figure 16.10, is somewhat different,
however. Here the directionality of the dipole oscillations within the slab breaks
the symmetry between the forward- and backward-radiated beams. The angle h00
between the direction of oscillation of the dipoles and the plane of the slab may
be determined by considering multiple reflections within the slab. For very thin
slabs, it is possible to show that
tan h00 ¼ ð1=n2 Þ tan h: ð16:10Þ
Note that at Brewster’s angle, where tan h ¼ n, we have tan h00 ¼ 1/n, that is,
h00 ¼ h0 , where h0 is the propagation angle within the medium as given by Snell’s
law. At angles below the Brewster angle h00 < h0 , while above the Brewster
angle h00 > h0 .
For the case of p-polarized light of wavelength k0 ¼ 633 nm incident at h ¼ 50
on a slab of index n ¼ 2, plots of r and t versus the slab thickness d are shown in
Figure 16.11. Although the magnitude of tp rp can still be shown to be unity, its
phase does not approach 2p(d/k0)cos h as d ! 0. This is a manifestation of the
breakdown of symmetry between the forward and backward radiations. If the
magnitudes of the beams radiated in the two directions are taken into account,
however, the preceding arguments can be restored. One may readily observe from
1.0
(a) |t p |
0.8
Amplitude
0.6
0.4 |r p |
0.2
0.0
0 25 50 75 100 125 150 175
270 (b)
Phase (degrees)
f (r p )
180
f (t p )
90
0
0 25 50 75 100 125 150 175
3.0
(c)
Normalized (t p – wr p )
2.5
f(t p – wr p )/(2d cos / 0 )
2.0
1.5
1.0 |t p – w r p |
0.5
0.0
0 25 50 75 100 125 150 175
Thickness d (nm)
Figure 16.11 Computed plots of r and t for a slab of thickness d and index
n ¼ 2, when a p-polarized plane wave with k0 ¼ 633 nm illuminates the slab at
h ¼ 50 . The horizontal axis covers one cycle of variations in r and t, corres-
ponding to a half-wave thickness of the slab at this particular angle of incidence.
Figure 16.10 that the ratio of the forward- and backward-propagating magnitudes
must be given by
WðhÞ ¼ cosðh h00 Þ= cosðh þ h00 Þ: ð16:11Þ
Therefore, for p-polarized light at oblique incidence, it is t Wr that approaches

exp(i2pd cos h/k0) as d ! 0. This is seen to be verified in Figure 16.11(c).
1.0 0= 633 nm, d = 10 nm, n = 2.00
0.5
rp
tp– exp(i2d cos/0)
1/ W
0.0
–0.5
–1.0
0 15 30 45 60 75 90
Angle of incidence (degrees)
Figure 16.12 Computed ratio of the amplitudes of backward-propagating

radiation and forward-propagating radiation for a dielectric slab 10 nm thick and
with n ¼ 2. A p-polarized plane wave with k0 ¼ 633 nm is assumed to be
obliquely incident on the slab at an angle h.
As a further test of Eq. (16.11), we show in Figure 16.12 the computed plot
versus h of rp/[tp exp(i2pd cos h/k0)] for a slab with d ¼ 10 nm and n ¼ 2, illu-
minated by a plane wave with k0 ¼ 633 nm. This curve overlaps the plot of the
function 1/W(h) exactly. Taking into account the ratio W(h) between the forward
and backward radiated beams, one can prove the Ewald–Oseen theorem as before.
Appendix
This chapter, when originally published in Optics & Photonics News,
prompted the following criticism and reply.
“Editor:
While we are pleased that Masud Mansuripur has called attention in OPN to the
rather basic Ewald–Oseen extinction theorem, we wish to take issue with certain
parts of his article.1
“Mansuripur states that the goal of his article is ‘to outline a general proof of
the Ewald–Oseen theorem using arguments that are based primarily on thin-
film optics.’ We wish to note first that the proof he outlines, based on the
field produced by a uniform sheet of dipole oscillators and the assumed form
exp[2pinz/k0] for the field inside the medium, is essentially the same approach
used by Fearn, James, and Milonni.2 Their proof is more general in that Fresnel
coefficients (for normal incidence) are derived rather than assumed. Indeed, the
derivation of the Fresnel coefficients assumes the extinction of the incident field
inside the dielectric medium: Mansuripur’s starting point implicitly assumes the
very theorem he is trying to prove! In this connection we note that it was not
claimed by Fearn et al. that they provided a ‘general proof ’ of the extinction
theorem. A general proof, valid for media bounded by surfaces of arbitrary shape,
is given by Born and Wolf.3
“Mansuripur cites References 2 and 3 in support of his opinion that the proof
of the extinction theorem is ‘devoid of physical insight’. While it is true that the
proofs given in these references involve ‘complicated integration over dipolar
fields in three-dimensional space,’ we do not think it is fair to say it [the proof] is
devoid of physical insight. In Reference 3, page 101, the significance of the
theorem is described in the following manner that could hardly be more phys-
ical: ‘The incident wave may . . . be regarded as extinguished at any point
within the medium by interference with the dipole field and replaced by another
wave with a different velocity (and generally also a different direction) of
propagation.’
“Finally we note that various features of the extinction theorem have been
interpreted differently by various authors: some of these differences have been
discussed by Fearn et al.2 It would be unfortunate if readers of Mansuripur’s
article were left with the impression that the theorem can somehow be based
‘primarily on thin-film optics.’
1 M. Mansuripur, The Ewald–Oseen extinction theorem, Opt. & Phot. News 9 (8),
50–55 (1998).
2 H. Fearn et al., Microscopic approach to reflection, transmission, and the Ewald–
Oseen extinction theorem, Am. J. Phy. 64, 986–995 (1996).
3 M. Born and E. Wolf, Principles of Optics, sixth edition, Cambridge University Press,
Cambridge UK 1985, section 2.4.2.
Daniel James and Peter W. Milonni, Los Alamos National Laboratory,

Los Alamos NM
Heidi Fearn, California State University at Fullerton, Fullerton CA
Emil Wolf, University of Rochester, Rochester NY”
The author replied:

“It is puzzling that Fearn et al. consider starting from Fresnel’s reflection coef-
ficients a shortcoming of my method of proof. The Fresnel coefficients can be
derived directly from Maxwell’s equations without invoking the extinction

theorem, they are available in many textbooks (including Born and Wolf, sixth
edition, pp. 38–41), and their derivation from first principles does not in any way
add to the value of a paper. I used Fresnel’s coefficients to derive the radiation
field for a sheet of dipoles (Equation 9 of my article), as this is a simple, accurate,
and intuitive way of calculating the field, and also because its underlying
principle is familiar to many practitioners of optics. Alternatively, one could
derive the radiation field by integrating over individual dipoles within the sheet,
as is done, for example, in The Feynman Lectures on Physics (my reference 3).
After this step that establishes the radiation field from a dipolar sheet, the method
of proof that I proposed (based on demonstrating self-consistency) is similar to
that of Fearn et al.
“Although Fresnel’s coefficients are derived from Maxwell’s equations,
nowhere in the standard derivation is it assumed that the incident beam is still
present within the medium (albeit masked by the dipole radiations). Had the
Ewald–Oseen theorem been somehow implicit in the standard derivation of
Fresnel’s coefficients, there would have been no need for the paper of Fearn et al.
in the first place.
“I strongly disagree with the suggestion that the use of Fresnel’s coefficients
somehow renders my proof of the Ewald–Oseen theorem circular. I also dispute
the assertion made by Fearn et al. that ‘it would be unfortunate if readers . . .
were left with the impression that the theorem can somehow be based primarily
on thin film optics.’ Emphatically, the proof of the theorem can be based on thin
film optics (this is exactly what I showed in the article), and it is far from
‘unfortunate’ indeed when a valid proof happens to be based on a simple physical
picture.
“I erred in stating that I was going to ‘outline a general proof of the . . .
theorem.’ Mine was a general proof for the one-dimensional case, where the
beam enters from free space through a plane boundary into an isotropic, homo-
geneous medium. My proof is more general than the proof of Fearn et al., in that
it covers both transparent and absorbing media, and also in that it considers the
case of oblique incidence with p and s polarized light. The method described in
Born and Wolf is obviously more general than both, because it applies to arbitrary
boundaries. None of the above methods, however, is sufficiently general to
embrace inhomogeneous, anisotropic, and optically active media, for which the
theorem is presumably valid as well.
“Finally, my expressed opinion regarding the proof of the extinction theorem
being ‘devoid of physical insight’ was meant as a commentary on the nature of
the method, not as a reflection on the authors of the cited references. Ultimately,
of course, such judgments are subjective and are best left to the readers.”

1 P. P. Ewald, On the foundations of crystal optics, Air Force Cambridge Research
Laboratories Report AFCRL-70-0580, Cambridge MA (1970). This is a translation
by L. M. Hollingsworth of Ewald’s 1912 dissertation at the University of Munich.
2 C. W. Oseen, Über die Wechselwirkung Zwischen Zwei elektischen Dipolen der
Polarisationsebene in Kristallen und Flüssigkeiten, Ann. Phys. 48, 1–56 (1915).
3 R. P. Feynman, R. B. Leighton, and M. Sands, The Feynman Lectures on Physics,
chapters 30 and 31, Addison-Wesley, Reading, Massachusetts, 1963.
4 V. Weisskopf, How light interacts with matter, in Lasers and Light, Readings from
Scientific American, W. H. Freeman, San Francisco, 1969.
1980.
6 H. Fearn, D. F. V. James, and P. W. Milloni, Microscopic approach to reflection,
transmission, and the Ewald–Oseen extinction theorem, Am. J. Phys. 64, 986–995
(1996).
1986.
17
Reciprocity in classical linear optics
An informal survey of some colleagues and students revealed that the notion of
reciprocity in optics is not widely appreciated. One colleague even justified the
prevailing ignorance by drawing a parallel between reciprocity in optics and
complementarity in quantum mechanics: “Both are true statements which have
little, if any, practical value in their respective domains.” This chapter is an
attempt at explaining the concept of reciprocity, clarifying some associated
misconceptions, and pointing out its practical applications.
Non-reciprocity of Faraday rotators

No one disputes that a Faraday rotator is a non-reciprocal element. The usual
argument goes as follows. Let a linearly polarized beam of light be fully trans-
mitted through a polarizing beam-splitter (PBS) before being directed through a
45 Faraday rotator, as shown in Figure 17.1. If the beam is reflected back (by an
ordinary mirror, for example), it retraces its path through the rotator and emerges
with its polarization vector rotated by a full 90 . At the PBS, therefore, the
returning beam will be deflected away from its original path. (This, in fact, is a
well-known method of isolating laser diodes from spurious reflections within a
given system.) Since the reflected light does not return on its original path, and
since the PBS is believed to be reciprocal, the argument is taken as proof of the
non-reciprocity of the Faraday rotator.
Although it is true that Faraday rotators are non-reciprocal, there is a flaw in
the above argument, which will become clear upon inspection of the system of
Figure 17.2. In this system, which is similar to that of Figure 17.1, the Faraday
rotator is replaced by a quarter-wave plate (QWP). The fast and slow axes of the
plate are oriented at 45 to the direction of incident polarization, so that the light
emerging from the plate in the forward path is circularly polarized. (The system
of Figure 17.2 is used in some optical disk drives with the optical disk acting as a
224
Polarizing 45° Faraday
beam-splitter rotator Mirror
P
P
Figure 17.1 A Faraday rotator as used in an optical isolator. The incident p-

polarized beam, having undergone two consecutive 45 rotations in its forward
and backward paths through the rotator, becomes s-polarized, enabling the PBS
to divert it away from its original direction.
Polarizing Quarter-wave
beam-splitter plate Mirror
P P RCP
S LCP
S
Figure 17.2 The quarter-wave plate as used in this system helps to separate the
reflected beam from the incident beam. The key contribution is made by the
(conventional) mirror, which converts the incident RCP beam into LCP upon
reflection.
mirror, the purpose being to separate the reflected beam from the incident beam
efficiently, as well as to isolate the laser diode.) Although the system of Figure
17.2 behaves very much like that of Figure 17.1, no one claims that a QWP is
non-reciprocal. This seeming paradox can be resolved after a careful examination
of the concept of reciprocity, to which we now turn.
Is a polarizer reciprocal?
Consider the simple linear polarizer shown in Figure 17.3. A collimated beam of
light entering from the left-hand side emerges from the polarizer linearly polarized
along the transmission axis. The polarization state of the incident beam may be
decomposed into two linear components, one parallel and the other perpendicular to
P P
Transmission
axis
Figure 17.3 An ideal polarizer has a well-defined transmission axis (shown by

the vertical double-ended arrow). The component of the incident beam that is
polarized along the transmission axis goes through, while the component per-
pendicular to this axis is fully absorbed within the polarizer.
Plano-convex
Lens
Z
F
Figure 17.4 A simple plano-convex lens behaves differently depending on

whether the light is incident from its plane side or from its convex side. The
particular plano-convex lens used in the simulations had refractive index n ¼ 1.5
at k0 ¼ 633 nm, thickness ¼ 5 mm, radius of curvature ¼ 10 mm, and clear-
aperture diameter ¼ 10 mm. The 5 mm diameter incident beam was collimated
and uniform. When the beam enters at the convex facet, the best focus (i.e., the
circle of least confusion) appears at a distance of 16.49 mm in front of the plane
facet. With the lens flipped and the beam entering the plane facet, the best focus
occurs at 19.29 mm in front of the lens.
the transmission axis. Assuming an ideal polarizer, the entire parallel component is
transmitted while the entire perpendicular component is absorbed within the
polarizer. If the direction of propagation of the transmitted beam is reversed, it will
pass through the polarizer without any change. Since the original state of polarization
of the incident beam is not recovered, the polarizer is a non-reciprocal element.
One might argue that in one sense the polarizer is reciprocal because, irre-
spective of whether the incident beam illuminates it from the left or from the right
side, it behaves the same way. However, this turns out to be a poor way to define
reciprocity, because it cannot be generalized to cover other optical elements. For
example, consider the simple plano-convex lens shown in Figure 17.4. As will be
shown below, lenses in general are reciprocal elements. However, a collimated
beam of light shining on the convex surface of this lens comes to focus with
less spherical aberration than a beam shining on its flat surface (see Figures 17.5
and 17.6). Therefore, if reciprocity required the identity of behavior from both
sides of an element, one would end up with the undesirable result that a plano-
convex lens, for example, is non-reciprocal. To avoid this outcome we return to
our earlier definition that the beam transmitted through a reciprocal element, when
“properly” reversed, must recreate the incident beam in the reverse direction. It is
in this sense that the polarizer of Figure 17.3 is non-reciprocal.
Are lenses reciprocal in the above sense?

Consider an aberration-free lens that brings a collimated beam of light to focus as
in Figure 17.7(a). A flat mirror placed in the focal plane of the lens reflects the
a b
–2.75 x (mm) 2.75 –2.75 x (mm) 2.75

c d
–12 x (μm) 12 –24 x (μm) 24
Figure 17.5 Plots of intensity and phase corresponding to the plano-convex lens
of Figure 17.4, illuminated by a collimated and uniform beam from the convex
side. (a) Intensity distribution immediately after the beam leaves the plane facet of
the lens. (b) Residual phase distribution immediately after the beam leaves the
plane facet. The curvature of the beam has been removed from the phase distri-
bution, leaving only the residual spherical aberration balanced by a small amount
of defocus. The r.m.s. value of these residual aberrations over the entire aperture
is 0.17k0. (c) Intensity distribution in the plane of best focus, i.e., at the circle of
least confusion. (d) Same as (c) but on a logarithmic scale and over a larger area.
a b
–2.75 x (mm) 2.75 –2.75 x (mm) 2.75
c d
–12 x (μm) 12 –24 x (μm) 24
Figure 17.6 Same as Figure 17.5 for the case when the beam enters from the
plane side of the lens. (a) Emergent intensity distribution at a plane tangent to the
convex facet of the lens at its vertex. Note the larger diameter of the emergent
beam compared with Figure 17.5(a). (b) Residual phase of the emergent beam
within the tangent plane to the convex surface. The r.m.s. wavefront aberration
over the entire aperture is 0.68k0. (c) Distribution of intensity in the plane of best
focus. (d) Same as (c) but on a logarithmic scale and over a larger area.
beam back towards the lens. Upon re-emerging from the lens the beam, now
collimated once again, propagates in the reverse direction of the original incident
beam. Is this sufficient proof that the lens is reciprocal? The answer is no, for the
following reasons. What if the lens has aberrations? What if the incident beam is
only illuminating one half of the lens’s aperture, as in Figure 17.7(b)? What if the
mirror is displaced from the focal plane of the lens, as in Figure 17.7(c)? In all
these examples (and many more that can be conceived) the returning beam does
not retrace the path of the incident beam. Does this mean that the lens is non-
reciprocal? Again the answer is no. The culprit in all these examples is the mirror,
which does not “properly” reverse the path of the beam.
What we need in place of the conventional mirror is a phase-conjugate mirror1
(PCM) to reverse the wavefront properly. Suppose a PCM is placed perpendicular
to the Z-axis at z ¼ z0. If the complex-amplitude distribution incident on the PCM is
denoted by A(x, y, z0), then the reflected wavefront at the plane of the mirror will be
(a) Lens Mirror
(b) Lens Mirror
(c)
Lens Mirror
Z
F
Figure 17.7 (a) A collimated and uniform beam, focused by an aberration-free

lens and reflected by a plane mirror placed at the focal plane of the lens, retraces
its path through the lens. (b) A collimated beam entering the upper half of the lens
aperture and reflected from the mirror surface does not return on itself, but
emerges from the lower half of the lens aperture. (c) When the mirror is displaced
from the focal plane, the returning beam no longer retraces the incidence path.
A*(x, y, z0), which propagates along the negative Z-axis and completely retraces the
incidence path. Substituting the ordinary mirror by a PCM in Figure 17.7 ensures
that the beam is properly reversed in each case, and proves beyond any doubt that
lenses are reciprocal.
The quarter-wave plate

Returning now to the system of Figure 17.2, we re-examine the question of
reciprocity of the QWP. Once again the mirror is recognized as the culprit: upon
reflection from an ordinary mirror, a right circularly polarized (RCP) beam
becomes left circularly polarized (LCP) and vice versa. The result is that the
QWP in Figure 17.2 rotates the polarization of the beam by 90 in double pass,
forcing it to change its propagation direction at the PBS. If the mirror is replaced
by a PCM, the sense of circular polarization does not change upon reflection, and
the beam emerges from the QWP with the same linear polarization as it had when
it first entered the plate. The returning beam thus retraces its path, proving the
reciprocity of the QWP.
The question arises as to what happens in the system of Figure 17.1 if the
mirror is replaced by a PCM? Since the beam incident on the mirror is linearly
polarized, it remains linear whether it is reflected from an ordinary mirror or from
a PCM. Therefore, the path of the reflected light in Figure 17.1 does not change as
a result of changing the mirror, confirming our earlier conclusion that the Faraday
rotator is non-reciprocal.
Reciprocity of conventional mirrors

In general, lossy elements are non-reciprocal by the above definition of reci-
procity. The beam going through (or reflecting from) a lossy device becomes
attenuated. Reversing the beam by a PCM reverses the propagation direction, but
does not recover the losses incurred. A second path through the lossy element
attenuates the beam even further. Thus the returning beam is twice attenuated,
which means that it differs from the original incident beam, if not in its direction
of propagation or phase or polarization state, at least in its amplitude. With the
strict definition of reciprocity, which requires the beam to be fully recovered in
the reverse path, attenuation is sufficient grounds for declaring lossy elements
non-reciprocal.
A conventional mirror, such as a polished metallic surface, is lossy and
therefore non-reciprocal. But consider a total internal reflection (TIR) device
such as that shown in Figure 17.8. Here there are no losses and the only effect of
the mirror on the incident beam is a change in its state of polarization.
Suppose that the p and s components of the incident beam have complex ampli-
tudes ap exp(ip) and as exp(is), respectively. Upon reflection from the TIR mirror
these components retain their amplitudes but acquire different phases; the first one
becomes ap exp[i(p þ wp)], say, and the second becomes as exp[i(s þ ws)]. If the
direction of propagation of the beam is reversed by means of a conventional mirror,
the phase angles wp and ws do not disappear from the returning beam; rather, they
become twice as large. However, by now we have learned that a conventional
mirror is not the proper device for reversing the beam. Instead, one must use a
PCM to phase-conjugate the beam and launch it on its way back. When placed in
the system of Figure 17.8, the PCM will return the two components of polarization
MIRROR
TIR Prism
Figure 17.8 A collimated, uniform, and polarized beam of light is reflected

from the rear facet of a TIR prism. If the beam is returned via a conventional
mirror, the emergent beam will not, in general, have the same state of polari-
zation as the incident beam. Use of a PCM mirror, however, ensures not only
that the beam retraces its path but also that it will have the same state of
polarization at any point along the path.
as ap exp[i(p þ wp)] and as exp[i(s þ ws)]. The second reflection from the
TIR mirror eliminates the acquired phases wp and ws and returns the conjugate of
the original incident beam, which is exactly what is needed. A TIR mirror,
therefore, is a reciprocal element.
A regular beam-splitter
There are many different ways of constructing a beam-splitter. For simplicity’s sake,
let us consider the specific beam-splitter shown in Figure 17.9. This flat piece of glass
of thickness d and refractive index n has no coating layers and is used at a 45 angle
of incidence. If the reflected and transmitted beams are returned by conventional
mirrors, as shown in the figure, then, in general, a certain fraction of the light returns
along the incidence path and the remainder leaves the beam-splitter along a fourth
direction. However, if the conventional mirrors in Figure 17.9 are replaced by PCMs,
the entire beam will retrace its original path.
To see this we must first examine certain properties of the glass slab that forms
the beam-splitter. Figure 17.10 shows computed plots of the reflection and trans-
mission coefficients versus the thickness d of the slab. The assumed refractive
index is n ¼ 2, the angle of incidence is fixed at h ¼ 45 , and the incident beam is
a coherent and monochromatic beam from a red HeNe laser (k0 ¼ 633 nm).
Only the range of thicknesses corresponding to one half-wavelength is shown in
Mirror
Beam-splitter
Mirror
Figure 17.9 A parallel plate made of a glass slab of thickness d and refractive
index n used as a beam-splitter. The collimated and uniform incident beam is
partially reflected and partially transmitted at the slab. If conventional mirrors are
used to return the reflected and transmitted beams back to the beam-splitter, in
general a fraction of the beam will go back towards the source but the remainder will
leave the beam-splitter in a fourth direction. However, if the mirrors are replaced by
phase-conjugate mirrors, the entire beam will return along the incidence path.
Figure 17.10, since the reflection and transmission coefficients are periodic with
this period. The half-wave thickness of the slab is d ¼ k0 /(2n cos h0 ) ¼ 169.2 nm.
Here h 0 ¼ 20.7 , obtained from Snell’s law, is the angle between the propagation
direction within the slab and the slab’s surface normal. The reflection and trans-
mission coefficients for both p- and s-polarized light are shown in the figure.
Note in Figure 17.10 that, at any given thickness, jrj2 þ jtj2 ¼ 1 and r t ¼ 90 .
In fact, it may be shown that these two properties of the slab are quite general and
hold not only for all thicknesses but also for all values of the refractive index n,
angle of incidence h, and wavelength k0. The first identity is a trivial statement of
the principle of conservation of energy. The second, relating the phase angles of the
reflected and transmitted beams, is more subtle, but its violation also results in non-
conservation of energy, as we shall see shortly.
When the transmitted beam returns to the slab via a PCM it will have an
amplitude t*. Upon transmission (in the reverse direction) its amplitude becomes
tt*; it will then combine with the reversed reflected beam whose amplitude at this
point is rr*. The total returning amplitude is therefore rr* þ tt* ¼ jrj2 þ jtj2 ¼ 1.
The remainder of the beam, leaving the beam-splitter in the fourth direction, will
have a total amplitude rt* þ r*t ¼ 2jrtj cos(r t), which is exactly zero because
the phase difference between r and t is 90 . Thus the beams reversed by the two
PCMs combine at the beam-splitter to yield the reverse propagating beam along
the original path, leaving no other light to go in the fourth direction.
270 (b)
1.0 (a)
|tp| 225
0.8 frp
Phase (degrees)
180
Amplitude
0.6
135 ftp
0.4
|rp| 90
0.2
45
0.0 0
0 25 50 75 100 125 150 175 0 25 50 75 100 125 150 175
270
1.0 (c) (d)
225
|ts| frs
0.8
Phase (degrees)
180
Amplitude
0.6
135 fts
0.4 |rs|
90
0.2 45
0.0 0
0 25 50 75 100 125 150 175 0 25 50 75 100 125 150 175
Thickness (nm) Thickness (nm)
Figure 17.10 Computed plots of reflection and transmission coefficients

versus slab thickness for the parallel-plate beam-splitter of Figure 17.9. The
assumed refractive index of the glass material is n ¼ 2, and the angle of inci-
dence is fixed at h ¼ 45 . The incident beam is a coherent and monochromatic
beam from a red HeNe laser (k0 ¼ 633 nm), and it is assumed to be linearly
polarized either along the p- or the s-direction. The phase angles are evaluated
at the front facet of the slab for the reflection coefficients and at the rear facet for
the transmission coefficients. The reference phase angle is that of the incident
beam at the front facet.
Although the above proof for reciprocity of the glass slab was given for plane
waves, one can show its validity in the general case of a finite-size incident beam as
well. To appreciate the effects of finite size, consider the plots of intensity distri-
bution in Figure 17.11, computed for a HeNe beam of diameter 2000k0 upon
reflection from and transmission through a slab 500 lm thick of n ¼ 2 glass. Near
the edges of the beam the various reflected (or transmitted) orders do not overlap
and, consequently, give rise to varying degrees of brightness in these regions.
Instead of considering the edges separately, however, the appropriate proof of
reciprocity for a finite-size beam involves the consideration of such beams as a
superposition of a large number of plane waves traveling in different directions
(i.e., angular spectrum decomposition). Since the reciprocity applies to each such
plane wave, it must, of necessity, also apply to their linear superposition.
a b
c d
–1500 x/ 1500 –1500 x/ 1500
Figure 17.11 Plots of intensity distribution upon reflection or transmission of

a collimated uniform beam from the beam-splitter of Figure 17.9. (Due to the
limited range of the gray-scale, certain weak parts of the distributions are not
visible.) The incident beam diameter is 2000k0, where k0 ¼ 633 nm. The beam-
splitter, oriented at 45 to the propagation direction of the incident beam, has
n ¼ 2 and d ¼ 500 lm. (a) Logarithmic plot of the reflected intensity distribu-
tion for p-polarized incident beam. Since reflection from each surface is weak,
only the first- and second-order reflected beams are observed. (b) Transmitted
intensity distribution for p-polarized incident beam. (c) Logarithmic plot of the
reflected intensity distribution for s-polarized incident beam. Since reflection
from each surface is strong, the effect of the third-order reflection can also be
seen in this figure. (d) Transmitted intensity distribution for s-polarized inci-
dent beam.
Reciprocity and Maxwell’s equations

The principle of reciprocity in classical linear optics is rooted in the fact that
electromagnetic waves obey Maxwell’s equations and that these equations admit
reciprocal solutions. Consider a distribution of electromagnetic waves in a region
of space occupied by matter represented by the dielectric tensor e(x, y, z). Assume
that the fields oscillate harmonically at a given frequency x, and that the time-
dependence factor exp(ixt) has been eliminated from Maxwell’s equations.2
Suppose now that the propagation direction is reversed everywhere, so that any
plane-wave component of the field that was propagating along a given k-vector is
now propagating along the negative direction of that same k-vector. If we replace
the E-fields by E* and the H-fields by H* everywhere, Maxwell’s equations
remain satisfied so long as the dielectric tensor of the material environment obeys
the relation e ¼ e* at all points of space. This latter relation holds, for example, if
the medium is isotropic and lossless (i.e., e is a real-valued scalar), or if the
medium is birefringent but non-absorptive (i.e., e is a real-valued symmetric
matrix), or if the medium has optical activity of the type observed in sugar
crystals. If, however, the medium is absorptive, or if it has magneto-optical
activity such as that exhibited by a Faraday rotator, then e 6¼e*, in which case the
reverse-propagating beam(s) violate Maxwell’s equations and, consequently,
reciprocity breaks down.
Multilayer dielectric stack

The power of the reciprocity principle may be demonstrated by the following
analysis of a multilayer dielectric stack. Adopting the approach pioneered by Sir
George Gabriel Stokes (1819–1903)3 we prove that any stack consisting of an
arbitrary number of dielectric (i.e., non-absorbing) layers exhibits symmetric
behavior between its front facet and rear facet reflectivity (or transmissivity). To
prove this statement consider returning both the reflected beam and the trans-
mitted beam back to the stack via two PCMs, as shown in Figure 17.12. Denoting
the front facet reflection and transmission coefficients by r and t, and the cor-
responding rear facet coefficients by r0 and t0 we must have, by reciprocity, the
following identities:
rr
þ t0 t
¼ 1; ð17:1Þ
tr
þ r 0 t
¼ 0: ð17:2Þ
Equation 17.1, in conjunction with the principle of conservation of energy, yields

t ¼ t0 , proving that the complex transmission coefficient is the same from both
facets of the stack. From Eq. (17.2) one obtains r0 ¼ tr*/t*, which proves that the
amplitude of the reflection coefficient is the same from the two facets, that is,
jrj ¼ jr0 j. As for the phase angles we have:
1
ð þ 0r Þ ¼ t 90 : ð17:3Þ
2 r
These relations are readily verified for the specific quadrilayer stack whose
performance characteristics are depicted in Figure 17.13. Needless to say, the
symmetry of reflection and transmission from the two facets of a multilayer
PCM
rr *
r
tt * r*
t
rt* tr *
t*
PCM
Figure 17.12 Multilayer stack consisting of an arbitrary number of dielectric

layers. A unit-amplitude beam is partially reflected and partially transmitted at
the top facet of the stack. If the reflected and transmitted beams are returned to
the stack via phase-conjugate mirrors (PCMs), the principle of reciprocity
requires that the beam must retrace its path. Thus the total amplitude along the
reverse incidence direction must be unity and the total amplitude emerging from
the bottom facet of the stack must be zero.
stack applies quite generally unless one or more layers are absorptive or
magneto-optically active. In fact, the media of incidence and emergence on the
two sides of the stack do not have to be identical either. Using the method of
proof outlined above, one can readily show that the behavior of dielectric stacks
remains symmetrical even when the media above and below the stack have
arbitrary refractive indices n1 and n2, provided that proper account is made of
the difference in beam cross-section and the dependence of power on the
refractive index.
Another interesting property of multilayer stacks arises when one or more of
the layers happen to be absorptive. Since reciprocity no longer applies to this
case, it should come as no surprise that the reflectivities of the two sides of the
stack are, in general, different. What is surprising is that, even in the presence of
absorption, the transmissivity continues to be the same from both sides. This
property can be proven using standard methods of thin-film-stack calculation4
and has been verified numerically in several situations. A simple proof for the
symmetric behavior of the transmissivity under quite general conditions is given
in the following appendix.
1.00
0.75
Amplitude
|rs| |tp|
0.50 |ts|
0.25
|rp|
0.00 (a)
0 15 30 45 60 75 90
200
ftp = ftp
Phase (degrees)
100
frp
frp
(b)
–100
0 15 30 45 60 75 90
400
frs
Phase (degrees)
300
frs
200
fts = fts
100 (c)
0 15 30 45 60 75 90
Angle of Incidence (degrees)
Figure 17.13 Computed plots of reflection and transmission coefficients

versus the angle of incidence for a quadrilayer dielectric stack surrounded by
free space. The layer thickness d and refractive index n for consecutive layers
starting at the top of the stack are as follows: 140 nm, 2.2; 200 nm, 1.8; 80 nm,
2.0; 100 nm, 1.5. The magnitudes of the various reflection and transmission
coefficients shown in (a) are the same whether the beam is incident from
the top side or from the bottom side of the stack. The phase angles of the
transmission coefficients, tp and ts, are also the same for top and bottom
incidence. The phase angles of the reflection coefficients, however, depend on
the side of the stack at which the beam is directed. In (b) and (c) rp and rs are
the phase angles for p- and s-reflectivities when the beam is incident from the
top of the stack. The corresponding primed quantities refer to incidence from
the bottom.
Appendix
We prove that the Fresnel transmission coefficient t for a multilayer stack con-
sisting of metal and dielectric layers does not depend on whether the light is
incident from the top or the bottom of the stack. For stacks consisting solely of
dielectric layers this property has been proved in the present chapter, using
reciprocity. Reciprocity, however, breaks down in the presence of absorptive
layers, and one needs to resort to an alternative method of proof, such as that
outlined below.
A general stack consists of an arbitrary number of layers, each having thick-
ness dj and complex refractive index (n þ ik)j, the subscript j referring to the layer
number. For an incident plane wave of wavelength k, arriving at the top of the
stack at angle h, the Fresnel reflection and transmission coefficients of the stack
are denoted by r and t, respectively. Similarly, when the beam is incident from
the bottom side on the stack (again at angle h), the Fresnel coefficients are
denoted r 0 and t 0 . Our goal is to demonstrate the equality of t and t 0 , even though,
in general, r and r 0 may differ from each other.
Consider the hypothetical situation shown in Figure A17.1, where the stack is
split along an interfacial plane into two smaller stacks separated by an air gap d.
The upper stack, identified as stack 1, has reflection and transmission coefficients
from top and bottom denoted by r1, t1, r10 , t10 . Similarly, the corresponding par-
ameters of the lower stack, stack 2, are r2 , t2 , r20 , t20 . The transmissivity t of the
entire stack (in the presence of the air gap) can be obtained by adding an infinite
number of terms corresponding to the beams bouncing back and forth in the gap,
namely,
t ¼ t1 t2 expðiÞ þ t1 r2 r10 t2 expði3Þ þ t1 r22 r10 2 t2 expði5Þ þ

¼ t1 t2 expðiÞ=½1 r10 r2 expði2Þ: ðA17:1Þ
Here ¼ 2pd cos h/k is the phase delay due to one passage of the beam through
the gap. In the limit of a vanishing gap (i.e., d ! 0) we find a simple expression
for t in terms of the parameters of stacks 1 and 2:
t ¼ t1 t2 =ð1 r10 r2 Þ: ðA17:2Þ
In similar fashion, the reverse-direction transmissivity t 0 of the stack (bottom

illumination) is found to be
t0 ¼ t10 t20 =ð1 r10 r2 Þ: ðA17:3Þ
The argument for the equality of t and t 0 flows readily from Eqs. (A17.2)
and (A17.3), using proof by induction as follows . It is clear that if the individual
a ra
Stack 1
(r1, t1, r1, t1)
t1a r1 r2 t1 exp(i2f)a Complete stack

Air gap in the limit d 0
d (r, t, r, t)
Stack 2
(r2, t2, r2, t2)
ta
Figure A17.1 A multilayer stack consisting of metal and dielectric layers is

split into two sub-stacks along an arbitrary interfacial plane. The upper stack has
Fresnel reflection and transmission coefficients r1, t1 when the beam of light is
incident from the top. The corresponding coefficients when the light is incident
from the bottom are r10 , t10 . Similarly, the lower stack has reflection and trans-
mission coefficients r2 , t2 , r20 , t20 . The width of the air gap separating the two sub-
stacks is d. The overall transmission coefficient t of the entire stack can be
obtained by adding the contributions of the infinite number of beams that bounce
back and forth in the air-gap region.
sub-stacks are such that t1 ¼ t10 and t2 ¼ t02, then t ¼ t0 is guaranteed. For each sub-
stack the reduction to a pair of smaller stacks can be repeated until each sub-stack is a
single-layer, in which case t1 ¼ t10 and t2 ¼ t20 obviously hold. The proof is thus
complete.

1 A. Yariv and D. M. Pepper, Amplified reflection, phase conjugation, and oscillation
in degenerate four-wave mixing, Opt. Lett. 1, 16–18 (1977).
1980.
3 E. Hecht, Optics, third edition, Addison-Wesley, Reading, Massachusetts, 1998.
1986.
18
Optical pulse compression
A variety of methods exist for temporally compressing (shortening) optical

pulses. These methods typically start with pulses in the picosecond or femto-
second range, and end up with pulses that can be as short as a few optical cycles.
The optical bandwidth of the initial pulse is usually increased using a nonlinear
interaction such as self-phase modulation; this leads to a chirped pulse, which
sometimes ends up being longer than the original pulse. A well-known technique
for generating sub-100 fs pulses is nonlinear compression in a fiber, where the
fiber’s nonlinearity is used to broaden the optical spectrum. Thereafter, the pulse
duration is reduced using linear dispersive compression, which removes the chirp
by flattening the spectral phase. This is accomplished by sending the pulse
through an optical element with a suitable amount of dispersion, such as a prism
pair, an optical fiber, a grating compressor, or a chirped mirror.
In the 1960s, Gires and Tournois1 and Giordmaine et al.2 independently proposed
the shortening of optical pulses using compression techniques analogous to those
used at microwave frequencies. Fisher et al.3 suggested that femtosecond optical
pulses could be obtained by first passing a short pulse through an optical Kerr liquid
in order to impress a frequency sweep or “chirp” on the pulse’s carrier. Pulse
compression was then to be achieved by compensating the frequency sweep in the
pulse frequency spectrum using a dispersive delay line. In 1982, Shank et al.4
reported the generation and measurement of an optical pulse of only 30 fs duration at
a wavelength of 619 nm, corresponding to 14 optical cycles. In their experiment,
90 fs optical pulses, obtained from a mode-locked, colliding pulse, ring dye laser
followed by a dye amplifier, were focused onto a 15 cm-long, single-mode, polar-
ization preserving optical fiber. For a few nano-joules of pulse energy coupled into
the fiber, the optical spectrum was observed to broaden significantly. (With
increasing input energy the spectrum continued to broaden, covering nearly the
entire visible range.) The optical energy coupled into the fiber was adjusted to
240
produce a factor of 3 increase in the frequency spectrum bandwidth; the

spectral half-width thus broadened from 6 nm to 20 nm. The light emerging
from the fiber was subsequently recollimated with a lens and sent through a
grating compressor (i.e., two gratings set 6.4 cm apart, each having 600
lines/mm; angle of incidence on the gratings 30 ), to yield clean, 30 fs
pulses (compression ratio Rc ¼ 3.0). Shank et al. used the second-harmonic
autocorrelation method to determine the duration and profile of their com-
pressed pulses.
Compression techniques have evolved over the years,5,6,7,8,9 and many
impressive applications of the femtosecond pulse technology have been reported.
In this chapter we present an elementary theory of optical pulse compression,
describe the role of optical nonlinearity in self-phase modulation (which produces
chirp and thereby broadens the Fourier spectrum), and analyze methods of chirp
cancellation using dispersive (linear) optical instruments.
Pulse propagation in an isotropic, homogeneous,

and dispersive medium
Consider a periodic train of light pulses propagating along the z-axis in a
Cartesian coordinate system. The amplitude of this pulse train is denoted by a(t, z),
a function of the coordinate z and time t. At the origin of the coordinates, z ¼ 0, the
Fourier spectrum of the pulse is given by
X
M
Að f Þ ¼ Am dðf f0 mDf Þ: ð18:1Þ
m ¼ M
Here Am ¼ jAmj exp(im) is the complex amplitude of the spectral component at

f ¼ f0 þ mDf, where f0 is the central frequency of the spectrum. At z ¼ 0, the pulse
amplitude will be
X
M
aðt; z ¼ 0Þ ¼ jAm j cos½2pðf0 þ mDf Þt m : ð18:2Þ
m ¼M
Here the time-dependence factor is assumed to be exp(i2pft). In the limit Df ! 0,

the above sum is replaced by an integral, and one recovers a single pulse of
continuous spectrum A( f ). Figure 18.1 shows plots of A( f ) and a(t, z ¼ 0) for
the specific values of f0 ¼ 3.75 · 1014 Hz (corresponding to k0 ¼ c/f0 ¼ 0.8 lm),
Df ¼ 0.01f0, and M ¼ 10. The amplitude function a(t, z ¼ 0) in Figure 18.1 is
periodic, with a period T ¼ 1/Df 267 fs; a single period of the function is shown
in Figure 18.1(b), and a close-up of the pulse appears in Figure 18.1(c).
1.0
(a)
0.8
0.6
A(f)
0.4
0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Frequency (1014 Hz)
2 (b)
1
a (t, z = 0) (× 10–7)
–1
–2
0 50 100 150 200 250
Time (fs)
2 (c)
a (t, z = 0) (× 10–7)
–1
–2
10 20 30 40 50 60 70
Time (fs)
Figure 18.1 Amplitude A(f) of the Fourier transform of a Gaussian pulse train
described by Eq. (18.1), having f0 ¼ 3.75 · 1014 Hz, Df ¼ 0.01f0, and M ¼ 10. The
corresponding amplitude profile a(t, z ¼ 0) is a periodic function of time, with
period T ¼ 1/Df 267 fs. A single period of the pulse train is shown in (b), and a
close-up appears in (c).
At any point Z ¼ z0 along the propagation path, each Fourier component in

Eq. (18.1) will be multiplied by exp(i2pnfz0/c), where f ¼ f0 þ mDf is the specific
frequency for that component of the spectrum, and n is the effective refractive
index of the medium; c is the speed of light in vacuum. In free-space, n ¼ 1.0, and
the pulse amplitude at Z ¼ z0 simply becomes a(t, z ¼ z0) ¼ a(t – z0/c). The pulse
then propagates at the speed of light, c, without any change of shape whatsoever.
In a (homogeneous and isotropic) medium of refractive index n, however, the
dependence of n on the frequency f complicates the propagation process. For a
sufficiently narrow spectrum, one may approximate the dependence of n on f by

the first few terms of the Taylor series of n(f ), namely,
nðf Þ n0 þ n1 ðf f0 Þ þ n2 ðf f0 Þ2 : ð18:3Þ
The propagation factor at f ¼ f0 þ mDf, up to and including second-order terms in

(f – f0), may then be written as follows:
expði2pnfz0 =cÞ expði2pn1 f02 z0 =cÞ exp½i2pðn1 þ n2 f0 Þðf f0 Þ2 z0 =c

· exp½i2pðn0 þ n1 f0 Þfz0 =c: ð18:4Þ
In the above equation, the first term on the right-hand side is a constant phase-factor
(independent of f ), which can, for purposes of the present analysis, be ignored. The
second term is a quadratic phase-factor in (f – f0) ¼ mDf, which may be combined
with the phase m of Am in Eq. (18.2); this term is ultimately responsible for the
broadening and chirp induced on the pulse by the effects of dispersion. The last
term is a linear phase-factor that translates the (dispersed) pulse from t ¼ 0 at z ¼ 0
to t ¼ (n0 þn1f0)z0/c at z ¼ z0. The group velocity Vg is thus found to be
Vg ¼ c=ðn0 þ n1 f0 Þ: ð18:5Þ
When n1 ¼ 0, the refractive index is, to first order, independent of the

optical frequency f, and the group velocity Vg would be equal to the phase velocity
Vph¼ c/n0. In general, the refractive index of a transparent optical material is an
increasing function of the frequency f, hence n1 0 and Vg Vph. For a typical
material such as fused silica, where n0 ¼ 1.46 and n1 ¼ 4.2 · 1017 s at f0 ¼ 5.4546
· 1014 Hz (corresponding to k0 ¼ 0.55 lm), Vg 0.985Vph. (For fused silica in the
wavelength range k ¼ 0.3 lm – 1.6 lm, plots of n0, n1, n2 versus the optical fre-
quency f are shown in Figure 18.2.)
Note that the above arguments have been presented in the context of propa-
gation in a homogeneous medium, where n(f ) is a characteristic of the material
environment. For a beam confined to a waveguide, however, the index n(f ) is an
effective index that depends not only on the material properties of the core and
the cladding, but also on the structure of the waveguide. Equation (18.5) will still
be applicable in this case, but the coefficients n0 and n1 must be obtained for the
effective index neff of the waveguide, for the particular mode under consideration.
(See the Appendix for a discussion of guided modes and the effective index of a
simple slab waveguide.)
Figure 18.3 shows the pulse of Figure 18.1 after propagating a distance of
4.0 mm in fused silica (n0 ¼ 1.4534, n1 ¼ 3.69 · 1017 s, n2 ¼ 0.6 · 1033 s2
n0
1.5
Refractive index and derivatives

1 1016n1
0.5
1030n2
–0.5
2 3 4 5 6 7 8 9 10
Frequency (1014 Hz)
Figure 18.2 Plots of n0, n1, and n2 versus the optical frequency f for fused
silica in the wavelength range k ¼ 0.3 lm – 1.6 lm. The refractive index n0( f) is
measured and fitted to the Sellmeier equation, then the derivatives of the
equation are obtained analytically to yield the plots of n1 and n2.
1.5
a (t, z = 4 mm) × 10–7
1.0
0.5
0.0
–0.5
–1.0
–1.5
0 50 100 150 200 250

Time (fs)
Figure 18.3 The pulse depicted in Figure 18.1 after propagating a distance of
4.0 mm in fused silica (n0 ¼ 1.4534, n1 ¼ 3.69 · 1017 s, n2 ¼ 0.6 · 1033 s2 at
k0 ¼ c/f0 ¼ 0.8 lm).
at k0 ¼ c/f0 ¼ 0.8 lm). Clearly it does not take much propagation for a short pulse
of the given wavelength in the given material to become significantly broadened.
Group velocity dispersion

The group velocity defined by Eq. (18.5) may itself be treated as a function of
frequency, namely, Vg(f ) ¼ c/(n þ n0 f ), where n0 is the derivative of n with respect
to f. The variations of Vg in the vicinity of a given frequency f0 may then be analyzed

in terms of the derivative of Vg with respect to f, evaluated at f ¼ f0, namely,
Vg0 ð f0 Þ ¼ 2cðn1 þ n2 f0 Þ=ðn0 þ n1 f0 Þ2 : ð18:6Þ
(Here we have used the fact that n00 ¼ 2n2.) The so called group velocity dispersion
(GVD) defined by Eq. (18.6) is clearly proportional to the coefficient (n1 þ n2f0)
appearing in the quadratic phase factor in Eq. (18.4). In particular, the sign of (n1 þ n2f0)
determines whether Vg is an increasing or decreasing function of frequency.
Quadratic phase-factor, chirp, and pulse broadening

The spectral amplitude A(f ) of a single light pulse may be a Gaussian function of
frequency f, namely,
h i
Aðf Þ ¼ A0 exp paðf f0 Þ2 ; ð18:7Þ
where A0 and a are two complex constants. Whereas A0 ¼ jA0jexp(i0) may be

chosen arbitrarily, the parameter a ¼ a1 ia2 is required to have a positive real
part, that is, a1 > 0. The units of A0 are volt·second/meter, while a has units of
second2. The Fourier transform of A(f ) is given by
n o
aðtÞ ¼ Re A0 a 2 expðpt2 =aÞ expði2pf0 tÞ
1

¼ jA0 jða21 þ a22 Þ4 exp p a1 =ða21 þ a22 Þ t2

1
n o
· cos 2pf0 t þ p a2 =ða21 þ a22 Þ t2 12 tan1 ða2 =a1 Þ 0 : ð18:8Þ
Note that the field amplitude a(t) has units of volt/meter, namely, those of
the electric field in the MKSA system of units. The pulse envelope is a Gaussian
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
function whose width is proportional to ða1 þ a22 Þ=a1 . (To obtain the pulse’s
2
full widthffi at half-maximum intensity (FWHM), multiply this parameter by

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 ln 2=p 0:665.) Thus the quadratic phase-factor, having a coefficient pa2
in the exponent of the spectral function A(f ), causes a broadening of the
pulse. For example, the coefficient of the quadratic phase-factor in Eq. (18.4),
a2 ¼ 2(n1þn2f0)z0/c, indicates a growing pulse-width with the propagation distance
z0. The time-dependent chirp frequency in Eq. (18.8), f ¼ f0 þ [a2/(a12 þ a22)]t,
varies continuously along the pulse within a range centered at f0 (noting that the
pulse of Eq. (18.8) is centered at t ¼ 0). If (n1þn2f0) happens to be positive, then
the chirp frequency will increase with time (up-chirp). Since the GVD, given
by Eq. (18.6), is negative in this case, the leading edge of the pulse, having a
frequency that is lower than f0, travels faster than the trailing edge, which has a
higher frequency. On the other hand, if (n1þn2f0) happens to be negative, the chirp
frequency will decrease with time (down-chirp). However, since the GVD is
positive in this case, the leading edge, once again, will travel faster than the trailing
edge. Either way, the pulse is seen to broaden as a result of propagation in the
dispersive medium, which is the same conclusion arrived at earlier, when we
argued that the width of the Gaussian pulse of Eq. (18.8) is an increasing function
of a2. The minimum width occurs at z0 ¼ 0, where a2 ¼ 0; here the pulse is said to
pffiffiffiffiffi
be transform-limited, meaning that its width, a1 , cannot be reduced any further,
owing to the finite width of its Fourier transform A(f ).
Propagation in nonlinear media and pulse compression

Consider a chirp-free Gaussian pulse as described by Eq. (18.8) with a set equal
to a1 (i.e., a2 ¼ 0). When this pulse is launched into a nonlinear medium having a
(linear) refractive index n0, its period-averaged intensity at the origin of the
coordinates, z ¼ 0, will be
IðtÞ ¼ ðn0 =Z0 Þha2 ðtÞi

¼ 12 n0 Z01 a1 2
1 jA0 j expð2pt =a1 Þ I peak ð1 2pt =a1 Þ;
2 2
ð18:9Þ
where Ipeak ¼ n0jA0j2/(2Z0a1) is the pulse’s peak intensity (i.e., optical power per unit
cross-sectional area). Here Z0 377 X is the free-space impedance in the MKSA
system of units. The pulse’s Gaussian intensity profile is approximated in Eq. (18.9)
by the quadratic function (1 2pt2/a1), which provides an accurate description at
and around the center of the pulse, but grossly underestimates the intensity distri-
bution further away, in the wings. The only justification for this approximate
treatment is that it simplifies the following analysis; more realistic calculations,
therefore, must properly account for the actual pulse’s intensity profile.
If the nonlinear refractive index of the medium happens to be proportional to
I(t), namely, n(I) ¼ n0 þ cI, which is characteristic of media with the so-called
Kerr nonlinearity, then, assuming dispersionless propagation and ignoring time-
independent terms, the phase modulation imparted to the pulse after propagating
a distance z0 will be

DðtÞ 2pcIðtÞz0 =k0 4p2 f0 cIpeak z0 =ða1 cÞ t2 ¼ pt2 =a3 ; ð18:10Þ
where a3 ¼ a1c/(4pf0cIpeakz0) is a real-valued constant. For typical values of

c ¼ 2.5 · 1020 m2/w and n0 ¼ 1.4534 in a silica glass fiber at f0 ¼ 3.75 ·1014 Hz,
setting a1 ¼ 5 ·1028 s2, z0 ¼ 1.0 m, and jA0j ¼ 2.0 ·106 v·s/m (corresponding to
Ipeak 15 w/lm2 inside the fiber), we find a3 ¼ 8.5 ·1029 s2. Figure 18.4 shows
10
a (t, z = 1m) × 10–7

5
–5
–10
0 10 20 30 40 50 60 70
Time (fs)
Figure 18.4 The original pulse of Figure 18.1(c) after acquiring a nonlinear
phase shift, D(t) ¼ pt2/a3, with a3 ¼ 8.5 ·1029 s2. Other relevant parameters
are jA0j ¼ 2.0 · 106 v·s/m, a1 ¼ 5.0 ·1028 s2, f0 ¼ 3.75 ·1014 Hz. The chirp
frequency is seen to be a linearly increasing function of time.
a plot of the original pulse depicted in Figure 18.1(c) after acquiring the nonlinear
phase shift given by Eq. (18.10) with the above value of a3. The quadratic nature
of the phase-shift in Eq. (18.10) is responsible for the chirped behavior of the
oscillations in Figure 18.4, where the frequency is seen to be a linearly increasing
function of time (the so-called up-chirp).
When the quadratic phase-factor exp(ipt2/a3) is imposed on a transform-
limited Gaussian pulse, it does not change the Gaussian nature of the pulse, but
modifies the pulse parameter from a ¼ a1 to b. Defining 1/b ¼ (1/a1) þ i/a3, the
Fourier transform of the chirped pulse becomes
pffiffiffiffiffiffiffiffiffiffi h i
Að f Þ ¼ A0 b=a1 exp pbð f f0 Þ2 : ð18:11aÞ
Writing b ¼ b 1 – ib 2, we find
.h i
b 1 ¼ a1 1 þ ða1 =a3 Þ2 ; ð18:11bÞ
.h i
b 2 ¼ a3 1 þ ða3 =a1 Þ2 : ð18:11cÞ
The first consequence of imposing a chirp in the time domain, therefore, is a

broadening of the spectral function A(f ) in the frequency domain; this is because b 1
is always less than a1. The second consequence of a time-domain chirp is the
imposition of the quadratic phase-factor exp[ipb2(f – f0)2] on the spectrum of the
pulse. Should this spectral phase somehow be eliminated, the new pulse would
become chirp-free. More importantly, however, is the fact that, by virtue of its
broader spectrum, the resulting chirp-free pulse will be a compressed version of the
original pulse.
pffiffiffiffiffiffiffiffiffiffiffiffi
Note that the spectral broadening factor a1 =b1 depends only on the ratio
a1 =a3 ¼ 4pcIpeak z0 =k0 , which is independent of the original pulse duration. With
a reasonable value of the nonlinear coefficient c, and with sufficient intensity Ipeak
and/or propagation distance z0, it should be possible to broaden the spectra of pulses
having durations of the order of picoseconds and perhaps even those that reach
into the nanosecond regime. To substantially broaden the spectral width requires that
a1/a3 be much greater than unity, in which case b 2 will become nearly equal to a3.
However, for a given compression ratio, a3 is proportional to a1, which is the
square of the original pulse width. For pulses in the picosecond regime and shorter,
the value of b 2 will be small enough that passing the chirped (and spectrally-
broadened) pulse through a simple dispersive element (e.g., a pair of prisms or
gratings) will eliminate the quadratic phase-factor, thus yielding a compressed pulse.
However, for longer pulses (say, in the nanosecond range) the quadratic phase
coefficient b 2 will be so large as to render ineffective these simple methods of
chirp-cancellation. Under such circumstances, one should resort to resonant
linear devices such as Fabry Pérot etalons and tuned spectral filters to accomplish
chirp cancellation.
Eliminating the quadratic phase-factor

One way of removing the quadratic phase-factor exp ipb 2 ð f f0 Þ2 , imposed
on the spectral function A(f ) by the effects of nonlinear propagation, is to
simply send the chirped pulse through a linear dispersive medium such as a
transparent glass slab or a length of fiber. A comparison of Eq. (18.11) with
Eq. (18.4) reveals the required propagation distance in the dispersive element
to be z0 ¼ 12 cb2/(n1þn2f0), where, obviously, b 2 and (n1þn2f0) must have opposite
signs.
Example: For an infra-red laser pulse (k0 ¼ 1.5 lm, f0 ¼ 2.0 · 1014 Hz),
let a1 ¼ 1025 s2 and a3 ¼ 1026 s2. From Eq. (18.11c), we find b 2 ¼ 0.99a3. The
GVD of fused silica, which is negative in the visible and near-infrared, changes sign
at k 1.3 lm. At k ¼ 1.5 lm, we find n1 ¼ 9.0 · 1017 s, n2 ¼ 5.5 · 1031 s2,
yielding (n1þn2f0)¼ 2.0 · 1017s. Passing the pulse through a fairly thick plate of
fused silica (thickness ¼ 74.25 mm) can, therefore, cancel the chirp and produce a
pffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
compressed pulse. The compression ratio is Rc ¼ a1 =b 1 ¼ 1 þ ða1 =a3 Þ2 10.
Another method of chirp-cancellation utilizes dispersive optical elements such

as gratings and prisms. Figure 18.5 shows a pair of identical diffraction gratings of
period p and separation d. Let the pulse be incident at an angle h on the first
m0
d
Figure 18.5 A pair of identical gratings of period p and separation d is com-

monly used as a linear dispersive device for chirp cancellation. The incident
pulse arrives at the first grating at an angle h relative to the grating’s surface
normal. The mth diffracted order leaves the first grating at an angle hm, and is
subsequently diffracted from the second grating. The emergent beam is parallel
to, but laterally displaced from, the direction of incidence.
grating. The (complex) amplitude of a plane-wave of wavelength k illuminating the

grating surface will be exp[i(2p/k)x sinh]. The grating surface will modulate this
wavefront by a phase-factor of period p/m, where m, an integer, is the diffraction
order. Thus the reflected wavefront immediately after the first grating surface will be
exp[i(2p/k)(sin h þ mk/p)x]. This wavefront propagates a distance z ¼ d along the
grating’s surface normal, acquiring along the way a phase , where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðkÞ ¼ ð2pd=kÞ 1 ðsin h þ mk=pÞ2 : ð18:12Þ
Reflection from the second grating does not modify the acquired phase factor given
in Eq. (18.12), but merely cancels the modulation of the wavefront, exp(i2pmx/p),
which was added at the first grating. The beam thus returns to propagating in its
original direction at an angle h to the grating normal, but retains the phase (k)
which it acquired while propagating between the two gratings.
We mention in passing that, in addition to the above phase, one must take into
account the phase-shift imparted to each wavelength upon reflection from the two
gratings. The phase shifts of the two gratings, which will be the same if the gratings
are identical, must be added to (k), and their wavelength dependence must be
fully accounted for when computing the quadratic phase-factor imposed on the
emergent beam.
The Taylor series expansion of (k) of Eq. (18.12) around the center frequency
f0 yields
ðf Þ ¼ U0 þ U1 ðf f0 Þ þ U2 ðf f0 Þ2 þ ; ð18:13aÞ
where, denoting by hm0 the mth order diffraction angle corresponding to the
0
f( f ) – Φ0 – Φ1( f – f0 ) (rad)
–10
–20
–30
–40
–50
–60
–70
3.55 3.6 3.65 3.7 3.75 3.8 3.85 3.9 3.95
Frequency (× 1014 Hz)
Figure 18.6 Plot of the function (f) – U0 – U1(f – f0) in the vicinity of
f0 ¼ 3.75·1014 Hz. (f) is given by Eq. (18.12), while its first two Taylor series
coefficients, U0 and U1, are given by Eqs. (18.13b) and (18.13c). The diffraction
order under consideration is m ¼ 1, the assumed grating period is p ¼ 1.0 lm,
the incidence angle is h ¼ 60 , and the separation between the gratings is
d ¼ 10 mm.
central wavelength k0, namely, sin hm0 ¼ sin h þ mk0/p, we have

U0 ¼ 2pðd=k0 Þ cos hm0 ; ð18:13bÞ
U1 ¼ 2pðd=cÞð1 sin h sin hm0 Þ=cos hm0 ; ð18:13cÞ
U2 ¼ pdm2 k30 =ðc2 p2 cos3 hm0 Þ: ð18:13dÞ
As a typical example, Figure 18.6 shows a plot of the phase function (f ) of
Eq. (18.12), with the constant and linear terms of Eq. (18.13a) removed. The
horizontal axis is centered at f0 ¼ 3.75 · 1014 Hz (k0 ¼ 0.8 lm), the grating period
is p ¼ 1.0 lm, the assumed incidence angle is h ¼ 60 , the separation between the
gratings is d ¼ 10 mm, and the diffraction order under consideration is m ¼ 1. The
numerical value of U2, 1.8 · 1025 s2, provides a good match to the actual curvature
of the function plotted in Figure 18.6. This quadratic phase-factor is a linear function of
the separation d of the two gratings, and also a strong function of the grating period p.
Note that the quadratic phase coefficient U2 of Eq. (18.13d) is always negative.
Losses due to diffraction orders other than the mth order used, as well as polarization
dependence of the diffraction efficiency from gratings, can be a disadvantage. Since
the various frequencies are shifted laterally upon emerging from the second grating,
to the extent that this lateral shift cannot be ignored, one must either employ a
second, identical pair of gratings, or return the beam through the same pair, in order
to compensate for this lateral spectral shift. In the end, chirp-compensation with a
grating pair works well for femtosecond and even a few-pico second-long pulses,
r = exp(if)
Dielectric mirror
2 = exp(i 2)
Incident beam
d
Dielectric mirror
1 = exp(i 1)
Figure 18.7 Diagram of a Gires–Tournois resonator. The (collimated) incident

beam is fully reflected because both mirrors are lossless and, moreover, the rear
mirror is 100% reflective. The spectral shape of the pulse is preserved by virtue
of the fact that the device’s reflection coefficient r has a magnitude of unity at all
frequencies. The phase (f) of r depends on the reflectivity q of the front mirror,
on the separation d between the mirrors, and on the phase angles w1 and w2 of
the individual mirror reflectivities.
but increasing the pulse duration to the sub-nanosecond regime and beyond imposes
unrealistic demands on the grating period p and grating separation d, which renders
impractical this method of chirp-compensation for long pulses.
A third method of chirp-compensation is based on resonant structures, such as
Fabry–Perot etalons. Figure 18.7 is a diagram of a special resonator (the Gires–
Tournois interferometer), which is particularly useful for low-level chirp-
cancellation. For simplicity, let us assume that the front mirror has amplitude
reflectivity q1 ¼ q exp(iw1) and transmissivity s1 from both sides (i.e., symmetric
mirror), and that the second mirror is 100% reflective, that is, q2 ¼ exp(iw2). The
mirrors being lossless, we have jq1j2 þ js1j2 ¼ 1;palso,
ffiffiffiffiffiffiffiffiffiffigenerally,
ffi the phase dif-

ference between q1 and s1 is 90 ; therefore, s1 ¼ i 1q expðiw1 Þ. Assuming the
2
separation between the two mirrors is d, the amplitude reflectivity of the GT

etalon will be
q expfi½w1 þ w2 þ ð4pd=kÞg
r ¼ jr j expðiÞ ¼ expðiw1 Þ: ð18:14Þ
1 q expfi½w1 þ w2 þ ð4pd=kÞg
Clearly, jrj ¼ 1 at all wavelengths, and
ðq2 1Þ sin½w1 þ w2 þ ð4pd=kÞ

tanð w1 Þ ¼ : ð18:15Þ
2q ðq2 þ 1Þ cos½w1 þ w2 þ ð4pd=kÞ
The above phase can be expanded in a Taylor series around the center frequency
f0, as follows:
ðf Þ ¼ U0 þ U1 ðf f0 Þ þ U2 ðf f0 Þ2 þ : ð18:16aÞ
Ignoring the frequency dependence of q, w1, and w2, we find

U1 ¼ ð4pd=cÞð1 q2 Þ= 1 þ q2 2q cos½w1 þ w2 þ ð4pd=k0 Þ ; ð18:16bÞ
qðq2 1Þ sin½w1 þ w2 þ ð4pd=k0 Þ

U2 ¼ ð4pd=cÞ2 : ð18:16cÞ
f1 þ q2 2q cos½w1 þ w2 þ ð4pd=k0 Þg2
A typical behavior of the GT phase-function (f ) for the special case of q ¼ 0.9 is
shown in Figure 18.8. The dependence of on the total retardation w ¼ w1 þ
w2 þ (4pd/k), shown in Figure 18.8(a), reveals that rises rapidly from 0 to 2p in
the vicinity of resonance, which occurs at w ¼ 0. Figure 18.8(b) shows the
dependence of 0 (f ) ¼ d/df on w. As can be seen from Eq. (18.16b), the max-
imum value of 0 , namely, (4pd/c)(1 þ q)/(1 q), occurs on resonance, at w ¼ 0.
Therefore, for the (chirped) incident pulse to experience, upon reflection, the full
range of the available phase of the GT resonator, the pulse’s spectral
width should be Df (c/4d)(1 q)/(1 þ q). Assuming Df 3.0 · 1011 Hz
(corresponding to pulses in the few-picosecond range), a good choice for the
separation distance of the GT mirrors would be d 14 lm.
With reference to Figure 18.8(c), which is a plot of 00 (f ) ¼ d2/df 2 versus w, it
is clear that a slight increase of the mirror separation d (by only 3.8 nm in the
present example) will shift the center of the incident spectrum (f0 ¼ 3.75 · 1014 Hz)
to the vicinity of the negative peak of 00 (f ), where a large negative quadratic phase
factor is available for chirp cancellation.
Figure 18.9 is a plot of (f ) in the vicinity of the center frequency f0, with the
first two terms of the Taylor series subtracted. The assumed parameters are q ¼ 0.9,
w1 ¼ w2 ¼ 0, d ¼ 14.005 lm, and the computed Taylor series coefficients are
U0 ¼ 1.28, U1 ¼ 7.2 · 1012, U2 ¼ 1.9 · 1023. It is easy to verify that the
quadratic function U2( f – f0)2 provides a fairly good match to the actual phase
depicted in Figure 18.9.
In general, the magnitude of the quadratic phase available from a GT resonator is
rather small, thus limiting the applicability of this type of device to situations that
involve small compression ratios only. To see this, note in Eq. (18.11) that the
pffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
compression ratio Rc ¼ a1 =b 1 is equal to 1 þ ða1 =a3 Þ2 ; also, b2/b 1 ¼ a1/a3;
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
therefore, b 2 =b 1 ¼ R2c 1. Defining the bandwidth Df as the FWHM
6 (a)
5
4
f( f ) (rads)
3
2
1
0
–150 –100 –50 0 50 100 150
20
(b)
15
f( f )/(4d/c)
10
0
–60 –40 –20 0 20 40 60
100 (c)
2
f( f )/(4d/c)
50
–50
–100
–60 –40 –20 0 20 40 60

C1+C2+ 4d/ (degrees)
Figure 18.8 Characteristics of a GT resonator having q ¼ 0.9. (a) Phase

plotted versus the total retardation angle w ¼ w1 þ w2 þ (4pd/k). (b) Plot of
d/df versus w; the vertical axis is in units of 4pd/c. (c) Plot of d2/df 2 versus w;
the vertical axis is in units of (4pd/c)2.
of jA(f )j2, the phase of A(f ) in Eq. (18.11) varies by about 0.35b 2/b 1 radians
between f0 and f0 12 Df. In practice, the limited amount of quadratic phase
available from a GT resonator means that this device can handle only small
compression ratios (Rc 2–3).
Operating away from the peak of 00 (f ) could help provide a slight increase
in the range of the quadratic phase at the expense of introducing third- and
higher-order phase factors into the optical spectrum. It is also possible to
0
–0.1
f( f ) – Φ0 – Φ1( f – f0) (rad)

–0.2
–0.3
–0.4
–0.5
3.748 3.749 3.75 3.751 3.752
Frequency (× 1014 Hz)
Figure 18.9 Plot of the function (f) – U0 – U1(f – f0) in the vicinity
of f0 ¼ 3.75 · 1014 Hz. The GT parameters are q ¼ 0.9, w1 ¼ w2 ¼ 0,
d ¼ 14.005 lm, while the computed Taylor series coefficients are U0 ¼ 1.28,
U1 ¼ 7.2 · 1012, U2 ¼ 1.9 · 1023. The quadratic function U2(f – f0)2
provides a fairly good match to the actual function depicted here.
design the GT mirrors such that q, w1, and/or w2 exhibit strong dependences
on f within the relevant spectral range. Inserting a transparent dielectric slab
(or thin film layer) between the mirrors is another degree of freedom that can
be (and has been) exploited for the purpose of improving the performance
of GT compressors.9
Concluding remarks
In this chapter we have attempted to provide an explanation of the fundamental
principles of optical pulse compression. We stayed away from the advanced
topics, and steered clear of some of the technical difficulties as well as the
ingenious methods that have been developed to overcome them. In practice, one
must contend with a host of technical problems in order to reliably and efficiently
produce high-quality compressed pulses. The nonlinear medium, which imparts
the all-important phase modulation to the initial pulse, may introduce significant
dispersion of its own. This results in a distorted pulse and, often, it is the
mechanism that limits the amount of useful chirp that can be placed on the pulse.
In addition, the third- and higher-order terms introduced into the spectral phase
profile, either within the nonlinear medium or as a consequence of passage
through the chirp compensator, must be identified and corrected, perhaps by
sending the pulse through additional (high-order) compensators. Finally, the
profile of the compressed pulse must be measured to determine the degree of
compression, and to find out whether the pulse is free from distortions and other
imperfections. The interested reader may consult the vast literature of the subject
for further details.
Appendix
Slab waveguide and the effective refractive index of guided modes
Consider the slab waveguide depicted in Figure 18.10. The guiding layer has
thickness d and refractive index ng. The substrate and the cladding layer, having
refractive indices ns and nc, respectively, may be assumed to be infinitely thick.
Within the guiding layer a pair of plane-waves propagate at an angle h relative to
the surface normal; h is greater than the critical angle of total internal reflection at
both interfaces, that is, ng sin h > max (ns, nc). The two plane-waves thus have the
following complex amplitudes:
E ðx; zÞ ¼ jE0 j expði0 Þ exp½ið2png =k0 Þð x cos h þ z sin hÞ: ðA18:1Þ
nc
z
Evanescent field
ng

Guiding layer
Evanescent field
ns
Figure 18.10 Slab waveguide consisting of a guiding layer of thickness d and

refractive index ng, sandwiched between a substrate of index ns and a cladding
layer of index nc. Within the guiding layer, a pair of plane-waves propagate at an
angle h relative to the surface normal.
Here the plus sign refers to the up-going beam, the minus sign to the down-going
beam, 0 defines the relative phase between the two plane-waves, and k0 ¼ c/f0 is
the vacuum wavelength. At the interface with the cladding, where x ¼ d/2, the
down-going beam must have the same amplitude as the up-going beam, but its
phase must be incremented by the phase of the Fresnel reflection coefficient
at this interface. The Fresnel coefficient, depending on whether the beam is s- or
p-polarized, is rp ¼ exp(ip) or rs ¼ exp(is), where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pðcladÞ ¼ þ2 tan 1
ðn2c cos hÞ= ng n2g sin2 h n2c ; ðA18:2Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sðcladÞ ¼ 2 tan 1
n2g sin h n2c =ðng cos hÞ :
2
ðA18:3Þ
Therefore, at the cladding interface, one must have

ðcladÞ
0 þ ð2png =k0 Þ 2 d cos h þ z sin h þ p;s
1

¼ 0 þ ð2png =k0 Þ 2 d cos h þ z sin h ;
1
ðA18:4Þ
which leads to
ðcladÞ
20 þ 2png ðd=k0 Þ cos h þ p;s ¼ 0: ðA18:5Þ
A similar relation must hold at the substrate interface (x ¼ d/2), where the
down-going beam is incident and the up-going beam is reflected. Therefore,
ðsubÞ
20 þ 2png ðd=k0 Þ cos h þ p;s ¼ 0: ðA18:6Þ
Equations (A18.5) and (A18.6) can be satisfied simultaneously if and only if an
integer m exists such that
ðcladÞ ðsubÞ
4png ðd=k0 Þ cos h þ p;s þ p;s ¼ 2mp: ðA18:7Þ
If the guiding layer’s thickness d is sufficiently small, Eq. (A18.7) will have only
one solution (i.e., one acceptable value of h) for s-light, and perhaps another
solution for p-light. The guide is then said to be single-mode. Larger values of d
lead to more solutions, which correspond to higher-order modes. (Note: h ¼ 90 is
always an acceptable solution; however, 0 in this case turns out to be 0 for
p-light and 90 for s-light. Both of these solutions result in the up-going and
down-going plane-waves coming into alignment with equal but opposite ampli-
tudes, thereby canceling each other out. The solution corresponding to h ¼ 90 ,
therefore, does not lead to a viable mode.) For a viable mode, denoting the
solution of Eq. (A18.7) by hm, and with reference to Eq. (A18.1), the E-field
amplitude within the guiding layer will be
Eðx; zÞ ¼ Eþ ðx; zÞ þ E ðx; zÞ

¼ 2jE0 j cos ð2png cos hm =k0 Þx þ 0 exp ið2png sin hm =k0 Þz :
ðA18:8Þ
The cross-sectional profile of the mode along the x-axis is thus determined by the
cosine function on the right-hand side of Eq. (A18.8) (and also by the evanescent
fields within the cladding and the substrate). The exponential term in Eq. (A18.8)
is the propagation phase-factor, from which one can identify an effective
refractive index neff ¼ ng sin hm for the given mode. Considering that, in general,
both ng and the solution hm of Eq. (A18.7) are functions of the frequency f, the
dispersive properties of the waveguide are seen to arise from the frequency
dependence of neff.

1 F. Gires and P. Tournois, Compt. Rend. 258, 6112 (1964).
2 J. A. Giordmaine, M. A. Duguay, and J. W. Hansen, Quantum Electron. 4, 252
(1968).
3 R. A. Fisher, P. L. Kelley, and T. K. Gustafson, Sub-picosecond pulse generation
using the optical Kerr effect, Appl. Phys. Lett. 14, 140 (1969).
4 C. V. Shank, R. L. Fork, R. Yen, R. H. Stolen, and W. J. Tomlinson, Compression of
femtosecond optical pulses, Appl. Phys. Lett. 40, 761 (1982).
5 W. J. Tomlinson, R. H. Stolen, and C. V. Shank, Compression of optical pulses
chirped by self-phase modulation in fibers, J. Opt. Soc. Am. B 1, 139 (1984).
6 J. Biegert and J.-C. Diels, Compression of pulses of a few optical cycles through
harmonic generation, J. Opt. Soc. Am. B 18, 1218 (2001).
7 J. Moses and F. K. Wise, Soliton compression in quadratic media: high-energy few-
cycle pulses with a frequency-doubling crystal, Opt. Lett. 31, 1881 (2006).
8 W. Rudolph and B. Wilhelmi, Light Pulse Compression, Harwood Academic,
London, 1989.
9 J.-C. Diels and W. Rudolph, Ultrashort Laser Pulse Phenomena, Academic Press,
New York, 1996.
19
The uncertainty principle in classical optics
In the classical electromagnetic theory the wave-vector k ¼ (2p/k)r underlies

the Fourier space of propagating (or radiative) fields. The k-vector combines
into a single entity the wavelength k and the unit vector r that signifies the
beam’s propagation direction. The Fourier transform relation between the three-
dimensional space of everyday experience and the space of the wave-vectors (the
so-called k-space) gives rise to relationships between the two domains analogous
to Heisenberg’s uncertainty relations.
Considering that in quantum theory the electromagnetic k-vector is propor-
tional to the photon’s momentum1 (p ¼ h k, where h ¼ h/2p, h being the Planck
constant), one should not be surprised to find relationships between dimensions of
a beam in the XYZ-space and its momentum spread in the k-space. Such rela-
tionships impose fundamental limits on the ability of measurement systems to
determine the various properties of electromagnetic fields.
In this chapter we address two problems that have widespread applications in
optical metrology, spectroscopy, telecommunications, etc., and discuss the con-
straints imposed by the uncertainty principle on these problems. The first topic of dis-
cussion is the separation of two overlapping beams of identical wavelength having
slightly different propagation directions. This will be followed by an analysis of the
limits of separating co-propagating beams having slightly different wavelengths.
Angular separation and the limit of resolvability

Figure 19.1 shows an aperture of diameter D, which transmits two plane waves of
the same wavelength k propagating in slightly different directions. Denoting the
angular separation between the beams by Dh, we find that the projections of the
two k-vectors along the X-axis differ by Dkx (2p/k)Dh. In geometrical optics,
rays propagate along straight lines and, therefore, the two beams must separate
from each other after a certain propagation distance. In wave optics, however, the
258
X
Observation
plane
Δu
D Z
Figure 19.1 Two beams of the same wavelength k, propagating in slightly

different directions, pass through an aperture of diameter D. The angle between
the two k-vectors is Dh, giving rise to Dkx (2p/k)Dh. The beams separate from
each other at the observation plane located a distance z from the aperture,
provided the uncertainty relation DDkx 2p is satisfied.
beams expand as they propagate along Z and, although their centers drift apart,
there is the distinct possibility that they will never be completely separated.
Roughly speaking, we expect the beams to remain more or less collimated
between z ¼ 0 and z ¼ D2/k, the Rayleigh range2 for a beam of diameter D and
wavelength k. If at the Rayleigh range the distance between the beam centers is
greater than D, the beams should be separable; otherwise their drifting apart will
go hand in hand with their expansion, and the beams remain entangled as they
propagate beyond the Rayleigh range. The necessary condition for separability is
thus (D2/k)Dh > D, or equivalently,
D Dkx > 2p: ð19:1Þ
The lower bound 2p on the product of D and Dkx appearing in Ineq. (19.1) is not
exact, but depends on the definition of beam diameter D and the adopted criterion
for separability, which are typically imprecise. For all practical purposes, the
number appearing on the right-hand side of Ineq. (19.1) should be on the order of
unity, say, greater than 1 but less than 10.
Invoking the quantum nature of light, if the aperture diameter D is interpreted
as a measure of the uncertainty Dx about the photon position along X, while Dkx
is related (through the relation p ¼ h k) to the linear momentum uncertainty Dpx
along the same axis, then Ineq. (19.1) is equivalent to Heisenberg’s uncertainty
relation DxDpx > h.
a b
c d
e f
–500 x/ 500 –500 x/ 500
Figure 19.2 Plots of intensity (left) and phase (right) at the entrance aperture
of the system of Figure 19.1. Two uniform beams, one propagating with a slight
tilt toward the upper right, another with a slight tilt toward the lower left, enter a
D ¼ 500 k aperture. The angular separation of the beams is Dh ¼ 0.23 . The
individual beams are shown in the top (a, b) and the middle (c, d) rows; their
superposition appears at the bottom (e, f).
Figure 19.2 shows the intensity and phase profiles of two plane waves as well
as those of their superposition at the aperture depicted in Figure 19.1 (diameter
D ¼ 500 k). The phase distributions in Figures 19.2(b) and 19.2(d) indicate that
one of the beams is slightly tilted towards the upper-right corner of the XY-plane,
while the other is tilted by an equal amount towards the lower-left corner.
The angular separation between these beams is Dh ¼ 0.23 ¼ 0.004 radians.
The combined beam’s intensity distribution in Figure 19.2(e) reveals the angular
separation of the two superimposed beams through a tell-tale fringe pattern.
When the composite beam (whose intensity and phase distributions are shown
in Figures 19.2(e, f)) is propagated along the Z-axis, one obtains at various
distances from the aperture the intensity patterns displayed in Figure 19.3. It is
seen in these pictures that the two constituent beams continue to overlap at first,
giving rise to interesting interference patterns. After a sufficient propagation
distance, however, the beams separate and go their own ways. The assumed value
of DDkx in this example is 4p, which satisfies Ineq. (19.1).
Separating two beams by means of a lens

In the preceding section it was demonstrated that separating two beams of a
certain angular distance Dh requires a minimum beam diameter D in accordance
with Ineq. (19.1). It may be asked whether a similar limitation exists on the
propagation distance z before the individual beams can be resolved. Apparently
no physical law limits the required distance z, although practical considerations
seem to impose certain constraints. In free space, the required propagation dis-
tance is typically less than or equal to the Rayleigh range, D2/k, but one can
substantially reduce this distance by employing a lens, as shown in Figure 19.4.
Here two overlapping beams of diameter D and angular separation Dh are
resolved after going through an aberration-free lens. In the focal plane of the lens
the center-to-center spacing of the focused spots is f Dh, which must be greater
than the Airy disk3 radius of 0.6k/NA ¼ 1.2 f k/D. Note that the resolvability
criterion is independent of f and NA, requiring only that D(Dh/k) > 1.2, which is a
statement of the uncertainty principle in the present context. The required
propagation distance f in this example can be much less than that needed in the
case of free-space propagation of Figure 19.1. It must be emphasized that the
uncertainty principle does not impose any constraints on z, the requirement for
resolvability being only a restriction on the product of D and Dh.
An interesting feature of separating two beams by means of a lens is the
resulting dependence of the focused spots on the state of polarization. To reduce
the required propagation distance z, one may use a high-NA lens, thus enhancing
the polarization effects. The shortest focal length f is obtained when the NA of the
lens is close to unity, that is, f D/2. Figure 19.5 shows computed plots of
intensity distribution at the focal plane of an NA ¼ 0.99, f ¼ 250k lens, when the
incident beam is the two-beam superposition depicted in Figure 19.2. The three
columns of Figure 19.5 represent three different polarization states. In (a) both
incident beams are linearly polarized along X, which explains the elongation of
the spots in this particular direction. In (b) the two beams are linearly polarized
a b c
d e f
g h i
j k l
m n o
–500 x/ 500 –500 x/ 500 –500 x/ 500
Figure 19.3 Two overlapping plane waves depicted in Figure 19.2 propagate
along the Z-axis. The various intensity patterns in frames (a) to (o) are obtained
at z/(103k) ¼ 1, 2, 3, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, and 150,
respectively. Initially the beams strongly interfere with each other, but as
propagation proceeds, they separate and exhibit their individual identities.
Focal plane
Lens
D Z
Figure 19.4 Two identical beams of diameter D and angular separation Dh

may be isolated after going through an aberration-free lens. In the focal plane,
the distance between the focused spots is fDh, which must be greater than the
Airy disk radius of 1.2fk/D if the individual spots are to be resolved.
at 45 to the X- and Y-axes, i.e., the direction along which the spots are separated
from each other. The plot in (c) corresponds to the case when both beams are
circularly polarized. Frames (d)–(f) are the logarithmic versions of those in (a)–(c),
showing their detailed structure by emphasizing the weaker regions. Since the
assumed values of D ¼ 500k and Dh ¼ 0.004 rad satisfy the uncertainty relation
in Ineq. (19.1), the focused spots are seen to be resolved irrespective of their
polarization state.
Angular discrimination by means of a Fabry–Pérot etalon

Another device that can, in principle, accomplish the separation of two beams
via angular discrimination is a Fabry–Pérot etalon,3,4 such as that shown in
Figure 19.6. This particular etalon is tuned to transmit a plane wave of k ¼ 633 nm
at the incidence angle of h ¼ 45 . Figure 19.7 shows the etalon’s computed
reflection and transmission coefficients, rs ¼ jrsjexp (irs) and ts ¼ jtsjexp(its),
versus h for an s-polarized plane-wave of k ¼ 633 nm. It turns out that the shapes
of the transfer functions jrs(h)j and jts(h)j are not quite suitable for complete
separation of two finite-diameter beams of differing propagation directions.
Computed plots of intensity distribution in Figure 19.8 confirm that the etalon
of Figure 19.6 can only partially separate two beams of diameter D ¼ 2 · 104k
and angular separation Dh ¼ 0.115 ¼ 0.002 rad, even though the value
a b c
–1 x/ 1 –1 x/ 1 –1 x/ 1
d e f
–2 x/ 2 –2 x/ 2 –2 x/ 2
Figure 19.5 Total electric field intensity distribution (jEj2 ¼ jExj2þjEyj2þ

jEzj2) at the focal plane of a 0.99NA lens. (Rainbow colors: red ¼ maximum,
blue ¼ minimum). The beam at the entrance pupil is the superposition of two
D ¼ 500k beams of angular separation Dh ¼ 0.23 , as shown in Figure 19.2.
In (a) the assumed polarization state of both incident beams is linear along the
X-axis. In (b) the two beams are linearly polarized at 45 to the X-axis, i.e., along
the direction of separation of the spots. In (c) one of the beams is right-circularly
polarized, while the other is left-circularly polarized. Frames (d)–(f) in the
bottom row are the logarithmic versions of frames (a)–(c) in the top row. Like an
over-exposed photographic plate, a logarithmic plot reveals weak regions of an
intensity distribution.
of D(Dh/k) ¼ 40 in this case amply satisfies Ineq. (19.1). Figure 19.8(a) shows
the incident pattern of intensity distribution of the superposed beams upon
arriving at the etalon. One of these beams propagates along the direction that
makes a 45 angle with the etalon’s surface normal, while the other deviates from
this direction by Dh ¼ 0.115 . The reflected intensity profile depicted in Figure
19.8(b) contains mostly the latter beam, plus a small fraction of the former. This
is due to the imperfect transfer function of the etalon, which cannot fully transmit
the angular spectrum of the 45 beam, nor can it fully reflect the spectrum of the
45.115 beam. Either beam’s angular spectrum has a width of k/D 0.003 ,
which would readily pass through a narrow rectangular transfer function, but is
Reflected
beam Es Dielectric stack
Incident dair
beam
Es
Es
Substrate Transmitted
beam
Air gap
Substrate
Figure 19.6 Fabry–Pérot etalon designed for operation at k ¼ 633 nm, h ¼ 45 .
Dielectric mirrors each contain six pairs of high/low-index layers (n1 ¼ 2.0,
d1 ¼ 84.6 nm; n2 ¼ 1.5, d2 ¼ 119.6 nm). Both mirror substrates are glass (nsub
¼ 1.5), and the medium separating the mirrors is air (dair ¼ 55.95 lm).
The incidence angle on the etalon is in the vicinity of h ¼ 45 ; within the
substrate, however, the angle of incidence on the stack is close to h 0 ¼ 28.1255
(sin h ¼ nsub sin h 0 ). The etalon can separate two beams of identical k arriving
through an aperture of diameter D, but differing in propagation direction,
namely, h1 ¼ 45 , h2 ¼ 45 þ Dh. One beam is reflected by the etalon while
the other is transmitted. Only s-polarized light is considered here, although
p-polarized beams exhibit similar behavior.
partially blocked by the sharply peaked transfer functions of the etalon (see
Figure 19.7(a)). The same arguments apply to the transmitted intensity distri-
bution shown in Figure 19.8(c) which, although primarily composed of the 45
incident beam, still contains a fraction of the 45.115 beam.
To summarize the results of this and the preceding sections, there are several ways
of separating two overlapping beams of the same wavelength and differing propa-
gation directions. Some of these methods may be more effective than others, but none
could violate the uncertainty relation given by Ineq. (19.1). Moreover, Ineq. (19.1)
remains valid even if the beams are observed within a transparent medium of
refractive index n > 1. For instance, in Figure 19.1 if the region to the right of the
aperture happens to be filled with such a medium, the angular separation Dh of the
beams shrinks by a factor n upon entering the medium, but the length of the k-vector
increases by the same factor, thus preserving the magnitude of Dkx. Similarly, in
Figure 19.4 if the index of the medium on the right-hand side of the lens happens to be
n, the focused spot diameters will be n times smaller, but their center-to-center
spacing will also be reduced by the same factor, resulting once again in the preser-
vation of Ineq. (19.1).
1.0
(a) |r s |
0.8
Amplitude 0.6
0.4
0.2 |t s |
0.0
44.8 44.9 45.0 45.1 45.2
180
(b)
135
f ts
90
Phase (degrees)
45
0
–45
–90
–135 f rs
–180
44.8 44.9 45.0 45.1 45.2
Angle of Incidence (degrees)
Figure 19.7 Computed reflection and transmission coefficients versus the

incidence angle h for the etalon of Figure 19.6 at k ¼ 633 nm for an s-polarized
plane wave. The magnitude and phase of the reflection and transmission coeffi-
cients are defined through the relations rs ¼ jrsj exp(irs) and ts ¼ jtsj exp(its).
At k ¼ 633 nm the stack is tuned to fully transmit at h ¼ 45 . A small
deviation from 45 incidence causes a sharp drop in jtsj and a corresponding
rise in jrsj.
Co-propagating beams of differing wavelengths

A problem of general interest in spectroscopy is that of separating two beams of
slightly different wavelengths, k1 and k2, propagating in the same direction. In
this case the beam diameter D turns out to be irrelevant, but the available
propagation distance z is critical for isolating the individual beams.
–12500 x/ 12500
Figure 19.8 Two overlapping beams of uniform amplitude and circular cross-
section (k ¼ 633 nm, D ¼ 2 · 104k) arrive at the etalon of Figure 19.6. One beam
travels at h ¼ 45 relative to the etalon’s surface normal, the other at h ¼ 45.115 .
(a) Intensity distribution of the superposed beams at the entrance aperture.
(b) Reflected intensity distribution, consisting mainly of the second beam plus a
small fraction of the first. (c) Transmitted intensity distribution, consisting
mostly of the first beam plus a small fraction of the second.
S2
Detector 2
Detector 1
S1
50/50 Splitter
Incident beams 50/50 Splitter

Reflector
(1, 2)
45° Mirror
Figure 19.9 The Mach–Zehnder interferometer can be used to separate two

beams of differing wavelengths, k1 and k2. The beams have identical diameters and
arrive in the same direction. The two beams are split equally at the first 50/50
splitter, travel the two arms of the device, and are recombined at the second 50/50
splitter. The lengths of the two arms of the interferometer differ by Dz. When
Dz/k1 – Dz/k2 ¼ 1/2, constructive interference at the second beam-splitter for one of
the two wavelengths coincides with destructive interference for the other. The beams
are thus separated at the second splitter, one is captured by detector 1, the other by
detector 2. The 45 mirrors (three in each arm) have a reflectivity of 90%, resulting in
an overall system transmission of 73%. The 50/50 splitters are identical, each con-
sisting of a glass substrate coated with a six-layer dielectric stack as follows:
(substrate, nsub ¼ 1.5)/(d1 ¼ 30 nm, n1 ¼ 2.64)/(d2 ¼ 140 nm, n2 ¼ 1.76)/
(d3 ¼ 50 nm, n3 ¼ 2.64)/(d4 ¼ 105 nm, n4 ¼ 1.76)/(d5 ¼ 60 nm, n5 ¼ 2.64)/
(d6 ¼ 100 nm, n6 ¼ 1.76)/air.
Although the above stack works for both p- and s-polarized light, its splitting
ratio is much closer to 50/50 for s-light than for p-light. In our simulations the
polarization state of the incident beam was fixed at s.
A straightforward method of separating two beams of differing wavelengths is

shown in Figure 19.9. This Mach–Zehnder interferometer3 splits each input beam
into two equal halves, provides a separate path for each half, then recombines the
halves into a single beam at the output. For one of the wavelengths, say k1, the
path-length difference Dz between the two arms of the device may be an integer-
multiple of k1, in which case the corresponding half-beams interfere constructi-
vely and emerge from one exit channel of the interferometer. For the other
wavelength, k2, the path-length difference may be a half-integer-multiple of k2, in
which case interference will be destructive and the beam will emerge from a
Figure 19.10 Computed detector signals S1 and S2 versus the input wavelength
k in the Mach–Zehnder interferometer of Figure 19.9. The assumed path-length
difference between the two arms of the device is Dz ¼ 1.266 mm. In the vicinity
of k ¼ 633 nm the adjacent peaks of S1 and S2 are separated by Dk ¼ 0.158 nm, in
agreement with Eq. (19.2).
different exit channel of the device. Therefore, the separability condition for this
interferometer is Dz/k1Dz/k2 ¼ 12, or
Dz Dkz 2pDzDk=k2 ¼ p: ð19:2Þ
Figure 19.10 shows computed detector signals S1, S2 of the system of Figure 19.9
versus the input wavelength in the vicinity of k ¼ 633 nm. For the particular path-
length difference chosen in this example (Dz ¼ 1.266 mm), it is observed that, in
compliance with Eq. (19.2), a pair of beams having Dk ¼ 0.158 nm can be readily
separated from each other.
An alternative form of the uncertainty relation may be obtained in this case
by invoking the quantum-mechanical relation between the magnitude k of the
wave-vector and the photon energy E ¼ hm, namely, k ¼ 2p/k ¼ 2pm/c ¼ E/•c. For
two beams of wavelengths k and k þ Dk, co-propagating in the Z direction,
Dkz ¼ DE/•c. Also Dz ¼ cDt, where c is the speed of light and Dt is the time
needed for light to travel a distance Dz in free space. The product Dz Dkz is thus
proportional to DEDt, with • being the proportionality constant. One may thus
reinterpret Eq. (19.2) as a statement of the time-versus-energy uncertainty. When
the observations are made in a transparent medium of refractive index n > 1, the
increase of the k-vector by a factor of n dictates a corresponding decrease in Dz.
This is consistent with the reduced speed of light in the medium of index n, which
yields the same travel time Dt for the shorter propagation distance Dz/n. Needless
to say, DE ¼ hDm is independent of n.
Wavelength discrimination using a Fabry–Pérot etalon

The etalon of Figure 19.6 may also be used to separate co-propagating beams of
slightly different wavelengths, say k and k þ Dk. Figure 19.11 shows computed
plots of reflection and transmission coefficients versus k for a resonator having
an air gap dair ¼ 55.95 lm. From Eq. (19.2) at k ¼ 633 nm, considering that
Dz ¼ 2dair cos(45 ) ¼ 79.125 lm, we find Dk ¼ 2.53 nm, in agreement with the
peak-to-valley distance in the simulated results of Figure 19.11. The figure, how-
ever, indicates the feasibility of resolving beams with a smaller Dk as well; this is
due to the high finesse of the Fabry–Pérot etalon. In other words, multiple back
and forth reflections within the etalon’s cavity build up an optical field whose
amplitude is G times stronger than that of the incident beam. (In the present
example, G is 3.0 for s-light and 1.94 for p-light.) The effective Dz is thus G times
the effective gap width, resulting in a corresponding increase in the resolution of
the device.
Spectral analysis using a diffraction grating

Consider two co-propagating beams of wavelengths k and k þ Dk, where it is
assumed for convenience that Dk > 0. These beams travel along the Z-axis and
pass through an aperture of diameter D. By definition, kz ¼ 2p/k and, therefore,
Dkz ¼ 2pDk/k2. Figure 19.12 shows the above beams arriving at an incidence
angle 0 h < 90 on a grating of period P. The Nth diffracted order emerges
from the grating at an angle h 0 , in accordance with Bragg’s law,3,4
sin h0 ¼ sin h þ Nk=P; ð19:3aÞ
which yields
cos h0 Dh0 ¼ ðN=PÞDk: ð19:3bÞ

1.0
(a) |r p |
0.8
Amplitude 0.6
0.4
|t p |
0.2
0.0
630 632 634 636 638 640
1.0
(b) |r s |
0.8
Amplitude
0.6
0.4
0.2
|t s |
0.0
630 632 634 636 638 640
(nm)
Figure 19.11 Computed plots of amplitude reflection and transmission

coefficients versus k for the Fabry–Pérot etalon depicted in Figure 19.6. The air
gap and the incidence angle are fixed at dair ¼ 55.95 lm and h ¼ 45 . The inci-
dent beam is p-polarized in (a) and s-polarized in (b).
Now, the emergent beam diameter is D0 ¼ Djcosh 0 /coshj. Since the lens is
expected to resolve the two wavelengths, Ineq. (19.1) requires that jDh 0 j k/D0 ,
which leads to jcos h 0 Dh 0 j k cos h/D, which in turn leads to jN /Pj Dk k cos h/D.
In other words,
D=cos h ðk=DkÞjP=N j: ð19:4aÞ
From Eq. (19.3a) it is clear that jNk/Pj 2, that is jP/N j 12k. Inequality (19.4a)
may thus be written as follows:
D=cos h 12 k2 =Dk: ð19:4bÞ

D Z

Grating
Lens
f
Figure 19.12 Two beams of wavelengths k and kþDk, propagating in

the same direction Z, arrive at an aperture of diameter D. The beams propagate a
distance Dz1 from the center of the aperture to a grating of period P, shining on
the grating at an angle h. One of the diffracted orders (the Nth order) leaves the
grating at an angle h 0 , travels a distance Dz2 (from the center of the grating to the
center of the lens), then enters a lens of focal length f and numerical aperture
NA 1. Emerging from the grating, the two wavelengths deviate from each
other by an angle Dh 0 , thus forming separate focused spots at the focal plane of
the lens. From the entrance aperture to the focal plane, the total propagation
distance is Dz ¼ Dz1þDz2þf.
Inequality (19.4b) places a lower bound for resolvability not on the beam
diameter D, but on the illuminated length of the grating, D/cos h, in the direction
perpendicular to the grooves.
Next we examine the propagation distance from the center of the entrance
aperture to the focal plane of the lens. With reference to Figure 19.12, the
shortest possible distance from the entrance aperture to the grating center is
Dz1 ¼ 12 D tan h. Similarly, the shortest possible distance from the grating to
the lens center (ignoring the possibility that the lens might block the incident
beam) is Dz2 ¼ 12 D0 jtanh 0 j ¼ 12 D jsinh 0 j/cosh. The smallest feasible focal length
for the lens is f ¼ 12D 0 , corresponding to NA ¼ 1. Therefore, the shortest distance Dz
from the center of the entrance aperture to the focal plane of the lens is given by
Dz ¼ Dz1 þ Dz2 þ f ¼ 12 ðD=cos hÞðsin h þ jsin h0 j þ jcos h0 jÞ: ð19:5aÞ
Since sinh 0, and jsinh 0 jþjcosh 0 j 1 for any h 0 , Eq. (19.5a) yields
Dz 12 D=cos h: ð19:5bÞ
Combining Ineqs. (19.4b) and (19.5b) then yields Dz 1

4 k2/Dk, that is,
Dz Dkz 12 p: ð19:6Þ
Note that the initial beam diameter D in this example is not restricted at all,
whereas the propagation distance Dz is required to be greater than a certain
minimum, 14 k2/Dk, to ensure resolvability of the wavelengths k and k þ Dk.

Addison-Wesley, Reading, Massachusetts (1964).
2 A. E. Siegman, Lasers, University Science Books, California (1986).
1980.
4 M. V. Klein, Optics, Wiley, New York (1970).
20
Omni-directional dielectric mirrors
An omni-directional dielectric mirror (also known as a one-dimensional photonic

bandgap crystal)1,2 exhibits 100% reflectivity at all angles of incidence and for all
states of incident polarization.3,4 Unlike metallic mirrors, which absorb a small
fraction of the incident optical power, dielectric reflectors are lossless. These
properties make omni-directional dielectric mirrors ideal candidates for appli-
cations in which a beam of light in an unknown or unpredictable polarization
state is likely to arrive at the mirror from any direction, and in which loss of light
at the mirror, no matter how small, is deemed intolerable. A good example is
provided by the walls of an optical waveguide. Since there are numerous
reflections from the wall as a beam of light travels through, even small losses at
each encounter with a wall rapidly deplete the beam’s energy.
A typical omni-directional reflector is a periodic stack of bilayers, each bilayer
consisting of a high-index and a low-index dielectric layer. The larger the
refractive indices of the available dielectrics (and also the larger the difference
between these indices), the easier it is to design the reflector. For example, if the
two materials available for fabricating a stack of bilayers have indices n1 ¼ 1.5
and n2 ¼ 2.0, it is impossible to obtain omni-directionality for both p- and
s-polarized light. However, with n1 ¼ 1.5 and n2 2.3, an omni-directional
reflector can be designed. When the available dielectrics have reasonably large
indices, it is also possible (by properly selecting the layer thicknesses) to achieve
omni-directionality over a broad range of wavelengths.
In this chapter we describe a theory of omni-directional reflectors, and outline a
method of selecting the layer thicknesses for a given pair of indices n1, n2. Although
the following discussions are confined to flat mirrors, it is clear that the exterior of a
glass cylinder (or the interior of a hollow tube) may also be coated with omni-
directional reflectors. As long as the diameters of these cylinders/tubes are not too
small (compared to the incident wavelength) the walls will appear flat locally and,
therefore, the cylinder/tube may be used as an essentially lossless waveguide.4
274
General properties
Consider a periodic multilayer stack such as that depicted in Figure 20.1. The
stack consists of an infinite number of identical blocks, each block having
reflection coefficients r ¼ jrj exp(ir) from the top side and r 0 ¼ jr 0 j exp(ir 0 )
from the bottom side, as well as transmission coefficient t ¼ jtj exp(it) from
either side. In general, r, r 0 and t are functions of the incidence angle h and
the polarization state (p or s) of the incident light. From the reciprocal
properties of electromagnetic waves in non-absorbing media, it is known
that t should be the same whether the incidence is from the top or from
the bottom side, that jrj ¼ jr 0 j, and that 12(rþr 0 ) ¼ t 90 (see Chapter 17,
Reciprocity in classical linear optics). Also, from conservation of energy,
jrj2þjtj2 ¼ 1.
As shown in Figure 20.1, one can express the reflection coefficient r0 from the
top of the stack in terms of the parameters r, r 0 , t of the individual blocks by
assuming a diminishing air-gap between the top unit and the rest of the stack.
Denoting the round-trip phase delay within this (artificial) air-gap by d, and
r
r0t2ei
r02r9t2ei2
tr02r92ei2
..
.
Figure 20.1 Method of calculating the reflection coefficient r0 of a periodic

stack in terms of the parameters r, r0 , t of the individual blocks that comprise
the stack.
recognizing that the reflectivity r0 of the infinite stack is the same with and
without itsuppermost block, we write
r0 ¼ lim r þ r0 t2 expðidÞ þ r02 r 0 t2 expð2idÞ þ r03 r 02 t2 expð3idÞ þ
d!0
¼ ½r r0 ðrr 0 t2 Þ=ð1 r0 r0 Þ
¼ fr r0 exp½iðr þ r0 Þg=ð1 r0 r0 Þ: ð20:1Þ
The above formula is a quadratic equation in r0. A perfect reflector requires that
jr0j ¼ 1; Eq. (20.1) then yields the following expression for the phase 0 of r0 in
terms of jrj, r, and r 0 :

1 1
cos 0 ðr r0 Þ ¼ cos ðr þ r0 Þ jr j: ð20:2Þ
2 2
Since in practice the actual value of 0 is irrelevant, the above equation predicts
that the reflectivity R0 ¼ jr0j2 of the stack will be unity provided that the right-hand
side of Eq. (20.2) is confined to the interval [1, þ1]; in other words, the necessary
and sufficient condition for the infinite dielectric stack of Figure 20.1 to have 100%
reflectivity may be written as follows:

1
jr j>cos ðr þ r0 Þ : ð20:3aÞ
2
Using the identity jrj 2 þ jtj 2¼ 1 and the relation among r , r 0 , t mentioned
earlier, Ineq. (20.3a) may be written in either of the following alternative forms:
jrj > jsin t j; ð20:3bÞ
jtj < jcos t j: ð20:3cÞ
The three inequalities (20.3a), (20.3b), and (20.3c) are equivalent and may be
used interchangeably. As an example, consider a unit block consisting of a pair of
high-index, low-index layers, each a quarter-wave thick at the free-space wave-
length of k0 ¼ 633 nm at normal incidence (i.e., h ¼ 0 ). Let n1 ¼ 2, t1 ¼ 79.0 nm,
n2 ¼ 1.5, t2 ¼ 105.5 nm. Figure 20.2 shows plots of jrj and cos [12 (r þ r 0 )]
in frame (a), jtj and cos t in frame (b), as functions of h for both p- and s-polarized
incident plane-waves. Note that Ineqs. (20.3) are satisfied for p-light when 0 < h <
40 , and for s-light when 0 < h < 52 .
The computed p- and s-reflectivities for a quarter-wave stack consisting of twenty
repetitions of the above bilayer are shown in Figure 20.3. As expected, Rp0 ¼ jrp0j2 1 in
the incidence range 0 < h < 40 , and similarly Rs0 ¼ jrs0j2 1 in the range 0 < h < 52 .
1.0 1.0
|tp|
|ts|
0.8 cos[(
rs +
rs)/2] 0.8
Transmission Coefficient
cos(
tp)
0.6 0.6
cos(
ts)
0.4 |rs| 0.4
|rp|
0.2 0.2
cos[(
rp +
rp)/2]
0.0 0.0
0 15 30 45 60 75 90 0 15 30 45 60 75 90
u (degrees) u (degrees)
Figure 20.2 Plots of the various functions appearing in Ineqs. (20.3) for a
bilayer consisting of a pair of dielectric layers, each having a quarter-wave
thickness at the vacuum wavelength of k0 ¼ 633 nm at normal incidence.
(n1 ¼ 2.0, t1 ¼ 79.0 nm, n2 ¼ 1.5, t2 ¼ 105.5 nm.)
Single dielectric layer

In principle, the unit block from which an omni-directional reflector is con-
structed can be a bilayer or a multilayer, which may even contain gradient-index
layers. It should be obvious that a single homogeneous layer (say, having index n
and thickness d) will never produce a 100% reflector; therefore, Ineqs. (20.3)
cannot be satisfied for such a layer. (We note in passing that, for a single layer,
r ¼ r 0 ¼ t 90 .)
Let us examine in some detail the single dielectric layer shown in Figure 20.4.
The monochromatic plane wave of wavelength k0 is incident on the top surface of
the layer at an angle h; the Fresnel reflection and transmission coefficients of the
top surface are q and s. Inside the layer the transmitted wave-vector makes an
angle h0 with the surface normal. The reflection and transmission coefficients at
the bottom surface, where the light shines from within the slab onto the glass–air
interface, are q 0 and s0 . Reciprocity may be invoked to show that, for both p- and
s-polarization, q0 ¼ q and q2 þ ss0 ¼ 1 at all h. The dependences of q on n and h
1.0
0.8 Rso
0.6
Reflectivity
0.4
Rpo
0.2
0.0
0 15 30 45 60 75 90
(degrees)
Figure 20.3 Computed reflectivity R versus h for p- and s-polarized light for
a quarter-wave stack consisting of twenty repetitions of the bilayer depicted in
Figure 20.2. Rpo and Rso are 100% in those regions where Ineqs. (20.3) are
satisfied.
for p- and s-light are given by the Fresnel formulas5

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
qp ¼ n sin h n cos h
2 2 2
n sin h þ n cos h ;
2 2 2
ð20:4aÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
qs ¼ cos h n sin h
2 2
cos h þ n2 sin2 h : ð20:4bÞ
Also, the single-path phase-shift D acquired through the thickness d of the

slab is
D ¼ 2pðnd=k0 Þ cos h0 ; ð20:4cÞ


ei2

d
n
ei
Figure 20.4 Method of calculating the reflection and transmission coeffi-

cients, r and t, for a single-layer dielectric slab of thickness d and refractive
index n.
where h0 is the angle of the refracted ray.5 The slab’s reflection and transmission
coefficients, r and t, may be obtained by summing the infinite number of rays
multiply reflected from its front and rear facets, namely,
r ¼ q þ ss0 q0 expði2DÞ þ ss0 q03 expði4DÞ þ ; ð20:5Þ
t ¼ ss0 expðiDÞ þ ss0 q0 expði3DÞ þ ss0 q0 expði5DÞ þ :

2 4
ð20:6Þ
When simplified, the above expressions yield

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jrj ¼ 2ð q sin DÞ= q4 2q2 cosð2DÞ þ 1; ð20:7aÞ
r ¼ arctanfð1 q2 Þ=½ð1 þ q2 Þ tan Dg; ð20:7bÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jtj ¼ ð1 q2 Þ= q4 2q2 cosð2DÞ þ 1; ð20:8aÞ
t ¼ arctanf½ð1 þ q2 Þ tan D=ð1 q2 Þg: ð20:8bÞ
Equations (20.7) and (20.8) readily confirm that jrj2 þ jtj2 ¼ 1, that r ¼ t 90 ,
and that a single-layer slab does not satisfy Ineq. (20.3).
Double layer
Next, consider the bilayer slab depicted in Figure 20.5. The top layer has index
n1, thickness d1, and reflection and transmission coefficients r1, t1 at the incidence
angle h. The corresponding parameters of the second layer are n2, d2, r2, t2. To
determine the bilayer’s overall transmission coefficient t, we assume a small air
gap between the two layers and proceed to sum the partial transmission coeffi-
cients. We find, in the limit of a vanishing gap,
t ¼ t1 t2 þ t1 t2 r1 r2 þ t1 t2 r12 r22 þ ¼ t1 t2 =ð1 r1 r2 Þ: ð20:9Þ
It is now easy to apply criterion (20.3c) to the bilayer’s transmission coefficient

given by Eq. (20.9), to determine the conditions under which an infinite stack of
bilayers becomes a 100% reflector. Both the necessary and sufficient conditions
turn out to be

jt1 jjt2 j jr1 jjr2 j cosðr1 þ r2 Þ; ð20:10Þ
which is actually two distinct inequalities in one, depending on whether the

absolute value on the right-hand side is that of a positive or a negative quantity.
Substituting in Eq. (20.10) for r, t, in terms of q and D from Eqs. (20.7, 20.8),
d1
n1
d2
n2
Figure 20.5 Method of calculating the transmission coefficient t of a bilayer

slab consisting of two dielectric layers, one having thickness d1, index n1, the
other having thickness d2, index n2.
we find the necessary and sufficient conditions for 100% reflectivity to be

2 1
Gp;s ðn1 ; n2 ; hÞ sin D1 sin D2 cos ðD1 þ D2 Þ ; ð20:11aÞ
2

1
Gp;s ðn1 ; n2 ; hÞ sin D1 sin D2 sin2 ðD1 þ D2 Þ ; ð20:11bÞ
2
where
Gp;s ðn1 ; n2 ; hÞ ¼ ðq1 q2 Þ2 =½ð1 q21 Þð1 q22 Þ: ð20:11cÞ
Inequalities (20.11a,b) are the fundamental results of this section, each

expressing the condition (both necessary and sufficient) for the attainment of
R ¼ 1.0 from a periodic stack of bilayers. Whereas Ineq. (20.11a) leads to bilayer
designs in which both layer thicknesses are close to k/4, Ineq. (20.11b) yields
structures in which one layer’s thickness is k/4 while the other’s is 3k/4.
Discussion
We begin by examining the behavior of Gp,s(n1, n2, h). According to Eq. (20.11c) this
function depends only on q1 and q2, which, in turn, are dependent on n1, n2, the angle
of incidence h, and the polarization state of the beam, but not on layer thicknesses d1
and d2. For fixed values of n1, n2 the function depends only on h and on the
polarization state. Substituting from Eqs. (20.4a) and (20.4b) into Eq. (20.11c) yields
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
Gs ðn1 ; n2 ; hÞ ¼ ðn2 sin2 hÞ=ðn22 sin2 hÞ
4 q1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi
1 1
þ ðn22 sin2 hÞ=ðn21 sin2 hÞ ; ð20:12aÞ
4 2
1 2
Gp ðn1 ; n2 ; hÞ ¼ ðn2 =n1 Þ ðn21 sin2 hÞ=ðn22 sin2 hÞ
4 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 2 1
þ ðn1 =n2 Þ ðn22 sin2 hÞ=ðn21 sin2 hÞ ; ð20:12bÞ
4 2
Gp,s(n1, n2, h) is plotted versus h in Figure 20.6 for both p- and s-polarized plane-
waves for the specific values (a) n1 ¼ 1.5, n2 ¼ 2.0, and (b) n1 ¼ 1.3, n2 ¼ 1.2.
Although the selected values of n1, n2 are specific, the shapes of the functions are
quite general. The two functions for p- and s-light are always positive; they both
start, at h ¼ 0 (normal incidence), at the same level, from there Gs goes up
and Gp goes down with increasing h. The behavior shown in Figure 20.6(a),
where Gs increases while Gp decreases monotonically, is typical of situations of
interest in this chapter, where the Brewster angle hB is inaccessible from outside
the multilayer. The behavior depicted in Figure 20.6(b), where Gs increases
0.05
(a)
n1 = 1.50
0.04 n2 = 2.00 Gs(n1, n2, u)

G (n1, n2, u)
0.03
0.02
Gp(n1, n2, u)
0.01
0.00
0 15 30 45 60 75 90
0.0125
(b)
n1 = 1.30
0.0100 n2 = 1.20
G (n1, n2, u)
0.0075
Gs(n1, n2, u)
0.0050
0.0025
Gp(n1, n2, u)
0.0000
0 15 30 45 60 75 90
u (degrees)
Figure 20.6 Plots of the functions Gp,s (n1, n2, h) versus h for specific values of
n1, n2. In (a) n1 and n2 are large enough to satisfy Ineq. (20.13), thus ensuring
that the Brewster angle is inaccessible from outside the stack. In (b) the Brewster
angle is reached at h ¼ 61.9 .
monotonically while Gp first drops to zero at hB ( 61.9 in this example) before

rising again, is typical of situations where hB can be accessed from outside the
stack. Since Rp at h ¼ hB cannot be made equal to unity, the latter situation is of
no interest here.
One can readily show that the slope of Gs versus h is always positive. Gp,
on the other hand, has a negative slope at h ¼ 0 , which continues to be negative
up to where sin h ¼ n12 n22/(n12 þ n22). At this point Gp achieves its minimum value
of zero, then rises until grazing incidence at h ¼ 90 . The angle h at which Gp is a
minimum corresponds to the Brewster angle hB between two media of indices n1
and n2. When hB is accessible from the incidence medium (air in this case), it will
be impossible to achieve 100% reflectivity at this particular angle. Therefore, we
impose the following constraint on the indices of the bilayer:
ð1=n1 Þ2 þ ð1=n2 Þ2 <1: ð20:13Þ
In this way, Gp,s(n1, n2, h) will always exhibit the typical behavior depicted in Figure
20.6(a), namely, both functions start at the same level, 14(n1/n2) þ 14(n2/n1)12,
when h ¼ 0 ; from there Gs increases and Gp decreases, both monotonically, with an
increasing h.
Inequality (20.11a) can be satisfied over the entire range of h for both p- and
s-light if D1 and D2 are maintained around p/2 throughout the range h ¼ [0 , 90 ].
Likewise, Ineq. (20.11b) can be satisfied if D1 is kept around p/2 while D2 is kept
around 3p/2 (or vice versa). When n1 and n2 are far apart, Gs and Gp will be fairly
large, and choosing d1 and d2 to satisfy the requisite inequalities for all h will not
be difficult. When n1 and n2 are close together, however, it is easier to maintain
D1 and D2 both around p/2 (if at all possible), rather than to keep one of them
around 3p/2. This is simply because the variations with h will be greater for that
D which stays near 3p/2. We limit the following discussion to stacks that satisfy
Ineq. (20.11a), but emphasize that a similar class of reflectors based on Ineq.
(20.11b) is feasible as well.
Selecting layer thicknesses

The parameters d1, d2 should be chosen to satisfy Ineq. (20.11a) for all h from 0
to 90 . Since Gs Gp, one should try to achieve omni-directional reflectivity for
p-light only; the s-reflectivity will automatically follow suit.
The phase D acquired in a single path through a layer of thickness d and
index n at incidence angle h is given in Eq. (20.4c). D can be made equal to p/2
at some arbitrary angle of incidence, say, h ¼ h0, by choosing the layer thick-
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ness d such that nd cos h00 ¼ d n2 sin2 h0 ¼ k0/4. Since h can be anywhere
between 0 and 90 , one should choose h0 in such a way as to make D vary
symmetrically around p/2. This is achieved when
pffiffiffiffiffiffiffiffiffiffiffiffiffi
d=k0 ¼ 0:5=ðn þ n2 1Þ: ð20:14Þ
The maximum deviation of D from p/2 is then given by

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D 1p 1
¼ p 1 1n 2 1 þ 1 n2 : ð20:15Þ
2 max 2
The expression on the right-hand side of Eq. (20.15) is a monotonically
decreasing function of n, going from p/2 to zero as n goes from 1 to 1.
Therefore, the range of variation of D is smaller for larger values of n.
If d is chosen in accordance with Eq. (20.14), variations of D with h will
be symmetric around D ¼ p/2; as h goes from 0 to 90 , D, which is larger than p/2 in
the beginning, drops to p/2, then decreases further until its value at grazing incidence
becomes equal to that at normal incidence. In this way, sin D remains close to unity,
swinging from just below þ1 to þ1, and then back to its initial value, as the
incidence goes from normal to grazing. Similarly, cos2D varies symmetrically
around 0, moving, between normal and grazing incidence, from a small positive
value to zero, then back again to its initial (positive) value. The choice of d according
to Eq. (20.14) thus minimizes the swing of sinD around its desired value of þ1, while
simultaneously minimizing the swing of cos2D around its desired value of zero.
In the case of a bilayer, one must select values for d1 and d2; both can be chosen
to satisfy Eq. (20.14) with the corresponding value of n. In this way, Ineq. (20.11a)
is likely to be satisfied, because sin D1 sin D2 will remain around unity, with a
minimum swing throughout the range of h, and, similarly, cos2[(D1þD2)/2]
remains around zero. Figure 20.7 shows plots of these functions for a bilayer
consisting of materials with n1 ¼ 1.5, n2 ¼ 2.0. Layer thicknesses obtained from
Eq. (20.14) are d1 ¼ 0.191k0 and d2 ¼ 0.134k0. With these choices, Figure 20.7
shows that sin D1 sin D2 remains above 0.97 while cos2[(D1þD2)/2] remains below
0.03 throughout the entire range of h.
Unfortunately, the function Gp(n1, n2, h) shown in Figure 20.6(a) is extremely
small, and the aforementioned choice of layer thicknesses is not able to yield a 100%
reflector through the entire range of incidence. Because Gp is relatively large near
normal incidence but decreases with an increasing h, it may be helpful to make the
curves of sin D1 sin D2 and cos2[(D1þD2)/2] slightly asymmetric. This is done by
choosing a somewhat larger thickness for layer 2, as shown in Figure 20.7, where the
dashed curve corresponds to d2 ¼ 0.147k0 and the dotted curve to d2 ¼ 0.161k0. (d2 is
a better choice for this purpose than d1, because the corresponding layer has a larger n
and, therefore, its variation with h is smaller, thus causing a smaller variation in the
functions depicted in Figure 20.7.) Note in Figure 20.7 that, as d2 increases, at large h,
the function cos2[12(D1 þ D2)] is depressed, and the function sin D1 sin D2 moves
toward unity (at least initially); both of these trends are helpful in satisfying
Ineq. (20.11a). Unfortunately, the other side of the curves (i.e., the side around
h ¼ 0 ) moves in the wrong direction, making it harder to satisfy the inequality at and
1.00 (a)
0.98
0.96 0.134
sin Δ1 sin Δ2
0.147
0.94
0.92 d2/0 = 0.161
0.90
n1 = 1.5, n2 = 2.0
0.88 d1 /0 = 0.191
0 15 30 45 60 75 90
0.12 (b)
n1 = 1.5, n2 = 2.0
d1 /0 = 0.191
0.10 d2/0 = 0.161
0.08
cos2[(Δ1 + Δ2)/2]
0.06 0.147
0.04
0.134
0.02
0.00
0 15 30 45 60 75 90
u (degrees)
Figure 20.7 (a) Plots of sinD1 sinD2 versus h for a bilayer slab consisting
of materials with n1 ¼ 1.5 and n2 ¼ 2.0. The first layer’s thickness is fixed at
d1 ¼ 0.191k0, but the second layer’s assumes one of three different values.
(b) Same as (a) for the function cos2[12(D1þD2)].
around normal incidence. All in all, it turns out that it is impossible to design an omni-
directional reflector with bilayers having n1 ¼ 1.5 and n2 ¼ 2.0.
Figure 20.8 shows the best that one can achieve with n1 ¼ 1.5, n2 ¼ 2.0, and
layer thicknesses d1 ¼ 0.191k0, d2 ¼ 0.134k0 (chosen to satisfy Eq. (20.14)).
0.04
(a)
s-polarization
Gp, s (n1, n2, u) sin Δ1 sin Δ2 – cos2 [(Δ1 + Δ2)/2)]

0.03
0.02
0.01 p-polarization
0.00
–0.01
–0.02
–0.03
–0.04
0 15 30 45 60 75 90
1.0 (b)
Rso
0.8
Rpo
Reflectance
0.6
0.4
0.2
0.0
0 15 30 45 60 75 90
u (degrees)
Figure 20.8 (a) Plots of Gp,s(n1, n2, h) sinD1 sinD2 cos2 [12(D1þD2)] versus
h for a bilayer having n1 ¼ 1.5, d1 ¼ 0.191k0 and n2 ¼ 2.0, d2 ¼ 0.134k0.
At small angles of incidence both p- and s-light violate Ineq. (20.11a), while
at large angles only p-light is inadequate. (b) Computed plots of p- and
s-reflectivity, Rp0, Rs0, versus h for a twenty-period stack of the above
bilayer. The regions of 100% reflectivity coincide with those that satisfy
Ineq. (11a).
0.090
(a)
n1 = 1.5, d1 = 0.1910
Gp, s (n1, n2, ) sinΔ1 sinΔ2 – cos2[(Δ1+ Δ2)/2)]

0.075 n2 = 2.3, d2 = 0.1220 s-polarization
0.060
0.045
0.030
p-polarization
0.015
0.000
0 15 30 45 60 75 90
1.000 (b)
Rso
0.999
0.998
Reflectance
Rpo
0.997
0.996
0.995
0 15 30 45 60 75 90
(degrees)
Figure 20.9 Same as Figure 20.8 for a twenty-period stack consisting of layers
with n1 ¼ 1.5, d1 ¼ 0.191k0 and n2 ¼ 2.3, d2 ¼ 0.122k0. It is seen in (a) that Ineq.
(20.11a) holds for both polarization states throughout the entire range of inci-
dence. In (b) the reflectances are 100% everywhere. The slight drops in Rpo
and Rso are due to the fact that the assumed stack consists only of a finite number
of bilayers; the reflectivity will rise rapidly if the total number of bilayers
comprising the stack is increased. Note that the smaller the functions depicted in
(a) become, the harder it is to obtain 100% reflectivity from a finite stack.
Figure 20.8(a) shows plots of Gp,s(n1, n2, h)sinD1 sinD2 cos2 [12(D1 þ D2)]
versus h for both p- and s-polarized light; both functions must stay above zero to
satisfy Ineq. (20.11a). Figure 20.8(b) shows computed plots of reflectivity, Rp0
and Rs0, versus h for a twenty-period stack of this bilayer. It is seen that 100%
reflectivity is achieved in exactly those regions where the functions depicted in
Figure 20.8(a) are positive-valued.
Designing an omni-directional reflector

To achieve 100% reflectivity over the entire range of incidence requires materials
with larger indices (or a larger index difference) than those examined above. For
example, by keeping n1 at 1.5 but raising n2 to 2.3 it is possible to satisfy Ineq.
(20.11a) with d1 ¼ 0.191k0 and d2 ¼ 0.122k0, as shown in Figure 20.9. The plots
in Figure 20.9(a) confirm that Ineq. (20.11a) for both p- and s-light is satisfied
over the entire range of incidence. Figure 20.9(b) shows plots of reflectivity
versus h for a twenty-period stack; both Rpo and Rso are seen to be greater than
99.5% between normal and grazing incidence. To increase the reflectivity beyond
99.5% one should either increase the number of bilayers comprising the stack, or
use materials which have a larger index difference.
Finally, it must be mentioned that omni-directional reflection can be achieved
not just for one wavelength but over a continuous range of wavelengths. To the
extent that n1 and n2 remain constant over the desired range of k, the functions Gp
and Gs remain the same at all wavelengths of interest. Designing an omni-
directional reflector for a range of k then reduces to choosing thicknesses d1 and
d2 that satisfy Ineq. (20.11a) throughout that range. The techniques described in
this chapter can be readily extended to allow adjusting layer thicknesses for the
desired band of wavelengths.

1 E. Yablonovitch, Phys. Rev. Lett. 58, 2059 (1987).
2 J. D. Joannopoulos, R. D. Meade, and J. N. Winn, Photonic Crystals, University of
Princeton Press, Princeton, N.J., 1995.
3 J. N. Winn, Y. Fink, S. Fan, and J. D. Joannopoulos, Omnidirectional reflection from
a one-dimensional photonic crystal, Optics Letters 23, 1573–1575 (1998).
4 Y. Fink, J. N. Winn, S. Fan, C. Chen, J. Michel, J. D. Joannopoulos, and
E. L. Thomas, A dielectric omnidirectional reflector, Science 282, 1679–1682 (1998).
1980.
21
Linear optical vortices†
An optical vortex is a phase singularity nested within the cross-sectional profile

of a coherent beam of light.1,2,3 Such vortices occur naturally in the electro-
magnetic mode structure of certain optical cavities.4 They may also be created
artificially by computer-generated holograms designed to impart to an incident
beam of light the desired phase and amplitude characteristics of a vortex.5 In
recent years the study of vortices has become the focus of several research groups
around the world, as potential applications have emerged. Noteworthy examples
of such applications are the manipulation of small objects by optical tweezers6
and the control of atomic or molecular beams via the exchange of angular
momentum with optical vortices.7
Mathematical description
The complex-amplitude distribution of a simple vortex of order m centered at
(x0, y0) in the cross-sectional plane of a Gaussian beam may be written as
Aðx; y; z ¼ 0Þ ¼ ½ðx x0 Þ þ i signðmÞ ðy y0 Þjmj exp½ðx2 þ y2 Þ=r02 : ð21:1Þ
The sign of the integer m determines whether the vorticity is clockwise or

counterclockwise, and the magnitude of m is the number of 2p phase shifts in one
cycle around the singularity. For the amplitude distribution to be single-valued it
must go to zero at the center, as is indeed the case in Eq. (21.1). The host is here a
circular Gaussian beam having 1/e radius r0 at the waist, and the beam’s
propagation direction is along the Z-axis.
Figure 21.1 shows an m ¼ þ1 vortex centered at (x0, y0) ¼ (0, 0) within a
Gaussian beam of radius r0 ¼ 10k, where k is the wavelength of the light. The
†
This chapter was coauthored with Ewan M. Wright of the College of Optical Sciences, University of Arizona.
289
–15 x/ 15
Figure 21.1 A Gaussian beam, having 1/e radius r0 ¼ 10k at the waist, hosts an
m ¼ þ1 vortex at its center. (a) Intensity and (b) phase distribution at the beam
waist. (c) Interferogram with a tilted plane wave.
intensity distribution in Figure 21.1(a) has a hole at the center, and the phase
distribution in Figure 21.1(b) displays a continuous variation from 0 to 2p around
the vortex. If this vortex is made to interfere with a tilted plane wave, the
resulting fringe pattern would resemble that in Figure 21.1(c). The fork at the
center of the fringe pattern created by the splitting of a single fringe is charac-
teristic of all first-order vortices.
An important feature of vortices is that they maintain their identity as
they propagate through space. Figure 21.2 shows the intensity and phase
–65 x/ 65
Figure 21.2 The beam of Figure 21.1 propagates to its Rayleigh range at
z ¼ 314k. (a) Intensity, (b) phase. The phase singularity is now mixed with the
wavefront curvature.
distributions of the vortex of Figure 21.1 after propagating to the Rayleigh

range of the host Gaussian beam at z ¼ pr02/k ¼ 314k. Clearly the central hole in
the intensity pattern and the 2p phase variation around the singularity are
preserved, even though the phase is mixed with the curvature acquired during
propagation.
The flow of energy

Figure 21.3 shows another example of an optical vortex, this one of order
m ¼ 3. The intensity distribution in Figure 21.3(a) is similar to that of the
first-order vortex in Figure 21.1(a). However, the phase distribution depicted
in Figure 21.3(b) shows a continuous 6p variation around the center. The
fringe pattern in Figure 21.3(c), produced by allowing the vortex to interfere
with a tilted plane wave, has the characteristic fork; here one fringe splits
into four.
Figure 21.4 shows the distribution of the Poynting vector S for the above
vortex. Shown from top to bottom are the x-, y-, and z- components of S encoded
in gray-scale (black corresponds to a minimum, white to a maximum). The
–20 x/ 20
Figure 21.3 Same as Figure 21.1 for an m ¼ 3 vortex.
normalized ranges of values are: 0.04 Sx 0.04, 0.04 Sy 0.04, and

0 Sz 1. So, for example, in Figure 21.4(a) the bright regions indicate that Sx is
directed along þX, while in the dark regions Sx is directed along X. Similarly, Sy
on the left-hand side of Figure 21.4(b) is directed along þY, while on the right-
hand side it is directed along Y. In Figure 21.4(c), where Sz 0, large positive
values appear bright whereas those in the vicinity of zero are dark. Overall, S
spirals around the Z-axis. The electromagnetic energy, therefore, does not flow
straightforwardly along the optical axis but twists and turns as the beam moves
forward.
–20 x/ 20
Figure 21.4 From top to bottom: x-, y-, and z-components of the Poynting
vector S for the vortex of Figure 21.3. The minimum value of each function is
shown as black and its maximum value as white, the intermediate values being
covered by the gray-scale. The depicted ranges of values (in normalized units)
are: 0.04 Sx 0.04, 0.04 Sy 0.04, 0 Sz 1. The Poynting vector here
has clockwise circulation around the optical axis.
When the vortex of Figure 21.3 propagates to the Rayleigh range at z ¼ 314k,
the intensity and phase patterns of Figure 21.5 are obtained. As before, the
central hole in the intensity pattern and the singularity of the phase pattern are
preserved, but the phase is now mixed with the curvature of the diverging
wavefront.
–65 x/ 65
Figure 21.5 The beam of Figure 21.3 is propagated to its Rayleigh range at
z ¼ 314k. (a) Intensity, (b) phase. The phase singularity is mixed with the
wavefront curvature.
Vortex pair
The complex amplitude of a beam containing multiple vortices may be written as
the product of terms similar to those appearing in Eq. (21.1), namely,
( )
YN
Aðx; y; z ¼ 0Þ ¼ ½ðx xn Þ þ i signðmn Þ ðy yn Þjmn j exp ðx2 þ y2 Þ=r02 :
n¼1
ð21:2Þ
This equation represents N vortices nested in a Gaussian beam of radius r0;

the nth vortex whose order is mn is centered at the point (xn, yn) within the
XY-plane.
Figure 21.6 shows two identical m ¼ þ1 vortices, separated by a distance
d ¼ 11k in a Gaussian beam with r0 ¼ 10k. From left to right, the distributions
represent the beam at its waist (z ¼ 0), at the Rayleigh range (z ¼ 314k), and in
the far field (z ¼ 2000k). Note that the three cross-sections in this figure are
a b c
–30 x/ 30 –40 x/ 40 –165 x/ 165
Figure 21.6 A pair of identical m ¼ þ1 vortices separated by d ¼ 11k, nested

within a Gaussian beam having r0 ¼ 10k at the waist. Shown are the loga-
rithmic plots of intensity (top row) and phase (bottom row) at several distances
from the waist. (a) At the waist of the beam. (b) At the Rayleigh range,
z ¼ 314k. (c) At z ¼ 2000k. Note that the vortex pair survives into the far field
while rotating by 90 .
plotted on different scales. The beam expands along the propagation path, of
course, but the vortices maintain their relative shape and position while
undergoing a collective 90 rotation around the optical axis between the waist
and the far field.8
The case of two vortices of opposite helicity is shown in Figure 21.7. Here
m ¼ 1 for one vortex and m ¼ þ1 for the other. The initial separation between
the vortices at the beam waist is d ¼ 11k. As the beam propagates through free
space, the vortices appear to spread out and combine with each other. Eventually,
they carve out a circular niche for themselves, but the phase discontinuity near
the beam center survives all the way to the far field.
A somewhat different behavior will be observed when two vortices of opposite
polarity are separated at the beam waist by d < r0. The corresponding intensity
and phase patterns remain more or less the same as those in Figure 21.7 (which
are representative of the case d > r0) but, at some distance z from the waist, the
phase discontinuity near the beam center disappears.4 This behavior is reminis-
cent of fluid vortices of opposite chirality, which collide and annihilate when they
happen to be within each other’s basin of attraction.
a b c
–30 x/ 30 –40 x/ 40 –165 x/ 165
Figure 21.7 A pair of vortices of opposite helicity, m ¼ þ1 and m ¼ 1,

separated by d ¼ 11k, nested within a Gaussian beam having r0 ¼ 10k at the
waist. Shown are the logarithmic plots of intensity (top row) and phase (bottom
row) at several distances from the waist. (a) At the waist of the beam. (b) At the
Rayleigh range, z ¼ 314k. (c) At z ¼ 2000k. Note that the phase singularity
survives into the far field.
Relation to Gauss–Hermite (or Laguerre) polynomials

It can be shown that N vortices of the type described by Eq. (21.2) can be written
as a superposition of Gauss–Hermite (or Gauss–Laguerre) polynomials of order
N. These polynomials, which describe the eigenmodes of certain waveguides,
are also the eigenmodes of free-space propagation in the paraxial regime. The
vortices formed by the superposition of such modes propagate in free space but,
while individual modes maintain their identities, different modes accrue different
phase shifts. As a result of these differing phase shifts the pattern of vortices
might change along the optical axis, but the main topological features of their
singularities are usually preserved.4,9
Resolving adjacent vortices

An interesting question concerning vortices is whether one can densely pack
them on a given waveform (or on a given surface) and use the resulting pattern
Mirror
Objective
Sample
Gaussian beam
Observation
Plane
Figure 21.8 Densely packed vortices imprinted upon a sample’s flat surface
may be observed through a coherent-light microscope. The incident Gaussian
beam has 1/e radius r0 ¼ 900k. The entrance pupil of the 0.95NA objective lens,
having a radius of 3000k, allows the Gaussian beam through with negligible
truncation. The beam reflected from the sample picks up the amplitude and phase
patterns of the vortices and returns through the objective lens. The phase
structure may be extracted by interference with the original Gaussian beam
reflected from the reference mirror.
for data communication (or for information storage). Figure 21.8 is the sche-
matic of a coherent-light microscope that might be used to retrieve a dense
pattern of vortices recorded on a flat surface. The Gaussian beam entering the
system is narrow enough that truncation at the objective’s aperture may be
considered negligible. Upon focusing through the 0.95NA lens, the FWHM
diameter of the focused spot becomes 1.33k. The focused spot is modulated
by the amplitude and phase reflectivity of the sample before returning to the
objective lens. At the beam-splitter the returning beam is diverted towards the
observation plane, where the intensity pattern may be examined directly, and
the phase pattern may be obtained by interference with a reference beam
(supplied by the mirror).
Figure 21.9 shows the patterns of intensity and phase at the focal plane of
the objective lens immediately after the beam is reflected from the sample.
There appear here a total of seven vortices within the focused beam area, all
with the same helicity, m ¼ þ1. The pair in the middle, having a center-to-
center spacing of k/2, is at the resolution limit of conventional optical micro-
scopy. When the reflected beam reaches the observation plane, the patterns
shown in Figure 21.10 are obtained. Note that both the intensity and the phase
–4 x/ 4
Figure 21.9 (a) Intensity and (b) phase distribution imparted to the focused
Gaussian beam in Figure 21.8 immediately after reflection from the sample’s
surface. There is a total of seven m ¼ þ1 vortices; the distance between the
closest pair, near the center, is 0.5k.
distribution at the observation plane are magnified versions of the original

distributions at the sample, albeit with an inconsequential 90 rotation around
the optical axis.
As was the case for two vortices of the same helicity (see Figure 21.6), the
beam in the present case has also preserved the seven equal-helicity vortices all
the way to the far field. To observe the phase structure, however, it is necessary to
interfere the beam returning from the sample with a reference beam. The
resulting interference pattern is shown in Figure 21.10(c). The split fringes cor-
responding to five of the vortices are clearly distinguishable in this pattern, even
for those vortices that are far from the beam center. However, the split fringes for
the two adjacent vortices near the center are hard to recognize and, in practice,
where the signal-to-noise ratio is limited, it is unlikely that these vortices can be
resolved. It appears, therefore, that the Rayleigh criterion for resolution in image-
forming systems applies to these vortices as well, even though the image quality
here is extremely good and no information seems to have been lost between the
sample and the observation plane.
–3100 x/ 3100
Figure 21.10 Distributions of (a) intensity and (b) phase at the observation
plane of Figure 21.8 corresponding to the seven vortices of Fig. 21.9. In (a) and
(b) the reference beam is blocked. In (c) the reference beam interferes with the
beam returning from the sample, thus creating fringes. The vortices may be
identified by the forks within these fringes.

1 J. F. Nye and M. V. Berry, Dislocations in wave trains, Proc. Roy. Soc. London A
336, 165–190 (1974).
2 N. B. Baranova et al., Wavefront dislocations: topological limitations for adaptive
systems with phase conjugation, J. Opt. Soc. Am. 73, 525–528 (1983).
3 J. M. Vaughan and D. V. Willetts, Temporal and interference fringe analysis of
TEM01* laser modes, J. Opt. Soc. Am. 73, 1018–1021 (1983).
4 G. Indebetouw, Optical vortices and their propagation, J. Mod. Opt. 40, 73–87
(1993).
5 N. R. Heckenberg et al., Laser beams with phase singularities, Opt. Quant.
Electronics 24, S951–S962 (1992).
6 K. T. Gahagan and G. A. Swartzlander, Optical vortex trapping of particles, Opt.
Lett. 21, 827–829 (1996).
7 H. He et al., Direct observation of transfer of angular momentum to absorptive
particles from a laser beam with a phase singularity, Phys. Rev. Lett. 75, 826–829
(1995).
8 D. Rozas, Z. S. Sacks and G. A. Swartzlander, Experimental observation of fluidlike
motion of optical vortices, Phys. Rev. Lett. 79, 3399–3402 (1997).
9 M. W. Beijersbergen et al., Astigmatic laser mode converters and transfer of orbital
angular momentum, Opt. Comm. 96, 123–132 (1993).
22
Geometric-optical rays, Poynting’s vector,
and the field momenta
In isotropic media the rays of geometrical optics are usually obtained from the
surfaces of constant phase (i.e., wavefronts) by drawing normals to these surfaces
at various points of interest.1 It is also possible to find the rays from the eikonal
equation, which is derived from Maxwell’s equations in the limit when the
wavelength k of the light is vanishingly small.2 Both methods provide a fairly
accurate picture of beam-propagation and electromagnetic-energy transport in
situations where the concepts of geometrical optics and ray-tracing are applic-
able. The artifact of rays, however, breaks down near caustics and focal points
and in the vicinity of sharp boundaries, where diffraction effects and the vectorial
nature of the field can no longer be ignored.
It is possible, however, to define the rays in a rigorous manner (consistent with
Maxwell’s electromagnetic theory) such that they remain meaningful even in those
regimes where the notions of geometrical optics break down. Admittedly, in such
regimes the rays are no longer useful for ray-tracing; for instance, the light rays no
longer propagate along straight lines even in free space. However, the rays continue
to be useful as they convey information about the magnitude and direction of the
energy flow, the linear momentum of the field (which is the source of radiation
pressure), and the angular momentum of the field. Such properties of light are
currently of great practical interest, for example, in developing optical tweezers,
where focused laser beams control the movements of small objects.3,4,5,6 Similarly,
the manipulation of atoms and molecules with laser beams is presently an active
area of research that has tremendous potential for future applications.7
Computing the Poynting vector

For a coherent, monochromatic beam of light the time-averaged Poynting vector
S at any point (x, y, z) in space can represent the direction and magnitude of the
corresponding ray. Computing S is fairly straightforward and involves only a few
301
fast-Fourier transformations (FFTs). Outlined below is a method of calculating S

for a beam in free space, but the method can readily be generalized to material
environments as well. Throughout this chapter the adopted system of units
is MKS, c is the speed of light in vacuum, e0 is the permittivity of free space
(e0 c2 ¼ 107/4p), and h is Planck’s constant.
With the beam’s propagation direction fixed along the Z-axis, consider the
distribution of the E-field in the beam’s cross-sectional plane, XY. The only
components of E needed for calculating S are Ex and Ey. To compute the
Poynting vector, decompose the beam into its plane-wave spectrum. This requires
one FFT for Ex(x, y) and another for Ey(x, y). For each plane wave thus obtained
compute the Z-component Ez(x, y) of the E-field using the requirement k · E ¼ 0.
Here k ¼ 2pr/k is the wave-vector for the plane wave propagating along the unit
vector r. The knowledge of E and k for each plane wave leads directly to the
corresponding magnetic field B ¼ r · E/c. At this point all six components of E
and B for each plane wave are determined; therefore, an inverse FFT on each
such component would yield the complete E- and B- fields within the cross-
sectional XY-plane. Finally, the time-averaged Poynting vector may be obtained
from S ¼ 12 e0 c2 Real(E · B*).8
Rays of a linearly polarized Gaussian beam

As an example, consider a Gaussian beam of wavelength k at its waist, having 1/e
(amplitude) radii Rx ¼ 15k along X, Ry ¼ 10k along Y. Assuming that the beam is
linearly polarized in the X-direction, its Poynting vector may be computed by the
aforementioned method. Figure 22.1(a) shows the distribution of intensity for the
x-component of polarization, Ix ¼ jExj2. By definition, this beam is linearly
polarized along X and has no Ey, but Ez, albeit very small, is not zero, as can be
seen in Figure 22.1(b). The phase of Ez (not shown) is +90 on the left-hand side
and 90 on the right-hand side of the beam’s cross-section. For this beam it
turns out that Sx ¼ Sy ¼ 0, and only Sz 6¼ 0; a plot of Sz at the waist of the beam is
shown in Figure 22.1(c). Clearly Sz is strongest at the beam center and decays
with increasing distance from the center, behaving very much like Ix does. The
fact that S in this case is everywhere parallel to the Z-axis is consistent with one’s
intuitive expectation that, at its waist, the Gaussian beam should be collimated.
When the beam propagates away from the waist, it acquires curvature and the
rays exhibit behavior characteristic of a divergent beam. The Cartesian com-
ponents of the Poynting vector (Sx, Sy, Sz) all turn out to be nonzero in this case.
Figure 22.2 shows various distributions for the above Gaussian beam at a distance
of z ¼ 800k from the waist. Shown in the left-hand column are the intensity
profiles for the three Cartesian components of E. The peak intensities in these
a
–20 x/ 20
Figure 22.1 Various distributions at the waist of a Gaussian beam having 1/e
(amplitude) radii Rx ¼ 15k, Ry ¼ 10k; the beam is linearly polarized along the
X-axis. (a) Intensity of the x-component of polarization, Ix ¼ jExj2. (b) Intensity
of the z-component of polarization, Iz ¼ jEzj2. In (a) and (b) the peak intensities
are in the ratio Ix : Iz ¼ 1.0 : 0.83 · 104. (c) A plot of Sz, the projection of the
Poynting vector S along the optical axis. Sz(x, y) 0 is encoded in gray-scale
(black, minimum; white, maximum). The other components of S, namely, Sx and
Sy, are exactly zero at this cross-section.
figures are in the ratios Ix : Iy : Iz ¼ 1.0 : 0.39 · 108 : 0.83 · 104. Whereas the
beam at the waist is elongated along X, at z ¼ 800k it is elongated along Y; this is
a natural consequence of diffractive propagation. The phase plots in Figure 22.2,
middle column, reveal the acquired curvature of the beam, as well as a p phase
difference between the adjacent quadrants of Ey and the two halves of Ez. The
–40 x/ 40 –75 x/ 75 –40 x/ 40
Figure 22.2 The Gaussian beam of Figure 22.1 after propagating a distance
of z ¼ 800k in free space. The left-hand column shows, from top to bottom,
the distributions of intensity for the x-, y-, and z- components of polarization;
the peak intensities are in the ratios Ix : Iy : Iz ¼ 1.0 : 0.39 · 108 : 0.83 · 104.
The middle column shows the corresponding phase plots for Ex, Ey, Ez; the
gray-scale covers the range 180 (black) to þ180 (white). The third col-
umn shows the Cartesian components of the Poynting vector, Sx, Sy, Sz, in
gray-scale (black, minimum; white, maximum). Here the normalized ranges
of values are: 0.48 Sx 0.48, 0.9 Sy 0.9, 0 Sz 100. Symmetry
with respect to the optical axis ensures that the angular momentum of the
field around this axis is zero. Note that the dimensions are not the same in
the three columns.
general structure of the intensity and phase patterns depicted here may be readily
understood in terms of the symmetries of the Gaussian beam and the basic
properties of electromagnetic radiation.
Shown in the right-hand column of Figure 22.2 are, from top to bottom,
the x-, y-, and z-components of S encoded in gray-scale (black corresponds
to a minimum, white to a maximum). The normalized ranges of values
are: 0.48 Sx 0.48, 0.9 Sy 0.9, 0 Sz 100. So, for example, in the top
frame the bright regions indicate that Sx is directed along þX, while in the dark
regions Sx points toward X. Similarly, Sy in the upper half of the middle frame
points along þY, while it is directed along Y in the lower half. In the bottom frame
where Sz 0, the large positive values appear bright and those in the vicinity of zero
appear dark. As expected, these plots of Sx, Sy, Sz represent a divergent beam.
The case of circular polarization

Let us consider once again the Gaussian beam for which the computed Poynting
vector at the waist was shown in Figure 22.1(c). This time, however, we assume
that the polarization state of the beam is circular rather than linear. The Car-
tesian components of S for this circularly polarized Gaussian beam at the waist
are shown in Figure 22.3. The normalized ranges of values are: 0.96 Sx
0.96, 0.64 Sy 0.64, 0 Sz 100. Although Sx and Sy are nonzero at the
waist, they exhibit neither a convergent nor a divergent behavior. Indeed the
projection of S in the XY-plane, Sxx þ Syy, shows only a counterclockwise
circulation. (Reversing the sense of circular polarization of the beam would
reverse the circulation of S as well.) From the standpoint of geometrical
optics this behavior of the rays is totally unexpected, since the state of
polarization should affect neither the magnitude nor the direction of the rays.
Nonetheless, taking into account the full distribution of the fields (especially
the components Ez and Bz) yields for circularly polarized light a non-zero
projection of S in the XY-plane, in sharp contrast to the case of linear
polarization where Sx ¼ Sy ¼ 0.
Linear and angular momenta of the field

It is well known that the momentum density of the field is directly proportional to
S. (Feynman et al.8 give a beautiful exposition of the concept of field momentum
density and its relation to the Poynting vector, p ¼ S/c2.) The field’s angular
momentum is then computed by integrating r · p over the volume of interest.
Here r is the position vector and p is the field’s momentum density at location r.
Since the momentum distribution of the circularly polarized Gaussian beam
depicted in Figure 22.3 has a net circulation in the XY-plane, it follows that the
beam carries a net angular momentum around the optical axis Z. If, for instance,
such a beam is absorbed by a particle, it will exert a torque on the particle due to
the transferred angular momentum.9
If one expands the Gaussian beam by enlarging its cross-sectional area (while
maintaining its total optical power), Sx and Sy decrease faster than Sz and, in the
a
–20 x/ 20
Figure 22.3 From top to bottom: plots of Sx, Sy, Sz at the waist of a circularly
polarized Gaussian beam having 1/e radii (Rx, Ry) ¼ (15k, 10k). The normalized
ranges of values are: 0.96 Sx 0.96, 0.64 Sy 0.64, and 0 Sz 100.
The counterclockwise circulation of S around the optical axis gives rise to the
beam’s angular momentum around this axis.
limit of an infinitely large beam (i.e., a plane wave), Sx and Sy vanish. Does this
mean that a circularly polarized plane wave does not carry angular momentum?
The answer is no, because while Sx and Sy diminish with the expansion of the
beam they also spread over a larger area, yielding the same final value for the
integrated r · p over the beam’s cross-section.10 This is also in agreement with
the quantum picture of light, where a circularly polarized photon of frequency m
carries energy hm and a unit of angular momentum h/2p.
Spin versus orbital angular momentum

In Chapter 21 we discussed optical vortices and showed that their Poynting-
vector distribution over the beam’s cross-section exhibits a circulation similar to
that seen here in Figure 22.3. In the case of these vortices the state of polarization
was linear and the circulation of S arose from the particular phase structure of the
beam, whereas in the present case the phase is uniform but the polarization is
circular. These two cases have often been compared to the orbital and spin
angular momenta of bound electrons but, in reality (unless the beam is treated in
the paraxial approximation), the two contributions to angular momentum are
intermixed, making it difficult to distinguish one from the other. All one can say
in the general case is that the field has a net angular momentum, which is
obtained by integrating r · p over the beam’s cross-section.10,11
Rays at the focal plane of a lens

As a final example, we show the complex pattern of ray distribution that can be
obtained by focusing a relatively simple beam through a diffraction-limited
microscope objective lens. Consider a beam of constant amplitude and phase, but
non-uniform polarization, in which one side is linearly polarized at þ45 and the
other side at 45 relative to the X-axis. The distribution of polarization angle
over the beam’s cross-section is shown in Figure 22.4(a). Let this beam be
brought to focus by an aberration-free 0.5NA lens. The distribution of total E-field
intensity (i.e., jExj2 þ jEyj2 þ jEzj2) at the focal plane is shown in Figure 22.4(b).
a b
–3200 x/ 3200 –5 x/ 5
Figure 22.4 A coherent, monochromatic, and collimated beam having constant

amplitude and phase but nonuniform polarization enters the pupil of an aber-
ration-free, 0.5NA lens. (a) Distribution of the beam’s polarization angle at the
entrance pupil. Both halves are linearly polarized, the right at þ45 and the left
at 45 with respect to the X-axis. (b) Logarithmic plot of total E-field intensity
at the focal plane of the lens.
–2.5 x/ 2.5 –5 x/ 5 –2.5 x/ 2.5
Figure 22.5 Various distributions for the focused spot of Figure 22.4(b). The
left-hand column shows, from top to bottom, plots of intensity for the x-, y-,
and z-components of polarization. The peak intensities are in the ratios Ix :
Iy : Iz ¼ 0.49 : 1.0 : 0.06. The corresponding phase plots appear in the middle
column, where the gray-scale covers the range 180 (black) to þ180 (white).
The right-hand column shows plots of Sx, Sy, Sz in gray-scale (black, minimum;
white, maximum). The normalized ranges of values are: 9.5 Sx 9.5,
22.6 Sy 12.9, 0 Sz 100. Note that the dimensions are not the same in
the three columns.
Note the elongation of the focused spot along the X-axis, which is a consequence
of the particular polarization pattern of the incident beam. Figure 22.5, left col-
umn, shows the computed intensity distributions for the x-, y- and z- components
of polarization at the focal-plane. The corresponding phase patterns are shown in
the middle column. Of particular interest here are the focal-plane distributions of
Sx, Sy, Sz, shown in the right-hand column. There are two equal but opposite
vortices in this picture, which may be discerned by considering the combined
effects of Sx and Sy. A schematic diagram of the projection of S in the focal plane,
namely, Sxx þ Syy, is given in Figure 22.6. These, as well as more complex
momentum distributions, can now be routinely created in the laboratory and
Figure 22.6 A schematic diagram showing the vortex structure of the Poynting
vector for the focused spot depicted in Figures 22.4(b) and 22.5. The arrows
represent the projection of S in the focal plane, namely, Sx x þ Sy y.
used to trap and manipulate small objects within the confines of the focal region
of a microscope.

1 M. V. Klein, Optics, Wiley, New York (1970).
1980.
3 A. Ashkin, J. M. Dziedzic, J. E. Bjorkholm, and S. Chu, Observation of a single-
beam gradient force optical trap for dielectric particles, Opt. Lett. 11, 288–290
(1986).
4 K. T. Gahagan and G. A. Swartzlander, Optical vortex trapping of particles, Opt.
Lett. 21, 827–829 (1996).
5 H. He et al., Direct observation of transfer of angular momentum to absorptive particles
from a laser beam with a phase singularity, Phys. Rev. Lett. 75, 826–829 (1995).
6 M. W. Berns, Laser scissors and tweezers, Scientific American 278, 62–67 (April 1998).
7 E. A. Cornell and C. E. Wieman, The Bose–Einstein condensate, Scientific
American 278, 40–45 (March 1998).
Addison-Wesley, Reading, Massachusetts (1964). See Vol. I, section 34–9, and Vol.
II, chapter 27.
9 M. Kristensen and J. P. Woerdman, Is photon angular momentum conserved in a
dielectric medium?, Phys. Rev. Lett. 72, 2171–2174 (1994).
10 H. A. Haus and J. L. Pan, Photon spin and the paraxial wave equation, Am. J. Phys.
61, 818–821 (1993).
11 S. M. Barnett and L. Allen, Orbital angular momentum and nonparaxial light
beams, Opt. Commun. 110, 670–678 (1994).
23
Doppler shift, stellar aberration, and convection
of light by moving media
The characteristics of a beam of light emanating from a source in uniform

motion with respect to an observer differ from those measured when the source
is stationary. In general, it is irrelevant whether the source is stationary and the
observer in motion or vice versa; the observed characteristics depend only on
the relative motion. The observed frequency of the light, for example, has been
known to depend on this relative motion since the Austrian physicist Christian
Doppler (1842) showed the effect to exist both for sound waves and light
waves.1
The perceived direction of propagation of a light beam also depends on the
relative motion of its source and the observer. The English astronomer James
Bradley (1727) was the first to argue that the motion of the Earth in its orbit
around the Sun causes a periodic shift of the apparent position of fixed stars as
observed from the Earth; a telescope viewing a star must be tilted in the direction
of the Earth’s motion. Although this so-called stellar aberration could be
explained on the basis of the corpuscular theory of light accepted at the time,1
certain features of it remained poorly understood until the advent of Einstein’s
special theory of relativity in 1905.
The mid-nineteenth century measurements of the speed of light in moving
media could be made to agree with the prevailing theories at the time only if one
assumed that the moving medium partially carried the luminiferous ether, the
hypothetical medium which filled the Universe and in which the light waves
propagated. The magnitude of this ether convection depended on the velocity as
well as the refractive index of the moving medium.1 Since the refractive index is
wavelength dependent, implicit in these theories was the assumption that dif-
ferent ethers exist for different light colors, each being carried at a different rate
by the moving medium.
This ad hoc and unsatisfactory state of affairs came to an end with the advent
of the special theory of relativity. Doppler shift, stellar aberration, and the
310
23 Doppler shift, stellar aberration 311
convection of light by moving media are the various manifestations of the same
fundamental phenomenon: different relative perceptions of space and time for
observers in motion with respect to one another. In this chapter we derive general
formulas for all three phenomena by applying the Lorentz transformation to a
plane electromagnetic wave. Examples will be used to clarify the physics behind
the formulas.
Plane waves and the Lorentz transformation

A plane electromagnetic wave of frequency f propagating in free space along a
direction specified by the polar and azimuthal angles (h, ) within a Cartesian XYZ
coordinate system has the following form:
aðx; y; z; tÞ ¼ A0 expfi2pðf =cÞ½ðsin h cos Þx

þ ðsin h sin Þy þ ðcos hÞz ctg: ð23:1aÞ
Here c is the speed of light in free space, and the complex vector A0 denotes the
strength of the field at the origin of the coordinate system. Let us define the
coordinates of a point in space-time as p ¼ (x, y, z, ict). The coefficients
appearing in the exponent of the plane wave in Eq. (23.1a) can then be grouped
together as r ¼ [sin h cos , sin h sin , cos h, i], and the equation may be written
in compact form,
aðx; y; z; tÞ ¼ A0 exp½i2pðf=cÞr pT ; ð23:1bÞ
where superscript T denotes a transposed vector. A second inertial system, X 0 Y 0 Z 0 ,

in which the XYZ system moves with uniform velocity V along the common
Z-direction, is shown in Figure 23.1. The origins of the two systems coincide at
t ¼ t 0 ¼ 0. The point p 0 ¼ (x 0 , y 0 , z 0 , ict 0 ) in the new coordinate system is related to
p by the Lorentz transformation,1,2
pT ¼ L p0 ;
T
ð23:2aÞ
where the 4 · 4 transformation matrix L is given by

0 1
1 0 0 0
B0 C
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 C
B 1 0 0
B
L ¼ B0 1 ðV=cÞ C
0 1 1 ðV=cÞ iðV=cÞ C:
@ q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A
2 2
0 0 iðV=cÞ 1 ðV=cÞ 1 1 ðV=cÞ
ð23:2bÞ
Y X
X f
u

Z
V
Z
Figure 23.1 In the Cartesian XYZ coordinate system a plane wave of frequency f
(wavelength ¼ k) propagates along the unit vector u. The polar and azimuthal
angles of u are denoted by h and . As seen from another system, X 0 Y 0 Z 0 , the XYZ
system moves at a constant velocity V along the Z-axis. From the perspective of an
observer stationary in X 0 Y 0 Z 0 , the plane wave is Doppler shifted to a different
frequency f 0 , and the polar angle of its propagation direction has a different value
h 0 . The azimuthal angle , however, remains the same in the two systems.
(Recall that c is a constant, having the same value in any frame of reference in
which it is measured.) L is a unitary matrix whose inverse is the same as its
transpose, i.e., LLT equals a 4 · 4 identity matrix. We substitute for pT in
Eq. (23.1b) from Eq. (23.2a), and evaluate rL as follows:
1 þ ðV=cÞ cos h
rL ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½sin h 0 cos ; sin h 0 sin ; cos h 0 ; i: ð23:3aÞ
1 ðV=cÞ2
Here
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0
sin h ¼ 1 ðV=cÞ2 sin h=½1 þ ðV=cÞ cos h; ð23:3bÞ
cos h 0 ¼ ðcos h þ V=cÞ=½1 þ ðV=cÞ cos h: ð23:3cÞ
It is readily verified from Eqs. (23.3b) and (23.3c) that sin2 h 0 þ cos2 h 0 ¼ 1,
which is needed if the above definition of h 0 is to be meaningful. We conclude
that the plane wave in XYZ remains a plane wave in X 0 Y 0 Z 0 , albeit with a
different frequency and a different propagation direction.
Doppler shift
Replacing rpT in Eq. (23.1b) with rLp 0 T and using Eq. (23.3a), it becomes
clear that the optical frequency f 0 of the plane wave as measured in the X 0 Y 0 Z 0
system is given by
0
f ¼ f 1 þ ðV=cÞ cos h 1 ðV=cÞ2 : ð23:4Þ
This is the relativistic formula for the Doppler shift,2 valid for all propagation
directions h and all speeds V. When h ¼ 0 , the propagation direction and the
motion of the observer are antiparallel. In this case f 0 is greater than f (i.e., blue-
shifted) according to the following formula:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
f 0 ¼ f ½1 þ ðV=cÞ=½1 ðV=cÞ: ð23:5aÞ
When h ¼ 180 , the propagation direction and the motion of the observer are
parallel, in which case f 0 is less than f (i.e., red-shifted) as follows:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
f 0 ¼ f ½1 ðV=cÞ=½1 þ ðV=cÞ: ð23:5bÞ
When h ¼ 90 , the observer is moving at right angles to the propagation direction.
The classical analysis does not yield any Doppler shift in this case,1 but the
relativistic formula yields
0
f ¼f 1 ðV=cÞ2 : ð23:5cÞ
It is possible to have a direction of propagation with no Doppler shift at all, i.e.,

f 0 ¼ f . From Eq. (23.4) this direction is found to be:
2
cos h ¼ 1 ðV=cÞ 1 ðV=cÞ: ð23:6Þ
Substitution into Eqs. (23.3b) and (23.3c) reveals that, when the above condition
is satisfied, h 0 ¼ 180 – h.
Based on Eq. (23.4), Figure 23.2(a) shows plots of f 0 /f versus V/c for several values
of h, while Figure 23.2(b) shows plots of f 0 /f versus h for different values of V/c. At a
given velocity V, the beam is blue-shifted when h ¼ 0 , i.e., the observer moves
opposite to the direction of propagation, and red-shifted when h ¼ 180 , i.e., the
observer moves along the propagation direction. If h is varied continuously from 0 to
180 , the frequency changes from blue-shifted to red-shifted, becoming equal to f
somewhere after h ¼ 90 . As V increases, the crossing point occurs at larger angles h.
Stellar aberration
The direction of propagation of the beam perceived by the observer in the X 0 Y 0 Z 0
frame has polar angle h 0 , given by Eqs. (23.3b) and (23.3c), and azimuthal angle
5 (a) 5 (b)
V/c = 0.9
4 4
3 u = 0° 3
f/f
0.7
30°
2 2
0.5
60°
90° 0.3
120°
1 1
V/c = 0.1
150°
180°
0 0
0.0 0.2 0.4 0.6 0.8 1.0 0 45 90 135 180
V/c u (degrees)
Figure 23.2 A plane wave of frequency f and propagation direction (h, ) in

the XYZ coordinates is observed from the X 0 Y 0 Z 0 system of Figure 23.1. The
Doppler-shifted frequency f 0 seen by the observer is a function of V and h, but
does not depend on . (a) Plots of f 0 / f versus V/c for several values of h. (b) Plots
of f 0 / f versus h for different values of V/c.
. Dividing Eq. (23.3b) by Eq. (23.3c) yields2

0
tan h ¼ 1 ðV=cÞ2 sin h=ðcos h þ V=cÞ: ð23:7Þ
Figure 23.3(a) shows plots of h 0 as function of V/c for several values of h.

Similarly, Figure 23.3(b) shows plots of h 0 versus h for different values of V/c. It
is clear that, for a given h, the apparent direction of propagation depends on the
relative velocity between the observer and the light source. For instance, if a
telescope is aimed at a distant star far above the plane of the Solar System, then in
a reference frame where the star is stationary, h ¼ 90 . However, for an Earth-
bound observer, cos h 0 ¼ V/c, where V 31 km/s is the speed of the Earth in its
orbit around the Sun. As the Earth travels in its orbit, the direction of V changes
and so does the apparent location of the star. In a six-month period, cos h 0
changes by 2V/c, causing a shift of Dh 0 0.012 4300 in the star’s apparent
location. (In contrast, the size of the parallax for the nearest star as measured from
the same location on the Earth over a six-month period is less than 200 .)
180 (a) 180 (b)
V/c = –0.9
150 u = 150° 150 –0.7

–0.5
120 120° 120 –0.3
u9 (degrees)
90° –0.1
90 90
0.1
60°
60 60 0.3
30°
0.5
30 30 0.7
V/c = 0.9
0 0
–1.0 –0.5 0.0 0.5 1.0 0 45 90 135 180
V/c u (degrees)
Figure 23.3 A plane wave of frequency f and propagation direction (h, ) in

the XYZ coordinates is observed from the X 0 Y 0 Z 0 system of Figure 23.1. The polar
angle h 0 of the beam seen by the observer is a function of V and h, but does
not depend on . (a) Plots of h 0 versus V/c for several values of h. (b) Plots of
h 0 versus h for different values of V/c.
Diffraction of light from a grating in uniform motion

Figure 23.4(a) shows a uniform beam of frequency f (and wavelength k ¼ c/f )
focused onto a diffraction grating through an objective lens of numerical aperture
NA. The grating’s period P is sufficiently small to allow only the 0th, 1st, and
þ1st orders of diffraction to appear upon reflection. The three cones of light thus
reflected from the grating are captured by the lens in the return path. The partial
overlap of the diffracted cones at the lens’s exit pupil (resulting in interference
among them) gives rise to the so-called baseball pattern of intensity distribution.
When the grating moves at a constant velocity V along the Z-axis, the contrast
within the baseball pattern shows a periodic oscillation. This is caused by the
variation of the relative phase between the 1st and the 0th diffracted orders, the
phase being dependent on the position of the focused spot relative to the grooves
on the grating.
As a specific example, Figure 23.4(b) shows the surface profile of a grating
with a trapezoidal cross-section having period P ¼ 1.2k and groove depth
¼ 0.15k. Figure 23.4(c), a logarithmic plot of the intensity distribution at the focal
a
b
Objective
Y
Y
c
Z
P V
Z –4 x/ +4
Figure 23.4 (a) A monochromatic plane wave of wavelength k is focused,

through an objective lens of numerical aperture NA, onto a diffraction grating. The
period P of the grating is small enough to support only the 0th and 1st diffraction
orders upon reflection. The three cones of light returning from the grating partially
overlap in the exit pupil, giving rise to a “baseball pattern.” When the grating
moves at constant velocity V in the focal plane, the contrast within the baseball
pattern shows a periodic oscillation. (b) Trapezoidal profile of a grating having
period P ¼ 1.2k and groove depth ¼ 0.15k. (c) Logarithmic plot of intensity dis-
tribution at the focal plane of a uniformly illuminated 0.6 NA objective.
plane of a 0.6NA objective, shows the diameter of the central bright spot – the
Airy disk – to be 1.22k/NA 2k. Figure 23.5 shows computed patterns of
reflected intensity at the exit pupil of the objective for several positions of the
focused spot over the grating. From (a) to (i), the groove center’s distance from
the center of the focused spot is 0, 0.2k, 0.4k, 0.5k, 0.6k, 0.8k, k, 1.1k, and 1.2k,
respectively. In these simulations the grating is assumed to be stationary in its
various positions relative to the lens.
An alternative (and physically more accurate) explanation of the baseball
patterns of Figure 23.5 may be based on the Doppler shift between the 0th order
and the 1st-order diffracted light cones depicted in Figure 23.4. From the
viewpoint of an observer in the grating’s rest frame, the incident cone of light
moves with velocity V along the Z-axis. This cone is a superposition of a
multitude of plane waves of differing directions and frequencies. With reference
to Figure 23.6, consider a plane wave of frequency f and propagation direction
(h, ) in the XYZ coordinate system in which the lens is stationary. This plane wave,
when seen from the grating’s rest frame, has frequency f 0 given by Eq. (23.4) and
a b c
d e f
g h i
Figure 23.5 Patterns of intensity distribution observed at the exit pupil of the
objective of Figure 23.4. From (a) to (i) the distance between the groove center and
the center of the focused spot is 0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0, 1.1, and 1.2 (in units of k).
propagation direction (h 0 , ) given by Eq. (23.7). For the 0th-order reflected plane
wave, the frequency remains f 0 but the propagation direction becomes (h 0 , ).
Viewed from the rest frame of the lens, this reflected 0th-order beam has frequency
f and propagation direction (h, ). Thus the 0th-order reflected cone – which is
simply a superposition of various reflected 0th-order plane waves – seen by the lens
is ignorant of the velocity V of the grating.
As for the þ1st-order beam, in the grating’s rest frame the diffracted plane
wave has frequency f 0 and propagation direction (h 0þ1, 0þ1), where, in accord-
ance with Bragg’s law,
0
cos hþ1 ¼ cos h 0 þ ðk0 =PÞ; ð23:8aÞ
0
sin hþ1 cos 0þ1 ¼ sin h 0 cos : ð23:8bÞ
Y 0th order, f Y9 0th order, f 9

Incident, f Incident, f 9
1st order, f+1 1st order, f 9
u u9
u u+1 u9 u9+1
V
Z Z9
Lens reference frame Grating reference frame
Figure 23.6 In the reference frame of the lens of Figure 23.4, a plane wave of
frequency f incident at an angle h on a moving grating gives rise to a 0th order
diffracted beam of the same frequency and polar angle. The þ1st order dif-
fracted beam, however, will have frequency fþ1 and polar angle hþ1. In the
grating’s rest frame, the incident beam has frequency f 0 and polar angle h 0 . All
diffracted orders have the same frequency f 0 . The polar angle of the 0th order
beam is h 0 , while that of the þ1st order beam is h þ1
0
.
Back in the rest frame of the lens, the diffracted þ1st order plane wave appears to
have a new frequency fþ1, where
Df ¼ fþ1 f ¼ V= P 1 ðV=cÞ2 ð23:9Þ
is independent of the incident beam’s propagation direction (h, ). The period P of
the grating thus appears to have been foreshortened by the Lorentz contraction factor,
and the Doppler shift Df is proportional to the velocity V and inversely proportional
to the (contracted) grating period. Since Df is independent of the direction of the
incident plane wave, the entire þ1st-order cone will be Doppler shifted by the same
amount. This Doppler shift causes a beating at the exit pupil between the 0th-order
and the þ1st-order cones in their area of overlap. The beat period, 1/Df, is inde-
pendent of the groove profile as well as the NA of the lens. The same arguments apply
to the 1st-order light cone, except that the Doppler shift in this case is –Df.
We mention in passing that, for the plane wave incident at (h, ) in the rest
frame of the lens, the propagation directions of the 1st-order reflected plane
waves are (h1, 1), where
cos h k P 1 ðV=cÞ2
cos h1 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð23:10Þ
1 ðV=cÞk P 1 ðV=cÞ2
0
and 1 ¼ 1 (see Eq. (23.8b)). Aside from the Lorentz contraction of the
grating period P, there is a relativistic correction to Bragg’s law of diffraction
from a moving grating. This correction term, which appears in the denominator
on the right-hand side of Eq. (23.10), is of the first order in V/c.
Rayleigh range of a moving Gaussian beam

Figure 23.7 shows a Gaussian beam of wavelength k propagating along the
Z-axis in the XYZ coordinate system, in which the source of thepbeamffiffiffi is at rest.
The beam diameter W0 at the waist increases by a factor of 2 at a distance
L0 ¼ W02/k, the Rayleigh range of the beam.3 To an observer at rest in the X 0 Y 0 Z 0
frame, the source of the Gaussian beam moves at constant velocity V along the
Z-axis. Since the beam diameter at any cross-section is a measurable quantity
along Y 0 , from the perspective of the observer the beam diameters at the waist and
pffiffiffi
at the Rayleigh range remain W0 and 2W0, respectively. However, the distance
0
L 0 between these two points appears to have shrunk by Lorentz contraction, that
0
is, L 0 ¼ L0 1 ðV=cÞ2 . At the same time, the observer perceives the wavelength
k of the beam to have shifted in accordance with the Doppler formula to a lower
or higher value, depending on whether the motion of the light source is towards or
0
away from the observer. The Gaussian beam formula then yields L 0 ¼ W02 /k 0 , so
0
that, in the observer’s rest frame, the Rayleigh range L 0 could be greater or less
than L0 depending on whether the observer moves towards or away from the light
source. The two conclusions thus reached are contradictory, leaving one to
wonder which prediction, if any, will be borne out by experiment.
Y
2 W0
W0 L0 Z
Z9
pffiffiffi
Figure 23.7 A Gaussian beam of wavelength k propagates along the Z-axis. The
beam diameter is W0 at the waist and 2W0 at the Rayleigh range, which is a
distance L0 from the waist. To an observer moving with constant velocity
along the Z-axis, p
the
ffiffiffi beam diameters at the waist and at the Rayleigh range
remain W0 and 2W0, but the distance between them appears to have
shrunk by the Lorentz contraction factor.
Upon closer inspection, the Gaussian beam will be recognized as a superposition

of many plane waves of differing propagation directions. In the rest frame of the
beam’s source, all these plane waves have the same wavelength k, but viewed from
a moving frame the wavelengths differ for different propagation directions. (The
propagation directions of the various plane waves are not the same in the two
coordinate systems, thus resulting in a wider or narrower spatial frequency spec-
trum in X 0 Y 0 Z 0 depending on the direction of motion of the observer.) The formula
relating the Rayleigh range to the beam waist and to the wavelength has been
derived with the implicit assumption that the beam is a superposition of plane
waves of identical frequency.3 This is true in the rest frame of the beam’s source,
but decidedly false in the moving frame. Therefore, our second method of deter-
0
mining L 0 must be wrong, leaving the Lorentz contracted L0 as the correct answer.
Convection of light by moving media

Consider a plane wave of frequency f and propagation direction (h, ), propa-
gating in a stationary medium of refractive index n. By definition, the speed of
light in this medium is c/n. The expression for the field distribution throughout
space and time is similar to that given by Eq. (23.1a), namely,

aðx; y; z; tÞ ¼ A0 exp i2pðf =cÞ½nðsin h cos Þ x þ nðsin h sin Þ y
ð23:11Þ
þ nðcos hÞz ct :
From the perspective of an observer whose frame of reference X 0 Y 0 Z 0 is in

uniform motion relative to the stationary medium (see Figure 23.1), the expres-
sion for the field is obtained by substituting in Eq. (23.11) the Lorentz trans-
formation of Eq. (23.2). This yields the same expression as in Eq. (23.11) with
remaining the same but f, n, and h changing as follows:
0
f ¼ f 1 þ nðV=cÞ cos h 1 ðV=cÞ2 ; ð23:12aÞ
1
0 ðn2 1Þ½1 ðV=cÞ2 2
n ¼ 1þ ; ð23:12bÞ
½1 þ nðV=cÞ cos h2
0 2
tan h ¼ n 1 ðV=cÞ sin h= n cos h þ ðV=cÞ : ð23:12cÞ
It is clear that the Doppler-shifted frequency f 0 and the polar angle h 0 are not only
functions of h and V/c, as before, but they also depend on the refractive index n of
2.50 (a) 2.50 (b)
n = 1.33 n = 1.33
V/c = 0.9
2.25 2.25
0.7
= 180°
0.5
2.00 2.00
150°
n
1.75 1.75
120°
1.50 1.50
V/c = 0.1
90°
1.25 1.25
60° 0.3
0.9
= 0°
1.00 1.00
0.00 0.25 0.50 0.75 1.00 0 45 90 135 180
V/c (degrees)
Figure 23.8 The refractive index n0 of water (n ¼ 1.33), moving at

constant velocity V along the Z-axis, depends on V and on the propagation
direction h. In the water’s rest frame, the assumed propagation direction of a
plane wave of wavelength k has polar and azimuthal angles (h, ). (a) Plots of
n0 versus V/c for several values of h. (b) Plots of n0 versus h for different
values of V/c.
the propagation medium. Similarly, the apparent index n 0 of the moving medium
depends on n, V/c, and h in accordance with Eq. (23.12b). For water of refractive
index n ¼ 1.33, Figure 23.8(a) shows plots of n 0 versus V/c for several values of h,
while Figure 23.8(b) shows plots of n 0 versus h for different values of V/c.
In the special case when the beam moves in the same direction as the medium,
h ¼ 0 and Eq. (23.12b) simplifies to
n þ ðV=cÞ
n0 ¼ : ð23:13Þ
j1 þ nðV=cÞj
When V c, Eq. (23.13) yields Fresnel’s formula1 for the drag of light by a
moving medium,
c=n0 ðc=nÞ þ ð1 n2 ÞV: ð23:14Þ

Thus the speed of light in a moving medium increases by a fraction of V if the

light and the medium move in the same direction, whereas the apparent speed of
light decreases when the light moves opposite the direction of motion of the
medium.

1 Max Born, Einstein’s Theory of Relativity, Dover, New York, 1965.
2 J. D. Jackson, Classical Electrodynamics, Wiley, New York, 1962.
3 A. E.Siegman, Lasers, University Science Books, Sausalito, CA, 1986.
24
Diffraction gratings†
John William Strutt, Lord Rayleigh (1842–1919), graduated from Trinity

College Cambridge in 1864. From 1879 to 1884 he was the Cavendish professor
of experimental physics at Cambridge, succeeding James Clerk Maxwell. His
theory of scattering (1871) provided the first correct explanation of the blue
color of the sky. Rayleigh’s discovery of the inert gas argon (1895) earned him
the 1904 Nobel Prize for Physics. The Rayleigh–Sommerfeld theory of dif-
fraction is one of the pillars of the classical theory. (Photo: courtesy of AIP
Emilio Segré Visual Archives, Physics Today Collection.)
Diffraction gratings have been used in spectroscopy and other studies of

electromagnetic phenomena for nearly two centuries.1,2,3,4 Josef Fraunhofer
(1787–1826), the discoverer of the dark lines in the solar spectrum, built the
first gratings in 1819 by winding fine wires around two parallel screws.5 Henry
Rowland made significant contributions to the fabrication of precise, large-area,
high-frequency ruled gratings in the 1880s.6 Robert Wood, who succeeded
†
This chapter’s coauthors are Lifeng Li and Wei-Hung Yeh.
323
Rowland in the chair of experimental physics at Johns Hopkins University in

1901, used these ruled gratings extensively in his researches and discovered,
among other things, the “anomalous” behavior of metallic gratings, which he
first published in 1902.7 John William Strutt (Lord Rayleigh) developed
a theoretical model of these gratings around 1907 and was successful in exp-
laining certain features of Wood’s anomalies.8 However, it is only during the
past thirty years or so that a thorough understanding of nearly all aspects of the
behavior of diffraction gratings has been achieved through the consistent
application of Maxwell’s equations with the help of advanced analytical and
numerical techniques.2,9,10
Modern gratings having a few thousand lines per millimeter with near-
perfect periodicity are fabricated over fairly large areas (grating diameters
of around one meter or so are possible). The groove shapes can be con-
trolled to be sinusoidal, rectangular, triangular, or trapezoidal, and one can
obtain shallow or deep grooves (relative to the groove width) by current
manufacturing techniques. These gratings can be made on various metal,
plastic, and glass substrates and, when necessary, they can be coated with
thin-film metal and/or dielectric stacks. The primary applications of diffrac-
tion gratings are still in spectroscopy, where they are used for analyzing
the frequency content of electromagnetic radiation (visible light, ultraviolet,
X-rays, infrared, microwave), but they are also used as wavelength selectors
in tunable lasers, beam-sampling mirrors in high-power lasers, band-pass
filters, pulse compressors, and polarization-sensitive optics, among other
applications.
The goal of the present chapter is to describe some of the basic properties
of gratings and to point out through several examples the complex behavior
of these devices. These examples are by no means comprehensive, but they
should make it amply clear that there is no simple way to predict a grating’s
diffraction efficiency. Although the number and the propagation direction of
diffracted orders can be readily obtained from simple principles, the compu-
tation of diffraction efficiencies requires the complete solution of Maxwell’s
equations in conjunction with the appropriate boundary conditions. The results
of these calculations are often non-intuitive and depend strongly on a number
of factors such as the period of the grating, the geometry of the grooves, the
(complex) refractive index of the material(s) comprising the grating, and the
wavelength as well as the polarization state and the propagation direction of
the incident beam of light. Fortunately, powerful computer programs now
exist that take all the relevant factors into account and provide a reliable
solution to the electromagnetic equations that govern the behavior of diffraction
gratings.
Grating theories
The simplest theory of gratings treats them as corrugated structures that
modulate the amplitude and/or phase of the incident beam in proportion to the
local reflectivity and the height or depth of the surface relief features. The
modulated reflected (or transmitted) wavefront is then decomposed into its
Fourier spectrum to yield the various diffracted orders. Known as the scalar
theory of gratings, this elementary treatment yields the correct number and
direction of propagation for the diffracted orders, but it does not provide an
accurate estimate of the amplitude, phase, and polarization state of each order.
Rayleigh made a substantial contribution to the understanding of gratings by
representing the diffracted field as the superposition of a number of homo-
geneous (i.e., propagating) and inhomogeneous (i.e., evanescent) plane
waves.8 He then determined the complex amplitudes of the various plane
waves by imposing the electromagnetic boundary conditions at the grating
surface.
Although Rayleigh’s method was far superior to the scalar theory – it could
account for some of the observed anomalies and, in fact, provided exact solu-
tions to the electromagnetic field equations in certain cases of practical interest
– it failed to provide a comprehensive solution that would be applicable under
general conditions. A satisfactory analysis of the diffraction from gratings
requires a numerically stable solution to Maxwell’s equations constrained by
the relevant boundary conditions. Several such methods have been discovered
and elaborated over the past 30 years by a number of researchers from around
the world.2,9,10,11 The results presented in this chapter are based on the dif-
ferential method of Chandezon, which uses the so-called coordinate trans-
formation technique.11
Diffraction orders
Figure 24.1 shows the cross-section of a metallized grating with a trapezoidal
groove geometry. The grating period is denoted by p, the groove depth by d, and
the duty cycle, which is the ratio of the land width to the grating period, by c. In
this symmetric grating both side walls make the same angle a with the horizontal
plane. The metal layer, specified by its complex refractive index (n, k), is assumed
to be thick enough to render the grating opaque.
Referring to Figure 24.2, the plane of the grating is XY, and its surface normal
is the Z-axis. The plane of incidence is XZ, h being the angle of incidence. When
the incident E-field is in the plane of incidence, the beam is p-polarized, and when
the E-field is along the Y-axis it is s-polarized. In an alternative nomenclature,
Land Groove
Metal d 100 nm
α
Substrate Period ( p)
Figure 24.1 Cross-section of a metallized grating. Throughout this chapter

the side-wall angle a ¼ 60 and the duty cycle c, which is the ratio of the
land width to the grating period, is 60%. At k0 ¼ 0.633 lm the substrate’s
refractive index n ¼ 1.5 and the metal layer’s complex index is given by
(n, k) ¼ (2, 7).
in Figure 24.2(a), the polarization is transverse electric (TE) when the incident
E-field is parallel to the grooves and transverse magnetic (TM) when it is
perpendicular to the grooves. Although the grating may be mounted with its
grooves in an arbitrary direction within the XY-plane, we shall consider only
two situations. In the first case, depicted in Figure 24.2(a) and referred to as
“classical mount”, the grooves are perpendicular to the plane of incidence. In
this case all diffracted orders remain in the XZ-plane, their propagation vectors
k given by
kðmÞ ¼ ð2p=k0 ÞðrxðmÞ x þ rzðmÞ zÞ ¼ ð2p=k0 Þf½sin h þ ðmk0 =pÞx þ rzðmÞ zg: ð24:1Þ
Here k0 is the vacuum wavelength of the light, the integer m specifies the dif-
fraction order, the unit vector r ¼ (rx, ry, rz) is along the propagation direction,
and the medium of incidence is implicitly assumed to be air. With ry ¼ 0, it is
necessary that rx2 þ rz2 ¼ 1, from which rz can be determined once rx is known.
To keep rz real, rx(m) ¼ sin h þ mk0/p must be in the range (1, þ1), a constraint
that determines the number of propagating orders.
In the second case, depicted in Figure 24.2(b) and referred to as “conical
mount”, the grooves are parallel to the plane of incidence. Here all diffracted
orders (other than the zeroth) are outside the XZ-plane and their propagation
vectors are given by
kðmÞ ¼ ð2p=k0 ÞðrxðmÞ x þ ryðmÞ y þ rzðmÞ zÞ¼ ð2p=k0 Þ½ðsin hÞ x þ ðmk0 =pÞy þ rzðmÞ z:
ð24:2Þ
Again, the integer m is the diffraction order, the implicitly assumed medium of
incidence is air, and the constraint rx2 þ ry2 þ rz2 ¼ 1 specifies rz once rx and ry
are identified. The inequality rx2 þ ry2 ¼ sin2h þ (mk0 /p)2 1 determines the
number of propagating orders. This mounting is called conical because the
(a) Z
–2
Es –1
Ep
0
+1
X
(b) Z
Es +1
Ep
0
–1
X
Figure 24.2 A monochromatic beam of light breaks up into multiple

diffraction orders upon reflection from a grating. The incidence angle h is
measured from the Z-axis. When the incident E-field is in the XZ-plane of
incidence, the beam is p-polarized, and when E is perpendicular to XZ, the
beam is s-polarized. In (a) the plane of incidence is perpendicular to the
direction Y of the grooves. In this so-called classical mount all diffracted
orders remain within the XZ-plane. In (b), where the grooves are parallel to
the plane of incidence (conical mount), diffracted orders appear on both sides
of the XZ-plane.
various diffracted orders reside on the surface of a cone. Technically speaking,

the mount is conical whenever the grooves deviate from the normal to the plane
of incidence.2 In this chapter, however, whenever the mount is said to be conical,
the grooves will be strictly parallel to the plane of incidence.
Location of diffracted beams

A simple experimental setup for observing the beams diffracted from a grating
appears in Figure 24.3. The coherent beam of a red HeNe laser (k0 ¼ 0.633 lm)
is focused at oblique incidence h onto the grating through a long-focal-length
lens (NA ¼ 0.065). The diffraction-limited spot diameter is 1.22k0 /NA 12 lm,
which, if the grating period p is sufficiently small, will cover several land-groove
pairs. The various diffracted beams are then collected and collimated by a
microscope objective lens (NA ¼ 0.8, f ¼ 2000k0). In the following examples the
system is arranged in such a way that the zeroth-order beam always appears at
the center of the collimating lens. This lens being aplanatic, if we denote the
angle between a diffracted beam and the zeroth order by v, then the diffracted
beam’s distance from the center of the exit pupil will be f sin v rather than f tan v.
Figure 24.4 shows computed patterns of intensity distribution at the exit pupil
of the collimating lens for the grating of Figure 24.1 in the case of classical mount
(k0 ¼ 0.633 lm, p ¼ 4k0, d ¼ k0 /8; h ¼ 0 in Figures 24.4(a), (b), h ¼ 40 in Fig-
ures 24.4(c), (d)).12 The incident beam is p-polarized or TM (i.e., E-field parallel
to the XZ-plane), but the beams appearing in the exit pupil have both the p- and s-
components of polarization. In Figure 24.4 the intensity patterns on the left
represent the component of polarization that stays within the XZ-plane, while
those on the right correspond to the component along Y. In all cases E? is much
weaker than Ek, the ratio of the peak intensities jE?j2: jEkj2 being 0.65 ·105 in
the case of normal incidence and 0.009 in the case h ¼ 40 .
Figure 24.5 is similar to Figure 24.4, except that the grating is rotated by 90 in
the XY-plane to bring the grooves parallel to the plane of incidence (i.e., conical
Diffracted orders
Incident beam
Focusing lens
Collimating lens
Grating
Figure 24.3 A monochromatic beam of light is focused by a low-NA lens

onto a grating. Compared with the grating period, the focused spot is large,
covering several land-groove pairs at the grating’s surface. The diffracted
orders, collected and collimated by a high-NA aplanatic lens, may be observed
at the exit pupil.
a b
c d
–1600 x/0 1600 –1600 x/0 1600
Figure 24.4 Computed plots of intensity distribution at the exit pupil of the
collimating lens in Figure 24.3, when the beam is diffracted from the grating of
Figure 24.1 (k0 ¼ 0.633 lm, p ¼ 4k0, d ¼ k0 /8). The grooves are perpendicular to
the plane of incidence, as in Figure 24.2(a), and the incident beam is p-polarized.
The frames on the left correspond to the component of polarization parallel
to the XZ-plane (Ek), while those on the right correspond to the component along
the Y-axis (E?). In (a) and (b) the incidence is normal, whereas in (c) and (d)
h ¼ 40 . The ratio of the peak intensity in (b) to that in (a) is 0.65 · 105. Simi-
larly, the peak intensity ratio of (d) to (c) is 0.009. These results are based on full
vector-diffraction calculations.
mount). In Figures 24.5(a), (b) the incidence is normal, whereas in (c), (d) it is
oblique at h ¼ 30 . In both cases the incident beam is p-polarized, but the dif-
fracted beams contain a certain amount of s-polarization as well.12 At the exit
pupil of the lens, the ratio of the peak intensities perpendicular and parallel to the
XZ-plane is fairly small, jE?j2: jEkj2 being 0.97 · 104 at normal incidence and
0.025 at h ¼ 30 .
In both the above cases if the scalar theory of diffraction is used (instead of the
full vector theory), the picture that emerges will show the diffracted orders in
their correct locations but the amplitude, phase, and polarization state of the
various orders will be substantially incorrect.
a b
c d
–1600 x/0 1600 –1600 x/0 1600
Figure 24.5 Computed plots of intensity distribution at the exit pupil of the col-
limating lens of Figure 24.3, when the beam is diffracted from the grating of Figure
24.1 (k0 ¼ 0.633 lm, p ¼ 4k0, d ¼ k0 /8). The grooves are parallel to the plane of
incidence, as in Figure 24.2(b), and the incident beam is p-polarized. The frames on
the left correspond to the component of polarization parallel to the XZ-plane (Ek),
while those on the right correspond to the component along the Y-axis (E?). In (a)
and (b) the incidence is normal, whereas in (c) and (d) h ¼ 30 . The ratio of the peak
intensity in (b) to that in (a) is 0.97 · 104. Similarly, the peak intensity ratio of (d) to
(c) is 0.025. These results are based on full vector-diffraction calculations.
Diffraction efficiency
We denote by E the amplitude of the incident beam at angle h and by E(m) the
amplitude of the mth-order reflected (or transmitted) beam emerging at h(m). It is
further assumed that the incidence medium is air and, in the case of a trans-
mission grating, that the transparent medium into which the diffracted orders
emerge has refractive index n0. For the mth-order reflected (transmitted) beam the
diffraction efficiency q(m) (s(m)) can be written as
qðmÞ ¼ jEðmÞ j2 cos hðmÞ =ðjEj2 cos hÞ, ð24:3aÞ
sðmÞ ¼ n0 jEðmÞ j2 cos hðmÞ =ðjEj2 cos hÞ: ð24:3bÞ

Here the squared amplitude is the beam’s intensity, and the cosine factor keeps
track of the change in the beam’s cross-sectional area upon diffraction.
Figure 24.6 shows computed plots of diffraction efficiency versus h for the
zeroth- and first-order beams for the grating of Figure 24.1 (k0 ¼ 0.633 lm,
p ¼ 3k0, d ¼ k0 /8).12 In each frame there are four curves, representing the dif-
fraction efficiency of the corresponding order when the incident beam is either p- or
s-polarized and when the mount is either classical (qp, qs) or conical (qp0 , qs0 ). The
sharp peaks and valleys appearing in these plots are caused by the excitation of
surface plasmons, which, in the case of metal gratings, exist only when the incident
beam has an E-field component perpendicular to the grooves (see Chapter 9, “What
in the world are surface plasmons?”). The arrows at the bottom of each figure point
to the angles of incidence associated with the Rayleigh anomalies; these are points
at which a particular diffraction order appears or disappears. In Figure 24.6(b), for
example, qp and qs terminate at h ¼ 41.81 , which is where the þ first-order beam
becomes parallel to the surface and subsequently vanishes. In the case of qp0 and qs0
(conical mount) the cutoff of both the first orders occurs at h ¼ 70.53 . When
the metallic grating has a large conductivity, the surface plasmon features and
Rayleigh anomalies are usually located pairwise, close to each other.
Dependence of diffraction efficiency on the grating period

The efficiency curves become somewhat erratic as the period p of the grating
decreases, but they approach a limiting behavior with increasing p. Figure 24.7
shows computed plots of the zeroth-order efficiency versus h for the grating
of Figure 24.1 with (a) p ¼ k0 and (b) p ¼ 5k0 (in both cases k0 ¼ 0.633 lm,
d ¼ k0 /8).12 These plots should be compared with those of Figure 24.6(a) for
which p ¼ 3k0. Notice the substantial departure of the curves in Figure 24.7(a)
from those in Figure 24.6(a). However, there are similarities between Figures
24.6(a) and 24.7(b), stemming from the fact that in both cases the grating period
is fairly large and the grooves are rather shallow.
Figure 24.8 shows plots of q(0) versus p at a fixed angle of incidence (h ¼ 30 ,
k0 ¼ 0.633 lm, d ¼ k0 /8). The solid (broken) arrows at the bottom (top) of the figure
indicate the locations of Rayleigh anomalies for the classical (conical) mount. It
appears that as the period increases the various zeroth-order efficiencies approach a
limiting value in the vicinity of 55%. The remainder of the incident energy in this
case is partly absorbed by the metal layer and partly distributed among other dif-
fracted orders. As p ! 1 the orders that carry the bulk of the reflected energy
converge towards the zeroth order line. At the same time, the overall reflectance,
which becomes more and more concentrated around the direction of specular
reflection, approaches the specular reflectivity of the flat metal layer at 30 incidence
1.0
(a) 0 = 0.633μm
0.9 p = 30
d = 0/8
0.8
0.7 s(0)
Diffraction Efficiency
0.6 p(0)
0.5
0.4
0.3 p(0)
s(0)
0.2
0.1
0.0
0 15 30 45 60 75 90
(degrees)
0.24 (b)
0.20 p(+1)
0.16
0.12
s(+1)
0.08
s(+1) p(+1)
0.04
0.00
0 15 30 45 60 75 90
(degrees)
Figure 24.6 Computed plots of diffraction efficiency versus the angle of

incidence h for qp, qs (classical mount) and qp0 , q s0 (conical mount, i.e., grooves
parallel to the incidence plane). The solid (broken) arrows indicate the locations
of Rayleigh anomalies for the classical (conical) mount. (a) zeroth-order, (b)
þfirst-order, and (c) first-order diffracted beams upon reflection from the
grating of Figure 24.1 (k0 ¼ 0.633 lm, p ¼ 3k0, d ¼ k0 /8).
0.24 (c)
0.20 s(–1)
p(–1)
0.16
0.12
p(–1)
0.08
s(–1)
0.04
0.00
0 15 30 45 60 75 90
(degrees)
Figure 24.6 (continued)
(i.e., 84% for p-light, 88% for s-light). In the opposite extreme, p ! 0, the reflectivity
curves once again show a limiting behavior. Although there are no other diffracted
orders in this case, the limiting value of q(0) is not necessarily the same as the
specular reflectance of the flat metal layer but should be calculated from an
“effective medium” theory.
Effect of the groove depth

Another factor that complicates the behavior of a grating is the dependence of its
efficiency on the groove depth d. Figure 24.9 shows plots of q(0) versus h for
reflection from the grating of Figure 24.1 when the groove depth d ¼ 0.2 lm
(k0 ¼ 0.633 lm, p ¼ 3k0). These curves are quite different from those of Figure
24.6(a), which correspond to a similar grating with shallower grooves. The lower
values of q(0) in the case of a deep-groove grating indicate that more light is being
channeled into other diffracted orders.
Reciprocity theorem
There exists a powerful and quite unexpected reciprocity relation between the beam
incident on a grating and any of the resulting diffracted orders. Suppose the incident
beam arrives at the grating at an angle h and the mth diffracted order emerges at an
angle h(m), having diffraction efficiency q(m) or, in the case of a transmitted order,
1.0
(a) 0 = 0.633 μm, d = 0/8
p = 0
0.8
s(0) p(0)
0.6
0.4
0.2 p(0)
s(0)
0.0
0 15 30 45 60 75 90
(degrees)
1.0
(b) p = 50
0.8
s(0) s(0)
p(0)
0.6
(0)
p
0.4
0.2
0.0
0 15 30 45 60 75 90
(degrees)
Figure 24.7 Computed plots of diffraction efficiency versus h for the zeroth-order
diffracted beam upon reflection from the grating of Figure 24.1 (k0 ¼ 0.633 lm,
d ¼ k0 /8). In (a) the grating period p ¼ k0 while in (b) p ¼ 5k0. The solid (broken)
arrows indicate the locations of Rayleigh anomalies for the classical (conical) mount.
s(m). If the direction of incidence is now changed in such a way that the incident
beam is along the path of the mth-order beam (in the reverse direction, of course),
there emerges a mth diffracted order along the path of the original incident
beam (again in the reverse direction). The reciprocity theorem states that the
0.9
0.8 s(0)
0.7 p(0)
s(0)
0.6
0.5
0.4 (0)
p
0.3 0 = 0.633 μm
p = 0 /8
0.2 = 30°
0.1
0 1 2 3 4 5 6
p (μm)
Figure 24.8 Computed plots of the zeroth-order efficiency versus the grating
period p for the grating of Figure 24.1 (k0 ¼ 0.633 lm, d ¼ k0/8, h ¼ 30 ). The
solid (broken) arrows indicate the locations of Rayleigh anomalies for the
classical (conical) mount.
1.0
0.8 0 = 0.633 μm
p = 30 s(0)
d = 0.2 μm
0.6
p(0)
0.4
0.2 s(0)
p(0)
0.0
0 15 30 45 60 75 90
(degrees)
Figure 24.9 Computed plots of the zeroth-order diffraction efficiency versus

the angle of incidence for the grating of Figure 24.1 (k0 ¼ 0.633 lm, p ¼ 3k0,
d ¼ 0.2 lm). The solid (broken) arrows indicate the locations of Rayleigh
anomalies for the classical (conical) mount.
efficiency of this particular diffracted order will be exactly equal to q(m) (or s(m)).
This theorem can be rigorously proved under general conditions.2 In Figure 24.6
the first-order efficiency curves in the classical mount, i.e., qs(1) and qp(1),
show several manifestations of the reciprocity theorem. A few more conse-
quences of reciprocity will be pointed out in the examples that follow.
Resolving power
Consider a grating of period p having a total of N grooves. The width of the mth-
order diffracted beam that covers the entire grating is Np cos h(m). If this beam is
brought to diffraction-limited focus by a lens of focal length f, the focused spot
diameter D will be1
D k0 f =ðNp cos hðmÞ Þ: ð24:4Þ
Spectroscopists are interested in the focused spots formed by two nearby

wavelengths, k0 and k0 þ Dk. According to Eq. (24.1) the diffraction angle h(m) in
the classical mount is given by sin h(m) ¼ sin h þ mk0/p, in which case for a small
change of wavelength Dk we have
cos hðmÞ DhðmÞ ðm=pÞDk: ð24:5Þ
Therefore, in the focal plane of the lens, a shift of the wavelength from k0 to
k0 þ Dk causes a shift of the focused spot by the following amount:
f DhðmÞ mf Dk=ðp cos hðmÞ Þ: ð24:6Þ
The two wavelengths are just resolved when the above shift equals the spot
diameter D in Eq. (24.4), that is, when f Dh(m) D. This leads to the following
expression for the resolving power:
k0 =Dk mN: ð24:7Þ
It is thus seen that the resolving power of a grating is directly proportional to N,

its total number of illuminated grooves, and to m, the order of diffraction. The
resolving power is completely independent of such seemingly relevant factors as
the groove period, the groove geometry, and the incidence angle.
Littrow mount and blazed gratings

To build compact spectrometers, it is desirable that one of the diffracted orders
should return along (or almost along) the direction of incidence. In the so-called
Littrow mount, the nth-order beam, where n is negative, returns along the direction
of incidence. For instance, in the first-order Littrow mount, we find from Eq. (24.1)
2 sin h ¼ k0 =p: ð24:8Þ
Under this condition, if p < 1.5k0, then the only possible diffracted orders are the
zeroth and the first. Furthermore, if the efficiency for the zeroth order can be
reduced to zero, all the available power that is not absorbed by the grating will
return along the first reflected order, thus maximizing the sensitivity of the
spectrometer. Gratings that direct all or most of the incident optical power into a
single diffracted order are known as blazed gratings. Although in the early days
ruled gratings having a triangular groove profile satisfied the blaze condition, a
triangular cross-section is no longer a prerequisite to the blazing property.
Gratings with triangular cross-section and a 90 apex angle are now more
appropriately referred to as “echelette” gratings.
Figure 24.10 shows a metallic prism with an inclination angle a. When a plane
wave is normally incident on the inclined facet of this prism, the specularly
=
Incident beam
4m0/2
3m0/2
2m0/2
m0/2

d = 12 m0 cos
m0
p=
2 sin
Figure 24.10 A normally incident beam of light is specularly reflected from the
inclined facet of a metallic prism (inclination angle a). For a given integer m,
imagine cutting the prism along the broken and dotted lines, which are parallel to
the direction of incidence and have lengths that are multiples of mk0 / 2. The various
sections are then rearranged to form the echelette grating shown in the lower part of
the figure. If the grating is similarly illuminated at h ¼ a, the diffracted order that
retraces the incidence path in the reverse direction will be quite strong, which is why
this kind of grating has come to be known as a blazed grating.
reflected light returns along the direction of incidence. Let the lengths of the
equidistant lines drawn on the prism parallel to the direction of incidence be
integer multiples of mk0 / 2, where m is an arbitrary (but fixed) integer. If the
metal prism is cut along these lines and its segments rearranged, one obtains an
echelette grating with period p ¼ mk0/(2sin a), as shown in the lower part of the
figure. With an incidence angle h ¼ a on this grating, Littrow’s condition for
the negative mth diffracted order will be satisfied. In the geometric-optical
approximation, this grating should be equivalent to the original prism, because
the various reflected rays from its individual facets suffer phase delays in mul-
tiples of 2p only, making the grating’s reflected wavefront indistinguishable from
that of the prism. In reality, however, the electromagnetic field “feels” the groove
structure, and the actual diffraction efficiency of the beam returning along the
direction of incidence will not always be the same as the specular reflectance of
the polished metal prism, although they are usually close.
Figure 24.11 shows computed efficiency curves in the classical mount for the
echelette grating of Figure 24.10 having a ¼ 30 , p ¼ 2k0, and (n, k) ¼ (2, 7) at
k0 ¼ 0.633 lm.12 The horizontal axis depicts sin h, the incidence angle h being
positive (negative) when incidence is from the side of the large (small) facet of
the triangular grooves. The arrows at the top of each frame indicate the locations
of Rayleigh anomalies, in the neighborhood of which resonance features and
slope discontinuities are seen to occur. The zeroth-order efficiency curves for p-
and s-polarized light are shown in Figure 24.11(a). Despite the asymmetrical
groove geometry, the plots of qp(0) and qs(0) are perfectly symmetric around h ¼ 0,
which is a manifestation of the reciprocity theorem mentioned earlier. The þfirst-
order efficiency curves in Figure 24.11(b) show the same kind of symmetry
around h ¼ 14.48 (i.e., sin h ¼ 0.25), which is the angle of incidence for the
þfirst-order Littrow mount. Similarly, the first-order curves in Figure 24.11(c)
show the reciprocity theorem at work around h ¼ 14.48 , the angle of incidence
for the first-order Littrow mount. The Rayleigh anomalies at h ¼ 30 (i.e.,
sin h ¼ 0.5) mark the disappearance of the first-order beams beyond these
angles, as may be seen clearly in Figures 24.11(b) and 24.11(c).
The second-order efficiency curves are shown in Figure 24.11(d). These
curves peak at, and are symmetrical around, h ¼ 30 , where the Littrow con-
dition for the second-order beams is satisfied. Reciprocity between the incident
beam and the second-order reflected beams is evident in the symmetrical values
of efficiency around h ¼ 30 . Note in the case of the p-polarized beam incident
at h ¼ 30 , where the second-order efficiency reaches 80% while that of all
other orders essentially vanishes, that the remaining 20% of the incident power
must have been absorbed by the grating. A similar consideration applies to both
qp(þ2) and qs(þ2) at h ¼ 30 . The third-order beams exist only at large angles
1.0
(a)
0 = 0.633 μm
0.8 p= 20
α = 30°
0.6
s(0)
0.4
0.2
p(0)
0.0
–1.0 –0.5 0.0 0.5 1.0
sin
0.40
(b)
0.35
p(+1)
0.30
0.25
0.20
0.15
0.10
s(+1)
0.05
0.00
–1.0 –0.5 0.0 0.5 1.0
sin
Figure 24.11 Computed plots of diffraction efficiency versus sin h, where h is

the angle of incidence on the echelette grating of Figure 24.10 (k0 ¼ 0.633 lm,
a ¼ 30 , p ¼ 2k0, (n, k) ¼ (2, 7)). When h > 0, incidence is from the large-facet
side of the triangular grooves while when h < 0 incidence is from the small-facet
side. The displayed efficiencies are for p- and s-polarized incident light in the
classical mount. (a) Zeroth order, (b) þfirst order, (c) first order, (d) second
order, (e) third order. The arrows at the top of each frame indicate the
locations of Rayleigh anomalies.
0.30
(c)
0.25
s(–1)
0.20
0.15
0.10
p(–1)
0.05
0.00
–1.0 –0.5 0.0 0.5 1.0
sin
(d)
0.8
p(–2)
0.6
p(+2) s(–2)
0.4
0.2
s(+2)
0.0
–1.0 –0.5 0.0 0.5 1.0
sin
of incidence, as may be inferred from Figure 24.11(e). Again note the symmetry
of these curves (due to reciprocity) around sin h ¼ 0.75; these values of h
correspond to the Littrow mount in the third-order.
For the sake of completeness we present in Figure 24.12 computed efficiency
curves in the case of conical mount for the same echelette grating as discussed
(e)
0.6
0.5 s(+3)
50 (–3)
s
0.4
50 p(–3)
0.3
0.2
0.1 p(+3)
0.0
–1.0 –0.5 0.0 0.5 1.0
sin
above.12 Here the grooves are parallel to the plane of incidence, and symmetry
with respect to h ¼ 0 obviates the need for displaying the results for negative
values of h. In this conical mount only the zeroth and first diffracted orders are
allowed; even then, the first-order beams disappear beyond h ¼ 60 . Note that,
because of the asymmetrical groove shape, the þfirst-order efficiency curves are
quite different from those of the first-order. Also note that, beyond h ¼ 60 ,
where the zeroth-order beam is the only beam reflected from the grating, the
relatively small values of q0p ð0Þ and q0s ð0Þ indicate substantial absorption within the
grating medium.
Transmission grating
Consider a grooved glass plate such as that depicted in Figure 24.13(a). When a
plane wave is incident at h on this grating, the directions of the reflected orders
may be found from Eqs. (24.1) and (24.2), but the transmitted orders inside the
glass plate obey different equations. In the classical mount the transmitted orders
emerge at angles h(m), where
n0 sin hðmÞ ¼ sin h þ mk0 =p: ð24:9Þ
Here n0 is the refractive index of the substrate. The number of diffracted orders in
the substrate could, therefore, be greater than the number reflected into the air.
1.0
(a)
0 = 0.633 μm
0.8 p = 20
α = 30°
0.6 p(0)
0.4
s(0)
0.2
0.0
0 15 30 45 60 75 90
(degrees)
0.30 (b)
0.25
0.20
p(+1)
0.15 s(+1)
0.10
0.05
0.00
0 15 30 45 60 75 90
(degrees)
Figure 24.12 Computed plots of diffraction efficiency versus the angle of

incidence on the echelette grating of Figure 24.10 (k0 ¼ 0.633 lm, a ¼ 30 , p
¼ 2k0, (n, k) ¼ (2, 7)). The displayed efficiencies are for p- and s-polarized
incident light in the conical mount. (a) zeroth order, (b) þfirst order, (c) first
order. The arrows at the bottom of each frame indicate the locations of Rayleigh
anomalies.
0.6 (c)
0.5
0.4
p(–1)
0.3
0.2 s(–1)
0.1
0.0
0 15 30 45 60 75 90
(degrees)
However, when the transmitted orders attempt to exit the bottom of the substrate,
those incident at an angle higher than the critical angle for total internal reflection
will be fully reflected. The beams that do exit the substrate will emerge at
angles greater than h(m), in accordance with Snell’s law; the coefficient n0 on the
left-hand side of Eq. (24.9) is effectively canceled. Consequently, the beams
emerging from the bottom of the substrate have exactly the same number and
(aside from being mirror images) the same directions as those reflected from the
top of the grating. Nonetheless, the transmitted diffracted orders may be
observed in their native form by using a hemispherical substrate, as shown in
Figure 24.13(b).
In the case of conical mount similar arguments apply, so that the mth-order
beam inside the substrate will have a propagation direction given by the unit
vector r(m), where
rðmÞ ¼ rxðmÞ x þ ryðmÞ y þ rzðmÞ z

¼ ð1=n0 Þ½ðsin hÞx þ ðmk0 =pÞy þ rzðmÞ z: ð24:10Þ
Again, rz is determined from the relation rx2 þ ry2 þ rz2 ¼ 1. As above, when this
beam emerges into air from the bottom of a flat substrate, Snell’s law multiplies
rx and ry by the refractive index n0, ensuring that the emergent beams (aside from
being mirror images) have the same propagation directions as the corresponding
beams reflected from the top of the grating.
(a)

Grating
+1
–3 Substrate
(glass)
–2 –1 0
(b)
+1
–3
–2 0
–1
Figure 24.13 A simple transmission grating may be obtained by ruling or

etching a glass substrate, or by a holographic method. The substrate’s refractive
index being greater than unity, the diffraction angles inside the substrate are
smaller than those observed upon reflection from the same grating into the air.
(a) When the substrate bottom is flat, Snell’s law of refraction reorients the
beams as they emerge into the air, making the diffraction angles equal to those
observed in reflection. However, one or more diffracted orders may be missing,
owing to total internal reflection at the substrate bottom. (b) If the grating is
made on the flat surface of a glass hemisphere, the transmitted orders emerge
into the air undisturbed.
Figure 24.14 shows the location of the transmitted diffracted orders from a
glass grating.12 The assumed grating in this case is similar to that of Figure 24.1,
except that the metal layer is absent. The observation system is also similar to that
in Figure 24.3, except for the position of the collimating lens, which is moved to
the opposite side of the grating to collect the transmitted orders. The incident
beam, arriving at h ¼ 30 in the conical mount, is p-polarized. The pictures on
a b
c d
–1600 x/0 1600 –1600 x/0 1600
Figure 24.14 Computed plots of intensity distribution at the exit pupil of the
collimating lens of Figure 24.3, when the system is rearranged to allow
observation of transmitted orders from the grating of Figure 24.1, from which
the metal layer has been removed (k0 ¼ 0.633 lm, p ¼ 4k0, d ¼ k0 /8). In this
case of conical mount at 30 incidence the grooves are parallel to the plane of
incidence, as in Figure 24.2(b), and the incident beam is p-polarized.
The pictures on the left correspond to the component of polarization in the XZ-
plane, while those on the right represent the polarization component along the
Y-axis. In (a) and (b) the substrate bottom is flat, as in Figure 24.13(a), whereas
in (c) and (d) it is hemispherical, as in Figure 24.13(b). The ratio of the peak
intensity in (b) to that in (a) is 0.21 · 104. Similarly, the peak-intensity ratio of
(d) to (c) is 0.89 · 104. These results are based on full vector-diffraction
calculations.
the left-hand side of Figure 24.14 represent the component of polarization in

the XZ-plane (Ek), while those on the right correspond to polarization along the
Y-axis (E?). The top row shows the intensity distribution at the exit pupil of the
collimating lens when the substrate bottom is flat; the bottom row corresponds
to the case of a hemispherical substrate. As expected, in the latter case there
are more diffracted orders, the orders are more closely spaced, and the
individual beam diameters are smaller. For the flat substrate the peak-
intensity ratio jE?j2 : jEkj2 ¼ 0.21 · 104, while for the hemispherical substrate
jE?j2 : jEkj2 ¼ 0.89 · 104.
Dielectric-coated grating
Figure 24.15 is a diagram of a dielectric-coated transmission grating on a
hemispherical glass substrate. In the example that follows it is assumed
that k0 ¼ 0.633 lm, the grating period p ¼ k0, the groove depth d ¼ k0 /8, the
side-wall inclination angle a ¼ 60 , and the duty cycle c ¼ 60%. The coatings
are conformal to the grating surface, both dielectric layers are 100 nm
thick, and their refractive indices are 2.1 and 1.5, as indicated. Because there
are no metallic layers in this case there will be no surface plasmon excita-
tions, but there is the possibility of guided-mode coupling to the dielectric
waveguide formed by the coating layers. The hemispherical substrate allows
all transmitted orders to exit and be measured in air. The bottom of the
hemisphere is antireflection coated, to avoid losses as the beams exit the
substrate.
Figure 24.16 shows computed plots of diffraction efficiency versus h for
the grating of Figure 24.15.12 The case of conical mount does not show
interesting phenomena, as evidenced by the featureless plots of q0 and s0 for
the various orders. This is not surprising, considering that no guided modes
can be launched in the dielectric layers in this case. However, for the classical
mount qp, qs, sp and ss show peaks and valleys that are indicative of resonant
Incident beam

n2 = 1.5
100 nm
n1 = 2.1 100 nm

Substrate
(n0 = 1.5)
–2 +1
–1 0
Figure 24.15 Cross-section of a dielectric-coated diffraction grating. The side-

wall angle a ¼ 60 , and the duty cycle c, which is the ratio of the land width to
the grating period, is 60%. Both coating layers are 100 nm thick and (at k0
¼ 0.633 lm) their refractive indices are n1 ¼ 2.1 and n2 ¼ 1.5. For the substrate,
which is also transparent, n0 ¼ 1.5.
1.0
(a)
Bilayer-coated grating
0 = 0.633 μm
0.8
p = 0
d = 0/8
0.6 s(0)
s(0)
0.4
p(0)
0.2 p(0)
0.0
0 15 30 45 60 75 90
(degrees)
0.10
(b)
p(–1)
0.08
0.06
s(–1)
0.04
0.02
0.00
0 15 30 45 60 75 90
(degrees)
Figure 24.16 Computed diffraction efficiencies versus h for the dielectric-

coated grating of Figure 24.15 (k0 ¼ 0.633 lm, p ¼ k0, d ¼ k0 /8). Reflected
beams: (a) zeroth order, (b) first order. Transmitted beams: (c) zeroth order, (d)
þfirst order, (e) first order, (f) second order (classical mount only). The
arrows at the top or the bottom of each frame indicate the locations of Rayleigh
anomalies in the classical mount.
1.0
(c)
0.8 p(0)
0.6 s(0)
p(0)
0.4
s(0)
0.2
0.0
0 15 30 45 60 75 90
(degrees)
0.06 (d)
p(+1)
0.05
s(+1)
p(+1)
0.04
0.03
s(+1)
0.02
0.01
0.00
0 15 30 45 60 75 90
(degrees)
behavior. Figure 24.16(b) shows plots of qp and qs for the first-order

reflected beam, which carries as much as 8% of the incident beam into this particular
direction at several angles of incidence. Reciprocity between the incident beam and
the first-order reflected beam is evident in Figure 24.16(b), in the symmetrical
0.14
(e)
0.12 p(–1)
0.10
s(–1)
0.08
0.06 p(–1)
0.04 s(–1)
0.02
0.00
0 15 30 45 60 75 90
(degrees)
0.30 (f )
0.25
p(–2)
0.20
0.15
0.10
0.05
s(–2)
0.00
0 15 30 45 60 75 90
(degrees)
values of efficiency before and after h ¼ 30 . Note that, unlike surface plasmon
excitations in metals, which occur in p-polarization only, the waveguide modes of
dielectric layers can be excited by both p- and s-polarized light. For the classical
mount, Figure 24.16(d) shows that the þfirst-order transmitted beam is cut off
beyond h ¼ 30 . In its place the second-order transmitted beam shown in Figure
24.16(f) appears and shows fairly high efficiency for p-polarized light in a narrow
range of angles around h ¼ 33 .
It is impossible to describe in a brief survey the entire range of physical
phenomena that occur in diffraction gratings and their potential applications. We
hope, however, to have brought to the reader’s attention the richness and com-
plexity of the physics of gratings, and to have encouraged further exploration of
this fascinating subject.

1980.
2 R. Petit, editor, Electromagnetic Theory of Gratings, Vol. 22 of Topics in Current
Physics, Springer Verlag, Berlin, 1980.
3 M. C. Hutley, Diffraction Gratings, Academic Press, New York, 1982.
4 E. G. Loewen and E. Popov, Diffraction Gratings and Applications, Marcel Dekker,
New York, 1997.
5 J. Fraunhofer, Ann. d. Physik 74, 337 (1823), reprinted in his collected works, 117
(Munich, 1888).
6 H. A. Rowland, Phil. Mag. (5), 13, 469 (1882).
7 R. W. Wood, On a remarkable case of uneven distribution of light in a diffraction
grating spectrum, Phil. Mag. 4, 396–402 (1902).
8 J. W. S. Rayleigh, Proc. Roy. Soc. London A 79, 399 (1907).
9 D. Maystre, Rigorous vector theories of diffraction gratings, in Progress in Optics,
Vol. 21, 1–67, ed. E. Wolf, Elsevier, Amsterdam, 1984.
10 D. Maystre, ed., selected Papers on Diffraction Gratings, SPIE Milestone series,
Vol. MS 83, SPIE, Bellingham, 1993.
12 The simulations in this chapter were performed by DELTA, a program developed
by Lifeng Li for grating calculations, and by DIFFRACTTM, a product of MM
Research Inc., Tucson, Arizona.
Chapter 25
Diffractive optical elements
Diffractive optical elements (DOEs), which are relatively new additions to the
toolbox of optical engineering, can function as lenses, gratings, prisms, aspherics,
and many other types of optical element. Typically formed in a film of only a few
microns thickness, a DOE may be fabricated on an arbitrarily-shaped substrate.
Flexible functionality, wide range of available optical aperture, light weight, and
low manufacturing cost are among the advantages of DOEs. They can be fab-
ricated in a broad range of materials such as aluminum, silicon, silica, and
plastics, thus providing flexibility in selecting the base material for specific
applications. The effects of temperature change, thermal gradients, shock, and
stress in thin film optical devices, however, can cause deformation of the sub-
strate and ultimately alter the behavior of a DOE.1,2,3,4,5,6
DOEs are wavelength sensitive; for instance, the focal length and aberration char-
acteristics of a diffractive lens can vary substantially if the wavelength of the incident
light is changed. DOEs can duplicate most of the functions provided by conventional
glass optics provided that the optical system operates over a narrow spectral bandwidth,
or the operation of the system requires chromatic dispersion. To date, DOEs have found
widespread application in beam-combiners, head-mounted displays, beam-shaping
optics, laser collimators, spectral filters, compact spectrometers, diode laser couplers,
projection displays, compact disk (CD) and digital versatile disk (DVD) players, laser
resonators, computer interconnects, solar concentrators, laser material processing, and
wavelength division multiplexers/demultiplexers.
Optimal design of advanced optical systems requires a thorough understanding
of the interaction between the light beam and the various elements located between
the light source and the detectors. In this chapter we use a combination of polari-
zation ray-tracing and quasi-vector diffraction modeling to analyze the behavior of
a laser beam as it propagates through various diffractive optical elements.
351
Transmissive diffractive optical element

Figure 25.1(a) shows a geometric-optical ray (vacuum wavelength ¼ k0) arriving
through a medium of refractive index n1 at the surface of a substrate (refractive
index ¼ n2) coated with a variable thickness layer; the angle and the azimuth of inci-
dence are h1, 1, those of the transmitted ray are h2, 2. The incident wavefront at the
front facet of the substrate may be written as A(x, y) ¼ A0 exp [i(2pn1/k0)(xrx þ yry)],
where rx ¼ sin h1 cos 1 and ry ¼ sin h1 sin 1.
The coating layer has thickness t(x, y) and refractive index n. To avoid certain
complications in the following analysis we shall assume that n is very large and
t(x, y) very small, so that only the product (n n1) t(x, y), known as the optical
path difference (OPD), has a finite value. The characteristic function of the coating
layer is thus the dimensionless function F(x, y) ¼ (n n1)t(x, y)/kc, where kc is
some fixed “construction wavelength.” The characteristic function is generally
specified by a polynomial such as
X X
N Nm
Fðx; yÞ ¼ amn xm yn : ð25:1aÞ
m¼0 n¼0
F(x, y) must be greater than or equal to zero across the surface since n n1, t(x, y) and
kc are all non-negative. For later reference, the gradient of F(x, y) is written below:
rFðx; yÞ ¼ ð@F=@x; @F=@yÞ

XN Nm
X X
N X
Nn
¼ m amn y x ;
n m1
n m
amn x y n1
: ð25:1bÞ
m¼1 n¼0 n¼1 m¼0
n1 n2 n1 n2
t (x, y)
X X
n
Y Z Y Z
Figure 25.1 (a) A ray of light (vacuum wavelength ¼ k0) is incident at an oblique
angle (h1, 1) from a medium of refractive index n1 onto a substrate of index n2. The
substrate is coated with a layer of index n and variable thickness t(x, y), where n is
assumed to be large and t(x, y) very small, so that only the optical path difference,
OPD ¼ (n n1)t(x, y), has a finite value. (b) The variable thickness layer is con-
verted to a DOE by reducing the coating layer’s thickness wherever the OPD
contains an integer multiple of the construction wavelength kc. The characteristic
function of the DOE is thus the fractional part f(x, y) of the characteristic function of
the coating layer in (a), defined as F(x, y) ¼ (n n1)t(x, y)/kc.
A diffractive optical element (DOE) is constructed from the above coating layer by
reducing the layer’s thickness whenever F(x, y) happens to be greater than unity.
By removing from t(x, y) all integer multiples of kc/(n n1), one obtains a coating
such as that in Figure 25.1(b), for which the integer part of F(x, y), if any, has been
eliminated in all locations. The characteristic function f(x, y) of the DOE, with
values confined to the interval [0, 1], is simply the fractional part of F(x, y).
As shown in Figure 25.2, the coating layer’s F(x, y) is truncated at contours
where the function acquires integer values, so the local period (Dx, Dy) of the
DOE at a point such as (x0, y0) is the shortest line segment through (x0, y0) that
satisfies the equation
rFðx; yÞ ðDx ^
x þ Dy ^yÞ ¼ ð@F=@xÞDx þ ð@F=@yÞDy ¼ 1: ð25:2Þ
In Eq. (25.2) x ^ and ^y are unit vectors along the coordinate axes. Noting that
jrFj ¼ (@F/@x)2 þ (@F/@y)2, we find (Dx, Dy) ¼ rF/jrFj2. This is the local
2
period of the grating at (x0, y0), which is directed along rF and has magnitude
1/jrFj. In the linear approximation, a single period of the grating
begins at (x, y) ¼ (x0, y0) f(x0, y0)rF/jrFj2, where f(x, y) ¼ 0, and ends at
(x, y) ¼ (x0, y0) þ [1 f(x0, y0)]rF/jrFj2, where f(x, y) ¼ 1.
F/| F |2
(xo, yo)
Figure 25.2 Diagram of a DOE showing the slicing contours where the
function F(x, y) assumes integer values. The DOE’s characteristic function f(x, y)
is the fractional part of F(x, y). Thus, while F(x, y) is continuous across the
XY-plane, f(x, y) jumps by one unit at each contour. The space between each pair
of adjacent contours contains a single groove of the DOE, where f(x, y) varies
continuously between the values of 0 and 1. At an arbitrary location (x0, y0)
in the XY-plane, the separation between adjacent contours is given by
(Dx, Dy) ¼ rF/jrFj2, which is a vector of magnitude 1/jrFj oriented orthogonal
to the contours.
Since n is assumed to be large, inside the coating layer of Figure 25.1(a)

the ray travels along the Z-axis and acquires an extra phase W(x, y) ¼ 2p(n n1)
t(x, y)/k0 ¼ 2p(kc/k0)F(x, y). As long as k0 ¼ kc, the truncation of F(x, y), i.e.,
removal of its integer part, does not affect the acquired phase shift W(x, y); in
other words, eliminating 2p multiples does not change the transmitted beam’s
phase profile. However, when k0 ¼ 6 kc, the XY-plane may be divided into segments,
defined by the contours of truncation, where the phase of the transmitted beam over
each segment differs from W(x, y) by some integer-multiple of 2p(kc/k0); the DOE
thus modulates the incident phase by w(x, y) ¼ 2p(kc/k0)f(x, y). In the vicinity of an
arbitrary point (x0, y0), considering the local periodicity of the grating along the
direction rF, the modulating phase function exp[iw(x, y)] may be expanded in the
following (one-dimensional) Fourier series:
X
exp½i2pðkc =k0 Þf ðx; yÞ ¼ Cm expfi2pm½ð@F=@xÞðx x0 Þ
m
þ ð@F=@yÞðy y0 Þg; ð25:3aÞ
where the Fourier coefficients are given by

Z
Cm ¼ jrF j exp½i2pðkc =k0 Þf ðx; yÞ expði2pmjrF jsÞds: ð25:3bÞ
In Eq. (25.3b), the one-dimensional integral is taken in the XY-plane along

a straight line segment drawn parallel to rF through (x0, y0); the range of
integration, starting at (x, y) ¼ (x0, y0) f(x0, y0)rF/jrFj2 and ending at
(x, y) ¼ (x0, y0) þ [1 f(x0, y0)] rF/jrFj2, covers one full period of the grating;
see Figure 25.2. Expanding f(x, y) to first order in Taylor series yields
f ðx; yÞ ¼ f ðx0 ; y0 Þ þ ð@F=@xÞðx x0 Þ þ ð@F=@yÞðy y0 Þ: ð25:4Þ
Substituting for f(x, y) in Eq. (25.3b) from Eq. (25.4) and carrying out the
integration, we find
Cm ¼ exp½i2pmf ðx0 ; y0 Þ expfip½ðkc =k0 Þ mgsinc½ðkc =k0 Þ m; ð25:5Þ
where sinc(x) ¼ sin(px)/px. The mth order diffraction efficiency is thus found to
have the constant amplitude jCmj ¼ sinc[(kc/k0) m] across the XY-plane for any
given k0. When k0 happens to be the same as the construction wavelength kc, the
first order beam will have 100% efficiency while all other orders vanish. Also, if
kc is an integer-multiple of k0, only one order will emerge, unattenuated, from the
DOE. For all other values of k0, the various orders m ¼ 0, 1, 2, etc. will
coexist. The second term in Eq. (25.5) corresponds to a constant phase, p[(kc/k0)
– m], which is independent of (x0, y0) and may thus be ignored in practice. The
remaining phase, 2pmf(x0, y0), varies continuously across the XY-plane with
absolutely no dependence on k0. Since f(x0, y0) is the fractional part of F(x0, y0),
the two functions may be exchanged and the phase acquired by the mth order rays
written as 2pmF(x0, y0). In practice the lack of any discontinuous jumps in this
phase profile of the mth order beam is extremely important, since it means that
the wavefront associated with each and every diffraction order is well-behaved.
In other words, if one assembles all the mth order rays from across the DOE to
construct the mth order transmitted beam, the beam will have a continuous
wavefront.
The transmitted wavefront around (x0, y0), the foot of the incident ray, can now
be written
X
Aðx; yÞ ¼ A 0m exp½ið2pn2 =k0 Þðxrx0m þ yry0m Þ
m
¼ A0 exp½ið2pn1 =k0 Þðxrx þ yry Þ exp½iwðx; yÞ
X
¼ Cm A0 expfið2p=k0 Þ½ðn1 rx þ mk0 @F=@xÞx
m
þ ðn1 ry þ mk0 @F=@yÞyg: ð25:6Þ
The (complex) amplitude and the direction of the mth order transmitted ray are
thus given by
A 0m ¼ Cm A0 ; ð25:7aÞ

ðr0x ; r0y Þm ¼ n1 rx þ mk0 @F=@x; n1 ry þ mk0 @F=@y =n2 : ð25:7bÞ
Note that the mismatch between the refractive indices n1, n, and n2 is not taken
into consideration in Eq. (25.7a) as far as reflection losses at the various inter-
faces are concerned. Also ignored in this analysis are the effects of incident
polarization on the transmission coefficient Cm, which would have required a
rigorous vector diffraction treatment.
For m 6¼ 0, the direction of the mth order transmitted ray, (rx0, ry0)m, is seen from
Eq. (25.7b) to depend on the illumination wavelength k0 in a way that gives rise
to a substantial amount of chromatic aberration; this provides the basis for cor-
recting the chromatic aberrations of conventional refractive lenses by incorpor-
ating diffractive optical elements in the so-called hybrid designs. In going from
medium 1 to medium 2 of Figure 25.1, the undiffracted 0th- order ray follows the
0 0
Snell’s law since, according to Eq. (25.7b), (n2rx0 , n2ry0 ) ¼ (n1rx, n1ry). For other
diffraction orders, one must add mk0rF to the incident beam’s (n1rx, n1ry) in
order to obtain the transmitted beam’s (n2rx0, n2ry0) m.
Having exploited the localized ray picture to build the transmitted wavefront(s)
across the DOE surface, we now abandon the rays and concentrate instead on the
transmitted wavefronts (one for each diffracted order). When the incident
wavelength k0 differs from the construction wavelength kc, the various orders
will be present in the mix in different amounts, with the magnitude of the mth
beam, jCmj , being a function of m and the wavelength ratio kc/k0. Although the
phase profile of each diffracted order is independent of the incident wavelength
k0, this does not imply that a given diffracted order behaves identically in
response to different incident wavelengths. Remember that the mth order
phase profile is exp[i2pmF(x, y)], so, for simplicity’s sake, let us assume that
F(x, y) ¼ ax þ by, where a and b are arbitrary constants. This phase profile may
then be written as exp[i(2p/k)(mkax þ mkby)], where k ¼ k0/n2 is the wavelength
within the medium of refractive index n2. This represents a plane wave having
direction cosines (rx, ry) ¼ (mka, mkb), whose propagation direction evidently
depends on k0, even though its phase profile is independent of the incident
wavelength. The bottom line is that the rays and the wavefronts that emerge from
the above analysis paint a consistent picture, both leading to the same conclusions
concerning the diffraction efficiency and the chromatic aberrations associated
with each diffracted order of the transmitted beam.
Reflective diffractive optical element

The arguments of the preceding section may be extended to cover the case of an
ideal reflective DOE shown in Figure 25.3. As before, the incidence medium has
n1
t (x, y)
Perfect
2 Reflector
1
X
n
Y Z
Figure 25.3 The case of a reflective DOE differs from that of a transmissive
DOE in that the transparent substrate is now replaced with a perfect reflector.
The incident rays, after traveling through the coating layer, bouncing back at the
substrate interface, and returning through the same thickness of the coating
layer, re-emerge into the incidence medium (refractive index ¼ n1). The DOE is
constructed from the coating layer by removing from t(x, y) all integer multiples
of ½kc/(n n1).
r
h(r)
Figure 25.4 A surface of revolution around the z-axis is defined by its sag h(r),
which is the distance of the surface (along z) from the plane tangent to the surface
at its vertex. The curvilinear coordinate s follows the tangent to the surface in the
rz-plane. The value of s at each point is the length of the curve measured from
some point of reference, such as the vertex at (r, z) ¼ (0, 0). Also shown is a pair of
incident and refracted rays at the surface.
refractive index n1, but the DOE’s substrate is a perfect reflector. We assume
once again that the variable-thickness layer has a large refractive index n and a
correspondingly small thickness t(x, y). The optical path difference upon trans-
mission through the layer and reflection at the substrate interface is thus given by
OPD ¼ 2(n n1)t(x, y), which yields the characteristic function F(x, y) ¼
2(n n1)t(x, y)/kc, with kc being the construction wavelength. Once again, the
DOE is constructed from the above coating layer by reducing the layer’s
thickness whenever F(x, y) exceeds unity. Note that the above factor of 2 in the
expression for the OPD – representing the effect of double-path through the
coating layer – does not affect any of the subsequent results, since the starting
point of our derivations is the function F(x, y), which already incorporates this
factor. The formal derivations for a reflective DOE parallel those of the
transmissive DOE in the preceding section, until we reach Eq. (25.6), at which
point the refractive index n2 of the medium into which the beam emerges (upon
transmission through the DOE) must be replaced with n1, reflecting the fact that
the incidence and emergence media are now the same. Therefore, for reflective
DOEs, the only equation that needs to be modified is Eq. (25.7b), which
assumes the following form:

ðr0x ; r0y Þm ¼ rx þ ðmk0 =n1 Þ@F/@x; ry þ ðmk0 =n1 Þ@F/@y : ð25:8Þ
All the considerations discussed in the case of transmissive DOEs apply equally
to reflective elements as well.
DOE on a curved surface

Curved surfaces may also be coated with DOEs, and the method of calculating
reflected/transmitted rays is essentially the same as that described in conjunction
with flat surfaces in the preceding sections. The reason is that all such calcula-
tions are based on the properties of the surface and of the incident and emergent
rays over small patches, where curved surfaces are flat locally. The only com-
plication arises from the fact that the DOE’s characteristic function is usually
defined with respect to a coordinate system whose axes do not follow the profile
of the surface. We limit the present discussion to the case of a curved surface of
Gaussian Glass
Beam Plate
DOE
–2.1 x (mm) 2.1 Cover Substrate

Slip
Aspheric Destination
Lens Plane
Figure 25.5 Gaussian beam (k0 ¼ 0.66 lm, e1 radius R0 ¼ 2.5 mm, diameter
D ¼ 4.0 mm) is focused by a 4.0 mm diameter lens (thickness ¼ 1.7 mm,
refractive index ¼ 1.540 44, first surface: radius of curvature Rc ¼ 11.4 mm,
conic constant j ¼ 0.733, aspheric coefficients A4 ¼ 2.82 · 107, A6 ¼ 3.75
· 108, A8 ¼ 1.5 · 109; second surface: Rc ¼ 98 mm). The incident beam,
linearly polarized along the x-axis, has the intensity profile shown on the left-
hand side. The glass plate (d1 ¼ 0.61 mm), the cover slip (d2 ¼ 0.5 mm), and the
substrate (d3 ¼ 2.0 mm) all have the same refractive index n ¼ 1.520 168. The
glass plate is 1.0 mm away from the lens and 14.38 mm away from the cover slip.
The destination plane is at z ¼ 10.0 mm (measured from the first vertex of the
lens), and is tilted by h ¼ 6.03 , as shown. The beam is subsequently propagated
a distance of 10.468 mm along the normal to the destination plane, which brings
the beam to its plane of best focus.
revolution, such as that in Figure 25.4, where the axis of symmetry is z, and the
sag is a given function h(r) of r. The characteristic function of such a DOE is
usually defined by a radial polynomial,
X
N
FðrÞ ¼ an r n : ð25:9Þ
n¼0
Consider the local surface coordinate s shown in Figure 25.4. The value of s at
each point on the surface is the length of the curve measured from some point of
reference such as the vertex at (r, z) ¼ (0, 0). What we need is the characteristic
function’s gradient over a short distance Ds, namely, DF/Ds. But
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Ds ¼ ðDrÞ þ ðDhÞ ¼ Dr 1 þ ðdh=drÞ2 :
2 2
ð25:10Þ
1.5
y (mm)
–1.5
1.5
y (mm)
–1.5
–5 x (mm) –2 –5 x (mm) –2 –5 x (mm) –2
Figure 25.6 Distributions of intensity (top) and phase (bottom) at the destination
plane in the system of Figure 25.5; from left to right, x-, y-, and z-components of
polarization. Note that the emergent beam is centered at x ¼ 3.6 mm. The peak
intensities are in the ratio of Ix : Iy : Iz ¼ 1.0 : 0.39 · 103 : 0.13. In the residual
phase profiles x, y, z, where the wavefront curvature and tilt are factored out,
the color spectrum in each plot covers the range from minimum (blue) to max-
imum (red); here (min : max) is (0 : 39 ) for x, (147 : 39 ) for y, and
(146 : 0 ) for z.
15
y (μm)
–15
–15 x (μm) 15 –15 x (μm) 15 –15 x (μm) 15
y (μm)
–5
5
y (μm)
–5
–5 x (μm) 5 –5 x (μm) 5 –5 x (μm) 5
Figure 25.7 Plots of log-intensity (top), intensity (middle), and phase (bottom)
at the plane of best focus in the system of Figure 25.5. From left to right: x-, y-,
and z-components of polarization. The peak intensities are in the ratio of Ix : Iy :
Iz ¼ 1.0 : 0.15 · 103 : 0.115. The phase profiles’ range (blue to red) is (min :
max) ¼ (180 : 180 ).
Therefore,
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
@F=@s ¼ ð@F=@rÞ 1 þ ðdh=drÞ2 : ð25:11Þ
Equation (25.11), in conjunction with the equations derived previously for flat
surfaces, is all that one needs in order to compute the various diffracted rays and
wavefronts associated with DOEs on curved substrates.
Y 2.0 mm
1.6
Gaussian
Beam
DOE
y(mm) X
–1.6
–1.6 x (mm) 1.6
10 mm
Destination Plane
Figure 25.8 A linearly polarized Gaussian beam enters a glass prism of refractive
index n ¼ 1.65 whose rear facet is coated with a DOE. The incident beam’s intensity
profile is shown on the left-hand side. The emergent diffracted beam is the þfirst
order. The entrance and exit facets of the prism are antireflection-coated, and the
destination plane is a distance Dy ¼ 10 mm below the prism’s exit facet.
Transmissive DOE sandwiched between a pair of flat substrates

Figure 25.5 shows an aspheric lens illuminated with a Gaussian beam (k0 ¼ 0.66 lm,
e1 radius R0 ¼ 2.5 mm, diameter D ¼ 4.0 mm). The emerging convergent beam
passes through a glass plate on its way to a flat DOE sandwiched between a substrate
and a cover slip. The DOE’s construction wavelength kc is the same as k0 (hence the
emergent beam is the first diffracted order), and its phase profile is given by (x and y
in millimeters):
Fðx; yÞ ¼ 639:77 x þ 17:47 x2 19:76 y2 30:18 x3 0:0042 x2 y 33:69 xy2

þ 0:0021y3 3:25 x4 :
The incident rays are traced through the entire system, then back-traced to the
so-called destination plane, located at z ¼ 10 mm from the first vertex of the
lens and tilted by h ¼ 6.03 , as shown. At the destination plane, the magni-
tude, phase, and polarization state of the rays are used to reconstruct the
wavefront. Figure 25.6 shows the reconstructed wavefront’s intensity and
phase distribution at the destination plane. The wavefront’s curvature and tilt
are factored out, otherwise the phase variations across the cross-sectional
profiles will be too great to display. Note that the y-component is nearly four
orders of magnitude weaker than the x-component, whereas the z-component’s
power content is non-negligible. The phase profiles of Figure 25.6 are quite
1.75
y (mm)
–1.75
1.75
y (mm)
–1.75
1.75
y (mm)
–1.75
–1.75 x (mm) 1.75 –1.75 x (mm) 1.75 –1.75 x (mm) 1.75
Figure 25.9 Plots of intensity (top), phase (middle) and phase minus curvature
(bottom) at the destination plane of the system of Figure 25.8. From left to right:
x-, y-, and z-components of polarization. The peak intensities are in the ratio of
Ix : Iy : Iz ¼ 105 : 0.4 : 1.08. The range of the phase profiles (blue to red) is (min :
max) ¼ (180 : 180 ).
uniform, corresponding to a small residual aberration with an r.m.s. wavefront

error 0.003 k0.
Figure 25.7 shows plots of log-intensity, intensity, and phase in the plane
of best focus for the x-, y-, and z-components of polarization. Note that the
y-component is nearly four orders of magnitude weaker than the x-component,
whereas the z-component is fairly strong. The observed linear phase profile is due
to the 6.03 tilt of the focal plane relative to the incident beam coordinates (see
the focal plane coordinates in Figure 25.5).
Y
1.6
Gaussian Destination Plane
Beam
y (mm) X
DOE
Cover
–1.6 Aspheric Slip
–1.6 x (mm) 1.6 Lens
Figure 25.10 A linearly polarized Gaussian beam is focused via a DOE-coated

bi-aspheric lens through a glass cover slip (d ¼ 1.2 mm, n ¼ 1.573 456), which is
separated from the lens by 1.0 mm. The incident beam’s intensity profile is
shown on the left-hand side. The 3 mm diameter lens has thickness ¼ 1.8256
mm, refractive index ¼ 1.597 075, first surface parameters: radius of curvature
Rc ¼ 1.93 mm, conic constant j ¼ 0.655 844, aspheric coefficients A4 ¼ 2.833 ·
103, A6 ¼ 4.389 · 105, A8 ¼ 1.524 · 104; A10 ¼ 1.177 · 104; and second
surface parameters: Rc ¼ 6.744 mm, j ¼ 31.754, A4 ¼ 7.358 · 103,
A6 ¼ 2.5077 · 103, A8 ¼ 1.106 · 103; A10 ¼ 3.871 · 104. The destination
plane is at the exit pupil of the aspheric singlet, and the beam is subsequently
propagated to the focal plane.
Reflective DOE on flat substrate

Figure 25.8 shows a flat DOE on the rear facet of a glass prism, illuminated by a
Gaussian beam (k0 ¼ 0.65 lm, e1 radius R0 ¼ 2.0 mm, diameter D ¼ 3.0 mm,
linearly polarized along x). The only emergent beam is the þfirst diffracted
order, as the DOE’s construction wavelength kc is the same as k0. The DOE’s
aperture diameter is 5.0 mm, and its phase profile within its own plane is
F(x, y) ¼ 3.0(x2 þ y2); here both x and y are in millimeters. Figure 25.9 shows
the reflected intensity and phase profiles at the destination plane. These plots
depict intensity (top), phase (middle), and phase-minus-curvature (bottom),
with the x-, y-, z-components of polarization shown from left to right. Note that the
y- and z-components are several orders of magnitude weaker than the x-component.
The DOE’s 45 tilt produces the astigmatism seen in the phase plots.
Transmissive DOE on an aspheric glass lens

Figure 25.10 shows a DOE-coated aspheric lens illuminated with a Gaussian beam
(k0 ¼ 0.78 lm, e1 radius R0 ¼ 2.0 mm, diameter D ¼ 3.0 mm, linearly polarized
along x). The DOE’s phase profile is F(r) ¼ 4.2r2 – 2.5r4 þ 0.25r6 (r in mm),
2
y (mm)
–2
2
y (mm)
–2
–2 x (mm) 2 –2 x (mm) 2 –2 x (mm) 2
Figure 25.11 Plots of intensity (top) and phase (bottom) at the exit pupil of
the aspheric lens in the system of Figure 25.10. From left to right: x-, y-, and
z-components of polarization. The peak intensities are in the ratio of Ix : Iy :
Iz ¼ 1000 : 0.6 : 70. The range of the phase profiles (blue to red) is (min :
max) ¼ (180 : 180 ).
and its construction wavelength kc is the same as k0; hence the emergent beam is
the þfirst diffracted order.
The incident rays are first traced through the entire system, then back-traced to the
destination plane located at the exit pupil of the objective lens; the emergent
wavefront is subsequently reconstructed from the traced rays. Figure 25.11 shows
plots of intensity and phase at the destination plane. Shown from left to right are the
x-, y-, and z-components of polarization. The curvature of the wavefront has been
factored out, so what is displayed is the residual phase or aberrations. Note that the
y-component is nearly three orders of magnitude weaker than the x-component, but
the z-component is not so weak. The wavefront at the exit pupil is then propagated to
the focal plane and shown in Figure 25.12, where the y-component of polarization is
seen to be more than three orders of magnitude weaker than the x-component.
Reflective DOE on a parabolic mirror

Figure 25.13 shows the diagram of a DOE-coated parabolic mirror illuminated with
a Gaussian beam (k0 ¼ 0.65 lm, e1 radius R0 ¼ 2.0 mm, diameter D ¼ 3.0 mm,
4
y (μm)
–4
–4 x (μm) 4 –4 x (μm) 4
Figure 25.12 Intensity distribution at the focal plane of the lens in the system
of Figure 25.10; (left) x-component, (right) y-component of polarization. The
peak intensities are in the ratio Ix : Iy ¼ 1000 : 0.27.
Y Destination
Plane
1.6 Parabolic
Mirror
X
y(mm)
DOE
–1.6
–1.6 x (mm) 1.6 Gaussian
10 mm
Beam
Figure 25.13 A Gaussian beam is reflected from a DOE-coated parabolic

mirror. The incident beam, linearly polarized along the x-axis, has the intensity
profile shown on the left-hand side. Since the DOE’s construction wavelength kc
is 0.55 lm, various diffracted orders exist, although the most intense beam,
shown in Figure 25.14, is the þfirst order. The destination plane is a distance
Dz ¼ 10.0 mm from the vertex of the paraboloid.
linearly polarized along the x-axis). The paraboloid has radius of curvature
Rc ¼ 40 mm, conic constant j ¼ 1, and aperture diameter D ¼ 3.0 mm;
the DOE’s phase profile is given by F(r) ¼ r2–1.25r4 þ 0.35r6 þ 0.1r8 (r in
millimeters). Since the DOE’s construction wavelength is kc ¼ 0.55 lm, various
diffracted orders exist, although the most intense beam, shown in Figure 25.14, is
the þfirst order. Figure 25.14 shows the reflected intensity and phase profiles at
the destination plane, located 10.0 mm away from the mirror’s vertex; this also
happens to be 10.0 mm before the mirror’s nominal focal plane. From left to right,
1
y (mm)
–1
1
y (mm)
–1
–1 x (mm) 1 –1 x (mm) 1 –1 x (mm) 1
Figure 25.14 Plots of intensity (top) and phase (bottom) at the destination
plane of the system of Figure 25.13. From left to right: x-, y-, and z-components
of polarization. The peak intensities are in the ratio of Ix : Iy : Iz ¼ 1.0 : 0.33 ·
106 : 0.177 · 102. The range of the phase profiles (blue to red) is (min :
max) ¼ (180 : 180 ). For display purposes the curvature phase factor has been
taken out of the mesh.
these plots represent the x-, y-, and z-components of polarization. Note that the
y-component is nearly six orders of magnitude weaker than the x-component,
whereas the z-component is only 600 times weaker.

1 J. Turunen and F. Wyrowski, Diffractive Optics for Industrial and Commercial
Applications, Akademie Verlag, Berlin, 1977.
2 W. Veldkamp and T. J. McHugh, Binary optics, Scientific American, May 1992, 50.
3 W. C. Sweatt, Describing holographic optical element as lens, J. Opt. Soc. Am. 67,
803 (1977).
4 M. W. Farn, Quantitative comparison of the general Sweatt model for the grating
equation, Appl. Opt. 31, 5312 (1992).
5 L. N. Harza, Kinoform lenses: Sweatt model and phase function, Optics
Communications 117, 31 (1995).
6 F. Wyrowski, Diffractive optical elements: iterative calculation of quantized, blazed
phase structures, J. Opt. Soc. Am. A 7, 961 (1990).
26
The Talbot effect
The Talbot effect, also referred to as self-imaging or lensless imaging, was

originally discovered in the 1830s by H. F. Talbot.1 Over the years, investigators
have come to understand different aspects of this phenomenon, and a theory of
the Talbot effect based on classical diffraction theory has emerged which is
capable of explaining the various observations.2,3,4 For a detailed description of
the Talbot effect and related phenomena, as well as a historical perspective on the
subject, the reader may consult references 3 and 4 and further references cited
therein. Since many of the standard optics textbooks do not even mention the
Talbot effect, it is worthwhile to bring to the reader’s attention the essential
features of this phenomenon.
Lensless imaging of a periodic pattern

The Talbot effect is observed when, under appropriate conditions, a beam of light
is reflected from (or transmitted through) a periodic pattern. The pattern may
have one-dimensional periodicity (as in traditional gratings), or it may exhibit
periodicity in two dimensions (e.g., a surface relief structure or a photographic
plate imprinted with identical features on a regular lattice).
In what follows we shall present the diffraction patterns obtained from a periodic
array of cross-shaped apertures in an otherwise opaque screen. Because the dif-
fraction pattern of a single aperture differs markedly from that of a periodic array of
such apertures, we begin by examining the behavior of an individual aperture under
coherent illumination. Consider the cross-shaped opening in an opaque screen
shown in Figure 26.1(a). A collimated beam of coherent light, wavelength k, illu-
minates the screen at normal incidence; the assumed length and height of
the aperture are each 20k. Logarithmic plots of intensity distribution at distances
z ¼ 100k, 200k, and 600k beyond the screen are computed and shown in Figures 26.1
(b)–(d), respectively (note the different scales of these figures). For z > 600k the
367
a b
–20 x/ 20 –120 x/ 120

c d
–120 x/ 120 –350 x/ 350
Figure 26.1 (a) A cross-shaped aperture in an opaque screen, illuminated by a

normally incident plane wave of wavelength k. The length and the height of the
cross are each 20k. Also shown are the computed plots of intensity distribution
(logarithmic) at various distances z from the aperture: (b) z ¼ 100k, (c) z ¼ 200k,
(d) z ¼ 600k. Note that the scale varies.
intensity distribution will have the far field pattern of Figure 26.1(d), although its size
will scale with distance from the screen. Under no circumstances do we obtain an
intensity pattern that closely resembles the cross shape of the aperture itself.
Now consider the periodic array of cross-shaped apertures shown in Figure 26.2(a);
each aperture is identical to that in Figure 26.1(a). The center-to-center spacing
between adjacent apertures along the X- and Y-directions is p ¼ 60k. (For sim-
plicity we have assumed the periodic pattern to extend to infinity, although, for
practical purposes, a finite number of apertures in a periodic arrangement
will suffice.) When the pattern in Figure 26.2(a) is illuminated by a normally
incident, coherent beam of light, the cross shape of the apertures is abundantly
reproduced in the intensity patterns obtained at certain distances from the screen.
Figures 26.2(b)–(f) show the computed patterns of intensity distribution at dis-
tances z ¼ 600k, 1200k, 1800k, 2700k, and 3600k, respectively. (Note that all
pictures in Figure 26.2 have the same scale.) When the distance from object to
image z ¼ p2/k, as is the case in Figure 26.2(f), the original pattern of the apertures
a b c
d e f
–120 x/ 120 –120 x/ 120 –120 x/ 120
Figure 26.2 (a) A periodic array of cross-shaped apertures in an opaque

screen, illuminated by a normally incident plane wave of wavelength k. As in
Figure 26.1, the crosses are 20k wide on each side. Also shown are the
computed plots of intensity distribution at various distances z from the aperture:
(b) z ¼ 600k, (c) z ¼ 1200k, (d) z ¼ 1800k, (e) z ¼ 2700k, (f) z ¼ 3600k. Note that
the scale is the same for all the various pictures.
is reproduced, albeit with a half-period shift in both the X- and the Y-direction. In
Figure 26.2(d), the distance to the image is p2/(2k), and not only is the original
pattern replicated but also its frequency (along both X and Y) has doubled. In
Figure 26.2(c), where the distance to the image is z ¼ p2/(3k), the pattern is
repeated with three times the original frequency along both X- and Y- axes.
By showing the intensity distribution at other distances from the object,
Figures 26.2(b), 26.2(e) emphasize that perfect reproduction of the shapes in
the original pattern does not occur everywhere but only at certain special planes.
A hint as to why these periodic patterns are reproduced at certain intervals may be
gleaned from the following argument. A plane wave normally incident on a periodic
structure creates a discrete spectrum of plane waves propagating along the directions
k ¼ ðkx ; ky ; kz Þ ¼ 2p m=p; n=p; ð1=kÞ2 ðm=pÞ2 ðn=pÞ2 : ð26:1Þ
The z-component of this vector may be approximated as follows:

h i
kz ð2p=kÞ 1 12 ðmk=pÞ2 12 ðnk=pÞ2 ð26:2Þ
provided that p/k is large enough that, for all m, n values of interest, the above
Taylor-series expansion to first order suffices. The acquired phase after a
propagation distance of z will then be
kz z ð2pz=kÞ pzðm2 þ n2 Þk=p2 : ð26:3Þ
Now, since m, n are integers, if z happens to be an even-integer multiple of

2
p /k then the above phase will differ from the constant value 2pz/k by a mul-
tiple of 2p only. Since all plane waves emanating from the object will thus
arrive at the image plane with the same phase factor, their superposition will
recreate the original pattern.
It turns out that z does not need to be an even-integer multiple of p2/k for self-
imaging to occur. At odd-integer multiples of p2/k, for instance, a replica of the
original pattern will also emerge, but with a half-period shift. Multiple images of
the pattern will appear at certain non-integer multiples of p2/k as well. These
aspects of the Talbot effect will be further clarified below, when we present a
more rigorous analysis.
Although the mathematical argument supporting the Talbot effect depends on
periodicity of the object in the XY-plane, certain patterns that are not globally
periodic, but appear to be so locally, will also produce self-images. For example,
the concentric ring pattern shown in Figure 26.3(a), when illuminated by a nor-
mally incident coherent beam, will yield the patterns of Figures 26.3(b)–(d) at
distances z ¼ 18k, 27k, and 36k, respectively. The period p of the rings is 6k and
the width of the bright rings is 2k. Clearly, the self-images break down near the
center and near the outer edge, because (local) periodicity is no longer valid in
these regions. But a near self-image at z ¼ p2/k and a frequency-doubled image at
z ¼ p2/(2k) are clearly observed. Another example is shown in Figure 26.4, where a
spiral pattern with period p ¼ 9k is propagated to distances z ¼ p2/(2k), 3p2/(4k),
and p2/k. Again in Figures 26.4(b), (d) the center and the outer rings are not well
reproduced, but nearly everything else is.
The Talbot effect is much more general than the above limited exposition may
indicate. The pattern periodicities may be in one or two dimensions; the object
may modulate both the amplitude and the phase of the light beam; certain
applications rely on the use of incoherent light sources; in the case of two-
dimensional periodic patterns, the underlying lattice may be square, rectangular,
hexagonal, etc.; the incident beam may be a plane wave or a spherical wavefront
originating at a point source; applications are not limited to visible light but
extend to X-rays and microwaves, as well as to electron and atom optics. To
appreciate the variety of arrangements that lead to useful and interesting images
the reader is encouraged to consult the published literature.
55
a b
y/
–55
55
c d
y/
–55
–55 x/ 55 –55 x/ 55
Figure 26.3 (a) A mask consisting of eight concentric rings (width ¼ 2k,
spacing ¼ 6k) is illuminated by a normally incident plane wave of wavelength k.
The computed intensity distributions shown here are at distances of (b) z ¼ 18k,
(c) z ¼ 27k, and (d) z ¼ 36k from the mask. A bright spike appearing in the central
region of each image has been blocked off in order to improve the image contrast.
A simple analysis
Consider the point source shown in Figure 26.5, located at (x, y, z) ¼ (x0, y0, 0) and
radiating a spherical wavefront into the region z > 0 of space. In this analysis we
assume that all spatial dimensions are normalized by the vacuum wavelength k of
the light; as a result, k will not appear explicitly in any of the following equations.
In the z ¼ z0 plane, the complex-amplitude distribution may be written
Aðx; y; z ¼ z0 Þ ¼ ð1=rÞ expði2prÞ

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ 1= ðx x0 Þ2 þ ðy y0 Þ2 þ z20
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
· exp i2p ðx x0 Þ2 þ ðy y0 Þ2 þ z20

ð1=z0 Þ expði2pz0 Þ· exp ip x2 þ y2 =z0

· exp ip x02 þ y20 =z0 · exp½i2pðxx0 þ yy0 Þ=z0 : ð26:4Þ
60 a b
y/
–60
60
c d
y/
–60
–60 x/ 60 –60 x/ 60
Figure 26.4 (a) A mask consisting of a spiral aperture (width 3k, spacing 9k)
is illuminated by a normally incident plane wave of wavelength k. The com-
puted intensity distributions shown here are at distances of (b) z ¼ 40.5k,
(c) z ¼ 60.75k, and (d) z ¼ 81k from the aperture. As in the previous figure, a
bright spike appearing in the central region of each image has been blocked off
in order to improve the image contrast.
X
Z0
(x0, y0)
Z
Figure 26.5 A quasi-monochromatic point source located at (x, y, z) ¼ (x0, y0, 0)

radiates a cone of light into the half-space z > 0.
In deriving the above approximate expression we have used, for the exponent, the
first term in the Taylor series expansion
pffiffiffiffiffiffiffiffiffiffiffiffiffi
1 þ x2 ¼ 1 þ 12 x2 þ ð26:5Þ
Now, the first two terms on the right-hand side of Eq. (26.4) are the approximate
form of the spherical wavefront emanating from a point source at the origin of the
plane z ¼ 0. The next term is a constant phase factor that depends on the position
(x0, y0) of the point source within the XY-plane and the last term is a linear phase
factor in x and y.
Next, let us assume that a periodic mask, having periods ax and ay along the
X- and Y-axes, is placed at z ¼ z0 (see Figure 26.6). In the general case, where
the mask modulates the phase and/or the amplitude of the light beam, its
complex-amplitude transmission function may be written
XX
tðx; yÞ ¼ Cmn exp½i2pðmx=ax þ ny=ay Þ: ð26:6Þ
When the incident spherical wavefront is multiplied by t(x, y), each Fourier
component of t(x, y) will create a different spherical wavefront which, according to
Eq. (26.4), appears to originate at a different point (x0, y0) ¼ (mz0 /ax, nz0 /ay)
within the XY-plane. In addition, each such point source appears to have the
following phase factor:
expðimn Þ ¼ exp½ipðx02 þ y20 Þ=z0

¼ exp½ipz0 ðm2 =a2x þ n2 =a2y Þ: ð26:7Þ
The net effect of the mask, therefore, is to replace the single point source with a
periodic array of point sources, as shown in Figure 26.7, where the magnitude of
each point source is Cmn exp(i mn). At the observation plane, each point source
will give rise to a spherical wavefront that will obey Eq. (26.4), except that the
z0
ax
(0, 0) Z
Point
source
ay
Y Periodic
phase/amplitude
mask
Figure 26.6 A quasi-monochromatic point source located at the origin of the

coordinate system illuminates a periodic phase and/or amplitude mask placed
parallel to the XY-plane at z ¼ z0. The periods of the mask’s pattern are ax along
the X-axis and ay along the Y-axis.
X
z1
Z
z0 z0 + z1
Y
Periodic array Periodic Observation
of point sources mask plane
Figure 26.7 Interaction between the periodic mask and the cone of light
shown in Figure 26.6 gives rise to an array of (virtual) point sources, each
having a certain phase and amplitude depending on the structure of the mask
and its location z0 along the Z-axis. To determine the light distribution at the
observation plane one may replace the mask by this “equivalent” array of point
sources.
distance z0 is replaced by z0 þ z1. We thus have
Aðx; y; z ¼ z0 þ z1 Þ ½1=ðz0 þ z1 Þ exp½i2pðz0 þ z1 Þ

· exp½ipðx2 þ y2 Þ=ðz0 þ z1 Þ
XX
· Cmn exp½ipz0 ðm2 =a2x þ n2 =a2y Þ
· exp½ipðm2 =a2x þ n2 =a2y Þz20 =ðz0 þ z1 Þ
· expfi2p½xðmz0 =ax Þ þ yðnz0 =ay Þ=ðz0 þ z1 Þg: ð26:8Þ
The first two factors in the above equation correspond to a spherical wavefront
with radius of curvature z0 þ z1; we need not keep track of them any longer. The
last factor can be simplified if we define a magnification factor M ¼ (z0 þ z1)/z0, in
which case it is written as
expfi2p½mx=ðMax Þ þ ny=ðMay Þg: ð26:9Þ
This is just the (m, n)th plane-wave component of the spectrum, whose periods
ax, ay are magnified by a factor M. Except for this scale factor, the Fourier
basis functions have not changed in going from the plane of the mask (z ¼ z0)
to the observation plane (z ¼ z0 þ z1). The main factors in Eq. (26.8), there-
fore, are the first two factors in the double sum; these can be written as
follows:
h i h i
exp ip m =ax þ n =ay z0 z1 =ðz0 þ z1 Þ ¼ exp ipðz1 =M Þ m =ax þ n =ay :
2 2 2 2 2 2 2 2
ð26:10Þ
Let us now assume that a2x and ay2 have a least common multiple in the following
sense:
l a2x ¼ ma2y ¼ a2 ; ð26:11Þ
where both l and m are integers. Then the phase factor in Eq. (26.10) may be
written
expfip½z1 =ðMa2 Þðlm2 þ mn2 Þg: ð26:12Þ
Since lm2 þ mn2 is an integer, if z1 is chosen to be 2jMa2 with j integer, then the
phase factor in Eq. (26.12) will become unity for all values of m and n and can
therefore be ignored. Under such circumstances Eq. (26.8) will yield a magnified
image of the mask at the observation plane. This is the essence of the Talbot
effect.
By allowing z0 to approach infinity, the above results can be readily extended to
the case of plane-wave illumination. The magnification factor M will become unity
in this case, but no other change will be necessary in the preceding equations.
Image multiplicity
The appearance of multiple images at the observation plane may be readily
explained in the special case where the periodicity is one dimensional and the
frequency of the image is twice that of the object. The explanation, nonetheless,
captures the essence of the phenomenon and can be easily extended to peri-
odicity in two dimensions and to higher multiplicities. Consider the periodic
function f (x) shown in Figure 26.8(a). Note that the period ax is much larger
than the width of the individual “features” of the function, so that there is plenty
of space to insert additional features. Let the Fourier-series representation of
this function be
X
f ðxÞ ¼ Cm expði2p mx=ax Þ: ð26:13Þ
In the Fourier domain, the Fourier transform F(m) of f (x) is a “comb” function with
period 1/ax, where the delta function at position m is multiplied by the corres-
ponding Fourier coefficient Cm, as shown in Figure 26.8(b).
Now, let us assume that the odd coefficients of F(m) are multiplied by a
complex constant b. (This would happen in Eq. (26.12), for instance, if l ¼ 1,
m ¼ 0, and z1 ¼ 12 Ma2 , in which case b ¼ i.) We can then separate the Fourier
coefficients of f (x) into even and odd terms, as shown in Figure 26.9. Both the
resulting comb functions in the Fourier domain will have twice the period of the
(a) f (x)
x
–2ax –ax 0 ax 2ax
(b) F(m)
C–1 C1
C–4 C4
C0 C3
C–3
C–5 C–2 C2 C5
m
–5 –4 –3 –2 –1 0 1 2 3 4 5
Figure 26.8 (a) A periodic function f (x) in one-dimensional space; the indi-
vidual “features” of the function are much narrower than its period ax. (b) The
Fourier transform of f (x) consists of a sequence of delta functions located at
integer multiples of 1/ax in the Fourier domain.
Feven(m)
C–4 C4
C–2 C0 C2
m
–5 –4 –3 –2 –1 0 1 2 3 4 5
Fodd(m)
C–1 C1
C–3 C3
C–5 C5
m
–5 –4 –3 –2 –1 0 1 2 3 4 5
Figure 26.9 In Figure 26.8(b), when the odd components of the Fourier-
transformed function F(m) are multiplied by a constant b, the function may be
resolved into two “comb” functions, Feven(m) and Fodd(m). In these new func-
tions the spacing between adjacent delta functions is 2/ax and, in the case of
Fodd(m), the function is shifted by a half-period.
original comb function; therefore, their inverse transforms in the x-domain will
have twice the frequency. The second comb function in Figure 26.9 is also shifted
by a half-period, which means that its inverse transform must be multiplied
by exp(i2px/ax). The resulting comb functions in the x-domain are shown in
Figure 26.10. The net result is that when we add the two comb functions of
Figure 26.10 and convolve the resultant with the unit-period function f0(x), we
will find the function shown in Figure 26.11. Because the width of f0(x) is less
than half the period ax, the new features added to the function will not overlap
with the old ones, yielding a function with an apparently increased frequency.
x
–2ax –ax ax 2ax
x
–2ax –ax ax 2ax
Figure 26.10 The comb function corresponding to Feven(m), when inverse-

transformed to the x-domain, will yield a comb function that has twice the
frequency of the original function f (x). Likewise, the inverse transform of the
comb function corresponding to Fodd(m) will have a spacing of 12 ax between its
adjacent delta functions but, because of the half-period shift in the
Fourier domain, every other delta function is flipped over.
½(1 – ) f0(x) ½(1 + ) f0(x)
x
–2ax –ax 0 ax 2ax
Figure 26.11 When the sum of the two comb functions in Figure 26.10 is
convolved with the individual features f0 (x) of f (x), the resulting function appears
to have twice the frequency of the original f (x). Note, however, that the “features”
of the new function are alternately multiplied by 12 ð1 þ bÞ and 12 ð1 bÞ.
However, the periodicity is only in the amplitude of the function, since the phase
of each feature differs from the phase of its neighbors. In any event, this
description explains why the apparent periodicity of the pattern in Figure 26.2
increases at certain distances between the object and the image.

1 H. F. Talbot, Phil. Mag. 9, 401 (1836).
2 Lord Rayleigh, Phil. Mag. 11, 196 (1881).
3 O. Bryngdahl, Image formation using self-imaging techniques, J. Opt. Soc. Am. 63,
416–419 (1973).
4 J. F. Clauser and M. W. Reinsch, New theoretical and experimental results in Fresnel
optics with applications to matter-wave and X-ray interferometry, Appl. Phys. B 54,
380–395 (1992).
27
Some quirks of total internal reflection
Readers are undoubtedly familiar with the phenomenon of total internal reflection
(TIR), which occurs when a beam of light within a high-index medium arrives with
a sufficiently great angle of incidence at an interface with a lower-index medium.
What is generally not appreciated is the complexity of phenomena that accompany
TIR. For instance, consider the simple optical setup shown in Figure 27.1, where a
uniform beam of light is brought to focus by a positive lens, being reflected,
somewhere along the way, at the rear facet of a glass prism. Assuming a refractive
index n ¼ 1.65 for the prism material, the critical angle of incidence is readily
found to be hcrit ¼ sin1(1/n) ¼ 37.3 . Let the lens have numerical aperture
NA ¼ 0.2 (i.e., f-number ¼ 2.5). Then the range of angles of incidence on the
prism’s rear facet will be (33.5 , 56.5 ). The majority of the rays thus suffer total
internal reflection and converge, as depicted in Figure 27.1, towards a common
focus in the observation plane.
Figure 27.2 shows computed plots of intensity and phase at the observation
plane, indicating that the focused spot essentially has the Airy pattern, albeit with
minor deviations from the ideal. The diameter of the first dark ring, for example,
is approximately 6k, which is close to the theoretical value of 1.22k/NA for the
Airy disk.1 The coma-like tail appearing on the right-hand side of the focused
spot is caused by those rays that strike the prism in the neighborhood of the
critical TIR angle, hcrit, thus introducing apodization and aberration. (Apodization
is due to a reduction of the reflectivity of the prism below the critical angle, and
aberration is caused by deviations from linearity of phase as a function of angle of
incidence.) One noteworthy feature of the focused spot of Figure 27.2 is that it is
not centered on the optical axis, but is shifted to the right by about one wave-
length. This shift is known as the Goos–Hänchen effect,2, 3, 4 and its cause will
become clear in the course of the following discussion.
For the prism of Figure 27.1 the computed amplitude and phase of Fresnel’s
reflection coefficients at the glass-to-air interface are presented in Figure 27.3.1
379
Observation
plane
X
Z
Y
TIR Prism
Lens
Figure 27.1 Focusing of a uniform beam through a TIR prism. The incident beam
is linearly polarized along the X-axis, the numerical aperture of the lens is 0.2, and
the refractive index of the prism material is 1.65. The entrance and exit facets of the
prism are assumed to be spherical so that ray-bending by Snell’s law at these
surfaces is avoided, thus eliminating the corresponding spherical aberrations.
+10
y/
–10
–10 z/ +10 –10 z/ +10
Figure 27.2 Plots of (a) logarithmic intensity distribution and (b) phase, at the
focal plane of the lens. The center of the bright spot is shifted to the right by about
one wavelength in consequence of the Goos–Hänchen effect. The light and dark
rings in the phase plot correspond to regions of 0 and 180 phase, respectively.
The curves for both p- and s-components of polarization are shown, even though
in our example we are primarily concerned with p-polarized light. Note that
beyond the critical angle the phase of the reflected p-light has a very large slope.
To the extent that this phase may be approximated by a straight line (within the
range of incidence angles of interest) it imparts a linear phase shift to the beam
upon reflection from the prism’s rear facet. This linear phase shift is nothing other
than a wavefront tilt, which causes a displacement of the focused spot; in other
words, it gives rise to the Goos–Hänchen effect. One might phrase the same
explanation in the language of Fourier-transform theory by stating that when a
function is multiplied by a linear phase factor, its Fourier transform is displaced
by an amount proportional to the slope of that phase factor.
Note that the largest slopes of the phase plots in Figure 27.3(b) occur imme-
diately after the critical angle; therefore, the greatest effects would be observed
(a) (b)
180
1.0
160
Phase of Reflection Coefficient (degrees)

0.8 140
120
fs
0.6 100
|rs| 80
0.4
60 fp
|rp| 40
0.2
20
0.0 0
0 15 30 45 60 75 90 0 15 30 45 60 75 90
(degrees) (degrees)
Figure 27.3 Plots of amplitude and phase for the reflection coefficients of the
p- and s-components of polarization at a glass–air interface. The assumed index
of the glass is n ¼ 1.65. The critical angle for TIR is hcrit ¼ sin1(1/n) ¼ 37.3 ,
and the Brewster angle is hB ¼ tan1(1/n) ¼ 31.2 .
when the incident beam’s angular spectrum is confined to the vicinity of hcrit. In our
example, of course, the range of incident angles is fairly large (33.5 to 56.5 ), and
deviations from linearity of the phase function show up as higher-order aberrations
(e.g., coma, astigmatism, spherical aberration, defocus). It is this deviation from
linearity that is mainly responsible for the aberration of the focused spot seen in
Figure 27.2.
A question frequently asked about TIR concerns the balance of energy among the
incident beam, the reflected beam, and the evanescent waves that exist in the medium
beyond the prism. If all the light is reflected at the glass–air interface, then how can
there be any energy in the form of electromagnetic fields in the region immediately
beyond the interface? To answer this question one must distinguish between the
steady state of the system, which prevails once the waves have established them-
selves throughout space, and the transient state, which exists in the earlier stage
immediately after the light source has been turned on. In the transient state, some of
the incident energy goes into developing the evanescent waves, which are estab-
lished early on and remain for as long as the system remains undisturbed. If one
calculated for the evanescent field the component of the Poynting vector perpen-
dicular to the interface, one would find that the electric and magnetic components of
this field are exactly 90 out of phase and, therefore, that the perpendicular
component of the Poynting vector is zero. In other words, no energy is carried away
from the interface by these evanescent waves. Consequently, all the incident optical
energy in the steady state is carried away by the reflected beam.
Next, we consider the effect of a collimating lens (identical to the original
focusing lens), placed so as to capture the radiation emanating from the focused
spot. (In the system of Figure 27.1, this lens would be placed one focal length
above the observation plane and parallel to it.) The resulting collimated beam is
depicted in Figure 27.4, which shows computed plots of intensity and phase at the
(a) 3100
y/
–3100
–3100 z/ 3100
(b)
300
Phase (degrees)
200
100
–2000
2000
0 0
z /
2000 –2000 y/
Figure 27.4 (a) Plot of intensity distribution at the exit pupil of the collimating
lens. The low-contrast rings are caused by diffraction effects during propagation
and by loss of the high-spatial-frequency content of the spectrum. The rays on
the left side of the beam, having been below the critical angle for TIR, have been
partially transmitted through the prism. (b) Distribution of phase at the exit pupil
of the collimating lens. The small linear slope is responsible for the Goos–
Hänchen displacement of the focused spot. The plateau on the left-hand side is
caused by the (partially reflected) rays that fall below the critical angle. The
sharp rise immediately before reaching the plateau is due to the rapidly
decreasing phase of the reflected rays just above the critical angle.
exit pupil of the collimator. Note, in particular, the strong attenuation of the left
edge of the beam (owing to a loss of rays below hcrit), and also the near-linearity
of the phase plot in regions far from the critical angle. As the light rays approach
hcrit from above, the phase pattern in Figure 27.4(b) rises rather sharply and then
flattens. This is precisely what one would expect based on the behavior of p in
the interval (33.5 , 56.5 ), shown in Figure 27.3(b).
One cannot leave the subject of TIR without at least mentioning the fascinating
phenomena associated with frustrated TIR, which occur when a second prism is
brought to the vicinity of the interface at which TIR occurs. Consider a pair of
identical glass hemispheres separated by an air gap of width D, as shown in
Figure 27.5. Displayed in Figure 27.6 are computed plots of amplitude reflection
coefficients jrpj and jrsj versus the angle of incidence h for three different values
of D. In Figure 27.6(a), where D ¼ 100 nm, one can see close similarities to
Figure 27.3(a), albeit with TIR completely suppressed: the Brewster angle at
hB ¼ 31.2 is still there, but there are no sharp transitions to 100% reflectivity. In
Figure 27.6(b), D is set to 300 nm and the curves are beginning to look more like
those in Figure 27.3(a); it appears that by increasing D one can make a rather
smooth transition to TIR. But wait! In Figure 27.6(c), where D ¼ 400 nm, there
is a radical departure from the presumed “smooth transition”. Specifically, at
h ¼ 20.7 both rp and rs vanish identically. What is going on here? What will
happen if the gap width D keeps increasing? These questions are not difficult to
answer but require some thought. Essentially, at a certain gap width D and at
some angle of incidence h both rp and rs vanish. The gap width is such that, at this
angle, D cos h ¼ k/2 exactly. Now, whenever a non-absorbing layer’s thickness
Incident
Beam Ep
Es Reflected
Beam

Glass
Air gap
Glass
Transmitted
Beam
Figure 27.5 A pair of glass hemispheres separated by an air gap may be used
to demonstrate the phenomenon of frustrated TIR. The coherent beam of light is
directed at the center of the upper hemisphere at incidence angle h. The width D
of the air gap is adjustable.
384
(a) Air Gap = 100 nm (b) Air Gap = 300 nm (c) Air Gap = 400 nm
1.0 1.0 1.0
0.8 0.8 0.8
0.6 0.6 0.6 |rs|

|rs| |rs|
|rp|
0.4 0.4 0.4
|rp| |rp|
0.2 0.2 0.2
0.0 0.0 0.0

0 15 30 45 60 75 90 0 15 30 45 60 75 90 0 15 30 45 60 75 90
(degrees) (degrees) (degrees)
Figure 27.6 Computed amplitude reflection coefficients, jrpj and jrsj, for p- and s-polarized light in the system of Figure 27.5.
The refractive index of the glass hemispheres is n ¼ 1.65, the wavelength of the incident beam is k ¼ 650 nm, and the width D of
the air gap is (a) 100 nm, (b) 300 nm, (c) 400 nm.
becomes an integer multiple of a half-wavelength, that layer will have no effect

on the multiple-beam interferences and can, therefore, be eliminated from con-
sideration. Removing the air gap would bring the two hemispheres into contact,
in which case all the incident light will naturally pass from one hemisphere to the
other, leaving no reflected light whatsoever for either type of polarization.
If the flat surface of the bottom hemisphere in Figure 27.5 is coated
with a metallic layer, one would observe the phenomenon of attenuated TIR.5
Figure 27.7(a) shows plots of jrpj and jrsj versus h for the case of an aluminum-
coated surface separated from the top hemisphere by a 875 nm air gap. The
s-polarized light does not exhibit any interesting effects, but the drop in p-light
reflectivity around h ¼ 37.4 (just 0.1 above hcrit) is quite impressive. In fact,
(a) (b)
1.0 |rs| 1.0
|rp|
Normal Component of Poynting Vector
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 15 30 45 60 75 90 0 200 400 600 800 1000
(degrees) Z (nm)
Figure 27.7 (a) Computed amplitude reflection coefficients, jrpj and jrsj, for
p- and s-polarized light in the system of Figure 27.5, when the flat surface of the
bottom hemisphere is coated with a thick layer of aluminum: (n, k) ¼ (1.47, 7.8),
thickness ¼ 200 nm. The top glass hemisphere is assumed to have refractive
index n ¼ 1.65, the wavelength of the incident beam is k ¼ 650 nm, and the width
D of the air gap is 875 nm. This particular gap width was chosen because it
brought the minimum in rp close to zero. At other gap widths the behavior is
qualitatively the same but the minimum of reflectivity is higher. (b) Component
of the Poynting vector perpendicular to the gap, computed at h ¼ 37.4 . The
horizontal axis is the distance measured from the top of the air gap towards the
aluminized surface at the bottom. The optical energy flows unattenuated through
the air before being fully absorbed in the top 30 nm of the aluminum layer.
when the angle of incidence h is properly selected, it is possible to modulate the

reflectivity of the p-light from essentially 0% to 100% by adjusting the gap width,
without ever bringing the two surfaces into contact (or near contact). This has
provided the mechanism for a novel light-intensity modulator, which was
patented some years ago.6 Figure 27.7(b) shows the component of the Poynting
vector perpendicular to the gap as a function of the vertical distance from the top
surface into the gap; the assumed angle of incidence within the top hemisphere is
37.4 . The optical energy is seen to propagate unattenuated through the 875 nm
gap, before being fully absorbed within the top 30 nm of the aluminum layer.
The physics of attenuated TIR involves the excitation of surface plasmons in the
metallic layer by p-polarized evanescent waves in the air gap. Surface plasmons are
fairly easy to describe and to understand; in fact, they are just inhomogeneous
plane-wave solutions to Maxwell’s equations in absorptive media. See Chapter 10,
“What in the world are surface plasmons?”, for a more comprehensive discussion
of this subject.

1 M. Born and E. Wolf, Principles of Optics, sixth edition, Pergamon Press, New York,
1983.
2 F. Goos and H. Hänchen, Ann. Phys. Lpz. (6) 1, 333 (1947).
3 F. Goos and H. Lindberg-Hänchen, Ann. Phys. Lpz. (6) 5, 251 (1949).
4 H. K. V. Lotsch, Beam displacement at total reflection: the Goos–Hänchen effect,
Optik 32: part I, 116–137, part II, 189–204 (1970); part III, 299–319, part IV,
553–569 (1971).
5 A. Otto, Zeit für Physik 216, 398 (1968).
6 G. T. Sincerbox and J. G. Gordon, Appl. Opt. 20, 1491–1494 (1981).
28
Evanescent coupling
Evanescent electromagnetic waves abound in the vicinity of luminous objects.

These waves, which consist of oscillating electric and magnetic fields in regions
of space immediately surrounding an object, do not transfer their stored energy to
other regions and, therefore, remain localized in space. Like all electromagnetic
waves, the behavior of evanescent waves is governed by Maxwell’s equations,
and their presence in the vicinity of an object helps to satisfy the requirements of
field continuity at the object’s boundaries. Evanescent fields decay exponentially
with distance away from the object’s surface, making them exceedingly difficult
to detect at distances much greater than a wavelength.1
When a beam of light shines on a diffraction grating, for example, various
diffracted orders partake of the energy of the incident beam and carry it away in
different directions. At the same time, evanescent waves are created around the
grating, which ensure the continuity of the field at the grating’s corrugated sur-
face. Similarly, a beam of light shining on an aperture or on a small particle sets
up evanescent fields around the boundaries of these objects. Perhaps the best-
known example of evanescence, however, is provided by total internal reflection
(TIR) from an internal facet of a prism (see Figure 28.1). Here the evanescent field
is formed in the free-space region behind the prism, and remains distinct and
isolated from the propagating (i.e., incident and reflected) beams; this phenomenon
was discussed briefly in Chapter 27.
Bringing an object to the vicinity of another object that has an established
evanescent field in its neighborhood could change the distribution of the elec-
tromagnetic field throughout the entire space. For example, if a material object is
placed behind the prism of Figure 28.1, close enough to sense the evanescent field
but not close enough for the two to make physical contact, photons will tunnel
through the small gap thus created, diverting a fraction of the incident beam
across the gap and into the latter object. This is the essence of evanescent
coupling, of which we present several examples in this chapter. The well-known
387
Reflected
Beam
Glass Prism
Evanescent
Field
Incident
Beam
Figure 28.1 A beam of light is totally internally reflected from the rear facet of
a glass prism. The electromagnetic field lurking in the free space region behind
the prism is evanescent; both its electric and magnetic components decay
exponentially with distance from the interface, and the projection of its Poynting
vector perpendicular to the interface is zero. The energy stored in the evanescent
field is deposited there at the time when the light source is first turned on. In the
steady state, energy is neither added to nor removed from the evanescent field;
all the incoming optical energy is reflected at the rear facet of the prism.
phenomena of frustrated TIR and attenuated TIR, which are of relevance here,
were discussed in previous chapters (see Chapter 10, “What in the world are
surface plasmons?”, and Chapter 27, “Some quirks of total internal reflection”).
Focusing through a glass hemisphere

We begin by considering the system of Figure 28.2, in which a uniform, colli-
mated, linearly polarized beam of light (vacuum wavelength k0 ¼ 633 nm) is
brought to focus by an aberration-free 0.8NA objective lens. A glass hemisphere
of refractive index n ¼ 2, also referred to as a solid immersion lens (SIL), is
placed over the focal plane so that the focused spot rests at its flat facet.2 For
simplicity, we assume that the objective lens and the spherical surface of the SIL
are antireflection coated; thus the only reflected light originates at the flat facet of
the SIL. For rays that arrive at this flat facet at an angle below the critical TIR
angle (hc ¼ arcsin(1/n) ¼ 30 ) the reflectance is fairly small (about 11% at normal
incidence). For h > hc, however, reflectivity is 100%, so that the cone of light
covering the range of ray angles from critical to marginal is fully reflected. The
computed intensity distribution for the reflected light at the exit pupil of the
objective is shown in Figure 28.3(a). The bright ring resulting from TIR is clearly
visible in this plot. The central region of the aperture is not totally dark either, but
X
Objective
Glass Hemisphere
(SIL)
Z
Y
Figure 28.2 A collimated beam of light, uniform, monochromatic (k0 ¼ 633 nm),
and linearly polarized along X, enters an aplanatic 0.8 NA objective lens
(f ¼ 3750k0). A glass hemisphere – also known as a SIL – of refractive index
n ¼ 2 is placed so that its flat facet coincides with the objective’s focal plane.
The surfaces of the objective as well as the spherical surface of the SIL are
antireflection coated, but the flat facet of the SIL is bare. The light reflected
from this flat facet returns to the objective, is collimated by it, and appears at
the exit pupil.
to discern it requires a picture with better contrast. Figure 28.3(b), a logarithmic

plot of the same distribution as in Figure 28.3(a), shows the structure of the
central region. The two dark spots inside the ring along the horizontal axis arise
from low reflectance at and around the Brewster angle. The overall reflectivity at
the flat facet of the hemisphere (as measured at the objective’s exit pupil) is 66%.
Because Fresnel’s reflection coefficients at the flat facet differ for the p- and
s-polarized light, the reflected beam appearing at the exit pupil is no longer
linearly polarized. Figures 28.3(c), (d) show distributions of the x- and y-components
of polarization, respectively. Ex contains about two-thirds of the reflected optical
power, while Ey contains the remaining one-third. What is more, the relative phase of
Ex and Ey varies over the aperture, thus creating a non-uniform state of polarization.
The computed distribution of the polarization ellipticity g is shown in Figure 28.3(e);
here the gray-scale encodes angles from 37 (black) to þ 37 (white). The dis-
tribution of the polarization rotation angle q is shown in Figure 28.3(f), where the
gray-scale represents angles from 90 (black) to þ 90 (white). Clearly the state of
polarization in the TIR region is quite complex.
Suppose now that an identical hemisphere is placed in front of the SIL and
separated from it by a narrow air gap (see Figure 28.4). Under these circum-
stances, evanescent coupling causes a good fraction of the beam to be transmitted
through to the second hemisphere. Figure 28.5 shows the computed distributions
of the reflected light at the exit pupil of the objective lens for a 100 nm air gap.
These distributions should be compared directly with those in Figures 28.3. The
overall reflectance is now 43%, of which two-thirds is again in the x-component of
polarization (Figure 28.5(c)) and one-third in the y-component (Figure 28.5(d)).
a b
c d
e f
–3200 x/0 3200 –3200 x/0 3200
Figure 28.3 Various distributions of the reflected light at the exit pupil of the
objective lens of Figure 28.2. (a) Plot of reflected intensity corresponding to a
66% overall reflectivity at the flat facet of the hemisphere. (b) Logarithmic plot
of the reflected intensity. (c) Intensity distribution for the x-component of
polarization, Ex. (d) Intensity distribution for the y-component of polarization, Ey.
(e) The polarization ellipticity g encoded by gray-scale, covering a range from
37 (black) to þ37 (white). (f) The polarization rotation angle q encoded by
gray-scale, covering a range from 90 (black) to þ90 (white).
Where there was a bright ring of light at the exit pupil in Figure 28.3(a), now
there is a gradual brightening toward the margins in Figure 28.5(a), indicating the
gradual decrease in evanescent coupling with increasing angle of incidence. The
two dark spots in the vicinity of the Brewster angle are clearly visible in the loga-
rithmic plot of Figure 28.5(b). The ellipticity g shown in Figure 28.5(e) varies over
the aperture in the range 29.5 , while the polarization rotation angle q has the
distribution shown in Figure 28.5(f).
X
Objective Air Gap
Z
Y
Glass Hemispheres
Figure 28.4 A collimated beam of light, uniform, monochromatic (k0 ¼ 633 nm),
and linearly-polarized along X, enters an aplanatic 0.8NA objective lens (f ¼ 3750k0).
Two glass hemispheres of refractive index n ¼ 2, separated by an air gap, are
arranged in such a way that the focal plane of the objective coincides with the
mid-plane of the air gap. Both hemispheres are antireflection coated on their spherical
surfaces, but are left bare on their flat surfaces. The light reflected at the air gap returns
to the objective, is collimated by it, and appears at the exit pupil.
Up to this point we have considered the effects of evanescent coupling upon a

full cone of light, which contains rays both below and above the critical TIR
angle, hc. Let us now place a circular mask in the path of the incident beam in
Figure 28.4 in order to block those rays that arrive at the air gap at angles below hc.
The semi-hollow cone of light thus formed by the objective lens contains only rays
with h > hc. Figure 28.6 is the computed plot of reflectance versus gap width for the
system of Figure 28.4 augmented with a mask that completely blocks the central
part of the beam (i.e., the region with no contribution to evanescent coupling). The
reflectance curve is seen to start at zero, when the hemispheres are in contact and
the light is fully transmitted. With a widening gap, however, the reflectance
increases rapidly and saturates at 100% before the gap width reaches even one
wavelength of the light.
Evanescent coupling to a metallic film

Suppose now that the flat facet of the second hemisphere of Figure 28.4 is coated
with a metallic layer, say, a layer of aluminum 50 nm thick (n ¼ 1.4, k ¼ 7.6). For
a 100 nm air gap, Figure 28.7 shows the various distributions of the reflected light
at the exit pupil of the objective lens. The plot of reflected intensity in Figure 28.7(a)
shows high reflectance everywhere except in two crescent-shaped areas, which
correspond to a dip in the Fresnel reflection coefficient for p-polarized light.
The overall reflectance is 92%, of which 66% has x-polarization and 26%
has y-polarization. The remaining 8% of the light has been absorbed by the
a b
c d
e f
–3200 x/0 3200 –3200 x/0 3200
objective lens of Figure 28.4; the gap width is fixed at 100 nm. (a) Plot of the
reflected intensity corresponding to a 43% overall reflectivity at the air gap.
(b) Logarithmic plot of the reflected intensity. (c) Intensity distribution for Ex.
(d) Intensity distribution for Ey. (e) The polarization ellipticity g encoded by
gray-scale, covering a range from 29.5 (black) to þ29.5 (white). (f) The
polarization rotation angle q encoded by gray-scale, covering a range from 90
(black) to þ90 (white).
aluminum layer. (At 50 nm thickness, the aluminum film is opaque; the incident
light is partly absorbed and partly reflected from the film’s surface, practically
no light being transmitted through the film.) The absorbed light comes partly
from the central region of the incident beam, which is transported through the
gap by ordinary (i.e., propagating) waves, and partly from the remaining annular
region, which is transported by evanescent coupling. As before, the reflected
1.0
0.8
0.6
Reflectance
0.4
0.2
0.0
0 150 300 450 600
Gap width (nm)
Figure 28.6 Computed plot of reflectance versus gap width in the system of
Figure 28.4, when a circular mask is placed in the incident beam’s path to block
the rays that arrive at the interface between the hemispheres at or below the
critical TIR angle.
polarization state is quite complex, regions near the critical angle being RCP in
two quadrants and LCP in the other two quadrants. (The coarseness of the mesh
used in these calculations does not reveal the resonant absorption by surface-
plasmon excitation. This type of absorption occurs within a narrow range of
angles just above the critical angle for p-polarized light. Because the angular
range of resonant absorption is extremely narrow, however, its contribution to
the overall absorption within the aluminum film may be neglected.)
To compute the fraction of light absorbed by the aluminum film through
evanescent coupling, we place once again a circular mask in the central region of
the incident beam, blocking all the rays that would arrive at the gap below the
critical angle. The results of calculations in this case are shown in Figure 28.8 for
a bare aluminum film (solid curve) as well as a coated film (broken curve). With
increasing gap width the reflectance of the bare film drops slightly at first, then
rises rapidly to saturate at 100%. When the aluminum film is in contact with the
SIL (i.e., zero gap width) it absorbs about 16% of the light, but by the time the
gap widens to 150 nm the absorption is down to a mere 3%. One can improve
upon this situation by applying a dielectric coating over the aluminum layer. The
broken curve in Figure 28.8 is a plot of reflectance versus gap width for an
aluminum film 50 nm thick coated with a layer of SiO 100 nm thick. It is clear
that evanescent coupling now takes place over a wider range of gaps; in
a b
c d
e f
–3200 x/0 3200 –3200 x/0 3200
objective lens of Figure 28.4, when the flat facet of the second hemisphere is
coated with a layer of aluminum 50 nm thick. The gap width is fixed at 100 nm.
(a) Plot of the reflected intensity, showing a 92% overall reflectance. (If the SIL
is removed, the reflectivity drops to 90%.) (b) Logarithmic plot of the reflected
intensity. (c) Intensity distribution for Ex. (d) Intensity distribution for Ey.
(e) The polarization ellipticity g ranging from 43.4 (black) to þ43.4 (white).
(f) The polarization rotation angle q, ranging from 90 (black) to þ90 (white).
particular, between gap widths of 150 nm and 200 nm there is a plateau of about
10% absorption. (In the absence of the mask blocking the center of the beam, the
dielectric-coated aluminum film absorbs a total of 19% of the incident power at a
gap width of 100 nm.)
The above aluminum–SiO bilayer is discussed for illustration purposes only; it
does not represent the most efficient multilayering scheme for coupling the light
1.00
50 nm Aluminum
0.95
Reflectance
0.90
50 nm Aluminum
+
100 nm SiO
0.85
0 150 300 450 600

Gap width (nm)
Figure 28.8 Computed plots of reflectivity versus gap width for two different
samples. The solid curve shows the reflectance when an aluminum layer 50 nm
thick coats the flat facet of the second hemisphere of Figure 28.4. The broken
curve corresponds to the case where a layer of SiO 100 nm thick and with n ¼ 2
coats the 50 nm thick aluminum film. A mask blocks the central region of the
beam in both cases.
to the aluminum layer. One can do somewhat better by adding additional layers
on top of the aluminum film and/or beneath the flat facet of the SIL and by
optimizing the thickness and refractive index of each such layer.
Magneto-optical disk
A major application area for evanescent coupling is the field of magneto-optical
(MO) disk data storage.2,3,4 Here a disk, which is typically a multilayer stack of
metallic and dielectric layers on a glass or plastic substrate, is placed under a solid
immersion lens (SIL). As the disk spins, the SIL rides on an air cushion, which
separates the two by a fixed gap width. Two of the most important questions in this
area are: (1) how much of the focused optical energy is absorbed within the optical
disk?; (2) how does the reflected MO signal depend on the air gap?
Before answering these questions, however, we must give a brief overview of
the physical mechanisms involved in MO recording and readout. The disk con-
sists of a thin magnetic layer sandwiched between two dielectric layers coated
atop a reflector such as an aluminum-coated substrate (see Figure 28.9). The layer
thicknesses and refractive indices shown in the figure are typical but in fact can
Incident beam
(0 = 633 nm)
100 nm Dielectric (n = 2)
25 nm Magnetic
30 nm Dielectric (n = 2)
25 nm Aluminum (n, k) = (1.4, 7.6)
Substrate
Figure 28.9 Quadrilayer structure of a typical MO disk used in conjunction

with the SIL. Within the drive, the disk rotates at several thousand r.p.m., and the
SIL flies over the top dielectric layer of the disk, supported on a cushion of air
several tens of nanometers thick. The local state of magnetization (up or down)
represents the state of the recorded bit (0 or 1). The size of the focused spot
directed through the SIL at the magnetic layer determines the minimum mark
size that can be recorded and read out.
vary somewhat, depending on the configuration of the drive for which the disk is
intended. The disk is used in reflection, and the multilayer stack is designed to
take advantage of optical interference in order to maximize the coupling of the
laser beam to the magnetic layer.3 The aluminum reflector is an important
component of this optical interference device, but it also serves as a heat sink to
remove from the magnetic layer the thermal energy deposited there by the
focused laser beam. The dielectric layers protect the magnetic film from the
environment and, through their thickness and refractive index, provide the
necessary degrees of freedom for adjusting the optical characteristics of the stack.
Also, the dielectric layer between the magnetic film and the aluminum layer
controls the flow of heat between these two metallic layers.
The optical properties of the magnetic film are fully specified by its dielectric
tensor, namely,
0 1
e e0 0
e ¼ @ e0 e 0 A: ð28:1Þ
0 0 e
For conventional MO media, typical values of the diagonal and off-diagonal

elements of this tensor at k0 ¼ 633 nm are e ¼ 8þ27i and e0 ¼ 0.6 0.2i. The
off-diagonal elements are responsible for cross-coupling the x- and y-components
of polarization, this being the origin of optical activity in these media. If e0 is set
to zero, MO activity vanishes, as if the magnetization of the material had dis-
appeared. Reversing the direction of magnetization causes a sign reversal of e0 .
Suppose a plane wave, linearly polarized along the X-axis, is directed at
normal incidence onto the MO stack of Figure 28.9. Upon reflection from the
stack the beam will have two components of polarization: Ex along the original
X-axis and Ey along the Y-axis. The y-component of the reflected polarization is
created by the optical activity of the magnetic layer. If the magnetization of the
sample is reversed, Ex does not change at all, and Ey undergoes a sign change
only. If Ex and Ey happen to be in phase, the reflected polarization appears to have
been rotated from its original direction; this is referred to as MO Kerr rotation. If,
however, Ey is 90 out of phase with respect to Ex, the returning beam will have
pure Kerr ellipticity. In general, the polarization components have an arbitrary
relative phase and, therefore, the reflected beam exhibits both Kerr rotation and
ellipticity. Both Kerr angles convey information about the state of magnetization
of the disk: when the magnetization reverses, the sign of Ey is switched, in which
case both angles (i.e., rotation and ellipticity) change sign.3
In practice a disk is both recorded and read out using a focused laser beam. The
writing involves local heating of the magnetic film in the presence of an external
magnetic field. At high enough temperature the external magnetic field succeeds
in reversing the direction of local magnetization. Obviously, a small focused spot
yields a small recorded mark (i.e., a small magnetic domain). The SIL is valued in
optical recording precisely because it does produce a small focused spot4 (see
Chapter 37, “Scanning optical microscopy”). During readout, the same focused
spot is used, albeit at a lower power to avoid heating the media. The sign of the
Kerr rotation (or ellipticity) then provides the read signal for the detection sys-
tem. Once again, the usefulness of the SIL for this application becomes apparent
when one recognizes that by producing a small focused spot the SIL helps to
resolve small recorded marks.2,4
Differential detection
The standard method of detecting the MO signal in conventional optical disk
drives is shown in Figure 28.10.3 The beam reflected from the disk and collected
by the objective lens is sent through a Wollaston prism and thus divided between
two identical detectors. Typically the detection module is oriented at 45 with
respect to the original direction of polarization of the laser, so that Ex and Ey are
X Wollaston Prism
Split-detector
+
S1
Y ΔS
Z –
S2
Figure 28.10 Schematic diagram of a differential detection module consisting

of a Wollaston prism and two identical photodetectors. Since it can rotate around
the optical axis Z, the module may be oriented arbitrarily relative to the ellipse of
polarization of the incident beam. In particular, if the original linear polarization of
the laser beam is along the X-axis and the magneto-optically generated component
of polarization is along the Y-axis, then the module may be aligned in such a way
that the transmission axes of the Wollaston prism make 45 angles with X and Y.
equally split between the two detectors. Whereas both the magnitude and phase of
Ex at the two detectors are identical, Ey arrives at each detector with a different
pffiffiffi
sign. The total light amplitude arriving at the detectors is thus (Ex Ey Þ= 2, the
plus sign corresponding to one detector and the minus sign corresponding to the
other. If the phase difference between Ex and Ey is denoted x y then the net
differential signal may be written as
Z
1 2 1 2
DS ¼ S1 S2 ¼ c jEx þEy j jEx Ey j dxdy
2 2
Z ð28:2Þ
¼ 2c jEx Ey jcosðx y Þdxdy:
In this equation, c is the responsivity of the detectors (in volts per watt of optical
energy) and the integrals are over the individual detector areas. Note that when the
magnetization direction at the disk is reversed the sign of Ey reverses, resulting in a
sign reversal for DS. Also note that any phase difference between Ex and Ey reduces
the output signal by the cosine factor in the above equation. In principle, this phase
difference may be eliminated by a properly patterned phase plate placed imme-
diately before the Wollaston prism. In practice, however, unless x y is fairly
uniform over the aperture, it is difficult to correct the effects of this relative phase.
Evanescent coupling to an optical disk

Consider the typical MO disk structure of Figure 28.9 placed under the SIL of
Figure 28.2. When in contact with the SIL and at normal incidence, the disk has
reflectance 36%, Kerr rotation angle 0.66 , and Kerr ellipticity 0.05 . Focusing
the beam by the 0.8NA objective through the SIL changes these parameters only
slightly, as long as the disk and the SIL remain in contact. However, a small air
gap between the disk and the SIL can change the disk’s performance drastically.
Figure 28.11 shows computed distributions at the exit pupil of the objective lens
for a 100 nm gap width. Figure 28.11(a) is the intensity distribution for the reflected
a b
c d
e f
–3200 x/0 3200 –3200 x/0 3200
objective lens of Figure 28.2, when the MO stack of Figure 28.9 is placed in
front of the SIL with a 100 nm air gap. (a) Intensity distribution for Ex, con-
taining 36% of the incident optical power. (b) Phase distribution for Ex; the gray-
scale covers the range 180 (black) to þ180 (white). (c) Intensity distribution
for Ey, containing 11.5% of the incident power. (d) Phase distribution for Ey; the
phase difference between adjacent quadrants is nearly 180 . (e) The polarization
ellipticity g ranging from 45 (black) to þ45 (white). (f) The polarization
rotation angle q ranging from 90 (black) to þ90 (white).
Ex, containing 36% of the incident power. The dark oval-shaped region in the
middle indicates an area of strong absorption by the disk. The phase of Ex, shown
in Figure 28.11(b), is non-uniform over the aperture, ranging in value from 180
(black) to þ180 (white). At the center the phase is about þ100 , and drops
continuously along the X-axis to 150 at the edge.
Figure 28.11(c) is a plot of intensity distribution for the reflected Ey, which
contains 11.5% of the incident power. This y-component is due mainly to the
Fresnel reflection coefficients at the interface between the SIL and the multi-
layer stack. The fraction of Ey created by MO activity is relatively small and,
although embedded in Figure 28.11(c), is difficult to recognize at this point.
The phase of Ey depicted in Figure 28.11(d) shows the well-known p shift
between adjacent quadrants. The polarization distribution over the exit pupil
(see Figures 28.11(e), (f)) is highly non-uniform and contains all possible states
of polarization, i.e., linear, elliptical, and circular.
To observe the contribution to Ey by MO activity, we eliminate the magnet-
ization of the disk by setting to zero the off-diagonal element e0 of the tensor,
then subtracting the complex-amplitude distributions at the exit pupil with and
without the magnetization. In doing so the x-component cancels out exactly,
showing that there are no magnetic contributions to the reflected Ex. However, the
y-component shows a residual distribution Ey. Figure 28.12 shows the intensity
and phase plots for DEy at the exit pupil of the objective for a 100 nm gap width.
Notice that this MO contribution to Ey has circular symmetry; moreover, it is
large in the region where absorption by the disk is strong (compare the position of
the bright ring in Figure 28.12(a) with that of the dark oval-shaped region in
Figure 28.11(a)).
a b
–3200 x/0 3200 –3200 x/0 3200
Figure 28.12 Plots of intensity and phase for the MO contribution to the
reflected light, DEy, at the exit pupil of the objective lens of Figure 28.2. The
multilayer stack of Figure 28.9 is assumed to be in front of the SIL with a 100 nm
gap. (a) Intensity distribution, containing a fraction 0.37 · 104 of the incident
optical power. (b) Phase distribution, ranging from 70 (black) to þ246 (white).
The total optical power contained in the distribution of Figure 28.12 is

0.37 · 104 of the incident power; the corresponding value for the case of zero
gap width is 0.44 · 104. Despite the fact that interference effects at the air gap
have boosted DEy in the central region of the aperture, a substantial reduction in
evanescent coupling has caused an overall reduction of DEy. Also notice the
phase non-uniformity of DEy in Figure 28.12(b), ranging from 246 at the center
to 70 at the edge of the lens. Variations over the aperture of the relative phase
between Ex and DEy (see Figures 28.11(b) and 28.12(b)) have negative impli-
cations for the readout signal from the disk, as will be discussed shortly.
To isolate the contribution to the MO signal made by evanescent coupling,
we place a mask in the central region of the beam, blocking all the rays below
the critical angle. Figure 28.13 shows computed plots of the total reflectivity
(i.e., jExj2 þ jEyj2 integrated over the aperture), versus the gap width, and the total
contribution to Ey by the MO activity (i.e., the integrated value of jDEyj2) versus
the gap width. With increasing gap width the reflectance increases, leaving less
light to be coupled to the magnetic film. In consequence of this reduced coupling,
the MO content of Ey progressively decreases; by the time the gap width reaches
200 nm, there is hardly any Ey left from the evanescently coupled MO interaction.
A similar trend may be seen in the normalized differential signal, (S1S2)/
(S1þS2), which is plotted versus the gap width in Figure 28.14. (See Figure 28.10
1.0
|Ex|2 + |Ey|2
0.8
0.6
Reflectance
0.4
20000 |ΔEy|2
0.2
0.0
0 150 300 450 600
Gap width (nm)
Figure 28.13 Total reflectivity (solid line) and the integrated intensity of the MO
signal (broken line) at the exit pupil as functions of the gap width. These calcula-
tions correspond to the system of Figure 28.2 in conjunction with the quadrilayer
MO stack of Figure 28.9, when a mask blocks the central region of the beam.
0.020
0.015
(S1 – S2)/(S1 + S2)

0.010
0.005
0.000
0 150 300 450 600
Gap width (nm)
Figure 28.14 A computed plot of the normalized differential signal,

(S1 S2)/(S1þ S2), versus the gap width. This result corresponds to the system
of Figure 28.2 in conjunction with the quadrilayer MO stack of Figure 28.9
and the differential detector of Figure 28.10. It is assumed that a mask blocks
the central region of the incoming laser beam, thus eliminating all the rays
below the critical angle.
and Eq. (28.2) for the definition of S1, S2.) Again we have blocked the central
region of the incident beam in order to concentrate on the effects of evanescent
coupling. With the SIL and the disk in contact, the normalized differential signal
is close to its ideal value, which is twice the tangent of the Kerr rotation angle,
namely, 2 tan 0.66 ¼ 0.023. As the gap widens, the differential signal drops
sharply: at 100 nm gap width, for instance, the signal is down by a factor of four.
Roughly one-half of this drop may be attributed to the reduction in DEy and the
corresponding rise in reflectivity (see Figure 28.13). The remaining half, how-
ever, is due to variations over the beam’s cross-section of the relative phase
x y of Ex and DEy.
It must be emphasized that the quadrilayer stack of Figure 28.9 is not
specifically optimized for operation with the system of Figure 28.2. By chan-
ging the thicknesses and the refractive indices of the various layers and/or by
introducing dielectric coatings at the bottom of the SIL, it might be possible to
improve upon the aforementioned performance figures. It is highly unlikely,
however, that one can achieve significant gains in terms of the coupling effi-
ciency and the magnitude of the MO Kerr signal over what we have already
reported.

1980.
2 S. M. Mansfield, W. R. Studenmund, G. S. Kino, and K. Osato, High numerical
aperture lens system for optical storage, Opt. Lett. 18, 305–307 (1993).
3 T. W. McDaniel and R. H. Victora, eds., Handbook of Magneto-optical Recording,
Noyes Publications, Westwood, New Jersey, 1997.
4 B. D. Terris, H. J. Mamin, and D. Rugar, Near-field optical data storage using a solid
immersion lens, Appl. Phys. Lett. 65, 388–390 (1994).
29
Internal and external conical refraction
Sir William Rowan Hamilton (1805–1865). Irish mathematician and astron-

omer who put forward the theory of quaternions, a landmark in the development
of algebra, and discovered the phenomenon of conical refraction. His unification
of dynamics and optics has had a lasting influence on mathematical physics,
even though the significance of this work was not fully appreciated until after the
rise of quantum mechanics. Hamilton had learned Latin, Greek, and Hebrew by
the time he was five years old and learned many more languages afterwards. In
1827, while still an undergraduate, he was appointed Professor of Astronomy at
Dublin’s Trinity College. Hamilton published his third supplement to Theory of
Systems of Rays in 1832. Near the end of this work he applied the characteristic
function to study Fresnel’s wave surface. From this he predicted conical
refraction and asked Humphrey Lloyd, a professor of physics at Trinity College,
to try to verify his prediction experimentally. Lloyd’s confirmation two months
later of conical refraction brought great fame to Hamilton. (Photo: courtesy of
AIP Emilio Segré Visual Archives.)
404
The phenomenon of conical refraction was predicted by Sir William Rowan
Hamilton in 1832 and its existence was confirmed experimentally two months
later by Humphrey Lloyd.1,2 (James Clerk Maxwell was only a toddler at the
time.) The success of this experiment contributed greatly to the general accept-
ance of Fresnel’s wave theory of light.
Conical refraction has been known for nearly 170 years now,1,2 and a complete
explanation based on Maxwell’s electromagnetic theory has emerged, which is
accessible through the published literature.3,4 The complexity of the physics
involved, however, is such that it prevents us from attempting to give a simple
explanation. We shall, therefore, confine our efforts to presenting a descriptive
picture of internal and external conical refraction by way of computer simulations
based on Maxwell’s equations.
Overview
To observe internal conical refraction one must obtain a slab of biaxial
birefringent crystal, such as aragonite, that has been cut with one of its optic axes
perpendicular to the polished parallel surfaces of the slab (see Figure 29.1).
When a collimated beam of light (say, from a HeNe laser) is directed at normal
incidence towards the front facet of the slab, the beam enters the crystal and
spreads out in the form of a hollow cone of light. Upon reaching the opposite
facet, the beam emerges as two concentric hollow cylinders, propagating in the
same direction as the original, incident beam.
External conical refraction is, in a way, the above phenomenon in reverse.
Specifically, a hollow cone of light, converging towards a point on the surface of
Optic axis
Emergent
beams
Incident
beam
Slab of biaxial crystal
Figure 29.1 Internal conical refraction. A normally incident coherent beam

arriving at the front facet of a slab of biaxial birefringent crystal propagates
inside the slab in the form of a cone of light, and emerges from the rear facet as
two hollow concentric cylinders. The crystal is cut so that one of its optic axes is
perpendicular to the polished parallel surfaces of the slab.
Observation
Incident plane
beam Opaque
mask with
pinhole
Focusing Slab of Collecting

lens birefringent lens
crystal
Figure 29.2 External conical refraction. A coherent monochromatic beam of

light (wavelength k0) is focused by a lens at a biaxial birefringent crystal slab,
which is cut with its polished surfaces perpendicular to one of its optic-ray
axes. The exit facet of the crystal is painted black, except for a small aperture
in the middle that is left clear to allow rays that propagate near the optic-
ray axis to exit the crystal. The exiting rays propagate to a second lens where
they are collected and recollimated. In our simulations the incident beam is
uniform over the entrance pupil of the focusing lens, both lenses have
NA ¼ 0.075 and f ¼ 46667k0, the crystal slab has thickness ¼ 5000k0 and
principal refractive indices nx ¼ 1.533, ny ¼ 1.500, nz ¼ 1.565, and the pinhole
diameter d ¼ 100k0.
a biaxial crystal slab, becomes collimated along the optic-ray axis of the crystal
and continues to propagate along that axis for as long as the beam remains within
the crystal slab (see Figure 29.2). When the beam reaches the opposite facet of
the slab, it emerges as an expanding cone of light. The focused cone thus
“remains in focus” in its entire path through the crystal and diverges only after
exiting the slab.
There are certain subtle differences between internal and external conical
refraction; for instance, the optic axis of wave normals along which the beam pro-
pagates in the former case is not the same as the optic-ray axis in the latter. This and
other differences will become clear in the course of the following discussions.
Biaxial birefringent crystals and their optic axes

In general, a birefringent crystal has three different refractive indices along
the directions of its three principal axes. Assuming that the principal axes are
the X-, Y-, and Z-axes of a Cartesian coordinate system, the principal indices
may be denoted nx, ny, nz. The index ellipsoid of this crystal has semi-axis lengths
nx, ny, nz along the coordinate axes, as shown in Figure 29.3. For a plane wave
propagating along a given wave-vector k, the plane passing through the center of
the ellipsoid and perpendicular to k will, in general, have an elliptical cross-
section with the index ellipsoid. The semi-axes of this cross-sectional ellipse
Z
Optic Optic
axis 2 axis 1
Figure 29.3 The index ellipsoid has semi-axes of length nx, ny, nz along the
principal axes X, Y, Z of the crystal. For a plane wave propagating in a given
direction, a plane through the center of the ellipsoid and perpendicular to the
wave normal will have an elliptical cross-section with the index ellipsoid. A
propagation direction for which the cross-sectional ellipse becomes a circle is
known as an optic axis. Similarly, the ray ellipsoid has semi-axes of length 1/nx,
1/ny, 1/nz along the principal axes. For a given ray direction, a plane through the
center of the ray ellipsoid and perpendicular to the ray will have an elliptical
cross-section with the ray ellipsoid. A propagation direction for which the cross-
sectional ellipse becomes a circle is known as an optic-ray axis. In general,
biaxial crystals have two optic axes and two optic-ray axes.
yield the refractive indices associated with the two orthogonal polarizations of the
beam. If the wave-vector k happens to be in such a direction that its corres-
ponding cross-sectional ellipse becomes a circle, then the beam will “see” a
single refractive index, irrespective of its state of polarization. The propagation
direction corresponding to this circular cross-section is known as the optic axis.
Crystals in which the three principal indices of refraction nx, ny, nz are all dif-
ferent exhibit two such optic axes and are, therefore, referred to as biaxial. A
crystal in which one index of refraction differs from the other two exhibits one
optic axis and is known as a uniaxial birefringent crystal. Conical refraction
occurs only in biaxial birefringent crystals.
Birefringent crystals also have a ray ellipsoid with semi-axis lengths 1/nx, 1/ny,
1/nz along the principal axes. The ray ellipsoid, therefore, is different from the
index ellipsoid, whose semi-axis lengths are the refractive indices themselves.
While the index ellipsoid is relevant to the discussion of internal conical
refraction, it is the ray ellipsoid that plays the central role in the case of external
conical refraction. For a ray propagating along a given direction, the plane
passing through the center of the ray ellipsoid and perpendicular to the ray will, in
general, have an elliptical cross-section with the ellipsoid. If a ray happens to be
in such a direction that its corresponding cross-sectional ellipse becomes a circle,
then the direction of that ray defines an optic-ray axis. In general, the optic-ray
axis is different from the optic axis, which is obtained in a similar fashion from
the index ellipsoid. Biaxial birefringent crystals thus possess two optic-ray axes
in addition to their two optic axes. Assuming ny < nx < nz, it is not difficult
to show that the optic-ray axis is in the YZ-plane and makes an angle h with the
Z-axis, where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
tan h ¼ ðn2x n2y Þ=ðn2z n2x Þ:
Internal conical refraction

To give a specific example, let us consider a slab of crystal having three principal
refractive indices nx ¼ 1.533, ny ¼ 1.500, nz ¼ 1.565, and thickness ¼ 25000k0,
where k0 is the vacuum wavelength of the incident beam. (For the red HeNe
wavelength of 633 nm, for example, the assumed thickness of the slab would be
about 1.6 cm.) It is not difficult to show that the optic axes of this crystal are in the
YZ-plane, located symmetrically with respect to the Z-axis at angles of 46.35
from it. We assume the slab is cut with one of its optic axes perpendicular to its
polished surfaces.
Next, we assume that a collimated beam of coherent monochromatic light is
normally incident on this slab; the beam has a Gaussian profile and its 1/e
diameter is 150k0. The intensity distribution for this beam is shown as a small
bright spot in Figure 29.4(a). (The coordinate system is now redefined in such a
way that the incident beam propagates along the Z-axis, and the polished surfaces
of the crystal are parallel to the XY-plane.) The incident beam, upon entering the
crystal, breaks up into a multitude of rays that propagate as a cone of light
through the crystal, and emerge from the opposite facet of the slab in the form of
two concentric hollow cylinders; the plot in Figure 29.4(b) shows the computed
intensity distribution immediately after the beam exits the crystal. The scale of
Figure 29.4(b) is the same as that of Figure 29.4(a), so one can compare the size
and position of the bright rings with those of the incident beam. Note, in par-
ticular, that the incident beam is not at the center of the emerging cylinder, but at
its bottom; see Figure 29.1. (Had the crystal been cut in such a way that its other
optic axis was perpendicular to the facets, the incident beam would have been at
the top of the cylinder.)
To obtain the full rings seen in the present example we have assumed the state
of polarization of the incident beam to be circular; states of both right and left
1250 a
y/0
–150
1250 b
y/0
–150
1250 c
y/0
–150
–700 x/0 700
Figure 29.4 (a) The incident intensity distribution at the front facet of the
crystal slab in Figure 29.1. (b) Emergent intensity distribution at the rear facet of
the slab, corresponding to the circularly polarized incident beam shown in (a).
(c) Distribution of the angle of the emergent polarization vector to the X-axis.
The gray-scale is such that a white pixel represents a þ90 angle while a black
pixel corresponds to a 90 angle. The emergent polarization is linear at any
given point on the beam’s cross-section, but its direction varies from point to
point. At the top of the rings there is an apparent 180 discontinuity in the
direction of polarization. The jaggedness of the discontinuity is caused by small
numerical errors that are inevitable when computing the state of polarization in
the dark regions around the rings.
circular polarization (RCP and LCP) yield the same results. Alternatively, the
incident beam may be assumed to be unpolarized for the full rings to emerge.
As we shall see later, with linearly polarized light a certain part of the rings will
be missing.
Polarization and phase patterns of the refracted beam

The polarization and phase distributions emerging in internal conical refraction
are quite interesting. At any given point on the beam’s cross-section, the state of
polarization is linear, but the direction of the E-field varies as one scans the cross-
section of the beam. The gray-scale plot in Figure 29.4(c) shows the distribution
of the angle between the polarization vector and the X-axis at the exit facet of
the crystal. In this picture a black pixel represents a 90 angle, a white pixel
corresponds to þ90 , and the gray pixels represent angles in between. At the
bottom of the emergent rings of light the polarization angle is 0 , that is, the
E-field is parallel to the X-axis. The angle increases continuously from 0 to 90
as one moves from the bottom to the top on the right side of the rings. Similarly,
on the left side, the orientation angle of the E-field varies continuously from 0 at
the bottom to 90 at the top. Thus the polarization vector rotates by 180 as the
point of observation moves a full circle around the beam’s cross-section. This
seems to imply a discontinuity in the E-field distribution at the top of Figure 29.4
(c). In reality, however, this discontinuity does not occur, because the phase of
the E-field (not shown here) also undergoes a 180 change in a full circle around
the rings. Thus the polarization vector rotates by 180 and, at the same time, its
phase changes by 180 in a round trip of the circumference of the rings, so that
the E-field distribution is continuous at all locations.
We emphasize once again that the incident beam in the case shown in
Figure 29.4 is circularly polarized. Whether this beam is RCP or LCP, however,
is immaterial because the chirality of the incident beam affects neither the
intensity distribution nor the polarization state of the emergent beam. In other
words, if an observer facing the beam scans the rings in a clockwise sense, the
polarization vector also appears to rotate clockwise, whether the incident beam is
RCP or LCP. The only way to determine the state of incident polarization is by
examining the phase distribution of the emergent beam, which increases clock-
wise in one case and counterclockwise in the other. It is also interesting to note
that unpolarized light (i.e., light containing equal amounts of RCP and LCP that
have randomly varying amplitudes and phases) gives exactly the same distribu-
tions as in Figure 29.4. In the case of unpolarized light, however, the phase
distribution is meaningless, because it varies randomly with time and with
location on the beam’s cross-section.
1250 a b
y/0
–150
–700 x/0 700 –700 x/0 700
Figure 29.5 Interferograms showing the intensity distribution resulting from

the superposition of the emergent beam (at the rear facet of the crystal slab) with
a uniform reference beam. The beam incident on the crystal is RCP in both
cases, but the reference beam is RCP in (a) and LCP in (b).
To gain an appreciation for the phase distribution over the beam’s cross-
section, we show in Figure 29.5 two computed interferograms corresponding to
the superposition of the beam emerging from the exit facet of the crystal and a
uniform reference beam. The beam entering the crystal is assumed to be RCP in all
cases, but the reference beam is RCP in Figure 29.5(a) and LCP in Figure 29.5(b).
We notice that in Figure 29.5(a) the outer ring has interfered constructively with
the reference beam, whereas the inner ring shows destructive interference. As a
general rule, there is a 180 phase shift between the inner and outer rings at
radially adjacent locations, irrespective of the state of incident polarization. This
phase difference aside, the two rings are identical in their polarization and phase
distributions. The interferogram of Figure 29.5(b) is more complicated than that
of Figure 29.5(a); nevertheless, it can be fully explained in terms of the states of
polarization and the distribution of phase over the rings, which we have already
described.
Effect of linear incident polarization

Figure 29.6 shows the computed intensity distributions at the exit facet of the
crystal for three cases of linear incident polarization: (a) parallel to the X-axis;
(b) at 45 to X; and (c) parallel to Y. In all three cases the emergent state of
polarization (not shown) is similar to that in Figure 29.4(c). A segment from the
top of the rings is missing in Figure 29.6(a); this is the region that would have
had polarization parallel to Y had the incident beam contained the corres-
ponding E-field component. Similarly, the bottom of the rings is missing in
1250 a
y/0
–150
1250 b
y/0
–150
1250 c
y/0
–150
–700 x/0 700
Figure 29.6 When the incident beam is linearly polarized, the emergent rings
of light will be incomplete. This figure shows the intensity distribution at the
rear facet of the slab in the cases where the incident E-field is (a) parallel to the
X-axis, (b) at 45 to X, and (c) parallel to the Y-axis.
Figure 29.6(c); this region would have had polarization along the X-axis. Unlike
the distribution of polarization over the rings, which is independent of the state
of incident polarization, the phase of the rings is very much a function of the
polarization of the incident beam. When the incident beam is linearly polarized, as
in Figure 29.6, the emergent phase (not shown) will have a constant value over
the entire area of each ring. (As before, the two rings will have a 180 phase
difference.) One may verify the above statements by considering the various
linearly polarized incident beams as superpositions of RCP and LCP beams and by
analyzing the corresponding superpositions at the exit facet of the crystal.
External conical refraction

Consider the system of Figure 29.2, which consists of a focusing lens, a slab
of biaxial birefringent crystal, a pinhole, and a collimating lens. The incident
beam is uniform, coherent, and monochromatic with a vacuum wavelength
of k0. The crystal slab has refractive indices nx ¼ 1.533, ny ¼ 1.500, nz ¼ 1.565,
and its thickness is 5000k0. (For the red HeNe wavelength of 633 nm, for
example, this slab would be 3.165 mm thick.) The optic-ray axes of this crystal
are located symmetrically in the YZ-plane at angles h ¼ 45.14 from Z. The
slab is cut with its polished flat surfaces perpendicular to one of its optic-ray
axes. (The coordinate system is now redefined to be such that the incident
beam propagates along the Z-axis and the polished surfaces of the crystal are
parallel to the XY-plane.)
When the incident rays enter the crystal slab they will propagate, in general, in
various directions, but the rays that happen to be on a special cone, namely, the
cone of external conical refraction, propagate strictly along the optic-ray axis and
will emerge from a point opposite the point of entry into the crystal. A small
pinhole (of diameter 100k0 in the present example) on the exit facet of the slab
allows only these axial rays to emerge. The emergent rays diverge in a hollow
cone as they propagate towards a collecting lens, where they are recollimated and
directed towards the observation plane.
Figure 29.7 shows computed plots of the intensity distribution, polarization
ellipticity, and polarization rotation angle at the observation plane, corres-
ponding to a circularly polarized incident beam. Note that the emergent rings of
light in Figure 29.7(a) are in the bottom half of the exit pupil. (Had the crystal
been cut with its other optic-ray axis perpendicular to the polished surfaces, the
rings would have appeared in the top half of the exit pupil instead.) The
ellipticity plot in Figure 29.7(b) is coded in gray-scale, black corresponding
to 45 (i.e., LCP) and white to þ45 (i.e., RCP). The relevant part of the
plot, which is the region in the bottom half of the pupil where the emergent
beam’s intensity is non-vanishing, shows zero ellipticity. The emergent rings of
light, therefore, are linearly polarized. The direction of this linear polarization
varies over the rings, however, as the plot of polarization rotation angle in
Figure 29.7(c) indicates. (The gray-scale used here assigns black to 90 and
white to þ90 .)
3750 a
y/0
–3750
3750
b
y/0
–3750
3750
c
y/0
–3750
–3750 x/0 3750
Figure 29.7 The distributions of (a) intensity, (b) polarization ellipticity, and
(c) polarization rotation angle at the observation plane. The incident beam at the
entrance pupil of the focusing lens is assumed to be circularly polarized. The
ellipticity plot in (b) is coded in gray-scale, black corresponding to 45 (i.e.,
LCP) and white to þ45 (i.e., RCP). The distribution of polarization rotation
angle depicted in (c) is also coded in gray-scale, but the black pixels in this case
represent 90 rotation from the X-axis and the white pixels represent þ90
rotation. As before, the jaggedness of the transition from black to white in the
lower part of (c) is caused by small numerical errors; since the discontinuity
represented by this transition is not a physical discontinuity, this jaggedness has
no physical significance.
According to Figure 29.7(c), over the circumference of the rings the polarization
vector rotates from 90 at the bottom (i.e., E-field antiparallel to Y-axis) to 0 at
the top (E-field parallel to X) and back to þ90 at the bottom (E-field parallel to Y).
The apparent discontinuity of polarization direction at the bottom of the rings
does not signify a physical discontinuity, as before, because the phase of the
rings (not shown here) also exhibits a 180 change during one full cycle around
the rings. The overall E-field distribution turns out to be continuous after all.
Character of the emergent beam at the pinhole

and the effect of incident polarization
The beam emerging from the pinhole in the system of Figure 29.2 possesses
certain interesting features. Figure 29.8 shows computed plots of (a) intensity
distribution, (b) polarization ellipticity, and (c) polarization rotation angle at the
pinhole, for a circularly polarized incident beam. The intensity plot in Figure 29.8
(a) shows a bright spot at the center of the pinhole, surrounded by a diffuse, more
or less uniform background distribution. The origin of the diffuse light may be
traced back to those incident rays that were outside the cone of external refraction
and, therefore, once inside the crystal, did not become aligned with the optic-ray
axis. The plot of polarization ellipticity in Figure 29.8(b) shows that the state of
polarization varies from RCP in the bright white rings to LCP in the dark rings,
covering the full gamut of elliptical polarization in the intervening regions. The
plot of polarization rotation angle in Figure 29.8(c) indicates that the orientation
of the ellipse of polarization is not uniform over the aperture but rotates through
180 around certain circular bands. All in all, this is a complex and fascinating
state of affairs compared to the dull, uniform polarization state of the focused
spot that first entered the crystal.
As was the case with internal conical refraction, the full cone of external
refraction appears only when the incident beam contains all possible polarization
directions. This is the case with RCP or LCP as well as with unpolarized light.
When the incident beam happens to have linear polarization, however, certain
parts of the emergent cone of light will be missing. This is shown in Figure 29.9
for an incident beam that is linearly polarized along the X-axis. The distributions
of intensity, polarization ellipticity, and polarization rotation angle at the
observation plane shown in Figure 29.9 are analogous to those in Figure 29.7,
where the incident beam is circularly polarized. The lower part of the rings in
Figure 29.9(a), however, is missing simply because the corresponding polariza-
tion, linear along Y, is not present in the incident beam. Aside from this missing
segment, other features of the emergent beam shown in Figure 29.9 are quite
similar to those in Figure 29.7.
60 a
y/0
–60
60 b
y/0
–60
60 c
y/0
–60
–60 x/0 60
Figure 29.8 Distributions of (a) intensity, (b) polarization ellipticity, and (c)
polarization rotation angle within the pinhole at the exit facet of the crystal slab.
The incident beam at the entrance pupil of the focusing lens is assumed to be
circularly polarized. The ellipticity plot in (b) is coded in gray-scale, black
corresponding to 45 (i.e., LCP) and white to þ45 (i.e., RCP). The distri-
bution of polarization rotation angle depicted in (c) is also coded in gray-scale,
but the black pixels in this case represent 90 rotation from the X-axis and the
white pixels represent þ90 rotation.
3750 a
y/0
–3750
3750 b
y/0
–3750
3750 c
y/0
–3750
–3750 x/0 3750
Figure 29.9 Same as Figure 29.7 except for the state of polarization of the
incident beam, which is linear along X in the present case.

1 W. R. Hamilton, Trans. Roy. Irish Acad. 17, 1 (1833).
2 H. Lloyd, Trans. Roy. Irish Acad. 17, 145 (1833).
3 M. Born and E. Wolf, Principles of Optics, 6th edition, chapter 14, Pergamon Press,
Oxford, 1980.
30
Transmission of light through small
elliptical apertures†
The apertures of classical optics simply block those parts of an incident wavefront
that fall outside the aperture, allowing everything else to go through intact.
Moreover, multiple apertures act upon an incident beam independently of each
other, polarization effects are usually negligible (i.e., scalar diffraction), and it is
not necessary to keep track of both the electric- and the magnetic-field components
of the beam.1
All of the above assumptions break down when apertures shrink to dimensions
comparable to or smaller than a wavelength.2,3 For example, transmission
through two small adjacent apertures cannot be treated by assuming that only one
aperture is open at a time, then adding the fields transmitted by the individual
apertures. (This is because the electric charge and current distributions in the
vicinity of one aperture are influenced by the radiation pattern of the other
aperture.) Polarization effects are extremely important for small apertures, as
exemplified by the case of a normally incident beam going through an elliptical
aperture in a thin metal film; whereas in the case of polarization (i.e., E-field)
parallel to the long axis of the ellipse there is negligible transmission, when the
incident polarization is rotated 90 to point along the ellipse’s minor axis, the
aperture transmits a substantial fraction of the incident light. Finally, to analyze
the interaction of light with small apertures, it is generally necessary to keep track
of both E and B components of the electromagnetic wave, as the modification of
one of these fields produces non-trivial changes in the other field’s distribution.4
This chapter presents the results of computer simulations based on the Finite
Difference Time Domain (FDTD)5 method for an elliptical aperture in a thin
metal film illuminated by a normally incident, monochromatic plane wave. Both
cases of incident polarization parallel and perpendicular to the long axis of the
†
The co-authors of this chapter are Armis R. Zakharian, now at Corning Corp., and Jerome V. Moloney of the
University of Arizona.
418
30 Light transmission through small elliptical apertures 419
ellipse will be considered. We begin by developing an intuitive description of the

behavior of the electromagnetic fields in each case, then present simulation
results that exhibit patterns similar to those expected from this qualitative
analysis. The simulations reveal, in quantitative detail, the amplitude and phase
behavior of the E- and B-fields in and around the aperture.
Maxwell’s equations
In developing an intuitive understanding of the electromagnetic field distribu-
tion around an aperture, we rely heavily on Maxwell’s divergence equations,
r · D ¼ q and r · B ¼ 0, where D ¼ e0eE, B ¼ l0H, and q is the electric charge
density.1,4 (e0 and l0 are the permittivity and permeability of free-space, while e
is the relative permittivity of the local environment.) The divergence-free
nature of the magnetic field simply means that the B-field lines cannot be
interrupted; they can go around in loops or they can form unbroken infinite
lines, but they cannot originate, nor can they terminate, at specific points in
space. A similar argument applies to D-field lines, except in locations where
electric charges exist. When charges are present, lines of D originate on positive
charges and terminate on negative charges; everywhere else the D-lines can
twist and turn in space, but they cannot start or stop.
The other two of Maxwell’s equations, r · H ¼ J þ @D/@t and r · E ¼ – @B/@t,
are necessary not only for generating the E and B fields from electrical currents
(J is the local current density), but also to sustain these fields in source-free
regions of space.4 When highly conducting media (e.g., metallic bodies) are
present in a system, surface currents Is develop that support the magnetic field
H immediately outside the conducting surfaces. Aside from these electrical
currents that act as sources of the H-field, time variations of the E-field are
needed at each point of space to maintain the local B-field. In a similar vein,
aside from electric charges that act as sources and sinks for the D-field, time
variations of B are necessary to maintain the local E-field. The lines of the
current density J remain divergence-free, except in those locations where they
deposit electrical charges, that is, r · J ¼ – @q/@t.4,6
Inside an electrical conductor J ¼ rE, where r is the conductivity of the
material. Good conductors (e.g. metals) have large conductivities, which means
that the E-field must all but vanish from the interior of such bodies. When the fields
are oscillatory, any magnetic fields inside a good conductor will produce, by virtue
of the Faraday law, r · E ¼ – @B/@t, a local electric field. Since E-fields are not
allowed inside a conductor, time-varying magnetic fields, being intimately asso-
ciated with the electric fields, must also be absent. The interior of good conductors
thus remains free of charges, currents, and time-varying electromagnetic fields.
Charges and currents, however, can and do develop on the conductor’s surface,
where they give rise to E- and B-fields in the vicinity of the surface outside the
conductor.
The fifth equation of classical electrodynamics, the Lorentz law of force,
F ¼ q(E þ V · B), expresses the force F experienced by a particle of charge q
and velocity V.4 This equation is occasionally useful in developing a qualitative
picture of the current distribution in the vicinity of small apertures. For
example, within the skin depth of a conductor, the directions of E and B would
indicate the sense in which local surface currents are affected by the Lorentz
force acting on the charge carriers. Typically, the E-field is the dominant factor
in this regard, as evidenced by the constitutive relation J ¼ rE. Any transverse
deflections of the current by the B-field are generally neglected, unless the Hall
conductivity of the medium is explicitly included in the constitutive relations.
Radiation by an oscillating dipole

With reference to Figure 30.1(a), a static electric dipole p creates, in its sur-
rounding environment, electric-field lines that emerge from the positive pole and
disappear into the negative pole. A periodically oscillating electric dipole
emanates E-field lines that reverse direction at half-period intervals. The constant
speed of light in all directions in space then dictates that these E-field reversals
occur on spherical shells separated by a half-wavelength (k/2) from their adjacent
shells. The zero-divergence requirement imposed on the E-lines by the first
Maxwell equation thus requires the existence of the closed lines of field depicted
in Figure 30.1(b). The curl of the E-field gives rise to B-field lines that encircle
the dipole in closed loops, sustaining the E-field oscillations while simultan-
eously being generated by them. In the space between adjacent spherical shells
separated by k/2, the E-lines are not parallel to these shell surfaces, but bend
inward or outward as shown to maintain the divergence-free condition of the
E-field.6
A static magnetic dipole m, shown in Figure 30.1(c), is a closed loop of
electrical current whose B-field pattern is similar to the E-field of an electric
dipole. Figure 30.1(d) shows an oscillating magnetic dipole, which behaves in
much the same way as an electric dipole does, albeit with a role reversal for E
and B.6 These examples indicate that by direct appeal to Maxwell’s equations,
especially the divergence laws, it is possible to obtain an intuitive picture of
the electromagnetic field distribution. In the discussions that follow, we will
use the dipole radiation patterns sketched in Figures 30.1(b) and 30.1(d) to
elucidate the nature of transmission through subwavelength apertures in a thin
metal film.
(a) (b)
E
(c) (d)
B
E
m
Figure 30.1 (a) E-field lines of a static electric dipole p emerge from the
positive pole and disappear into the negative pole. (b) An oscillating electric
dipole emanates E-lines that reverse direction on spherical shells separated by k/2.
The curl of the E-field creates B-field lines that surround the dipole in closed
circular loops. (c) A static magnetic dipole m is a closed loop of electrical current
whose B-field pattern is similar to the E-field of an electric dipole. (d) An
oscillating magnetic dipole behaves similarly to an electric dipole, albeit with the
roles of E and B reversed.
Plane wave reflection from a (highly conducting) flat mirror

Figure 30.2 shows the case of a normally incident plane wave on a perfect
conductor (yellow slab at the bottom). The incident beam induces a surface
current Is in the conductor, which creates equal-amplitude plane waves propa-
gating in the Z-directions.4,6 In the half-space below the conductor, the induced
Z
E
/4 B
Y
/4
E
Is
/4 B
Figure 30.2 Normally incident plane-wave on a perfect conductor (yellow

slab) induces a surface current Is, which radiates two equal-amplitude plane
waves in Z-directions. In the lower half-space the induced beam cancels out
the incident beam. In the upper half-space, the incident and reflected beams
interfere, creating standing-wave fringes of both E- and B-fields.
and incident plane-waves cancel each other out. In the half-space above the
conductor, interference between the incident and reflected beams creates standing-
wave fringes of the electric-field E and the magnetic field B. The B-field is
strongest at the surface of the conductor, reversing sign at intervals of Dz ¼ k/2,
where its adjacent peaks are located. The peaks of the E-field, also located at k/2
intervals, are staggered relative to the B-field peaks, thus coinciding with planes of
vanishing magnetic field.
At the upper surface of the conductor, where the E-field is zero, the B-field
is sustained by the surface current Is. (Although Is is shown antiparallel to
the standing-wave’s E-field at Dz ¼ k/4, in reality Is is 90 behind this E-field,
reaching maximum when the E-field directly above the surface is going through
zero on its way to the peak.) In the half-space above the conductor, in the absence
of any electrical charges and currents, the E-field is sustained by the time-variations
of the B-field (r · E ¼ – @B/@t ), and vice versa (r · H ¼ @D/@t ).
In an imperfect conductor, where conductivity is large but finite, the E- and
B-fields penetrate slightly beneath the surface, producing a Lorenz force on the
moving charges that comprise the surface current. While the E-field provides the
current’s driving force, the magnetic component of the Lorentz force attempts to
drive the surface current further down into the conductor (radiation pressure). In
general, the surface current Is need not be in-phase with the penetrating E-field,
since, at optical frequencies, the electrical conductivity r is a complex number.
Elliptical aperture illuminated with plane-wave

polarized along the long axis
The presence of a small (subwavelength-sized) elliptical aperture in the system of
Figure 30.2 distorts the surface current Is in the vicinity of the aperture by diverting
the current’s path to avoid the hole, as shown in Figure 30.3. The B-lines within the
fringe immediately above the mirror surface reorient in such a way as to remain
perpendicular to the lines of Is, thus bending toward the center of the aperture. The
B-lines directly above the aperture, lacking support from an underlying surface
current, drop into the hole on the left side and re-emerge on the right side. (The
B-lines, of course, cannot break up because r · B ¼ 0 everywhere; they can only
bend locally and change direction, but must remain continuous at all times.)
The lines of surface current Is that begin and end on the ellipse’s sharp corners
deposit electric charges around these corners, which charges then act as sources
Is B
+ +
+
Figure 30.3 A small elliptical aperture in the system of Figure 30.2, with its
major axis parallel to the surface current Is, distorts the current distribution by
diverting its path to avoid the hole. The B-lines immediately above the surface
bend toward and into the aperture, without breaking up. The E-field in and
around the aperture gets redistributed in a way that supports the B-field while
staying away from the long side-walls of the hole. The surface currents in the
vicinity of the aperture deposit opposite charges around the sharp corners of the
ellipse, causing the E-lines to break up at these corners.
and sinks for the E-lines in their neighborhood. Elsewhere, lack of any significant
amount of charge means that the E-lines cannot break up, but rather they must
twist and turn continuously as they adjust to the new environment created by the
presence of the hole. The E-field in and around the aperture must be distributed in
a way that would support the B-field (through the curl equations), but, because a
parallel E-field cannot exist on conducting surfaces, it must also stay away from
the interior walls of the hole. Figure 30.3 shows a possible way for the E-lines
just above the aperture to dodge the side-walls and concentrate near the center, as
they drop into the hole from above. The bundle of E-lines in the middle of the
hole (parallel to the ellipse’s long-axis) then acts as a source of circulating
magnetic fields that wrap around the long axis (r · H ¼ @D/@t ), thus supporting
the B-field above, below, and inside the aperture.
Figure 30.4(a) shows that, in the central XZ cross-section of the aperture, the
B-lines above the aperture, without breaking up, thin down and sag toward and
into the hole. Magnetic energy thus leaves the mid-section of the strong B-fringe
above the hole and leaks into the hole and beyond. The behavior of the E-field in
the central YZ-plane is depicted in Figure 30.4(b). Here the strong fringe, which is
not immediately above the aperture but a distance of Dz ¼ k/4 away, is squeezed
laterally toward the hole’s center, while, at the same time, leaking some of its
energy into the aperture. Some of the E-lines originate or terminate on the
charges deposited by the surface current Is on the sharp corners of the ellipse.
(The dashed lines in Figure 30.4(b) represent the bending of the E-field out of the
YZ-plane toward charges that reside on the side-walls near these sharp corners.)
Note that the charge polarity is such that the E-lines above have the same dir-
ection as those inside and below the aperture. It is important to recognize that
the surface current Is lags 90 behind the E-field of the first fringe. Thus, when
the E-field directly above the aperture reaches its maximum along the negative
Y-axis, Is, which has been traveling in the positive Y-direction until that moment,
has stopped and is beginning to reverse direction. This explains why the charges
reach their maximum strength when the E-field immediately above the aperture is
at a peak, and also clarifies the reasoning behind the polarity chosen for the
charges in Figure 30.4(b).
Aside from the incident beam, which is fixed at the outset, all other radiation in
the system of Figure 30.3 is generated by the surface currents Is (and the charges
deposited by Is around the sharp corners of the aperture). The same is true of the
system of Figure 30.2, with its uniform current confined to the upper surface of
the conductor. Any differences between the radiation fields in the systems of
Figure 30.2 and Figure 30.3 must therefore arise from the difference between
the two surface current distributions. Subtracting the (uniform) surface current
of Figure 30.2 from that of Figure 30.3 yields the distribution sketched in
(a) Z
B
Is
X
(b)
E
– +
Is – Is
+
– +
Y
Figure 30.4 (a) The B-field above the aperture of Figure 30.3, without breaking
up, thins down and sags into the hole. (b) The E-field, whose strong fringe is not
immediately above the aperture but a distance of Dz ¼ k/4 away, is squeezed
toward the center of the hole, while, at the same time, leaking some of its energy
into the aperture. The E-lines can originate or terminate on the charges deposited
by the surface current Is on the sharp corners of the ellipse. (Dashed lines represent
the bending of some of the E-field out of the YZ-plane toward charges that reside
on the side-walls near these sharp corners.) Note that the charge polarity is such
that the E-lines above have the same direction as those inside and below the
aperture.
Figure 30.5(a). Far from the aperture, of course, the perturbation caused by the
aperture is small and the two surface currents must cancel out. In the vicinity of
the aperture we find two loops of current circulating in opposite directions, as well
as positive and negative charges in those regions where the divergence of the local
current is non-zero. As shown in Figure 30.5(b), these circulating currents are
equivalent to a pair of oppositely oriented magnetic dipoles þm and –m (i.e., a
magnetic quadrupole, assuming their separation is much less than a wavelength);
the charges localized on the aperture’s sharp corners give rise to an oscillating
electric dipole p. Thus, adding the dipoles p and m to the system of Figure 30.2
should transform it over to the system of Figure 30.3.
Figure 30.5(c) shows that, in the vicinity of the aperture, the combined radi-
ation pattern of the electric dipole and the magnetic quadrupole consists of a
(a)
Y
+
+ + Is
– –
–
X
(b) (c)
+++
p
–m m
Figure 30.5 (a) Surface current distribution obtained when the (uniform)
surface current of Figure 30.2 is subtracted from that of Figure 30.3. Charges
appear in regions where the current’s divergence is non-zero. (b) The net
effect of the aperture on the uniform surface current of Figure 30.2 is the
addition of an electric dipole p and two loops of current that circulate in
opposite directions; each current loop is a magnetic dipole m. (c) In the
vicinity of the aperture, the combined radiation pattern of the electric and
magnetic dipoles consists of a circulating B-field around the major axis of the
ellipse and an E-field pattern that tends to stay away from the long side-walls
of the aperture.
circulating B-field around the major axis of the ellipse, and an E-field distribution
that tends to stay away from the long side-walls of the aperture. These fields,
when added to the E- and B-fringes of Figure 30.2, produce the field profiles of
Figures 30.3 and 30.4. The circulating magnetic field around the ellipse’s major
axis in Figure 30.5(c) is responsible for the bending of the B-lines toward and into
the hole, as sketched in Figures 30.3 and 30.4(a). Similarly, superposition of the
E-field pattern of Figure 30.5(c) with the uniform E-fringe that exists above an
apertureless mirror gives rise to the E-field pattern of Figure 30.3 in the XY-plane
immediately above the aperture.
In practice, the metallic film has a finite thickness, and the combined radiation
by the dipole p and quadrupole m of Figure 30.5(b) must vanish within the
body of the film. To this end, the magnetic dipoles may have to tilt sideways, one
to the right and the other to the left, so that everywhere inside the metal film their
E- and B-fields will be canceled by the corresponding fields of the electric dipole.
Physically, the sideways tilt of the m dipoles is a consequence of the induced
surface currents on the interior side-walls of the aperture, which currents also
help to support the B-field adjacent to these side-walls; see Figure 30.4(a).
All in all, the primary source of radiation through the aperture of Figure 30.3
seems to be the m quadrupole depicted in Figure 30.5(b); the induced dipole p
in this system is relatively weak and plays a secondary role, namely, canceling
the quadrupole’s radiation inside the metal film. In general, quadrupolar sources
are weak radiators, thus accounting for the weakness of transmission through an
elliptical aperture illuminated by a plane wave whose polarization direction
coincides with the major axis of the ellipse.
Figure 30.6 shows computed plots of Ex, Ey, Ez in the XY-plane located 20 nm
above the surface of the conductor in the system of Figure 30.3. The simulated
conductor is a 124 nm-thick film of silver (n þ ik ¼ 0.226 þ i6.99 at k ¼ 1.0 lm)
having an 800 nm-long, 100 nm-wide elliptical aperture.7 The magnitude of each
field component is plotted in the top row of Figure 30.6, and the corresponding
phase profile appears below it. For our purposes, the main utility of the phase
distribution is to indicate the relative orientation of the various field components.
For instance, if the phase of Ey at a given location happens to be 0, then if the
phase of Ex at that location turns out to be equal (or nearly equal) to 0, we will
know that Ex x þ Ey y oscillates back and forth between the first and third
quadrants of the XY-plane. However, if the phase of Ex hovers around 0 180 ,
then Ex x þ Ey y oscillates between the second and fourth quadrants.
The E-field distribution of Figure 30.6 is consistent with the qualitative
behavior sketched in Figures 30.3, 30.4(b), and 30.5(c). The Ex component bends
the central field lines toward the middle of the aperture, and pushes the peripheral
lines further way, thus ensuring that the long side-walls repel the parallel E-field.
|Ex| |Ey| |Ez|
0.0000 0.0091 0.0182 0.0273 0.00 0.25 0.50 0.75 0.000 0.048 0.096 0.144
600
400
200
y [nm]
–200
–400
–600
f (Ex) f (Ey) f (Ez)
–180 –60 60 180 0.0 3.9 7.8 11.7 –180 –60 60 180
600
428
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] x [nm] x [nm]
Figure 30.6 Computed plots of Ex, Ey, Ez in an XY-plane located a short distance (Dz ¼ 20 nm) above the surface of the
conductor in the system of Figure 30.3. Top row: amplitude, bottom row: phase. The silver film is 124 nm thick, the aperture is
800 nm long and 100 nm wide, and the radiation wavelength is k ¼ 1 lm.
The Ey component is strengthened near the center of the aperture because the field
lines are pushed upward and squeezed laterally toward the center. Finally, the Ez
component confirms the presence of charges of opposite sign at and around the sharp
corners of the aperture (r · D ¼ q). These pictures are consistent with the presence
of a weak electric dipole and a magnetic quadrupole in and around the elliptical
aperture.
Computed amplitude and phase plots of Ey, Ez in the central YZ-plane of the
aperture are shown in Figure 30.7. The bands of Ey above the aperture are the
standing-wave fringes created by the interference between the incident and
reflected beams. The weak nature of transmission through the aperture is evident
in the very small perturbation of the fringes, as they sag ever so slightly to fill the
top of the aperture. The profile of Ez, once again, confirms the accumulation of
electric charges around the sharp corners of the hole. Moreover, it shows that the
|Ey| |Ez|
0.00 0.68 1.36 2.05 0.00 0.088 0.177 0.265
600
400
200
z [nm]
–200
–400
–600
f (Ey) f (Ez)
–180 –60 60 180 –180 –60 60 180
600
400
200
z [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
y [nm] y [nm]
Figure 30.7 Computed amplitude and phase plots of Ey, Ez in the central
YZ-plane in the system of Figure 30.3. The silver film’s cross-section is indicated
with dashed lines. The standing wave fringes are only slightly perturbed by the
aperture.
charges on the top facet of the metal film, while much stronger than those on the
bottom facet, have the same sign as the charges on the bottom; in other words, the
top and bottom charges are both positive at one end of the ellipse, and both
negative at the opposite end.
Figure 30.8 shows plots of Hx, Hy, Hz in the XY-plane 20 nm above the surface
of the conductor, while amplitude and phase plots of Hx and Hz in the central
XZ-plane appear in Figure 30.9. As expected from the preceding discussion of
Figures 30.3 and 30.4, the magnetic fringe nearest the surface is seen to leak into
the aperture by bending the H-lines near the corners of the ellipse toward the
center and down into the hole.
Computed plots of Ex, Ey, Ez in the XY-plane 20 nm below the conductor are
shown in Figure 30.10, and the corresponding H-field distributions appear in
Figure 30.11. While the profiles of these fields confirm the behavior expected
from our earlier qualitative analysis, their small magnitudes testify to the weak
nature of radiation by the m quadrupole (and the accompanying p dipole)
induced by the incident beam in the vicinity of the aperture of Figure 30.3.
|Hx| |Hy| |Hz|

× 10–3 0.00 1.88 3.75 5.63 × 10–3 0.000 0.085 0.169 0.254 × 10–3 0.00 0.38 0.75 1.13
600
400
200
y [nm]
–200
–400
–600
f (Hx) f (Hy) f (Hz)

0 34 68 102 –180 –60 60 180 –180 –60 60 180
600
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
Figure 30.8 Computed plots of Hx, Hy, Hz in the XY-plane 20 nm above the
conductor surface in the system of Figure 30.3. Top row: amplitude; bottom row:
phase.
|Hx| |Hz|
× 10–3 0 2 4 6 × 10–3 0.00 1.02 2.03 3.05
600
400
200
z [nm]
–200
–400
–600
f(Hx) f(Hz)
–180 –60 60 180 –180 –60 60 180
600
400
200
z [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] x [nm]
Figure 30.9 Amplitude and phase plots of Hx, Hz in the central XZ-plane in the
system of Figure 30.3. The silver film’s cross-section is indicated with dashed lines.
Figure 30.12 shows distributions of the magnitude jSj of the Poynting vector in
various cross-sections of the system of Figure 30.3. The superimposed arrows on
each plot show the projection of S in the corresponding plane.7 For instance, in
the XZ cross-section depicted in Figure 30.12(a), the arrows represent Sx x þ Sz z,
whereas in the YZ cross-section of Figure 30.12(b) the arrows correspond to
the projection Sy y þ Sz z of the Poynting vector on the YZ-plane. The plots in
Figure 30.12(c) and (d) show the distributions of jSj in the XY-planes immedi-
ately above and below the aperture. In the absence of an aperture, S is essentially
zero everywhere, as the reflected beam cancels out the incident beam’s energy
flux. When the aperture is present, however, the fields are redistributed in such a
way as to draw the incident optical energy toward the aperture. In the present
case, the energy flows in from the periphery, fails to find a way through the
aperture, bounces back and returns toward the source in the region directly above
the aperture. In the process, several vortices are formed, where the incoming
energy makes a sharp turnaround and heads back toward the source.
|Ex| |Ey| |Ez|
0.0001 0.0021 0.0042 0.0062 0.0000 0.0203 0.0406 0.0609 0.000 0.0099 0.0197 0.0296
600
400
200
y [nm]
–200
–400
–600
f(Ex) f(Ey) f(Ez)

–180 –60 60 180 0.0 24.7 49.4 74.1 –180 –60 60 180
600
400
200
y [nm]
–200
0–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
Figure 30.10 Computed plots of Ex, Ey, Ez in the XY-plane 20 nm below the
bottom facet of the conductor in the system of Figure 30.3. Top row: amplitude;
bottom row: phase.
In Figure 30.12(d) the Poynting vector S ¼ ½ Re (E · H*) at the bottom of the hole
has a magnitude jSj 2.5 · 10–6 W/m2, consistent with the transmitted E-field
of 0.06 V/m and H-field of 2.3 · 10–4 A/m, considering the large phase difference
of D 70 between the E- and H-fields near the bottom of the aperture; see
Figures 30.10 and 30.11. Since the incident plane-wave is assumed to have Ey ¼ 1.0
V/m, Hx ¼ Ey /Z0 ¼ 2.65 · 10–3 A/m (free-space impedance Z0 377 X), which
correspond to an incident energy density 1.32 · 10–3 W/m2, the power transmission
efficiency g at the center of the elliptical aperture of Figure 30.3 is seen to be just
under 0.2%. We will see in the next section that when the incident polarization is
rotated 90 (to point along the minor axis of the ellipse), the transmission efficiency
through the aperture increases to g 93%, a nearly 500-fold improvement.
Elliptical aperture illuminated with plane-wave polarized

along the short axis
When the incident E-field is parallel to the minor axis of an elliptical aperture, the
surface currents Is deposit charges at and around the long side-walls of the
|Hx| |Hy| |Hz|
× 10–3 0.000 0.077 0.154 0.231 × 10–3 0.000 0.0152 0.0303 0.0455 × 10–3 0.00 0.08 0.16 0.24
600
400
200
y [nm]
–200
–400
–600
f(Hx) f(Hy) f(Hz)
–180 –60 60 180 –180 –60 60 180 –180 –60 60 180
600
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
Figure 30.11 Computed plots of Hx, Hy, Hz in the XY-plane 20 nm below the
bottom row: phase.
aperture, as shown in Figure 30.13. These oscillating charges radiate as an

electric dipole flanked by a pair of magnetic dipoles, creating circulating mag-
netic fields around the ellipse’s minor axis that push the incident B-lines upward
and sideways. In the area surrounding the hole, the E-field produced by these
dipoles bends the Is lines toward the mid-section of the aperture, as shown in
Figure 30.13, and as required for self-consistency.
Aside from the incident beam, all the radiation in the system of Figure 30.13 is
produced by the surface currents Is and the charges created by these currents.
Subtracting the (uniform) surface current in the system of Figure 30.2 from that in
Figure 30.13 thus yields the current distribution of Figure 30.14(a), which is
responsible for the difference between the radiation patterns in the two systems.
When added to the uniform current of Figure 30.2, the currents of Figure 30.14(a)
produce the Is pattern shown in Figure 30.13. The current loops of Figure 30.14(a)
are equivalent to a pair of oppositely oriented magnetic dipoles, þ m and – m,
while the charges deposited on the long sides of the aperture constitute an electric
dipole p; see Figure 30.14(b). Figure 30.14(c) shows that, in the XY-plane
immediately above the aperture, the E-field is dominated by the electric dipole p.
|S| |S|
× 10–5 0 4 8 12 × 10–5 0 5 10 15
600 a b
400
200
z [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] y [nm]
|S| |S|
× 10–5 0.0 5.8 11.6 17.4 × 10–5 0.000 0.082 0.164 0.246
600 c d
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] x [nm]
Figure 30.12 Profiles of the magnitude jSj of the Poynting vector in various
cross-sections of the system of Figure 30.3. The superimposed arrows show
the projection of S in the corresponding plane. (a) Central XZ-plane.
(b) Central YZ-plane. (c, d) XY-planes located 20 nm above and below the
aperture.
The contribution of the magnetic dipoles is to enhance the E-field at the center of
the aperture, while weakening it in the outer regions.
Figure 30.14(d) shows that in the XY-plane directly above the aperture, the
B-field profile is shaped by competition between the electric dipole p and the
magnetic dipoles m. The electric dipole dominates near the center but, further
Y
Is
– – –
+ + +
Z
B
Figure 30.13 When the incident E-field is parallel to the minor axis of an
elliptical aperture, the surface currents Is deposit charges at and around the long
side-walls of the aperture. These oscillating charges radiate as an electric dipole
flanked by a pair of magnetic dipoles, creating circulating magnetic fields around
the ellipse’s minor axis that push the incident B-lines upward and sideways.
away, the magnetic dipoles dictate the B-field’s behavior. The dotted B-lines
near the sharp corners of the ellipse in Figure 30.14(d) show the field leaving the
XY-plane to enter/exit the hole vertically (i.e. in the Z-direction). Although not
shown in this figure, B-lines that enter the hole from above close the loop by
circling beneath the metal film and returning through the hole to reconnect with
the B-lines above the film; see Figure 30.15.
The surface charges and currents of Figure 30.14(a) create magnetic fields in
the free-space regions inside the hole as well as those above and below the metal
surface. The B-field of the electric dipole p combines with that of the magnetic
dipoles m to produce closed loops in the vicinity of the aperture, as shown in
Figure 30.15. The solid B-lines in this figure bulge above and below the metal
surface, while the dashed lines hug the conductor’s top and bottom surfaces. (The
B-field cannot penetrate into the conductor, but, as it emerges from the hole, it
bends above and below the surface in such a way as to bring the field lines close
to the metallic surface.) In all cases, the lines of B must form closed loops to
guarantee the divergence-free nature of the field. Since neither E- nor B-fields can
exist within the conductor, the fields radiated by the electric dipole p must cancel
out those of the magnetic dipoles m everywhere inside the metallic medium.
The radiation emanating from these dipoles, however, permeates the interior of
the hole as well as the free-space regions on both sides of the conductor. To get in
and out of the hole, the B-lines of Figure 30.15 appear to descend through one of
the current loops that constitutes a magnetic dipole in Figure 30.14(b), then return
through the other loop. Note the change of direction of the magnetic field at the
upper surface of the elliptical aperture: the direction of B just above the hole is
(a) (b)
Is
–
– –
– – – –
m -m
p
+ +
+ +
+ + +
(c) (d)
B
E
–– –
+ + +
Figure 30.14 (a) Surface currents and the accompanying charge distribution
produced by the elliptical aperture of Figure 30.13. When added to the
uniform current of Figure 30.2, these currents produce the Is pattern shown in
Figure 30.13. (b) The current loops in (a) are equivalent to a pair of magnetic
dipoles, m, while the charges deposited on opposite sides of the aperture
constitute an electric dipole p. (c) In the XY-plane immediately above the
aperture, the E-field is dominated by the electric dipole p. (d) In the XY-plane
immediately above the aperture, the B-field profile is shaped by competition
between the electric dipole p and the magnetic dipoles m. Dotted B-lines
near the sharp corners of the ellipse show the B-field leaving the XY-plane to
enter/exit the hole.
opposite to that beneath the hole’s upper surface. This 180 phase shift, dictated
by the presence of the (uniform) Is on the top surface of the elliptical aperture in
Figure 30.14(a), will disappear when the fringes of Figure 30.2 are added to the
fields produced by p, m, and –m to yield the total field in and around the aperture.
The induced electric charges on the surfaces surrounding the aperture produce
an oscillating E-field in the short gap between the long side-walls as well as in the
regions immediately above and below the aperture. The time rate of change of
B
Z
Figure 30.15 With reference to Figure 30.14(b), the B-field of the electric
dipole p combines with that of the magnetic dipoles m to produce closed loops
in and around the aperture. The solid B-lines bulge above and below the metal
film, while the dashed B-lines hug the conductor’s top and bottom surfaces.
this field, @D/@t, which is equivalent to an electric current density J across the
gap, creates circulating magnetic fields around the short axis of the ellipse.4
These B-fields by themselves, however, are not sufficient to explain the field
profile depicted in Figure 30.15, and must be augmented by the fields produced
by the circulating currents around the ellipse’s sharp corners (i.e., the m
dipoles) to yield a complete picture. Moreover, inside the metallic medium, the
E- and B-fields of the p dipole cannot vanish without the compensating contri-
butions of the m dipoles.
Figure 30.16 shows cross-sections of the system of Figure 30.13 in YZ- and
XZ-planes. Since Is lags 90 behind the incident E-field immediately above the
aperture, the accumulating charges on and around the side-walls produce
electric fields opposite in direction to the incident E-field. The E-lines may now
start on positive charges and end on negative charges (r · D ¼ q), as shown in
Figure 30.16(a). This change of direction of the E-field causes a 180 phase
shift in Ey from above to below the aperture. The E-fringe just above the
aperture thus becomes weaker, sharing some of its energy with the E-field
inside and below the aperture.
The XZ cross-section of the system of Figure 30.13 depicted in Figure 30.16(b)
shows how the oscillating electric dipole p and magnetic dipoles m push the
B-fringe above the aperture upward and sideways to make room for circulating
B-fields that surround the short axis of the elliptical aperture. The resulting
redistribution of the magnetic energy of the B-fringe above the hole thus makes it
(a)
Z
E
+
Is Is
++
Y
(b)
Z
B
Is Is
X
Figure 30.16 Cross-sections of the system of Figure 30.13 in YZ- and XZ-planes.
(a) The charges accumulating on the aperture’s side-walls produce an E-field
opposite in direction to the incident field. The lines of E may now start on
positive charges and end on negative charges. (b) The dipoles p, m, and – m of
Figure 30.14(b) push the B-fringe above the aperture upward and sideways to make
room for circulating B-fields that surround the short axis of the elliptical aperture.
possible for some of the energy stored in this fringe to leak into the hole as well as
the space below the hole. (The B-field distribution inside the aperture and in the
region below the metal film is the same as that in Figure 30.15, since the added
fringes contribute only to the half-space above the conductor.) The divergence-free
nature of the B-lines requires their continuity, which is evident in Figure 30.16(b),
in contrast to the E-lines of Figure 30.16(a), which break up whenever they meet
electrical charges.
Figure 30.17 shows computed plots of Ex, Ey, Ez in the XY-plane 20 nm above
the surface of the conductor in the system of Figure 30.13 (top row: magnitude;
bottom row: phase). As before, the 124 nm-thick silver film used in these
simulations has n þ ik ¼ 0.226 þ i6.99 at k ¼ 1.0 lm, and the ellipse’s diameters
along its major and minor axes are 800 nm and 100 nm, respectively.7 The strong
|Ex| |Ey| |Ez|
0.000 0.023 0.046 0.069 0.0000 0.245 0.491 0.736 0.00 0.33 0.66 0.99
600
400
200
y [nm]
–200
–400
–600
f(Ex) f(Ey) f(Ez)

–180 –60 60 180 0 45 90 135 –180 –60 60 180
600
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
Figure 30.17 Computed plots of Ex, Ey, Ez in the XY-plane 20 nm above the
conductor’s surface in the system of Figure 30.13. Top row: amplitude; bottom
row: phase. The silver film is 124 nm thick, the aperture is 800 nm long and
100 nm wide, and the radiation wavelength is k ¼ 1 lm. The aperture boundaries
are indicated with dashed lines.
z-component of E indicates the presence of significant amounts of electric charge

on the conducting surfaces in the vicinity of the hole; the sign-reversal of Ez from
one side of the hole to the other shows that the charges on the two sides have
opposite signs.
Figure 30.18, left panel, shows the amplitude and phase of Ey in the central
XZ-plane, while the right panel shows Ey, Ez in the central YZ-plane. Inside and
below the aperture Ey is seen to be strong, and to have reversed direction relative
to the E-field immediately above the aperture; its energy appears to have been
extracted from the E-fringe directly above the hole. The distribution of Ez shows,
once again, the presence of electric charges on the top and bottom surfaces of the
conductor; these charges have the same sign on the top and bottom surfaces on
either side of the hole, but their sign is reversed in going from the left-side to the
right-side.
Computed plots of Hx, Hy, Hz in the XY-plane 20 nm above the conductor’s
surface appear in Figure 30.19. Figure 30.20, left panel, depicts the amplitude and
|Ey| |Ey| |Ez|
0.00 0.68 1.36 2.05 0.00 1.48 2.97 4.45 0.0 1.3 2.6 3.9
600 600
400 400
200 200
z [nm]
z [nm]
0 0
–200 –200
–400 –400
–600 –600
f(Ey) f(Ey) f(Ez)

–180 –60 60 180 –180 –60 60 180 –180 –60 60 180
600 600
400 400
200 200
z [nm]
z [nm]
0 0
–200 –200
–400 –400
–600 –600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] y [nm] y [nm]
Figure 30.18 (Left) amplitude and phase of Ey in the central XZ-plane; (right)
amplitude and phase of Ey, Ez in the central YZ-plane in the system of
Figure 30.13. The silver film’s cross-section is indicated with dashed lines. The
fringes in the two panels are differently colored because the color scale for Ey in
the YZ-plane has been greatly expanded by two (barely visible) hot spots on the
sidewalls near the bottom of the hole.
phase of Hx, Hz in the central XZ cross-section, while the right panel shows the
distribution of Hx in the central YZ-plane. The magnetic field’s behavior in these
pictures is in accord with the qualitative behavior sketched in Figure 30.16(b).
Note, in particular, that the profile of Hz in Figure 30.20 resembles the
z-component of the circulating B-field in Figure 30.16(b). Note also the draining
of magnetic energy out of the B-fringe above the hole, and its redistribution not
only in the form of magnetic fields inside and below the aperture, but also in the
enhanced values of Hx directly above the conductor’s surface.
Plots of Ex, Ey, Ez in the XY-plane 20 nm below the bottom surface of the
conductor are shown in Figure 30.21, and the corresponding magnetic-field plots
appear in Figure 30.22. These pictures are in full agreement with the qualitative
diagrams of Figures 30.14–30.16.
Figure 30.23 shows distributions of the magnitude jSj of the Poynting vector in
various cross-sections of the system of Figure 30.13. The superimposed arrows
on each plot show the projection of S in the corresponding plane.7 For instance, in
|Hx| |Hy| |Hz|
× 10–3 0.00 1.94 3.89 5.83 × 10–3 0.00 0.31 0.62 0.92 × 10–3 0.00 0.45 0.89 1.34
600
400
200
y [nm]
–200
–400
–600
f(Hx) f(Hy) f(Hz)

0 35 69 104 –180 –60 60 180 –180 –60 60 180
600
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
Figure 30.19 Computed plots of Hx, Hy, Hz in the XY-plane 20 nm above the
surface of the conductor in the system of Figure 30.13. Top row: amplitude;
bottom row: phase. The aperture boundaries are indicated with dashed lines.
the XZ cross-section depicted in (a) the arrows represent Sx x þ Sz z, whereas in

the YZ cross-section of (b) the arrows correspond to the projection of the Poynting
vector on the YZ-plane, namely, Sy y þ Sz z. The plots in Figures 30.23(c) and (d)
show the distributions of jSj in the XY-planes 20 nm above and below the
aperture. In the absence of an aperture, S is essentially zero everywhere, as the
reflected beam cancels out the incident beam’s energy flux. When the aperture is
present, however, the fields are redistributed in such a way as to draw the incident
optical energy toward the aperture. The energy flows in from the region directly
above as well as from the periphery of the hole in every direction. In addition to
the straight-ahead energy, some of the peripheral energy also goes through the
hole, thus enhancing the overall transmission. Further away from the aperture,
especially in the YZ-plane (which contains the ellipse’s short axis), the peripheral
incoming energy turns away from the aperture and returns to the source.
The magnitude of the Poynting vector in the center at the bottom of the hole is
jSj 1.23 · 10–3 W/m2, which is consistent with the transmitted E- and B-fields
of 1.6 V/m and 2.14 · 10–3 A/m, with a phase difference D ¼ E – B 45
(see Figures 30.21 and 30.22). The transmission efficiency of the optical power
|Hx| |Hz| |Hx|
× 10–3 0.0 1.9 3.8 5.7 × 10–3 0.00 0.77 1.54 2.31 × 10–3 0.00 1.98 3.95 5.93
600 600
400 400
200 200
z [nm]
z [nm]
0 0
–200 –200
–400 –400
–600 –600
f(Hx) f(Hz) f(Hx)
–180 –60 60 180 –180 –60 60 180 –180 –60 60 180
600 600
400 400
200 200
z [nm]
z [nm]
0 0
–200 –200
–400 –400
–600 –600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] x [nm] y [nm]
Figure 30.20 (Left) amplitude and phase of Hx, Hz in the central XZ-plane;
(right) amplitude and phase of Hx in the central YZ-plane in the system of
Figure 30.13. The silver film’s cross-section is indicated with dashed lines.
density at the center of this aperture is thus g 93%, which is nearly 500 times
greater than that obtained when the incident polarization was parallel to the ellipse’s
long axis. (g is the ratio of jSj at the aperture’s center just below the conductor
to the incident plane-wave’s optical power density, jSincj 1.32 · 10–3 W/m2.)
Several factors appear to have contributed to this strong performance (compared to
the case of parallel polarization), among them, more electrical charges and stronger
surface currents (especially on the bottom facet of the conductor), and a greater
separation between the m magnetic dipoles, which tend to cancel each other out
when they are close together.
Concluding remarks
We have analyzed the transmission of light through a small elliptical aperture in a
thin silver film at k ¼ 1.0 lm. Both cases of incident polarization parallel and
perpendicular to the major axis of the ellipse were considered. The transmission
efficiency g was found to be low for parallel polarization and high for perpen-
dicular polarization.
|Ex| |Ey| |Ez|
0.000 0.025 0.050 0.075 0.00 0.53 1.07 1.60 0.00 0.42 0.84 1.25
600
400
200
y [nm]
–200
–400
–600
f(Ex) f(Ey) f(Ez)

–180 –60 60 180 –180 –60 60 180 –180 –60 60 180
600
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
Figure 30.21 Computed plots of Ex, Ey, Ez in the XY-plane 20 nm below the
bottom row: phase.
The hallmark of the low-transmission case was a weak excitation of electric

and magnetic dipoles on the upper surface of the metal film, which produced
even weaker excitations on the lower surface. Although not described here, we
have observed similar behavior for a circular aperture (diameter ¼ 100 nm,
silver film thickness ¼ 124 nm, g ¼ 0.06 % at the center of the aperture 20 nm
below the conductor), and also for an infinitely long, 100 nm-wide slit
(g ¼ 0.14% at the center of the slit 36 nm below the bottom facet; incident
polarization parallel to the slit). For the elliptical hole under low-transmission
conditions, g drops rapidly with an increasing film thickness h, from 0.2 % at
h ¼ 124 nm, to 0.008 % at h ¼ 186 nm, and to below 0.001% at h ¼ 248 nm. It
appears that the elliptical hole, when considered as a waveguide,8,9 does not
support any guided mode whose E-field is predominantly aligned with the
ellipse’s long axis.
The high-transmission ellipse revealed the excitation of fairly strong
electric and magnetic dipoles on the upper surface of the metal film, which
induced even stronger dipoles on the film’s lower facet. In this case g remains
high for thicker films as well (g ¼ 93 % for h ¼ 124 nm, 86 % for h ¼ 186 nm,
|Hx| |Hy| |Hz|
× 10–3 0.00 0.71 1.43 2.14 × 10–3 0.00 0.35 0.70 1.05 × 10–3 0.00 0.53 1.06 1.59
600
400
200
y [nm]
–200
–400
–600
f(Hx) f(Hy) f(Hz)
–180 –60 60 180 –180 –60 60 180 –180 –60 60 180
600
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
Figure 30.22 Computed plots of Hx, Hy, Hz in the XY-plane 20 nm below the
bottom row: phase.
and 136 % for h ¼ 248 nm), indicating propagation through the hole (along
the Z-axis) of a guided mode whose E-field is largely parallel to the ellipse’s
short axis. We also found that an infinitely long, 100 nm-wide slit exhibits
strong transmission for an incident polarization aligned with the narrow
dimension of the slit (g 69 % at the center of the slit, 36 nm below a 124 nm-thick
silver film).
It thus appears that achieving a large g requires an aperture that can excite
strong oscillator(s) on the upper surface of the film, which would then induce
strong oscillations on the lower facet, thereby creating the conditions for the
passage of a substantial amount of electro-magnetic energy through the sub-
wavelength opening in the metal film. The ability of a hole (or slit) to support a
guided mode that can be excited by the incident polarization appears to be critical
for achieving large transmission, especially for thicker films. Recent reports of
various aperture designs that have significant throughputs (compared with simple
circular or square-shaped apertures)3,10 indicate that the aforementioned prin-
ciples, far from being specific to elliptical holes in thin metal films, have a broad
range of application.
|S| |S|
× 10–5 0 59 118 177 × 10–5 0 120 240 360
600 a b
400
200
z [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] y [nm]
|S| |S|
× 10–5 0 42 84 126 × 10–5 0 41 82 123
600 c d
400
200
y [nm]
–200
–400
–600
–600 –400 –200 0 200 400 600 –600 –400 –200 0 200 400 600
x [nm] x [nm]
Figure 30.23 Profiles of the magnitude jSj of the Poynting vector in various
cross-sections of the system of Figure 30.13. The superimposed arrows
show the projection of S in the corresponding plane. (a) Central XZ-plane.
(b) Central YZ-plane. (c, d) XY-planes located 20 nm above and below the
aperture.

1 M. Born and E. Wolf, Principles of Optics, seventh edition, Cambridge University
Press, UK (1999).
2 H. A. Bethe, Theory of diffraction by small holes, Physical Review 66, 163 (1944).
3 T. Thio, K. M. Pellerin, R. A. Linke, H. J. Lezec, and T. W. Ebbesen, Enhanced
light transmission through a single subwavelength aperture, Optics Letters 26,
1972–1974 (2001).
4 J. D. Jackson, Classical Electrodynamics, second edition, Wiley, New York, 1975.
5 A. Taflove and S. C. Hagness, Computational Electrodynamics, Artech House,
Norwood, MA (2000).
6 Jin Au Kong, Electromagnetic Wave Theory, EMW Publishing, Cambridge, MA,
2000.
7 The computer simulations reported in this chapter were performed by
Sim3D_Max, a product of MM Research, Inc., Tucson, AZ.
8 J. A. Porto, F. J. Garcia-Vidal, and J. B. Pendry, Transmission resonances on
metallic gratings with very narrow slits, Phys. Rev. Lett. 83, No.14, 2845–48 (1999).
9 Q. Cao and P. Lalanne, Negative role of surface plasmons in the transmission of
metallic gratings with very narrow slits, Phys. Rev. Lett. 88, No.5, 57403 (2002).
10 X. Shi, L. Hesselink, and R. L. Thornton, Ultrahigh light transmission through a
C-shaped nanoaperture, Optics Letters 28, No. 15, 1320–22 (2003).
31
The method of Fox and Li
The electromagnetic fields within a waveguide or a resonator cannot have arbitrary

distributions. The requirements of satisfying Maxwell’s equations as well as the
boundary conditions specific to the waveguide (or the resonator) confine the dis-
tribution to certain shapes and forms. The electromagnetic field distributions that
can be sustained within a device are known as its stable modes of oscillation.1,2
When the device and its geometry are simple, the stable modes can be
determined analytically. For complex systems and complicated geometries,
however, numerical methods must be used to solve Maxwell’s equations in the
presence of the relevant boundary conditions. The method of Fox and Li is an
elegant numerical technique that can be applied to certain waveguides and res-
onators in order to obtain the operating mode of the device. Instead of solving
Maxwell’s equations explicitly, the method of Fox and Li uses the Fresnel–
Kirchhoff diffraction integral to mimic the physical process of wavefront
propagation within the device, thus arriving at its stable mode of operation after
several iterations.3,4
To illustrate the method of Fox and Li we focus our attention on the confocal
resonator shown in Figure 31.1(a). Let us assume that the two mirrors are aber-
ration-free parabolas with an effective numerical aperture NA ¼ 0.01 and focal
length f ¼ 62 500k0 (k0 is the vacuum wavelength of the light confined within the
cavity). The clear aperture of each mirror will therefore have a diameter of
1250k0. (For the HeNe wavelength of k0 ¼ 0.633 lm, for example, this resonator
will be a filament 8 cm long and 0.8 mm wide.) The resonator of Figure 31.1(a)
may be modeled as the periodic-lens waveguide depicted in Figure 31.1(b). The
beam starts at the focal plane of the first lens, becomes collimated, reaches the
second lens, is focused by the second lens, and the process repeats itself over and
over again. The essence of the method of Fox and Li for the computation of the
stable mode within this cavity may now be described as follows. An initial dis-
tribution is propagated through the periodic-lens waveguide until a steady-state
447
(a)
(b)
Figure 31.1 (a) Schematic diagram of a confocal optical resonator consisting

of two parabolic mirrors. The mirrors are identical, each having numerical
aperture NA and focal length f. (b) A periodic-lens waveguide that can be used to
simulate the behavior of the resonator.
distribution is reached. In the steady state the shape of the complex-amplitude

distribution within the cavity will no longer change with successive iterations, but
its power content will decline at a constant rate due to losses in the cavity. These
ideas may best be explained by several examples.
The lowest-order mode

The mode of the cavity that is easiest to obtain by the method of Fox and Li is the
lowest-order mode. Typically, just about any arbitrary initial distribution that one
picks will converge to the lowest-order mode. From a practical standpoint this is very
useful, because the lowest-order mode is also the mode in which the resonator
operates, under most practical conditions. In Figure 31.2(a) we show a uniform
initial distribution within a fairly large circular aperture. After going through about
80 iterations, this distribution settles into the mode known as the 0,0 mode of the
cavity and shown in Figures 31.2(b)–(d). The 0,0 mode is essentially Gaussian in
character, although, as the logarithmic plot of intensity in Figure 31.2(c) and the
phase plot of Figure 31.2(d) show, it has an oscillating tail. The oscillation is caused
by the finite apertures of the mirrors, which truncate the ideal, Gaussian mode.
300
a b
y/0
–300
300 c d
y/0
–300
–300 x/0 300 –300 x/0 300
Figure 31.2 Computing the 0,0 mode of the resonator shown in Figure 31.1
using the method of Fox and Li. (a) The assumed initial distribution, having
uniform amplitude and constant phase across a wide circular aperture. (b)
Computed intensity distribution at the mid-plane of the cavity, obtained after 80
iterations. (c) Same as (b) but showing the logarithm of intensity on a scale of 1
(white) to 105 (black). (d) Distribution of the phase in the mid-plane of the
cavity corresponding to the steady-state intensity distribution shown in (b) and
(c). In this picture a white pixel represents a þ180 phase angle, a black pixel
represents a 180 phase angle, and the gray pixels represent the continuum of
values in between.
For the above simulation a plot of the power attenuation coefficient c versus
iteration number is shown in Figure 31.3. c is the ratio of the optical power
contained in the beam after a given iteration to the same quantity before the
iteration. It thus represents, for the particular mode under consideration, the
fractional losses of the cavity during one round trip of the beam. The steady-state
value of c is also related to the eigenvalue of the mode under consideration; the
mode itself is an eigenfunction of the cavity. In the present example, where the
steady-state value of c is 0.97, the losses for the lowest-order mode are indeed
very small.
1.0
0.9
Attenuation Coefficient
0.8
0.7
0.6
0 20 40 60 80
Number of Iterations
Figure 31.3 Evolution of the power attenuation coefficient c during the simu-
lation that led to the 0,0 mode shown in Figure 31.2. The computation stabilizes
after about 80 iterations, and the steady-state value of c is close to 0.97.
Higher-order modes
Although the method of Fox and Li is ideally suited for computation of the
lowest-order mode of the cavity, under special circumstances (and sometimes
with the aid of special tricks) it is possible to compute some of the higher-order
modes as well. As an example, consider the initial distribution shown in
Figure 31.4(a), which consists of four identical lobes, each having the same
uniform intensity distribution. Although not shown, it is also assumed that the
phase is 0 for the pair of lobes along one diagonal and 180 for the opposite pair.
The stage is thus set for excitation of the so-called 1,1 mode of the cavity.
Figures 31.4(b)–(d) show the computed 1,1 mode obtained from the initial
distribution of Figure 31.4(a) after 64 iterations. The plot of attenuation coeffi-
cient c versus iteration number shown in Figure 31.5 reveals that the steady-state
is reached after only about 40 iterations and that the final value of c is 0.87. That
this value of c is less than that for the 0,0 mode is consistent with the observation
that the 1,1 mode is more spread out and, therefore, must suffer higher truncation
losses at the apertures of the mirrors. For comparison with the steady-state dis-
tribution, two of the intermediate distributions obtained in this simulation are
shown in Figure 31.6. In this figure the intensity plots appear on the left-hand side
and the corresponding log(intensity) plots appear on the right-hand side. The
patterns in Figures 31.6(a), (b) are obtained after 6 and 17 iterations, respectively.
300 a b
y/0
–300
300 c d
y/0
–300
–300 x/0 300 –300 x/0 300
Figure 31.4 Computation of the 1,1 mode of the confocal resonator in

Figure 31.1 begins with the initial distribution shown in (a). Here the four lobes
of the initial pattern have uniform and equal intensities, but the phase of each
lobe (not shown) differs from that of its adjacent lobes by 180 . The steady-state
distribution in the mid-plane of the cavity is obtained after 64 iterations. (b) Plot
of the intensity distribution in the steady state. (c) The same as (b) but showing
the logarithm of intensity on a scale of 1 (white) to 104 (black). (d) The
distribution of phase in the steady state. (For a description of the gray-scale see
the caption to Figure 31.2(d).)
Another example of a high-order mode is shown in Figure 31.7. Here the

starting distribution of Figure 31.7(a) has eight lobes, each having the same
uniform amplitude; the phase of the adjacent lobes alternates between 0 and
180 . After 16 iterations the distribution of Figures 31.7(b)–(d) is obtained.
Although this is very close to one of the high-order modes of the cavity, the
simulation does not converge at this point, but continues to evolve towards the 0,0
mode. Figure 31.8, which is the corresponding plot of attenuation coefficient c
versus iteration number, clearly demonstrates the situation. Although after about 20
iterations the simulation appears to be stabilizing, small numerical errors disturb
the system and push it away from the high-order mode. We confirmed that the
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0 10 20 30 40 50 60
Figure 31.5 Evolution of the power attenuation coefficient c during the

simulation that led to the 1,1 mode shown in Figure 31.4. The computation
stabilizes after about 40 iterations, and the final value of c is approximately 0.87.
steady-state distribution in this case was the same as the 0,0 mode of Figure 31.2;
also notice that the steady-state value of c in Figure 31.8 is 0.97, in agreement with
our previous estimate of c for the lowest-order mode.
To get an idea as to how the pattern in Figure 31.7 reconfigures itself to resemble
that of the 0,0 mode, we show in Figure 31.9 an intermediate state obtained from the
initial state of Figure 31.7(a) after 33 iterations. Notice that some of the lobes have
moved towards the center and have begun to merge, giving rise to a central bright spot
which, thanks to its lower losses, will eventually overtake the higher-order mode.
Effect of misalignments and aberrations

One of the great advantages of the method of Fox and Li is that in the presence of
misalignments and other imperfections, when analytical methods become
intractable, this numerical scheme continues to be effective in calculating the
stable mode of the resonator. As an example consider the same resonant cavity as
that of Figure 31.1(a), but now suppose that one of the mirrors has two waves of
primary coma. The results of computer simulations pertaining to this case are
shown in Figures 31.10 and 31.11. Note that the stable mode in this case is a
somewhat elongated version of the 0,0 mode, exhibiting a comatic tail. Note also
that the steady-state attenuation coefficient c is slightly reduced from its value in
the unaberrated case.
300 a a2
1
y/0
–300
300 b b2
1
y/0
–300
–300 x/0 300 –300 x/0 300
Figure 31.6 Intermediate patterns of intensity distribution in the cavity’s mid-

plane during the computation of the 1,1 mode shown in Figure 31.4. For each
intensity plot on the left-hand side the corresponding logarithmic plot is shown
on the right-hand side. The scale of the logarithmic plots is from 1 (white) to
102 (black). (a) After six iterations; (b) after 17 iterations.
300 a b
y/0
–300
300 c d
y/0
–300
–300 x/0 300 –300 x/0 300
Figure 31.7 Computed results for a high-order mode of the confocal resonator
shown in Figure 31.1. The assumed initial distribution in the cavity’s mid-plane
has eight lobes of uniform amplitude, as shown in (a), but its phase distribution
(not shown) alternates between 0 and 180 from lobe to adjacent lobe.
(b) Intensity distribution in the cavity’s mid-plane after 16 iterations. (c) Same
as (b), but showing the logarithm of intensity on a scale of 1 (white)
to 103 (black). (d) Distribution of phase in the mid-plane of the cavity after
16 iterations, corresponding to the intensity patterns in (b) and (c). (For a
description of the gray-scale see the caption to Figure 31.2(d).)
1.0
0.9
0.8
0.7
0.6
0.5
0 10 20 30 40 50 60

simulation that started with the distribution of Figure 31.7(a) and went through
the state shown in Figures 31.7(b)–(d). At first the simulation appears to stabilize
with c around 0.73, but instability sets in after about 25 iterations, forcing the
system towards the 0,0 mode and a value of c 0.97.
300
a
y/0
–300
300
b
y/0
–300
300 c
y/0
–300
Figure 31.9 Distributions of (a) intensity, (b) log (intensity), and (c) phase at
the cavity’s mid-plane after a total of 33 iterations, starting in the initial state of
Figure 31.7(a). This is a snap-shot from an intermediate state in the simulation
whose other results are depicted in Figures 31.7 and 31.8. Note that four of the
lobes have moved towards the center and started to merge into a bright central
spot. This is the spot that will eventually become the dominant 0,0 mode.
300 a
y/0
–300
300
b
y/0
–300
300 c
y/0
–300
–300 x/0 300
Figure 31.10 Computing the lowest-order mode of the confocal resonator of

Figure 31.1 when one of the mirrors has two waves of primary coma. The
assumed initial distribution has uniform amplitude and constant phase across a
wide, circular aperture, as shown in Figure 31.2(a). (a) Computed intensity
distribution at the mid-plane of the cavity, obtained after 64 iterations. (b) Same
as (a) but showing the logarithm of intensity on a scale of 1 (white) to 105
(black). (c) Distribution of phase in the mid-plane of the cavity corresponding to
the steady-state intensity distribution shown in (a) and (b). (For a description of
the gray-scale see the caption to Figure 31.2(d).)
1.00
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0 10 20 30 40 50 60

simulation that led to the stable mode shown in Figure 31.10. The computation
stabilizes after about 30 iterations, and the steady-state value of c is close to 0.95.
References to Chapter 31
1 A. E. Siegman, An Introduction to Lasers and Masers, McGraw-Hill, New York
(1971).
2 H. Kogelnik and T. Li, Laser beams and resonators, Proc. IEEE 54, 1312 (1966).
3 A. G. Fox and T. Li, Resonant modes in a maser interferometer, Bell Syst. Tech. J.
40, 453 (1961).
4 A. G. Fox and T. Li, Modes in a maser interferometer with curved and tilted mirrors,
Proc. IEEE 51, 80 (1964).
32
The beam propagation method†
The beam propagation method (BPM) is a simple numerical algorithm for

simulating the propagation of a coherent beam of light through a dielectric wave-
guide (or other structure).1 Figure 32.1 shows the split-step technique used in the
BPM, in which the diffraction of the beam and the phase-shifting action of the guide
are separated from each other in repeated sequential steps, of separation Dz.
One starts a BPM simulation by defining an initial cross-sectional beam profile in the
XY-plane. The beam is then propagated (using classical diffraction formulas) a short
distance Dz along the Z-axis before being sent through a phase/amplitude mask. The
properties of the mask are derived from the cross-sectional profile of the waveguide
(or other structure) in which the beam resides. The above steps of diffraction
followed by transmission through a mask are repeated until the beam reaches its
destination or until one or more excited modes of the guide become stabilized.2,3
Instead of propagating continuously along the length of the guide, the beam in
BPM travels for a short distance in a homogeneous isotropic medium, which has
the average refractive index of the guide but lacks the guide’s features (e.g., core,
cladding, etc.). After this diffraction step, a phase/amplitude mask is introduced
in the beam path. To account for the refractive index profile of the guide, the
mask must phase-shift certain regions of the beam relative to others. The mask
must also adjust the beam’s amplitude distribution to simulate the effects of
regions that absorb or amplify, when the guide happens to contain such regions.
A good approximation to the real physical situation is obtained in the limit
when Dz ! 0 and the phase/amplitude modulation imparted by the mask is scaled
in proportion to Dz. In practice, the BPM works quite well without the need to
make Dz excessively small. The various examples presented in this chapter
should make the capabilities of the BPM abundantly clear.
†
The coauthors of this chapter are Ewan M. Wright and Mahmoud Fallahi of the College of Optical Sciences,
University of Arizona.
459
Y ΔZ ΔZ
Diffraction Mask Diffraction Mask Diffraction Mask
Figure 32.1 The split-step technique used in the BPM. Instead of continuously
propagating the beam in an inhomogeneous environment, the method alternates
between diffracting the beam a short distance through a homogeneous medium
and then modulating its phase/amplitude through a mask. The mask imparts to
the incident beam the cumulative effect of phase shifts and amplitude attenua-
tions (or amplifications) during each propagation step.
Single-mode step-index fiber

Figure 32.2 shows a basic setup for injecting a laser beam into an optical fiber.
The Gaussian beam of the laser diode is captured and truncated by the lens, then
focused onto the entrance facet of the fiber. The focused beam typically loses
about 4% of its power to reflection at the cleaved facet, but the remaining 96%
enters the fiber. A fraction of this optical energy couples into the fiber’s propa-
gating mode and travels along the axis in the Z-direction; the rest radiates away
from the axis and disappears in the region beyond the cladding.
For a 0.2NA diffraction-limited lens, Figure 32.3(a) shows the computed
intensity profile of the focused spot in the XY-plane immediately after entering
the fiber. The average refractive index n of the silica glass fiber is assumed to be
1.5, and so the wavelength k of the light in this medium is k0 /n, where k0 is the
vacuum wavelength. The light amplitude distribution of Figure 32.3(a) serves in
this example as the initial distribution for the BPM.
To simulate a single-mode step-index fiber, we use the phase mask of Figure 32.3(b)
and choose Dz ¼ 2.5k. The assumed core and cladding diameters are 5k and 30k,
respectively, and the index difference of Dn ¼ ncore nclad ¼ 0.0125 results in a 3 phase
shift per distance k of propagation. The mask depicted in Figure 32.3(b) is therefore
required to advance the phase by 7.5 in its core region (relative to the cladding) during
each BPM step. The light amplitude distribution inside the fiber reaches a steady
state after a few hundred iterations; the intensity profile shown in Figure 32.3(c) is
obtained at z ¼ 1500k. The mesh used in this simulation had 512 · 512 pixels, and
the entire computation on a modern personal computer took less than one hour.
X
Y Z

Laser diode Fiber
Lens
Figure 32.2 The emergent beam from a semiconductor laser diode is captured
by a lens and focused onto the cleaved facet of a fiber. The numerical aperture of
the focused cone of light is NA ¼ sin h, where h is the half-angle of the cone of
light arriving at the fiber. A small fraction of the incident beam is typically lost
by reflection from the facet, while the remaining light penetrates the fiber,
entering the core and cladding. Depending on the modal structure of the fiber and
the cross-sectional profile of the injected beam, a certain fraction of the input
optical power is coupled into the guided mode, which will then propagate along
the fiber’s axis. The remaining (uncoupled) light radiates away from the core and
is lost in the surrounding regions.
The fraction of the beam that radiates away from the core during a BPM
simulation should not be allowed to reach the mesh boundary. The reason is that
the periodic boundary condition imposed on the mesh by the fast Fourier trans-
form (FFT) algorithm used in diffraction calculations tends to return the radiation
modes into the computational region, via aliasing. In the present simulation we
solved this problem by our choice of the mask, which, in addition to a core and
cladding, contains a strongly absorbing region beyond the cladding (transmission
coefficient zero for r > 15k). The transition from cladding to absorber is tapered
to minimize back-reflections into the core and cladding.
Figure 32.4(a) is a cross-sectional plot of the stabilized light amplitude dis-
tribution in the fiber; the vertical broken lines mark the core–cladding boundary.
This guided mode is essentially trapped in the core but has evanescent tails in the
cladding. The significant penetration of the evanescent waves into the cladding is
a direct consequence of the small index-contrast Dn chosen for this particular
example.2,3
Figure 32.4(b) shows the power content of the beam versus z throughout the
simulation. (The power at any given point along the Z-axis is obtained by
integrating the beam’s intensity over the entire cross-sectional area in the
XY-plane.) The power of the beam, which is set to unity at the entrance pupil
of the lens (see Figure 32.2), drops to 0.96 immediately after the beam enters
the fiber. It then drops rapidly until z 1000k while the beam adjusts itself to
–20 x/ 20
Figure 32.3 (a) Logarithmic plot of intensity distribution immediately after

the front facet of a silica glass fiber, produced by a laser beam of wavelength k0
focused through a 0.2NA lens. (b) Phase mask used in the BPM simulation of
a single-mode step-index fiber. The core and cladding diameters are 5k and
30k, respectively, and the phase shift imparted by the core relative to that
imparted by the cladding is 3 per k; here k is the wavelength inside the fiber.
The region beyond the cladding is absorptive. (c) Logarithmic plot of intensity
distribution in the cross-section of the fiber at a distance z ¼ 1500k from the
fiber’s front facet.
the fiber, shedding some of its energy by radiating into the absorption region.
The slight decline of optical power for z > 1000k is partly due to the slow decay
of the radiative modes. Nonetheless, the curve in Figure 32.4(b) continues to
exhibit a small negative slope even after a long propagation distance. This
1.00 (a) 1.00
(b)
Normalized amplitude
0.75 0.95
Power
0.90
0.50
0.85
0.25
0.80
0.00
–15 –10 –5 0 5 10 15 0 1250 2500 3750 5000
x/ Propagation distance (z/)
Figure 32.4 (a) Computed amplitude of the stable mode along the X-axis
for the single-mode fiber of Figures 32.2 and 32.3 (the vertical lines denote
the boundary between core and cladding). The mode stabilizes at about
z ¼ 1500k and does not change afterwards. (b) Computed optical power
along the length of the fiber when the incident power at the front facet is set
to unity.
behavior is indicative of the presence of a small loss factor despite the fact that
the simulated guide is lossless. So long as the diffraction step is treated non-
paraxially, this small loss factor, which is a consequence of the discrete
approximation to an inherently continuous problem, will remain an unavoidable
feature of the BPM.3
Fiber with a complex core structure

Figure 32.5 shows the cross-section of a special fiber with a core that contains 19
low-index filaments symmetrically arranged around its axis. Such structures,
known as photonic crystals or photonic bandgap structures, currently command
worldwide attention because of their unique optical properties.
For our BPM simulation of the fiber depicted in Figure 32.5 we chose core
and cladding diameters of 50k and 100k, respectively, and an index contrast of
Dn ¼ 0.0125, while placing a tapered absorber outside the cladding region. The
core filaments each had diameter 4k and the same index of refraction as the
cladding. A uniform beam, having a circular cross-section of diameter 40k and
unit optical power, was used as the initial distribution. Figure 32.6 shows the
various cross-sectional patterns of intensity obtained along a propagation path
5000k long. It is clear that several modes of the fiber have been excited and that
interference among these modes gives rise to the observed patterns. Note also that
the light tends to avoid the low-index filaments at all times. The beam’s power
content, plotted versus z in Figure 32.7, is seen to decline slowly with the
–60 x/ 60
Figure 32.5 Phase mask used in simulating a special fiber containing 19 low-
index filaments within its core. The core and cladding radii are 25k and 50k,
respectively, and the region beyond the cladding is absorptive. The filaments
each have a diameter of 4k and the same refractive index as the cladding. All
transition regions are tapered. The phase shift imparted to the beam by the
high-index regions of the core (relative to the low-index cladding and the
filaments) is 3 per k.
propagation distance. The power stabilizes when the radiative modes leave the
core and cladding to disappear into the surrounding absorber; however, a small
negative slope similar to the one mentioned in connection with Figure 32.4(b)
remains even after stabilization.
Y-branch beam-splitter
Figure 32.8 is a diagram of a Y-branch channel waveguide. This structure is
typically embedded in a lower-index medium that plays the role of cladding.2 The
beam, injected on the left-hand side, establishes in the initial section a guided
mode that propagates along the Z-axis; in Figure 32.8 the length of this initial
section of the guide is z1. The waveguide then slowly opens up over a distance z2,
and the beam follows this expansion adiabatically (i.e., without significant loss of
power and without exciting higher-order modes). Once the guide has been suf-
ficiently broadened, it splits into two channels that slowly recede from each other
over a distance z3 until they are optically isolated. Afterwards the two channels
may remain parallel for a distance z4. Thus a beam injected into the initial section
of the guide will split in two, each of which may be extracted from a separate
channel.
A set of phase masks for simulating a symmetric Y-splitter is shown in
Figure 32.9(a). From top to bottom these masks represent the initial section of
a b c
d e f
g h i
–30 x/ 30 –30 x/ 30 –30 x/ 30
Figure 32.6 Computed intensity profiles in the core region of the fiber of
Figure 32.5 obtained in a BPM simulation. The assumed split steps are of length
Dz ¼ 2.5k. The initial distribution is a uniform circularly symmetric beam of
diameter 40k, which enters the fiber at z ¼ 0. The distributions in (a) to (i)
correspond to propagation distances z/k ¼ 500, 1250, 1750, 2500, 3250, 3750,
4000, 4500, 5000.
the guide (5k · 5k square), the end of the expanded region in which the width of
the mask increases to 10k, and the length along the split section where the
center-to-center separation of the two channels slowly increases from zero to
45k (branching angle ¼ 1.15 ). The assumed lengths of the various sections are
z1 ¼ z2 ¼ 1000k and z3 ¼ z4 ¼ 2000k (see Figure 32.8). Each mask imparts a 15
phase shift to the incident beam after a propagation step of Dz ¼ 5k; this cor-
responds to a 3 phase shift per k. For the initial distribution at z ¼ 0 we chose a
uniform beam having a circular cross-section of diameter 14k. The output of the
device at z ¼ 6000k is shown in Figure 32.9(b). The intensity profile in the broad
section of the guide at z ¼ 2000k (just before branching) is shown in Figure 32.9(c),
while a plot of the phase distribution at the same location appears in Figure 32.9(d).
1.00
Power
0.95
0.90
0 1250 2500 3750 5000
Propagation distance z/
Figure 32.7 Total power of the beam versus propagation distance along the
length of the fiber for the BPM simulation depicted in Figure 32.6. The incident
power at z ¼ 0 is set to unity.
Incident beam
z1 z2 z3 z4
Figure 32.8 Core region of a Y-branch channel waveguide, which is

typically embedded in a lower-index cladding. For adiabatic operation the
initial section of the guide (length z1) is slowly broadened over a distance z2
before splitting into two branches. The branches then move apart at a small
angle (typically less than 1 ) over a distance z3 until they are optically
isolated from each other. Afterwards the two channels remain parallel
for a distance z4.
The phase plot indicates the propagation of the radiative modes away from the
core and into the absorbing region beyond the cladding.
Figure 32.10 shows the computed amplitude distributions at several cross-
sections of the Y-splitter of Figure 32.8. Evident in this picture is the evolution of
the guided mode from a narrow beam in the initial section of the guide to a wider
beam in the broadened section, and onwards to a pair of well-confined beams in
the divided channel. The power content of the entire beam is plotted versus z in
Figure 32.11, indicating the losses in various sections of the guide and confirming
the stabilization of power in the output channels once they are sufficiently
separated from each other.
a b
–30 x/ 30 –30 x/ 30
c d
–12 x/ 12 –12 x/ 12
Figure 32.9 (a) A set of phase masks used to simulate the Y-splitter of
Figure 32.8. From top to bottom: at the start of the guide; at the end of the
expanded region; at three locations in the split section. (b) The computed
intensity pattern at the end of the guide, z ¼ 6000k, showing two output beams
confined to their respective channels. (c), (d) Intensity and phase distributions in
the broad section of the guide, just before branching. In the phase plot the
remnants of the incident beam, which are not coupled into a guided mode, are
seen to be radiating away from the core.
Directional coupler
Figure 32.12 shows a channel waveguide known as a directional coupler, which
has applications such as switching in optical communication systems.4 A beam of
light injected into channel 1 propagates along that channel until it reaches a point
where channel 2 is close enough to sense the evanescent tail of the guided mode.
At this point the beam leaks into channel 2 and, after a certain distance, moves
entirely into the second channel. If the parallel section of the guide is long
enough, the back and forth coupling between the two channels may be repeated
many times. In this region of strong coupling, the lowest-order modes of the
guide are the even and odd modes depicted in Figure 32.12. Because these modes
travel at different speeds, their relative phase 1 2 varies with distance along
0.15 (a)
0.10
Amplitude
z=0
0.05
0.00
–40 –20 0 20 40
0.15 (b) z = 1000
0.10
Amplitude
z = 2000
0.05
0.00
–40 –20 0 20 40
0.15 (c)
0.10 z = 6000
Amplitude
0.05
0.00
–40 –20 0 20 40
z/
Figure 32.10 Plots of amplitude distribution at various cross-sections of the

Y-splitter depicted in Figures 32.8 and 32.9. (a) The initial distribution at z ¼ 0 is
uniform, having a circular cross-section of diameter 14k. (b) At z ¼ 1000k (solid
line), the end of the single-mode input channel, the beam is confined to the core
region. At z ¼ 2000k (broken line), just before branching, the broadened beam is
seen to fit into the wider channel. (c) Emerging from the guide at z ¼ 6000k are
two identical beams.
the guide. The beam resides entirely in channel 1 or channel 2 when 1 2 is 0

or 180 . When 1 2 ¼ 90 the two channels contain equal amounts of light,
albeit with a 90 relative phase. Eventually the two channels recede from each
other, and the beam stays in the guide in which it was residing just before the
separation.
1.0
0.8
Power
0.6
0.4
0 1000 2000 3000 4000 5000 6000
Propagation distance (z/)
Figure 32.11 Power content of the beam versus z/k in the BPM simulation of
the Y-splitter depicted in Figures 32.9 and 32.10. The arrows indicate the
beginning of the split section, the end of the split section, and the location where
the split channels stop receding from each other and become parallel.
X X
Even Odd
Channel 1
5
5
9 0.23° 3 0.23°
Channel 2
750 1500 7750 1500 Z
Figure 32.12 A directional coupler allows the coupling of light between

adjacent waveguides. Each guide’s cross-section in this example is a 5k · 5k
square, and both channels are embedded in a cladding of lower refractive index.
Between z ¼ 0 and z ¼ 750k the separation between the guides is fixed at 9k; it
then decreases continuously to 3k by z ¼ 2250k, and remains fixed at 3k until
z ¼ 104k. In the coupling region, where the guides are close together, the lowest-
order modes of the two channels (taken together as a single waveguide) are the
displayed even and odd modes.
–25 x/ 25
Figure 32.13 (a) Phase masks used in the BPM simulation of the directional
coupler of Figure 32.12. Each mask, which consists of a pair of 5k · 5k
square apertures, imparts a phase shift of 7.5 to the beam at the end of
each propagation step of Dz ¼ 2.5k. (b) Logarithmic plot of the intensity
distribution created by a 0.2NA lens at the entrance to channel 1. The
wavelength k is that inside the cladding material, and the effect of losses
incurred upon reflection from the front facet of the waveguide is included in
this picture.
Figure 32.13(a) displays the phase masks used in our BPM simulation of the
directional coupler shown in Figure 32.12. The initial intensity distribution
produced by a 0.2NA lens at the entrance to channel 1 is shown in Figure 32.13(b).
Figure 32.14 shows several intensity plots at various cross-sections of the guide,
demonstrating the transfer of light between the two channels. Figure 32.15 is a
plot of the phase distribution at a location along the guide where the two channels
carry equal amounts of optical power; the plot indicates the existence of a 90
phase difference between the two channels. Figure 32.16 shows the amplitude
distributions at several cross-sections of the guide. Figure 32.17 shows the power
content of each channel versus z, indicating several oscillations of the power
between the two channels.
–15 x/ 15
Figure 32.14 Computed plots of intensity distribution at various cross-sections

along the directional coupler depicted in Figures 32.12 and 32.13. From top to
bottom: z/k ¼ 750, 2250, 2500, 2750, 3250, 3750, 4000, 4250.
Multimode interference device

Figure 32.18 is a diagram of a multimode interference (MMI) device used as a
three-way power splitter. This device consists of an input channel, a wide
(multimode) section, and three output channels. The single-mode input guide
carries the incident beam to the multimode region, where the beam suddenly
expands, exciting the various modes of the broad waveguide. The ensuing
interference among these modes creates periodic patterns of intensity and phase
–15 x/ 15
Figure 32.15 The computed phase distribution at z ¼ 4000k in the BPM

simulation of the directional coupler depicted in Figure 32.14, showing a 90
phase difference between the two channels at this location.
0.25
2250
z = 750
0.20
2500
0.15
Amplitude
3250
0.10
0.05
0.00
–15 –10 –5 0 5 10 15
x/
Figure 32.16 Plots of the light amplitude distribution at various cross-sections

of the directional coupler depicted in Figures 32.12–32.15. At z ¼ 750k (solid
line) a single guided mode is established in channel 1. At z ¼ 2250k (broken and
double dotted line) the two channels have come close to each other, and some of
the light has already leaked into channel 2. At z ¼ 2500k (dotted line) the power
contents of the two channels are nearly equal. At z ¼ 3250k (broken line) the
beam has all but moved into channel 2.
1.00
0.75
Channel 1
Power
0.50
0.25
Channel 2
0.00
0 2500 5000 7500 10000

Figure 32.17 Power content versus z for each channel in the directional
coupler of Figure 32.12. The incident focused laser beam loses about 4% of
its power upon entering the front facet of the guide, and another 14% while
establishing itself in channel 1. When the two channels slowly approach
each other, the total power in the guide does not change appreciably, but it
begins to couple out of channel 1 and into channel 2. By the time the
separation of the channels has reached 3k, a fraction of the beam already
resides in channel 2. The oscillation of optical power between the
two channels continues as long as they remain close to each other. The
arrows at the top of the figure mark the locations of the intensity plots of
Figure 32.14.
at specific locations along the Z-axis. This behavior is reminiscent of the

Talbot effect, and in fact its explanation rests on the same principles (see
Chapter 24, “The Talbot effect”). At a particular distance L from the port of
entry, the beam breaks up into several bright spots of equal intensity. If access
channels are placed at this location they carry away the resulting isolated
beams.5,6
Figure 32.19 shows the phase masks used in the BPM simulation of the MMI
device depicted in Figure 32.18. At the top of the figure is the cross-section of
the 5k · 5k input channel (length 750k), in the middle is the 45k · 5k multimode
section of the guide (length 3000k), and at the bottom are the cross-sections of
Y
X
L
Z
W
Figure 32.18 In a multimode interference (MMI) device the beam carried by a

single-mode channel suddenly expands into a broad, multimode section of length
L, width W, and thickness D. The many modes of the broad waveguide thus
excited propagate at different speeds along the Z-axis, their interference giving
rise to complex patterns of intensity distribution confined within the guide’s
cross-section in the XY-plane. Access guides placed at the end of the multimode
section carry away the concentrated optical energy localized in isolated bright
spots at z ¼ L.
–30 x/ 30
Figure 32.19 Phase masks used in the BPM simulation of the 1 · 3 splitter
depicted in Figure 32.18. Each mask imparts a phase shift of 7.5 to the beam at
the end of each propagation step of Dz ¼ 2.5k.
the three 5k · 5k output channels (length 1250k). Computed intensity profiles at

several cross-sections of this device are shown in Figure 32.20. Depending on
the distance from the port of entry, the guide’s width W, and the wavelength k,
interference among the excited modes can give rise to a number of different
intensity patterns. In the present example, the chosen parameters of the mul-
timode section (L ¼ 3000k, W ¼ 45k) result in a three-way splitting of the input
optical power. The computed intensity pattern at the end of the output channels
appears in the bottom frame of Figure 32.20.
–30 x/ 30
Figure 32.20 Computed plots of intensity distribution in the MMI device

of Figures 32.18 and 32.19, showing (from top to bottom) the single-mode beam
in the input channel just before entering the multimode section at z ¼ 0, the
distribution of light in the multimode region at z/k ¼ 250, 750, 1125, 1775, 2250,
and at the end of the multimode section at z/k ¼ 3000. The bottom frame shows
the intensity distribution emerging from the three output channels. (The initial
distribution entering the input channel at z ¼ 750k was uniform and had a
circular cross-section of diameter 10k.)

1 M. D. Feit and J. A. Fleck, Computation of mode properties in optical fiber
waveguides by the propagating beam method, Applied Optics 19, 1154 (1980);
Analysis of rib waveguides and couplers by the propagating beam method, J. Opt.
Soc. Am. A 7, 73–79 (1990).
2 T. Tamir, ed., Guided-wave Optoelectronics, 2nd edition, Springer-Verlag, Berlin, 1990.
3 D. Marcuse, Theory of Dielectric Optical Waveguides, 2nd edition, Academic Press,
New York, 1991.
4 C. R. Pollock, Fundamentals of Optoelectronics, R. D. Irwin, Chicago, 1995.
5 O. Bryngdahl, Image formation using self-imaging techniques, J. Opt. Soc. Am. 63,
416–419 (1973).
6 R. Ulrich, Image formation by phase coincidences in optical waveguides, Optics
Communication 13, 259–264 (1975).
33
Launching light into a fiber
A typical single-mode silica glass fiber has a mode profile that is well approxi-
mated by a Gaussian beam. At k ¼ 1.55 lm, this Gaussian mode has a (1/e2
intensity) diameter of 10 lm. One method of launching light into a fiber calls
for placing the polished end of the fiber in contact with (or close proximity to) the
polished end of another, signal-carrying fiber that has a matching mode profile.
Alternatively, a coherent beam of light may be focused directly onto the polished
end of the fiber. If the focused spot is well aligned with the fiber’s core and has
the same amplitude and phase distribution as the fiber’s mode profile, then the
launched mode will carry the entire incident optical power into the fiber. In
general, however, the focused spot is neither perfectly matched to the fiber’s
mode, nor is it completely aligned with the core. Under these circumstances, only
a certain fraction of the incident optical power will be launched into the fiber. The
numerical value of this fraction, commonly referred to as the coupling efficiency,
will be denoted by g throughout this chapter.
It is well-known that the strength of the launched mode may be computed by
evaluating the overlap integral between the mode profile and the (complex) light
amplitude distribution that arrives at the polished facet of the fiber.1,2,3 The
problem of computing the coupling efficiency g is thus reduced to determining
the light amplitude distribution immediately in front of the fiber. In what follows,
we will evaluate the performance and tolerances of three different lenses
designed for coupling a collimated beam of light into a single-mode fiber.
Radial GRIN lens

The first lens to be studied is a radial gradient-index (GRIN) lens, shown
schematically in Figure 33.1. This lens has plane surfaces on its front and
rear sides, which may be antireflection coated to reduce ordinary reflection losses
at both facets. The lens diameter ¼ 3.0 mm, its length L ¼ 7.89 mm, and its
476
GRIN lens
Fiber
Incident beam
(collimated)
Figure 33.1 Radial gradient-index (GRIN) lens designed to focus a collimated

beam of light into a single-mode fiber attached to its rear facet. In our simulations
the lens has diameter ¼ 3.0 mm and length L ¼ 7.89 mm. The single-mode silica
glass fiber has a Gaussian mode profile with 1/e2 (intensity) diameter of 10 lm.
refractive index profile n(r) ¼ n0[1 – q(r/rmax)2], where n0 ¼ 1.5901, q ¼ 0.044 55,
rmax ¼ 1.5 mm. The lens is permanently affixed to a single-mode fiber whose
guided mode diameter (at the 1/e2 intensity point) is 10 lm.
A collimated Gaussian beam, having radius R0 (at the 1/e2 intensity point)
and some wavefront distortion, is incident on the front facet of the lens.
Figure 33.2, top row, shows cross-sectional plots of intensity, log intensity, and
phase for this k ¼ 1.55 lm beam arriving at the entrance facet of the GRIN lens.
The intensity profile has R0 ¼ 500k, full-width-at-half-maximum-intensity
diameter DFWHM ¼ 1.1774R0 ¼ 0.912 mm, and full-aperture diameter D ¼ 2.325
mm. The Poynting vector distribution (representing geometric-optical rays) in the
cross-sectional plane of the beam is computed, and its x-, y-, z-components are
shown in Figures 33.2(d)–(f).
Method of computation
With reference to Figure 33.3, we describe a method of computing the (complex)
light amplitude distribution at the focal plane of the lens. From the incident beam
profile one derives a large number of rays (i.e., Poynting vectors) for subsequent
tracing through the system. Ray-tracing begins at the entrance facet of the GRIN
lens, and continues through the focal plane to the destination plane, which is in
the far field of the focused spot. Note that, after traversing the GRIN lens, the rays
emerge into a homogeneous medium of refractive index n ¼ 1.5; this region is
800 c
a b
y/
–800
800 d e f
y/
–800
–800 x/ 800 –800 x/ 800 –800 x/ 800
Figure 33.2 Cross-sectional plots of (a) intensity, (b) log intensity, (c) phase of
a k ¼ 1.55 lm beam arriving at the entrance facet of the GRIN lens of Figure 33.1.
The intensity distribution is Gaussian, having DFWHM ¼ 589k ¼ 0.912 mm, and
full-aperture diameter D ¼ 1500k ¼ 2.325 mm. The Poynting vector distribution
S(x, y) – representing geometric-optical rays – is readily computed from the
beam profile. Frames (d)–(f) show the x-, y-, z-components of the Poynting
vector, namely, Sx(x, y), Sy(x, y), Sz(x, y). In (d) the values of Sx range from
0.18 to 0.39. Similarly, Sy in (e) ranges from 0.22 to 0.32, and Sz in (f) ranges
from 0 to 100 (black ¼ minimum, white ¼ maximum).
Homogeneous
GRIN lens medium
Incident beam
Focal Destination
plane plane
Figure 33.3 Method of computing the light-amplitude distribution at the focal

plane of the GRIN lens. Ray-tracing begins at the entrance facet, and continues
through the focal plane to the destination plane, which is in the far field of the
focused spot. At the destination plane the traced rays are used to construct the
emergent wavefront, which is subsequently back-propagated to the focal plane at
the exit facet of the GRIN lens.
intended to simulate the medium of the fiber (ignoring the slight difference
between the core and cladding indices). At the destination plane the traced rays
are used to construct the wavefront of the emerging (divergent) beam. This
wavefront is then propagated backwards, to the focal plane of the GRIN
lens (located at its exit facet), where the focused spot’s diffraction pattern is
computed. The reason for tracing the rays all the way to the destination plane (in
the far field of the focused spot) and then back-propagating to the focal plane is
that geometric-optical ray-tracing does not yield valid results when the rays
terminate in focal (or caustic) regions.
Figure 33.4 shows the results of two different computations for the incident
beam depicted in Figure 33.2. Shown are the intensity and phase distributions at
the focal plane of the GRIN lens of Figure 33.3. The incident wavefront is
initially converted to a set of geometric-optical rays, using the association
between a ray and the local Poynting vector of the electromagnetic field. In
Figure 33.4(a, b) the incident rays are traced directly to the focal plane, and the
emergent wavefront has been reconstructed from the traced rays. In Figure 33.4
(c, d) the rays are traced from the entrance facet to the destination plane (see
a b
c d
–36 x (m) 36 –36 x (m) 36
Figure 33.4 Using two different methods, the intensity (left) and phase (right)
distributions at the focal plane of the GRIN lens of Figure 33.3 have been
computed for the incident beam shown in Figure 33.2. In (a) and (b) the incident
rays are traced directly to the focal plane, and the emergent wavefront is con-
structed from traced rays. In (c) and (d) the rays are traced from the entrance
facet of the lens to the destination plane, where the emergent wavefront is
constructed and subsequently back-propagated to the focal plane (i.e., rear facet)
of the GRIN lens.
Figure 33.3), at which point the emergent wavefront is constructed. This wave-
front is subsequently back-propagated to the rear facet of the lens using the far
field (Fraunhofer) diffraction formula. Since the incident beam in this particular
example is highly aberrated, the two methods of calculation yield similar results.
As a general rule, however, the incident rays should not be traced to the vicinity
of the focal plane, where, due to significant diffraction, geometric-optical
methods are inadmissible.
Effect of beam tilt and wavefront curvature

Figure 33.5 shows cross-sectional plots of intensity, log intensity, and phase
for a k ¼ 1.55 lm Gaussian beam arriving at the entrance pupil of the GRIN
lens of Figure 33.3. The incident beam’s FWHM and full-aperture diameters
are DFWHM ¼ 0.639 mm and D ¼ 2.17 mm, respectively. The phase plot in
Figure 33.5(c) contains 2k of linear distortion (corresponding to 0.164 of tilt),
and 3k of Seidel curvature (corresponding to a radius of curvature Rc 127
mm). After tracing the incident rays to the destination plane (located 2.0 mm
away from the exit facet of the GRIN lens, within a homogeneous medium
having n ¼ 1.5) we obtain the plots of intensity, log intensity, and phase dis-
played in Figure 33.6. The emerging beam at the destination plane is divergent,
and its curvature phase-factor (Rc ¼ 2.046 mm) has been subtracted from
the phase plot in Figure 33.6(c). For the full aperture of the incident beam
(D ¼ 2.17 mm), the emergent beam diameter of 0.86 mm at the destination
plane represents a divergence cone angle h ¼ 24.3 , yielding an effective
numerical aperture NA ¼ nsin(h/2) ¼ 0.32.
a b c
–1.16 x (mm) 1.16 –1.16 x (mm) 1.16 –1.16 x (mm) 1.16
Figure 33.5 Plots of (a) intensity, (b) log intensity, and (c) phase of a
k ¼ 1.55 lm Gaussian beam arriving at the entrance facet of the GRIN lens.
The beam has FWHM diameter DFWHM ¼ 0.64 mm, full aperture diameter
D ¼ 2.17 mm, 2k of linear distortion (i.e., 0.16 of tilt), and 3k of Seidel
curvature (i.e., Rc 127 mm).
a b c
–517 x (mm) 517 –517 x (mm) 517 –517 x (mm) 517
Figure 33.6 Plots of (a) intensity, (b) log intensity, and (c) phase of the
emergent beam at the destination plane, located 2.0 mm beyond the exit facet of
the GRIN lens of Figure 33.3. Since the beam is highly divergent at this point, its
curvature phase-factor (Rc ¼ 2.046 mm) has been subtracted from the phase plot.
a b c
–25.8 x (␮m) 25.8 –25.8 x (␮m) 25.8 –25.8 x (␮m) 25.8
Figure 33.7 Plots of (a) intensity, (b) log intensity, and (c) phase of the focused
spot at the rear facet of the GRIN lens of Figure 33.3. To compute these distri-
butions, the beam displayed in Figure 33.6 has been back propagated a distance of
2.0 mm (i.e., from the destination plane to the rear facet of the GRIN lens).
When the light amplitude distribution of Figure 33.6 is back-propagated (from

the destination plane to the rear facet of the GRIN lens), one obtains the focused
spot distribution shown in Figure 33.7. These cross-sectional plots show intensity,
log intensity, and phase of the focused beam at the rear facet of the GRIN lens.
The 9.2 lm shift of the beam center away from the center of coordinates is a
consequence of the 0.164 tilt of the incident beam. Also, the 3k curvature of
the incident beam is seen to have resulted in a substantial enlargement of the
focused spot.
Effect of beam size and astigmatism

Figure 33.8 shows plots of intensity (left column), log intensity (middle col-
umn), and phase (right column) at the rear facet of the GRIN lens of Figure 33.3
a b c
d e f
g h i
–25.8 x (␮m) 25.8 –25.8 x (␮m) 25.8 –25.8 x (␮m) 25.8
Figure 33.8 Plots of intensity (left), log intensity (middle), and phase (right)
at the rear facet of the GRIN lens of Figure 33.3. Top row: incident beam
diameter DFWHM ¼ 1.37 mm, D ¼ 3.0 mm, no aberrations other than the
spherical aberration and 105 lm of defocus introduced by the lens itself.
Middle row: incident beam diameter DFWHM ¼ 0.365 mm, D ¼ 2.17 mm, no
aberrations. Bottom row: same as the middle row, except for the presence of
4k of Seidel astigmatism (i.e., cylinder) on the incident wavefront.
under three different conditions. In the first row of Figure 33.8 the assumed
incident Gaussian beam is fairly large, having DFWHM ¼ 1.37 mm, full aperture
D ¼ 3.0 mm, and no wavefront aberrations. The focused spot, however, is
affected by the spherical aberration of the lens and by nearly –105 lm of
defocus, both of which are consequences of the wide aperture of the incident
beam. (The GRIN’s parabolic index profile is not optimum for diffraction-
limited focusing at large aperture, nor is the selected length of the lens
appropriate for wide-aperture applications.) The large NA of the lens is
responsible for the poor coupling efficiency into the fiber obtained in this
case (g 27%).
The second row of Figure 33.8 shows profiles of the focused spot for a smaller
incident beam, having DFWHM ¼ 0.365 mm, D ¼ 2.17 mm, and no aberrations. This
focused spot is well matched to the fiber’s mode profile, yielding a large coupling
efficiency (g 99%). Finally, the third row of Figure 33.8 shows the focused
spot profile computed for the same incident beam as above (DFWHM ¼ 0.365 mm,
D ¼ 2.17 mm) to which 4k of Seidel astigmatism (i.e., wavefront cylinder) has been
added. Astigmatism reduces the computed coupling efficiency to g 69%.
Tolerance for beam decenter, tilt, and defocus

We computed the coupling efficiency (into a single-mode fiber) of the GRIN lens
of Figure 33.1 for an incident Gaussian beam as function of the beam diameter
DFWHM. From the resulting plot the optimum beam size that yielded the largest
possible g was identified. Subsequently, we studied tolerances of the lens (for the
optimum beam size) by computing g as function of the incident beam decenter,
tilt, and wavefront curvature (i.e., defocus). These results demonstrate the sen-
sitivity of the lens-fiber combination to alignment errors.
Figure 33.9 shows the various performance curves of the GRIN lens of
Figure 33.1. Shown in Figure 33.9(a) is a plot of g versus DFWHM; the optimum
beam diameter is 365 lm. The remaining frames in Figure 33.9 are computed
at this optimum beam size. Figure 33.9(b) shows the sensitivity of g to beam
decenter. Note that a decenter of about 250 lm is sufficient to reduce g by about
50%. The plot of g versus beam tilt in Figure 33.9(c) shows that a 0.14 tilt can
reduce g more than tenfold. Finally, Figure 33.9(d) shows that a few waves of
Seidel curvature (i.e., defocus) can substantially reduce the efficiency of
coupling into the fiber.
Plano-aspheric lens
Another design for a lens that launches a collimated beam into a single-mode
fiber is the plano-aspheric aplanat depicted in Figure 33.10. This lens is designed
to bring a k ¼ 1.55 lm beam to diffraction-limited focus at its rear facet (a plane
facet to which the fiber is attached). The lens has diameter ¼ 3.0 mm, length
L ¼ 5.8826 mm, and refractive index n ¼ 1.673 286. The asphere parameters are:
Rc ¼ 2.367 mm, K ¼ 0.667 23, A4 ¼ 2.911 25 · 103, A6 ¼ 2.522 86 · 104, and
A8 ¼ 2.930 78 · 105. Figure 33.11(a) shows the dependence of g on incident
beam diameter. Clearly, maximum efficiency is achieved with DFWHM 411 lm.
Figure 33.11(b) shows the dependence of g on beam decenter, when the incident
beam diameter is fixed at its optimum value of 411 lm. Similarly, sensitivity
to tilt for the optimum beam size is shown in Figure 33.11(c), and the
1.0 1.0
GRIN Lens DFWHM = 365 ␮m
f = 3 mm Full-aperture D = 2.17 mm
0.8 L = 7.89 mm 0.8
= 1.55 ␮m
Coupling Efficiency h
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 300 600 900 1200 1500 1800 0 100 200 300 400 500 600
FWHM beam diametrer (␮m) Decenter (␮m)
1.0 1.0
DFWHM = 365 ␮m
Full-aperture D = 2.17 mm
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2 DFWHM = 365 ␮m

0.0 0.0
0.00 0.05 0.10 0.15 0.20 0.25 –4.5 –3.0 –1.5 0.0 0.5 3.0 4.5
Tilt angle (degrees) Seidel curvature ()
Figure 33.9 Characteristics of the GRIN lens of Figure 33.3, computed for a
k ¼ 1.55 lm incident Gaussian beam. (a) Dependence of the coupling efficiency
g on the FWHM diameter of the incident beam; optimum diameter is 365 lm.
(b) Dependence of g on incident beam decenter relative to the optic axis. (c)
Variation of g with incident beam tilt. (d) Effect on g of Seidel curvature (i.e.,
defocus); the horizontal axis depicts the departure of the wavefront at the edge of
the beam, where the assumed beam’s full-aperture diameter is D ¼ 2.17 mm. In
(b), (c), and (d) the incident beam has DFWHM ¼ 365 lm.
dependence of g on Seidel curvature is shown in Figure 33.11(d). A comparison

of Figure 33.9 with Figure 33.11 shows that the GRIN lens is nearly as good as
the plano-aspheric lens, at least as far as the particular alignment tolerances
studied here are concerned.
Aspheric
surface
Fiber
Incident beam
(collimated)
Figure 33.10 Plano-aspheric lens, having diameter ¼ 3.0 mm, length L

¼ 5.88 mm, and refractive index n ¼ 1.673 286. The single-mode fiber is
attached to the rear facet of the lens.
1.0 1.0
Plano-aspheric Lens DFWHM = 411 μm
f = 3.0 mm Full-aperture D = 2.17 mm
0.8 L = 5.88 mm 0.8

= 1.55 μm
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 400 800 1200 1600 0 150 300 450 600
FWHM beam diameter (μm) Decenter (μm)
1.0 1.0
DFWHM = 411 μm
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2 DFWHM = 411 μm

0.0 0.0
0.00 0.05 0.10 0.15 0.20 0.25 –4.5 –3.0 –1.5 0.0 1.5 3.0 4.5
Figure 33.11 Characteristics of the plano-aspheric lens of Figure 33.10,

computed for a k ¼ 1.55 lm incident Gaussian beam. (a) Dependence of g on
the FWHM diameter of the incident beam; optimum diameter is 411 lm.
(b) Dependence of g on incident beam decenter relative to the optic axis.
(c) Variation of g with incident beam tilt. (d) Effect on g of Seidel curvature
(i.e., defocus); the horizontal axis depicts the departure of the wavefront at the
edge of the beam, where the assumed beam’s full-aperture diameter is
D ¼ 2.17 mm. In (b), (c), and (d) the incident beam has DFWHM ¼ 411 lm.
Plano-convex lens made of GradiumTM glass

Lenses made from Gradium glass have a refractive index gradient along their
optic axis. Although this type of index gradient does not by itself produce
focusing power, it has the ability to correct aberrations and field curvature
introduced by curved surfaces.
(a) Plano-convex
lens
Fiber
Incident beam
(collimated)
(b)
⌬z Gradium
L
glass
␾ Z
zmax
(c)
1.80
GRADIUMTMG14SF
Refractive Index, n(z)
1.76
1.72
1.68
1.64
= 1.55 ␮m
1.60
0 1 2 3 4 5 6
z (mm)
Figure 33.12 (a) Plano-convex lens made of Gradium glass is used to focus
a collimated beam of light into a single-mode fiber. The front facet of the lens is
spherical, having Rc ¼ 3.715 mm, and the focal point is 3.535 mm beyond the
rear (plane) facet of the lens. (b) The lens is fabricated by polishing into
spherical shape one end of a cylindrical rod cut from a slab of Gradium glass.
(c) Index profile of G14SF Gradium glass at k ¼ 1.55 lm. The refractive index
is highest at the front vertex, decreasing nonlinearly with z as one moves toward
the plane facet of the lens. When the same lens is made of homogeneous glass of
refractive index n ¼ 1.7, the focal point shifts by about 30 lm to the right, and
the spherical aberration increases slightly.
1.0 1.0
GRADIUM Lens DFWHM = 593 μm
f = 2.6 mm Full-aperture D = 2.17 mm
0.8 L = 2.9 mm 0.8
Coupling Efficiency = 1.55 μm
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 300 600 900 1200 1500 1800 0 200 400 600 800
FWHM beam diameter (μm) Decenter (μm)
1.0 1.0
DFWHM = 593 μm
0.8 0.8
Coupling Efficiency
0.6 0.6
0.4 0.4
0.2 0.2
DFWHM = 593 μm
0.0 0.0
0.00 0.05 0.10 0.15 0.20 0.25 –4.5 –3.0 –1.5 0.0 1.5 3.0 4.5
Figure 33.13 Characteristics of the plano-convex Gradium lens of Figure

33.12, computed for a k ¼ 1.55 lm incident Gaussian beam. (a) Dependence of g
on the FWHM diameter of the incident beam; optimum diameter is 593 lm.
(b) Dependence of g on incident beam decenter relative to the optic axis.
(c) Variation of g with incident beam tilt. (d) Effect on g of Seidel curvature (i.e.,
defocus); the horizontal axis depicts the departure of the wavefront at the edge of
the beam, where the assumed beam’s full-aperture diameter is D ¼ 2.17 mm. In
(b), (c), and (d) the incident beam has DFWHM ¼ 593 lm.
A plano-convex lens made of Gradium glass for focusing a collimated

beam into a fiber is shown in Figure 33.12. The various lens parameters are:
Rc ¼ 3.715 mm, ¼ 2.6 mm, L ¼ 2.9 mm, Gradium glass ¼ G14SF, zmax ¼ 5.8
mm, and Dz ¼ 2.85 mm. The focal point is a distance of 3.535 mm beyond the
plane facet of the lens. For this lens, the dependence of coupling efficiency g on
beam size as well as its sensitivity to misalignment are shown in Figure 33.13. The
optimum g is obtained for a beam diameter DFWHM 593 lm. For this beam size
the amount of decenter that results in 50% reduction of g is 420 lm, and the beam
tilt that causes a 50% drop in g is 0.045 . The dependence of g on wavefront
curvature may be seen in Figure 33.13(d).
We also computed the various performance curves for a plano-convex lens
similar to that depicted in Figure 33.12, but made of homogeneous glass (n ¼ 1.7)
instead of the Gradium material. The focal point of this homogeneous plano-
convex was found to be 3.565 mm beyond the plane facet of the lens, namely,
30 lm greater than that of the Gradium lens. Once again, the optimum beam
size was found to be DFWHM 593 lm; the other characteristics of the lens were
also very similar to those of the Gradium lens. Apparently, the use of
Gradium glass in this particular application does not result in any substantial
improvements.

1 H. Kogelnik, Coupling and conversion coefficients for optical modes, Proc. Symposium
on Quasi-Optics, series VR14, Polytechnic Press, Brooklyn, N.Y. 333–347, 1964.
2 W. L. Emkey, Optical coupling between single-mode semiconductor lasers and strip
waveguides, J. Lightwave Technology LT-1, No. 2, 436–443, 1983.
3 D. T. Neilson, Tolerance of optical interconnections to misalignment, Applied Optics
38, 2282–2286, 1999.
34
The optics of semiconductor diode lasers†
Robert N. Hall, born in New Haven, Connecticut in 1919, joined General

Electric’s Research and Development Center after graduating from the California
Institute of Technology. In 1962, having realized that a semiconductor junction
could support population inversion, Hall built the first semiconductor injection
laser. This device, based on a specially designed p-n junction, operated when an
electric current injected the electrons directly into the junction, thus allowing for
highly efficient generation of coherent light from a compact source. Today, diode
lasers based on Hall’s original idea are used, among other places, in CD and DVD
players, laser printers, and fiber-optic communication systems.1
In this chapter we describe the basic features of the beam of light emitted by a
diode laser, and discuss methods to analyze and manipulate this beam. Colli-
mation and beam-shaping with a pair of cylindrical lenses will be shown to be a
simple and flexible method that may be applied not only to diode lasers but also
to beams emerging from optical fibers.
Characteristics of diode lasers

A semiconductor diode laser shown schematically in Figure 34.1 consists of a
gain layer (only a few ten nanometers thick), surrounded by guiding layers for
confining the laser mode. The guiding layers’ index of refraction is somewhat
greater than that of the surrounding regions (substrate and cladding), thus per-
mitting confinement by total internal reflection. The electrical current is injected
through the positive electrode, a metallic stripe several microns wide, and col-
lected at the base-plate on the opposite side of the junction (ground electrode).
The population inversion and optical gain are strongest beneath the positive
†
This chapter is co-authored with Ewan M. Wright, Professor of Optical Sciences at the University of
Arizona.
489
Positive Electrode
Cladding
Guiding
Layers
Substrate Gain Medium
Ground
Electrode
Figure 34.1 A semiconductor diode laser consisting of an active layer sur-

rounded by guiding layers for confinement of the laser mode. The electrical
current, injected through the positive electrode, is collected on the opposite side
of the junction by the ground electrode.
electrode, tapering off laterally with an increasing distance from the electrode’s
center line along Z. In gain-guided lasers, this tapering off of the gain is
responsible for lateral beam confinement. (By contrast, in index-guided lasers the
regions adjacent to the guiding stripe are selectively etched away, then replaced
by a lower-index cladding material.) In general, the gain layer is highly
absorptive in regions that are not directly underneath the electrode and, therefore,
experience weak pumping or no pumping at all. The guiding layers are essentially
transparent, except for losses due to scattering at impurities and at the interfaces.
The substrate and the cladding are also highly transparent.
Figure 34.2 shows plots of intensity and phase at the front facet of a single-
transverse-mode diode laser (k0 ¼ 980 nm). The assumed beam divergence angles
(full-width-at-half-maximum intensity or FWHM) are hk ¼ 7 in the plane of the
junction and h? ¼ 35 perpendicular to the junction. In the top row of the figure,
(a), (b), where the assumed beam has no astigmatism, the phase distribution at the
laser’s front facet is uniform. In the middle row, (c), (d), the astigmatic distance
(defined as an equivalent distance in free space between horizontal and vertical
beam waists) is Dz ¼ 10 lm, resulting in a slightly wider beam along the X-axis,
and a divergent phase front whose peak-to-valley variation (i.e., from the edge to
the center of the beam) is 120 . In the bottom row, (e), (f), the assumed
astigmatism is Dz ¼ 25 lm. Again the beam is broader (in the horizontal dir-
ection) than the one without astigmatism, and the phase distribution exhibits a
peak-to-valley variation of 190 .
a b
c d
e f
–10 x (μm) +10 –10 x (μm) +10
Figure 34.2 Plots of the logarithm of intensity (left) and phase (right) at the
front facet of a k0 ¼ 980 nm diode laser having hk ¼ 7 , h? ¼ 35 . The range
of variations of intensity between the maximum (red) and minimum (blue) is
Imax : Imin ¼ 104. In (a), (b) the beam has no astigmatism. In (c), (d) the
astigmatic distance (in free space) between the horizontal and vertical beam
waists is Dz ¼ 10 lm, resulting in a wider beam along X, and a divergent
phase pattern whose variation from the center (blue) to the edge (red) is
120 . In (e), (f), where the assumed astigmatism is Dz ¼ 25 lm, the beam
is further broadened and the phase distribution exhibits a peak-to-valley
variation of 190 .
The elliptical cross-section of the beam emerging at the front facet of the laser
is responsible (through diffraction) for hk being much smaller than h?. The cause
of astigmatism is the non-uniform gain profile (along the X-axis) within the active
region of the laser. As the gain is strongest near the cavity’s central axis, the
beam, while propagating in the cavity along Z, experiences a “gain focusing”
effect toward this axis – a direct consequence of stronger amplification on-axis
than in the wings.2 Consequently, a divergent phase profile automatically evolves
for countering this tendency of the beam to collapse to the center. We will have
more to say about this property in the following section.
Another interesting property of a diode laser beam is its polarization state,
which is typically linear, having the E-field parallel to the plane of the junction.
This property may be traced back to the fact that, for light polarized parallel to the
junction (i.e., Ek) the gain is somewhat greater than that for perpendicularly
polarized light (hereinafter E?). The guided mode associated with E? is slightly
broader in the Y-direction than the mode associated with Ek. Since a broad mode
has less overlap with the gain layer than a more compact mode, it stays behind
while the compact mode surpasses the threshold and begins to lase. Moreover,
confinement of electrons and holes to a thin (quantum well) active layer makes
it easier for Ek (relative to E?) to stimulate the excited electrons and holes into
surrendering their photons and returning to the ground state. In practice a
combination of both effects is responsible for promoting the selection of Ek
polarization over E?.
Origin of diode laser astigmatism

The non-uniformity of the gain profile along X has a focusing effect on the guided
mode that is countered automatically by a divergent phase front imposed on the
beam as it propagates along the Z-axis of the cavity. An easy demonstration is
provided by studying the propagation of a beam of light through a gain medium
using the Beam Propagation Method (BPM); see Chapter 32. Figure 34.3(a)
shows the distribution of gain (red) and loss (blue) in the cross-sectional plane of
a typical diode laser. The gain is significant only in the middle section of the gain
layer, dropping off in a Gaussian fashion along the X-axis and becoming a loss in
the regions remote from the central axis, Z. During propagation along the cavity’s
Z-axis, the phase profile imparted to the beam’s cross-section is similar to that
shown in Figure 34.3(b). Here the high-index guiding layers (orange) advance the
phase relative to the lower-index substrate and cladding (blue). The gain medium
(red) with its slightly higher index advances the phase even more than the guiding
layers do, but the gain layer is thin, and its contribution to mode confinement
along the Y-axis is fairly insignificant.
Together, the cross-sectional intensity and phase distributions shown in
Figures 34.3(a), (b) define the profile of an amplitude–phase mask that can be
used in a BPM simulation of a diode laser. This particular mask is placed at
intervals of Dz ¼ 0.1k between propagation steps in a medium of refractive index
n0 ¼ 3.3. (The wavelength k in this environment is k0/n0, where k0 is the free-
space wavelength of the laser beam.) For a uniform beam incident on the mask,
the transmitted intensity and phase distributions appear in Figures 34.3(a), (b),
respectively. The assumed gain medium is a 0.25k-thick layer sandwiched
between two 0.5k-thick layers of slightly lower refractive index, which constitute
the guiding slab. The amplitude gain at the center of the active layer is 1.025 (per
0.1k of propagation), tapering off in a Gaussian fashion along the X-axis while
remaining uniform in the Y-direction. The background loss of the medium outside
the active layer is small (mask amplitude transmission ¼ 0.995), but within the
gain layer and far from the central axis, the light amplitude is attenuated by a
factor of 0.95 (per 0.1k of propagation). In Figure 34.3(b), the phase of the mask
is 6.12 within the active layer, 5.4 in the two adjacent (guiding) layers, and 0
in the substrate and cladding regions. This means, for example, that if the index
+1.5
a b
y/
–1.5
–20 x/ 20
+1.5 c
y/
–1.5
–20 x/ 20
Figure 34.3 Profiles of two amplitude–phase masks used in BPM simulations

of a diode laser beam. (a), (b) represent the amplitude and phase profiles for
Mask 1, (a), (c) the corresponding profiles for Mask 2. The gain medium is a
0.25k-thick layer sandwiched between two 0.5k-thick guiding layers.
(a) Transmitted intensity distribution for a uniform incident beam. The ampli-
tude gain at the center is 1.025, tapering off along X to a (lossy) value of 0.95.
Outside the active layer, the 0.995 amplitude transmissivity represents a weak
background loss. (b) The phase of Mask 1 is 6.12 within the active layer, 5.4 in
the guiding layers, and 0 in the substrate and cladding. (c) The phase of Mask 2
(having the same amplitude profile as in (a)) is 4.5 at the center of the active
layer, 6.12 within the active layer far from the axis, 5.4 in the guiding layers,
and 0 in the substrate and cladding.
of the substrate and cladding materials is n0 ¼ 3.3, then the guiding layers have
index n1 ¼ 3.45 and the active medium has index n2 ¼ 3.47.
In practice, pumping the gain medium causes a decline in its local index, so
that a more realistic phase mask would be similar to that shown in Figure 34.3(c),
where the phase at the center of the active region has dropped to 4.5 (corres-
ponding to n2 ¼ 3.425). From this minimum, the phase increases in a Gaussian
fashion along the X-axis, reaching the value of 6.12 in the highly absorptive
regions of the active layer. This results in index anti-guiding along the X-axis
(due to index inhomogeneity within the active layer). The Gaussian phase profile
inside the active layer imposes on the laser beam a divergence (along X ) above
and beyond that imposed by the gain profile alone.3,4 The rest of the mask is
identical to that in Figure 34.3(b). In what follows, we will first show results
of BPM simulations obtained with the amplitude-phase Mask 1, depicted in
Figures 34.3(a), (b), confirming that the gain profile alone can give rise to
astigmatism. We then show results of simulations obtained with the amplitude–
phase Mask 2, shown in Figures 34.3(a), (c), which reveal that the index “anti-
guiding” of the gain medium (caused by population inversion) can further
enhance the induced astigmatism.
Figure 34.4, top row, shows plots of (a) intensity, (b) logarithm of intensity, and
(c) phase after 600 steps of BPM using Mask 1. Since each step corresponds to a
propagation distance of 0.1k inside a medium of refractive index n0 ¼ 3.3, the total
propagation distance in this simulation is 18 lm. The light is seen to be well
confined to the guiding layers, with only a small fraction leaking (i.e., evanescent)
into the substrate and cladding. The light that escapes the guiding layers is
eventually lost by scattering or diffraction out of the system. In Figure 34.4(c), the
peak-to-valley phase variation along X is 160 , corresponding to a divergent beam
with a few microns of astigmatism. The bottom row in Figure 34.4 is similar to the
top row, except that it is obtained with Mask 2. The beam is somewhat broader
along the X-axis when compared to that obtained without an index anti-guiding of
the gain medium. Also, the peak-to-valley phase variation in Figure 34.4(f) is 175
(along X), corresponding to a divergent beam with somewhat more astigmatism
than the one depicted in Figure 34.4(c).
Figure 34.4(g) shows plots of the power content of the beam versus propa-
gation distance z, as obtained in the above simulations. At first, the power
decreases as the initial beam adjusts itself to the guiding structure, shedding
excess light that does not match the guided mode profile. The gain medium then
takes over and raises the power content exponentially, as the confined mode
propagates along the optical axis.
Shearing interferometry
The beam of a single-transverse-mode diode laser may be captured and colli-
mated by an aberration-free lens, then analyzed using a shear-plate interfero-
meter, as shown in Figure 34.5. The shear plate creates two identical copies of
the collimated beam shifted relative to each other along the X- and/or Y-axes.
Superposition of these two copies of the same beam at the observation plane
creates an interferogram that reveals the phase structure of the (collimated) beam.
(a) (b) (c)
(d) (e) (f)
–8 x/ +8 –8 x/ +8 –8 x/ +8
3.5 (g)
3.0
2.5
Optical Power
2.0 Mask 1
1.5
1.0
Mask 2
0.5
0 3 6 9 12 15 18
Propagation Distance (␮m)
Figure 34.4 Plots of (a) intensity, (b) logarithm of intensity, and (c) phase after
600 BPM steps. The amplitude–phase Mask 1 used in these simulations is
depicted in Figures 34.3(a), (b). The light is seen to be confined to the guiding
layers, with a weak evanescent tail leaking into the substrate and cladding. In (c)
the peak-to-valley phase variation along X is 160 . The bottom row (d)–(f) is
similar to the top row (a)–(c), except that it is obtained with Mask 2 depicted in
Figures 34.3(a), (c). In (f) the peak-to-valley phase variation along X is 175 . (g)
Power content of the beam versus propagation distance in the BPM simulations.
Any phase non-uniformities at the exit pupil of the collimator show up as

intensity variations (i.e., fringes) in the interferogram.
For a 7 · 35 beam emerging from a k0 ¼ 980 nm diode laser, Figure 34.6 (left
column) shows plots of intensity (top) and phase in a plane located 10 mm past
the exit pupil of a 0.6 NA collimator lens. The lens is at a distance of f ¼ 4.9 mm
from the mid-point between the two waists of the laser beam, and the displayed
phase patterns correspond to Dz ¼ 10, 20, 30 lm of astigmatism. For a fixed shear
of Dx ¼ 0.7 mm (horizontal) and Dy ¼ 2.0 mm (vertical), the right column in
Figure 34.6 shows the observed interference patterns in the viewing window
of the shear plate; from top to bottom, the assumed astigmatism of the laser is
Dz ¼ 0, 10, 20, 30 lm.
Collimator
Y Interferogram
Diode
Laser
Shear Plate
Figure 34.5 A single-transverse-mode beam from a diode laser is captured and

collimated by an aberration-free lens, then analyzed with a shear-plate inter-
ferometer. Any phase variations in the cross-section of the beam show up as
fringes in the shearing interferogram.
Beam collimation using a cylindrical lens pair

A diode laser’s beam may be collimated by a pair of cylindrical lenses, as shown
in Figure 34.7. In this scheme the first lens has the responsibility of collimating
the beam along its fast divergence axis, while the second lens arrests the
expansion of the beam along its slow axis. When a divergence angle is large, a
gradient index (GRIN) cylindrical lens provides more collimation power as well
as better correction for residual aberrations.
As a concrete example, consider a single-transverse-mode beam having
k0 ¼ 980 nm, hk ¼ 7 , h? ¼ 35 , and astigmatism Dz ¼ 0. The first lens is a
5 mm-long cylindrical rod of radius r0 ¼ 1.5 mm, made of GRIN material
having n(r) ¼ 1.59[10.044 55(r/r0)2]; the clear aperture diameters of the lens
are Dx ¼ 5.0 mm, Dy ¼ 1.2 mm. The distance between the front facet of the
laser and the first surface of this lens is 0.348 mm. The second lens, a plano-
cylinder made of homogeneous glass of index n ¼ 1.65, is separated by
0.397 mm from the first lens; its thickness (along the optical axis) is 3.2 mm, it
is 5 mm long, has a 3 mm radius of curvature, and its clear aperture diameter is
1.5 mm. All lens surfaces are anti-reflection coated.
Figure 34.8 shows computed plots of intensity and phase at an observation
plane located 0.3 mm beyond the plano-cylindrical lens of Figure 34.7. The
fraction of the optical power captured by the lens pair is nearly 0.8, and the r.m.s.
wavefront aberration at the observation plane is 0.19k0. The same lens pair
–3.1 x (mm) +3.1 –3.1 x (mm) +3.1
Figure 34.6 Left column: plots of intensity (top) and phase in a plane
located 10 mm beyond the exit pupil of the collimator of Figure 34.5. The
0.6 NA lens is one focal length (f ¼ 4.9 mm) away from the mid-point
between the two waists of the laser beam (k0 ¼ 980 nm, hk ¼ 7 , h?¼ 35 ).
Right column: intensity patterns at the viewing window of the shear plate
(Dx ¼ 0.7 mm, Dy ¼ 2.0 mm). From top to bottom, the assumed astigmatism
of the laser is 0, 10, 20, 30 lm.
Diode Laser
Cylindrical Lens
(radial GRIN)
Plano-cylindrical Lens
(homogeneous glass)
Figure 34.7 Collimation of a diode laser beam by a pair of cylindrical lenses.

The first lens collimates the beam along the fast axis, while the second lens
arrests the expansion of the beam along the slow axis.
a b
–750 x (μm) +750 –750 x (μm) +750
Figure 34.8 Plots of (a) intensity, (b) phase of a laser beam (k0 ¼ 980 nm,
hk ¼ 7 , h? ¼ 35 , astigmatism Dz ¼ 0) upon emerging from the lens pair of
Figure 34.7. For the particular lenses chosen in this simulation, the optical power
throughput is 80%, the r.m.s. wavefront aberration is 0.19k, and the peak-to-
valley phase variation across the aperture is 280 .
(with a slight adjustment of the separation between its two elements) may be used
for collimation in the presence of astigmatism on the laser beam, without any
degradation of the wavefront quality.
By allowing the slow axis of the beam to propagate further before being
collimated, the cylindrical lens pair enables one to adjust the degree of ellipticity
of the beam’s cross-section. Of course, the requisite physical parameters of the

second lens depend on the desired minor-to-major-axis ratio of the collimated
beam, but, in principle, any degree of ellipticity can be achieved. Thus, the
cylindrical lens pair not only collimates the divergent beam of a diode laser, but it
also allows shaping (in particular, circularization) of the beam’s cross-section.
Anamorphic magnification and beam compression

Figure 34.9 shows an aberration-free lens collimating a diode laser’s beam,
followed by a pair of anamorphic prisms that expand the beam along the X-axis.
This collimated and anamorphically magnified beam is subsequently focused by
an aberration-free lens identical with the one used initially for beam collimation.
Because the laser beam’s divergence angles parallel and perpendicular to the
plane of the junction are widely different (i.e., hk << h?), the collimated beam’s
diameter along X is typically much less than that along Y. Expanding the beam
along X until it fills the entrance pupil of the focusing lens enables one to obtain a
focused spot substantially smaller (in one dimension) than the bright spot
appearing at the front facet of the laser.
Figure 34.10 shows computed plots of intensity and phase at several cross-
sections of the system of Figure 34.9. The assumed parameters of the laser are
k0 ¼ 980 nm, hk ¼ 7 , h? ¼ 35 , astigmatism Dz ¼ 0. Both the collimator and the
focusing lens have NA¼0.6, f ¼4.9 mm, and the prism pair’s magnification factor
M ¼ 5.5 (along X ) is sufficient to circularize the beam’s cross-section. The top
Anamorphic
Prism Pair Focusing Lens
X
Diode Z
Laser
Collimator
Figure 34.9 A diode laser’s beam is collimated, then shaped by a prism pair
that expands the beam’s diameter along the X-axis. The collimated and ana-
morphically magnified beam is subsequently focused by an aberration-free lens
identical to the one used for collimation.
(a) (b)
–26 x (μm) +26 –26 x (μm) +26

(c) (d)
–3.1 x (mm) +3.1 –3.1 x (mm) +3.1
(e) (f)
–3.1 x (mm) +3.1 –3.1 x (mm) +3.1

(g) (h)
–8 x (μm) +8 –8 x (μm) +8
Figure 34.10 Distributions of the logarithm of intensity (left) and phase (right)
at several cross-sections of the system of Figure 34.9. The lenses have NA ¼ 0.6,
f ¼ 4.9 mm. The prisms are made of n ¼ 1.72 glass, and have an apex angle of
69 . (a), (b) Front facet of the laser. (c), (d) Exit pupil of the collimator, just
before entering the prism pair. (e), (f) Emerging from the second prism. (g), (h)
Focal plane of the focusing lens.
row in Figure 34.10 shows the beam at the front facet of the laser. The second
row shows that before entering the prisms the beam has an elliptical cross-
section with an aspect ratio of 5.5. Emerging from the prism pair (third row)
the beam is circularized. The bottom row shows the focused spot at the focal
plane of the focusing lens; this compressed image of the bright elliptical spot at
the front facet of the laser has circular symmetry and a much reduced diameter
along the X-axis.
(a) (b)
–26 x (μm) +26 –26 x (μm) +26

(c) (d)
–3.1 x (mm) +3.1 –3.1 x (mm) +3.1

(e) (f)
–3.1 x (mm) +3.1 –3.1 x (mm) +3.1

(g) (h)
–8 x (μm) +8 –8 x (μm) +8
Figure 34.11 Same as Figure 34.10 but with the diode laser shifted 20 lm to
the left along the X-axis. Comparing (d) with (f), note that the tilt angle of the
collimated beam is reduced substantially after going through the prism pair.
Thus the image of the laser beam, in addition to being circularized, moves closer
to the optical axis at x ¼ 0, as shown in (g), (h).
Figure 34.11 is similar to Figure 34.10, except for the position of the diode
laser along the X-axis, which is shifted to x ¼ 20 lm. Collimation and ana-
morphic magnification work as before, but the beam emerging from the colli-
mator is tilted by about 0.23 away from the Z-axis. The prism pair magnifies
the beam along X by 5.5, but it also reduces the tilt of the beam by the same
factor. The net result is that the image of the bright elliptical spot at the front
facet of the laser, in addition to being compressed in size, is brought closer to
the optical axis at x ¼ þ3.6 lm. This is an important result which may be
applied, for instance, to compressing incoherent laser beams. A typical

high-power, multi-transverse-mode diode laser may have the same divergence
angles as above (i.e., 7 · 35 ), but its bright area at the front facet is much
larger, say 1 · 50 lm2. The radiation profile of such lasers may be considered
(approximately) to consist of a number of mutually incoherent filaments, each
similar to the beam of a coherent (i.e., single-transverse-mode) diode laser.
If, therefore, the central filament of an incoherent laser is identified with the
coherent beam depicted in Figure 34.10, then the marginal filament will be
represented by Figure 34.11. The system of Figure 34.9 can thus collimate
the incoherent laser’s various filaments (simultaneously and independently
of each other), perform anamorphic magnification on each and every one of
them, then create (at the focal plane of the last lens) a string of closely packed
focused spots along the X-axis. In doing so the system of Figure 34.9 creates a
compressed image of the elongated bright spot at the front facet of the inco-
herent laser. (In the present example the image size will be 1 · 10 lm2.) The
achievable compression is roughly equal to the ratio h? /hk of the divergence
angles.
Cylindrical lenses for collimation and beam-shaping

in fiber optics systems
A pair of cylindrical lenses (similar to those shown in Figure 34.7) can be used
in similar fashion to capture the beam emerging from an optical fiber. For
example, a pair of GRIN cylinders can collimate and anamorphically magnify
the beam emerging from a single-mode silica glass fiber. Unlike the beam from a
diode laser, the emergent beam of a fiber is usually unpolarized, making cylindrical
lenses superior to prism pairs in applications that demand anamorphic magnifica-
tion with a low level of polarization-dependent loss.
Consider a cylindrical and a plano-cylindrical lens oriented at right
angles to each other, as in Figure 34.7. The GRIN profile of both lenses is
n(r) ¼ 1.59[1–0.044 55(r/r0)2], r being the radial distance from the cylinder
axis, and r0 the cylinder radius. The 1/e (amplitude) diameter of the single-
mode Gaussian beam emerging from the fiber is 10 lm (k0 ¼ 1.544 lm). The
distance from the fiber facet to the first lens is 0.501 mm, the radius of the
first lens is 2 mm, separation between the lenses is 3.525 mm, and the radius
of the second lens is 6.7 mm, with its center of curvature located on the plane
facet of the lens. The cylinder lengths should be large enough to accommo-
date the beam, but are otherwise arbitrary. All surfaces are anti-reflection
coated.
(a)
(b)
(c)
–1.65 x (mm) +1.65
Figure 34.12 Plots of (a) intensity, (b) logarithm of intensity, (c) phase at
the exit pupil of a cylindrical lens pair used as collimator and anamorphic
magnifier for the beam emerging from a single-mode fiber (k0 ¼ 1.544 lm).
A 0.775 · 3.1 mm2 elliptical aperture is placed at the exit pupil to clip the
edges of the beam. The overall transmission of the system is 95.5%, and the
ratio of the FWHM beam diameters is 4.1. The peak-to-valley phase vari-
ation in (c) is 56 .
Figure 34.12 shows computed plots of intensity, logarithm of intensity, and

phase at the exit pupil of the lens pair. The first lens arrests the spread of the beam
in the vertical direction, while the beam continues to spread in the horizontal
direction (and become elongated) until its capture by the second lens. The overall
transmission of the system is 95.5%, and the ratio of FWHM beam diameters at
the exit pupil is 4.1. (The anamorphic magnification factor can be further
increased if the first lens is made proportionately smaller or the second lens
larger.) The phase plot shows a small amount of astigmatism, with a peak-to-
valley variation of 56 (r.m.s. wavefront aberrations ¼ 0.035). The aberrations
may be reduced if the plano-cylindrical lens is replaced by a full cylinder.
Alternatively, a different index profile of the GRIN rods or a smaller aperture stop
could help reduce these aberrations.
When used in conjunction with a one-dimensional array of fibers, one of the
lenses can be shared among the various fibers. Similarly, in a 2-D square array of
fibers, the fibers in each row can share the first lens, while the fibers in each
column can share the second lens. Collimation of a 10 · 10 fiber array will thus
require only 10 cylinders of each type.

1 Adapted from <web.mit.edu/invent/www/inventors>.
2 L. W. Casperson and A. Yariv, Gain and dispersion focusing in a high gain laser,
Applied Optics 11, 462–466 (1972).
3 F. R. Nash, Mode guidance parallel to the junction plane of double-heterostructure
GaAs lasers, J. Appl. Phys. 44, 4696–4707 (1973).
4 D. D. Cook and F. R. Nash, Gain-induced guiding and astigmatic output beam of
GaAs lasers, J. Appl. Phys. 46, 1660–1672 (1975).
35
Michelson’s stellar interferometer
The essential idea behind the stellar interferometer is that of a double-slit inter-
ferometer, such as that shown in Figure 35.1. This type of instrument dates back to
1868 when Fizeau1 proposed using it to measure the diameters of the fixed stars.
Some modern textbooks2 describe the stellar interferometer in the language of
coherence theory, which tends to obscure its fundamental simplicity. This chapter
attempts to present the original concept in its simplest form while providing a
historical perspective.
The double-slit interferometer

With reference to Figure 35.1, let us assume that a quasi-monochromatic point
source of wavelength k is placed at the origin of the XY-plane, which is the focal
point of the collimator lens. The beam emerges from the lens collimated along the
optical axis, effectively placing the point source at infinity. A double-slit mask
blocks most of the light, allowing only the rays within two narrow slits to pass
through to the focusing lens. The slits have a separation d along the X-axis, their
widths being inconsequential as long as a sufficient amount of light gets through
and a reasonable number of fringes appear at the observation plane. The focusing
lens of focal length f brings together the rays that emerge from the two slits,
causing them to interfere and produce a fringe pattern. The simple geometrical
construction in Figure 35.2 shows that at the observation plane the fringe period p
may be written as3,4
p f k=d: ð35:1Þ
Figure 35.3 shows computed plots of intensity distribution in the system of

Figure 35.1 for a point source centered on the optical axis; Figure 35.3(a) is the
distribution immediately after the slits, while Figures 35.3(b) and 35.3(c) pre-
sent the intensity pattern within the observation plane. For the assumed system
505
Double-slit
Y X mask Y X
Point
source
f
Optical
axis
Collimator d Focusing lens

Observation plane
Figure 35.1 Schematic diagram of a double-slit interferometer. The collimator

lens has NA ¼ 0.8, f ¼ 6000k. The mask has two slits each of width 500k, the
centres of which are separated by d ¼ 6500k. The focusing lens has NA ¼ 0.008
and f ¼ 6 · 105k.
f
S1
p

d
Z
~ d
p ~ f
S2
Figure 35.2 Geometrical construction showing the relation between the fringe
period p, the distance d between the slits, the focal length f of the focusing lens,
and the wavelength k of the light.
parameters ( f ¼ 6 · 105k, d ¼ 6500k) the fringe period found from Eq. (35.1) is
p 92.3k, in agreement with the simulated results.
Next, assume that a second point source is placed in the focal plane of the
collimator lens, slightly displaced from the first one located at the origin. Assume
further that the two sources, although both quasi-monochromatic with wave-
length k, are completely uncorrelated and independent, so that their radiation may
be considered to be spatially incoherent. The collimated beam arriving at the
plane of the slits from this second source will make a small angle w with the
optical axis. As shown in Figure 35.4, this angle causes the phase of the light
arriving at the two slits to differ by D where3,4
D 2pwd=k: ð35:2Þ
A 2p phase difference at the slits corresponds to a fringe translation by one

period at the observation plane. Thus the two sets of fringes arising from the
a b
–5000 x/ 5000 –800 x/

/ 800
(c)
1.0
0.8
Normalized intensity
0.6
0.4
0.2
0.0
–800 –400 0 400 800

x/
Figure 35.3 Results of a computer simulation involving a single point source,

centered on the optical axis of the system of Figure 35.1. (a) Intensity distri-
bution immediately after the mask. (b) The fringe pattern at the focal plane of the
focusing lens. (c) Cross-section of the fringe pattern along the X-axis.
two point sources will be shifted relative to each other by an amount that will
depend on w. Since the point sources are uncorrelated, it is their corresponding
intensity distributions at the observation plane that will be added together. This
results in a w-dependence of the fringe visibility V, a quantity defined by
Michelson as
V ¼ ðImax Imin Þ=ðImax þ Imin Þ; ð35:3Þ
d
Phase shift =2d/ S1
S2
Figure 35.4 A collimated beam arriving at the double-slit mask at an angle w

relative to the optical axis will exhibit a phase difference of D between the two slits.
where Imin and Imax are the minimum and maximum values of intensity within
one fringe period.
As an example, let us assume that two point sources of equal strength, separated
by 0.5k along the X-axis, are placed in the focal plane of the collimator lens of
Figure 35.1. The angular separation between the two sources as viewed from the
plane of the slits, therefore, is w ¼ 17.2 seconds of arc.5 In the absence of the double-
slit mask, the images of the two point sources will be unresolvable at the observation
plane; see Figure 35.5(a). With the mask in place, however, the plots of intensity
distribution in Figures 35.5(b), (c) indicate that the fringe visibility will be sub-
stantially altered from that for a single point source; the latter can be deduced from
Figure 35.3(c) to be 100%. Close inspection of the fringes, therefore, enables one to
infer the presence of a second source of light in the system. As a practical matter, one
may adjust the distance d between the two slits until the phase shift D of Eq. (35.2)
becomes equal to p, at which point the two fringe systems will be shifted by half a
period. Under these conditions the maxima of one set of fringes will overlap the
minima of the other, resulting in a complete “washing out” of the interference
pattern. Equation (35.2) can then be used to determine the angular separation w
between the two point sources from the knowledge of the slit separation d that
resulted in minimum fringe visibility. (Note that changing d will have the undesir-
able effect of changing the fringe period p according to Eq. (35.1), but, as long as the
fringes remain visible to the observer, this change should be inconsequential.)
Dependence of fringe visibility on d

Of course one may not know a priori whether the source is an extended object
(such as a large star) or consists of a number of distinct point sources (e.g., a
double star) and, in the latter case, whether the point sources are of equal
a b
–300 x/ 300 –800 x// 800
(c)
1.0
0.8
Intensity (normalized)
0.6
0.4
0.2
0.0
–800 –400 0 400 800

x/
Figure 35.5 Results of a simulation of the system of Figure 35.1 involving two
independent point sources, one centered on the optical axis (i.e., at the origin), the
other shifted by 0.5k along the X-axis. (a) Logarithmic plot of the intensity distri-
bution at the observation plane in the absence of the double-slit mask. (b) Fringe
pattern at the observation plane with the double-slit mask present. (c) Cross-section
of the fringe pattern along the X-axis.
intensity. It turns out that a measurement of fringe visibility V as a function of the

separation d between the slits can provide ample information about the intensity
distribution of the source. A pair of equal-intensity stars, for instance, will make
the visibility versus d a periodic function, whereas the more-or-less uniform disk
of a giant star will give rise to an oscillating V(d) whose magnitude declines with
increasing d. Calculations show that in the former case the first zero of V(d)
a b
–300 x/ 300 –800 x/ 800
(c)
1.0
0.8
Intensity (normalized)
0.6
0.4
0.2
0.0
–800 –400 0 400 800

x/
Figure 35.6 Results of a simulation of the system of Figure 35.1 involving three
independent point sources, one centered on the optical axis, the others shifted
by 0.25k along the X-axis. (a) Logarithmic plot of the intensity distribution at
the observation plane in the absence of the double-slit mask. (b) Fringe pattern at the
observation plane with the double-slit mask present. (c) Cross-section of the fringe
pattern along the X-axis.
appears at d ¼ 0.5k/w, whereas in the latter the first zero occurs at d ¼ 1.22k/w;
here w is the angle subtended by the diameter of the giant star’s disk.2, 3, 4
In any event, it is clear that a measurement of fringe visibility for several
different separations of the slits will provide much information about the distri-
bution of intensity at the source. As another example, we show in Figure 35.6 the
case of three point sources of equal intensity placed at x ¼ 0.25k, 0, and þ0.25k
in the system of Figure 35.1. Again the image obtained at the observation plane
without the double-slit mask does not resolve the sources of light, but the fringe
visibility obtained as a function of d carries enough information to allow one to
make a fairly accurate statement about the distribution of intensity at the source.
A historical perspective
6
R. W. Wood sums up the origins of the interferometer: “This method was
proposed by Fizeau1 in 1868 for measuring the diameters of the fixed stars. In
1874 Stefan made an attempt to carry out Fizeau’s plan, placing two slits in front
of the objective of the Marseilles telescope, the largest available at the time. The
fringes remained visible even when the slits were separated by the full diameter
of the objective. In 1890 Michelson measured the diameters of the four moons of
Jupiter, using the 36 inch telescope of the Lick observatory.7 The method can also
be used for determining the distance between the components of a double star.
“In 1920 Michelson took up the problem of the determination of stellar diam-
eters.8 Even the great 100 inch telescope of the Mount Wilson Observatory is not
large enough to allow of a sufficient separation of the slits; consequently Michelson
designed a ‘periscopic’ arrangement of four mirrors, the two outer ones, twenty feet
apart, reflecting the light to two inner ones which in turn reflected the beams down
upon the mirror of the 100 inch telescope. The mirrors were mounted on a metal
beam attached to the top of the telescope tube. The instrument was constructed in
collaboration with F. G. Pease of the Mount Wilson Observatory.”
A schematic diagram of the stellar interferometer constructed by Michelson (and
mentioned by R. W. Wood in the preceding paragraph) is shown in Figure 35.7. In
this instrument the distance ‘ between mirrors M1 and M2 was varied to effect a
change of fringe visibility; one must therefore substitute ‘ for d in Eq. (35.2) in
order to make it applicable to the new instrument. The fringe period p, however, is
still determined by the distance d between the slits, and Eq. (35.1) applies to
Michelson’s interferometer without any modifications. Thus there is the further
advantage that the fringe spacing remains constant as the separation of the movable
mirrors is varied. The interferometer was mounted on the 100 inch reflecting
telescope of the Mount Wilson Observatory in California, which was used because
of its mechanical strength. The apertures S1 and S2 were 114 cm apart, giving a
fringe spacing of about 20 lm in the focal plane. The maximum separation of the
outer mirrors was 6.1 m, so that the smallest measurable angular diameter (with
k ¼ 550 nm) was about 0.02 seconds of arc.3
Again quoting R. W. Wood:6 “The bright star Betelgeuse was the first
investigated. This star shows evidence of its diameter with the 100 inch telescope
Albert Abraham Michelson (1852–1931) was born in what was then Germany
(now Poland) and emigrated with his family to the United States in 1855. He
became professor of physics at the Case School of Applied Science (Cleveland,
Ohio), then at Clark University (Worcester, Massachusetts), and then at the
University of Chicago. In 1907 he became the first American to receive a Nobel
prize; the prize citation reads: “For his optical precision instruments and the
spectroscopic and meteorological investigations carried out with their aid.”
(Photo: courtesy of AIP Emilio Segré Visual Archives.)
if a canvas cover is placed over the instrument, provided with two holes 7 inches
in diameter and 94 inches apart, the diffraction disk of the star being crossed with
faint interference bands. If either hole is covered the bands disappear. If the
telescope is pointed at Rigel, however, the bands are clear and strong, showing
that its angular diameter is smaller than that of Betelgeuse. With the twenty-foot
interferometer the bands disappeared entirely in the case of Betelgeuse when the
mirrors were separated by a distance of 120 inches, while Rigel showed very
distinct bands. The angular diameter of Betelgeuse was computed as 0.047 sec-
onds of arc. From the known distance of the star [determined by triangulation], its
M1
S1 C1
M3
l d
M4
S2 C2
Observation plane
M2
Figure 35.7 Michelson’s stellar interferometer. The apertures S1 and S2 are

fixed, and the light reaches them after reflection at mirrors M1, M2, M3 and M4.
The inner mirrors M3 and M4 are fixed, while the outer mirrors M1 and M2 can be
moved symmetrically in the direction joining S1 and S2. If the optical paths
M1M3S1 and M2M4S2 are maintained equal, the optical path difference for light
from a distant point source is the same at S1 and S2 as at M1 and M2, so that the
outer mirrors play the part of the movable apertures in the Fizeau method. A
plane-parallel glass plate C1, which can be inclined in any direction, is used to
maintain the geometrical pencils from S1 and S2 in coincidence in the focal
plane. A second plane-parallel glass plate C2, of variable thickness, is used to
compensate inequalities of the optical paths M1M3S1 and M2M4S2. (Adapted
from Born and Wolf.3)
actual diameter was calculated as 250 million miles [i.e., 300 times the diameter
of the sun] or greater than the earth’s orbit about the sun [180 million miles
across]. Its diameter has been found to vary, however, for at times the mirrors
must be separated by a distance of 14 feet before the fringes disappear. Antares
was found to be still larger, having a diameter of 400 million miles. The
minimum angular diameter measurable with the 20 foot instrument is 0.024
seconds of arc.”
The majority of stars are either too distant or too small for the Michelson
interferometer to measure their diameter. For example, at the distance of the
nearest star (Alpha Centauri) the sun’s disk would subtend an angle of only 0.007
seconds of arc, and to observe the first disappearance of the fringes a mirror
separation of 20 m would be necessary. The construction of such a large inter-
ferometer would be a difficult undertaking because of the requirement of rigid
mechanical connection between the collecting mirrors and the eyepiece.3 In recent
years, the method of Hanbury Brown and Twiss as well as extensions of Michelson’s
method to radio astronomy have been used for measurements of some of the smaller
astronomical objects.2,3,4

1 H. Fizeau, C. R. Acad. Sci. Paris 66, 934 (1868).
2 For example, L. Mandel and E. Wolf, Optical Coherence and Quantum Optics,
Cambridge University Press, London, 1995.
1980.
5 One degree is 60 minutes, and one minute is 60 seconds of arc. One second of arc is
the angle subtended by a small coin at a distance of about 3.5 km.
6 R. W. Wood, Physical Optics, third edition, reprinted by the Optical Society of
America, 1988.
7 A. A. Michelson, Phil. Mag. 30, 1 (1890); A. A. Michelson, Nature (London), 45,
160 (1891).
8 A. A. Michelson, Astrophys. J. 51, 257 (1920); A. A. Michelson and F. G. Pease,
Astrophys. J. 53, 249 (1921).
36
Bracewell’s interferometric telescope
There are countless suns and countless earths all rotating around
their suns in exactly the same way as the seven planets of our system.
We see only the suns because they are the largest bodies and are
luminous, but their planets remain invisible to us because they are
smaller and non-luminous. The countless worlds in the universe are no
worse and no less inhabited than our Earth.
Giordano Bruno (1584) in De L’Infinito Universo E Mondi
In 1978 Ronald Bracewell of Stanford University proposed the use of a nulling

interferometer to cancel the image of a bright star in order to observe the relatively
faint planets which might be in orbit around the star.1 This idea, which has been
expounded and further extended by others,2,3,4 is presently the most promising
method of detecting terrestrial planets (i.e., small, rocky planets similar to Venus,
Earth, and Mars) orbiting in habitable zones around our neighboring stars. Because
atmospheric turbulence distorts the stellar wavefronts and limits the resolution of
ground-based observations, an interferometric telescope capable of detecting
planets in other solar systems must, of necessity, be stationed in space. The
National Aeronautics and Space Administration (NASA) is currently working on a
program called the Terrestrial Planet Finder (TPF), and has tentatively scheduled
the launch of a nulling interferometer into orbit in about the year 2010.5
Nulling interferometer
Figure 36.1 is a diagram of a basic Bracewell telescope intended for operation in the
infrared range of wavelengths k 7–20 lm. The reason for working in the infrared is
that the expected brightness of the star in this region is only 106 times that of the
planet, which is much better than the 109 brightness ratio in the visible. Moreover,
several signature absorption lines corresponding to ozone, water vapor, methane,
515
B
Telescope axis
M1 M2
CP
Dp BS
M3
Detector
Figure 36.1 (adapted from reference 4). Diagram of a nulling interferometric

telescope. The primary mirrors have diameter Dp and baseline B. Unmatched
reflections are made at nearly normal incidence to minimize polarization differ-
ences. The folded beams are combined at the beam-splitter (BS), which is designed
for equal transmission and reflection in the desired range of wavelengths. The beam
from the left-hand side, after crossing the axis of the telescope, is folded down at M1
and transmitted by BS before coming to a focus. The beam from the right-hand side
is folded downward at M2 before reaching the axis of the telescope, and then passes
through a compensator plate CP and is reflected back up at M3 to equalize the path
lengths before being reflected from the underside of BS. An achromatic 180 phase
difference is realized by balancing a slight difference in the air path with the path
difference between BS and CP, fine-tuned by a slight rotation of CP.
and carbon dioxide reside in this band, which can be exploited in the spectroscopic
analysis of these planets to determine whether they harbor life as we know it.5
In the following discussion we confine our attention to a single infrared
wavelength of k ¼ 10 lm, even though the interferometric telescope can operate
over a fairly broad range of wavelengths. We assume the primaries each have an
aperture diameter Dp ¼ 1 m and focal length fp ¼ 2.5 m. (The angular resolution
of the individual mirrors is thus k/Dp ¼ 105 radians.) The assumed baseline
(i.e., center-to-center separation of primary mirrors) is B ¼ 5 m.† With the
†
These parameters, chosen for the sole purpose of demonstrating the basic concepts, are not representative of
the planned systems. A typical design under consideration by the TPF program, for example, has four primary
mirrors, each 2.5 m in diameter and separated by 100 m baselines. It is envisioned that these free-flying
mirrors would collect and forward the beam of light to a local combiner and controller unit (also free-flying).
The planned system will be capable of executing nulling interferometry over the broad band of k ¼ 720 lm.
The adjustment of mirror positions and their distances from each other as well as from the combiner would
allow the configuration to be optimized on the spot in accordance with the characteristics of the particular
solar system under consideration.5
telescope pointing at a star, an angular separation of ¼ 106 radians between

the star and its planet results in a relative phase of 2pB/k ¼ 180 between
the light rays arriving at the two mirrors from the planet. (1 lrad 0.2 arcsec
is the separation between the Sun and the Earth observed from 16 light
years away.) The separation of the planet from its parent star in this case is an
order of magnitude below the resolution of the individual mirrors, yet the
assumed nulling interferometer is capable of detecting the planet in the vicinity
of the star.
The secondary mirrors in the system of Figure 36.1, placed at z ¼ 25 cm before
the primary focus, bring the reflected beam to a final focus at z0 ¼ 5 m in front of
the secondary. These negative mirrors, designed for a 20:1 conjugate ratio, have
aperture diameter Ds ¼ 10 cm, focal length fs ¼ 26.32 cm, and magnification
Ms ¼ 20. The focused cone of light emerging from each secondary is an f / 50
beam (i.e., numerical aperture NA0 ¼ 0.01), giving rise to an Airy disk diameter of
1.22k/NA0 ¼ 1.22 mm at the image plane.
The light from the planet, entering the primary at the oblique angle of
¼ 106 radians, emerges from the secondary at 0 ¼ 10. (The secondary is
ten times closer than the primary to the virtual image of the sky at the primary’s
focal plane.) The final image of the planet, therefore, is shifted by Dr ¼ z0 0
¼ 50 lm from the image of the star at the center of the image plane. This
separation, being more than an order of magnitude below the Airy disk
diameter of 1220 lm, is clearly insufficient to resolve the planet’s image from
the parent star’s, confirming once again the inadequacy of the individual mirrors
for the task.
The case against a conventional telescope

Even a conventional (filled-aperture) space telescope 25 m in diameter will
fail to detect the planet in the preceding example. The problem in this case is
not resolution but photon noise. The image of the star, being about 106
times brighter than that of the planet, floods the detectors and obscures the
planet’s signal. The nulling interferometer, however, yields an acceptable
signal-to-noise ratio at the detector output by canceling the light of the star
arriving from the two mirrors while, at the same time, enhancing the image of
the planet by constructive interference. Not only does the nulling interfer-
ometer eliminate the complete Airy pattern of the star (i.e., the central disk as
well as the rings), it does so without requiring any significant displacement of
the planet’s Airy pattern. What is important for the nulling interferometer is
not how much the two Airy disks in the image plane are separated from each
other but how much the wavefront arriving from the planet at one mirror is
delayed relative to the time of arrival of the same wavefront at the other. This
delay or phase shift, being a function of the baseline B, is independent of the
mirror diameter Dp.
Destructive and constructive interference

As a specific example consider the system of Figure 36.1 with the afore-
mentioned parameters. The beam-splitter (BS) is an important component of
this system; to simulate its behavior we used a six-layer stack on a 1 mm
substrate, as shown in Figure 36.2. The alternate layers are high- and low-
index dielectrics, their thicknesses chosen to yield a 50/50 beam-splitter at the
operating wavelength of k ¼ 10 lm. The top of the substrate is anti-reflection
coated with a low-index layer to minimize undesirable reflections. This stack
design, although adequate for demonstration purposes, is not suitable for
broad-band applications requiring cancellation of the star light with high
accuracy. Such applications require alternative designs or more complex
multilayer stacks.
Figure 36.3 shows computed images of a single star obtained with the above
telescope. The Airy pattern in Figure 36.3(a) is obtained when the light from one
arm of the telescope is blocked. When both channels are open and properly
balanced, destructive interference at the beam-splitter yields the null image in
Figure 36.3(b); here the peak intensity is only 1.4 · 104 that in Figure 36.3(a).
The weak residual image of the star in Figure 36.3(b) is due to a slight imbalance
of the two channels brought about by the beam-splitter’s minute departure from
AR layer
Substrate
Six-layer stack
Figure 36.2 A simple design for the beam-splitter (BS) in the system of
Figure 36.1, having a 50/50 reflection to transmission ratio at k ¼ 10 lm. The
1 mm-thick substrate has refractive index n ¼ 2, and is antireflection coated (AR)
on the top surface with a t ¼ 1.785 lm layer having n ¼ 1.4. Deposited on the
substrate bottom is a six-layer stack. Numbered in increasing order starting at the
substrate interface, these layers have the following parameters: layers 1, 3, 5,
t ¼ 1.7 lm, n ¼ 1.5; layers 2, 4, t ¼ 1.25 lm, n ¼ 2.0, layer 6, t ¼ 0.475 lm, n ¼ 2.0.
a b
– 300 x/ 300 – 300 x/ 300
Figure 36.3 Logarithmic intensity distributions in the image plane corres-

ponding to a single star with no planets: (a) when the light from either arm of the
interferometer is blocked; (b) with both channels open and the path lengths
properly balanced to allow interferometric cancellation of the star’s image. The
ratio of the peak intensity in (b) to that in (a) is 1.4 · 104. The non-zero values
of the residual intensity in (b) are due to imperfect balance between the two
channels.
the ideal 50/50 ratio. Although this four-orders-of-magnitude reduction of

intensity in the null image is sufficient for the present discussion, it is totally
unacceptable for the observation of actual terrestrial planets. Because the radia-
tion levels of these planets are expected to be at least a million times weaker than
their parent star’s, it is imperative to design the telescope components, with much
higher accuracy, for a maximum rejection of the star light.
Consider next a planet only 100 times weaker than its star, with an angular
separation of 25 lrad. With one channel of the telescope blocked the image in
Figure 36.4(a) is obtained, whereas in a balanced interferometer one obtains the
image in Figure 36.4(b). It is obvious in this example that the single-channel output,
corresponding to a conventional telescope’s image, shows a faintly visible planet
next to a bright star, whereas in the interferometric image the light from the star is all
but eliminated. Note that for clarity of presentation we have chosen a relatively
bright planet with a large separation from its parent star (25 lrad 5 arcsec). Both
these assumptions are much too optimistic and, in practice, one must substantially
improve the sensitivity of the assumed telescope in order to detect terrestrial planets
in our neighboring solar systems.5
The fringe pattern and the spinning telescope

Figure 36.5 shows a Bracewell telescope oriented with its baseline in the
XY-plane at an angle h from X, pointing at a star along the Z-axis. Each point in
a b
– 300 x/ 300 – 300 x/ 300
Figure 36.4 Images of star and planet, when the assumed brightness of the
planet is 1% of the star’s and their angular separation is 5.16 arcsec. (a) When
either beam is blocked the intensity distribution is essentially that of the star;
the planet is barely visible. (b) With both channels open and the phase dif-
ference between them adjusted to 180 , the bright star is canceled out and the
planet becomes visible. The center-to-center spacing between the images of the
star and the planet is 1.25 mm, and the ratio of the peak intensity in (b) to that
in (a) is 0.038.
f Y
c
u
X
Figure 36.5 The Bracewell telescope, with its baseline in the XY-plane
and oriented at an angle h from X, targeting a star along the Z-axis. The
points in the vicinity of the star are identified by their polar and azimuthal
coordinates , w. The wavefront, arriving from an oblique direction, reaches
one mirror later than the other, producing a path-length difference of B sin
cos(w h).
the star’s neighborhood may be identified either by its angular coordinates ,w or
by the Cartesian coordinates x, y of its image in the telescope. The image location
is related to the polar coordinates through the equation
ðx; yÞ ¼ Ms fP tan ðcos w; sin wÞ: ð36:1Þ
Here fP is the focal length of the primary mirror and Ms is the magnification of the
secondary. For the light arriving at the two primaries from the direction (, w) in
the sky the relative phase is
DU ¼ 2pðB=kÞ sin cosðw hÞ: ð36:2Þ
The two arms of the telescope are adjusted in such a way that when DU ¼ 0 the
two beams interfere destructively whereas a 180 phase shift results in con-
structive interference. The corresponding light amplitude at the image plane is
thus given by
A ¼ 12 A0 ½1 expðiDUÞ; ð36:3Þ
and the resulting intensity may be written
jAj2 ¼ jA0 j2 sin2 ð12 DUÞ ¼ jA0 j2 sin2 ½pðB=kÞ sin cosðw hÞ: ð36:4Þ
Figure 36.6 is a gray-scale plot of jA/A0j2 in the image plane of the telescope
(black and white represent 0 and 1, respectively). Each point (x, y) in this plane
corresponds to a point (, w) in the sky in accordance with Eq. (36.1); it is also
assumed that the primary mirrors are separated by B ¼ 5 m along the h ¼ 45 line.
– 100 x/ 100
Figure 36.6 Gray-scale plot of jA/A0j2 (Eq. 36.4) in the image plane of the
telescope (black and white represent 0 and 1, respectively). The primary mirrors
are separated by B ¼ 5 m along the h ¼ 45 line; k ¼ 10 lm.
The field of view is centered at (x, y) ¼ (, w) ¼ (0, 0), which is the target star’s
location. Only a circle of radius 100k within the field of view – corresponding to
0 20 lrad – is shown in Figure 36.6, but the same pattern could extend over
a much larger patch of sky around the targeted star.
Any planet (or other source of radiation) located in the bright fringes of
Figure 36.6 will produce a bright Airy pattern in the image plane at that location.
However, planets located in the dark fringes disappear from the image because
destructive interference cancels them out. If the telescope is rotated around the
Z-axis while maintaining a tight fix on the target star, h will change continuously
and the pattern of Figure 36.6 will rotate around its center. The image of a planet
within the field of view, however, will remain fixed while the fringes rotate. The
planet’s image thus waxes and wanes as the bright and dark fringes cross it one
after the other. The number of times that the planet’s image appears and disap-
pears in a single revolution of the telescope depends on the polar coordinate of
the planet; specifically, the frequency of the planet’s signal at the detector output
increases in proportion to its separation from its parent star. In this way it is
possible to modulate the signal of a given planet and, by integration over time, to
reduce noise components residing outside the specific frequency of the planet’s
signal.1
Interplanetary dust and zodiacal light

In addition to the Sun and the nine planets and their moons, our solar system is
home to countless rocks, pebbles, and dust particles floating in interplanetary
space. The light of the Sun scattered from these dust particles (the so-called
zodiacal light) will enter a space-based telescope and create a background noise.
A similar diffuse radiation from the targeted solar system (exo-zodiacal light)
will also be imaged as a fairly uniform distribution across the telescope’s field
of view.4,5
It is true, of course, that the ideal image of a broad, uniform source of light in
the Bracewell telescope should resemble the striped pattern of Figure 36.6.
However, the Airy disk produced by the finite aperture of each mirror is typically
many times larger than the fringe spacing in Figure 36.6, and, therefore, the
image of the zodiacal light will be the convolution between the Airy pattern of
Figure 36.3(a) and the stripes of Figure 36.6. The zodiacal emissions, therefore,
appear as a fairly uniform distribution in the image plane of the telescope. The
shot noise from this captured background radiation is mainly responsible for
the unavoidable noise in the photodetector output, its elimination requiring the
spinning of the telescope followed by integration of the signal over time, as
discussed in the preceding section.
Effect of star’s finite diameter

In the presence of pointing errors or when the star has a finite angular diameter,
the star light leaks out of the null and swamps the planet’s image. Figure 36.7(a)
shows the computed image of a star having an angular diameter of 0.05 arcsec,
obtained in the nulling interferometer of the previous examples. To simulate this
finite-size star we assumed 25 equally bright point sources spread over the surface
of the star and superimposed the intensities of their Airy patterns at the image
plane. The peak intensity of this image is about 230 times stronger than that in
Figure 36.3(b), which was obtained under identical conditions except for neglect
of the star’s diameter. It is clear that the interferometer’s null must be made
broader if such effects of the finite diameter (as well as any pointing errors) are to
be avoided. A proposed solution to this problem involves the use of several
telescopes instead of just two, as in the original Bracewell concept. With the
beams from four or more telescopes combined in a nulling interferometer, it is
possible to broaden the central null of the fringe pattern.2,3
Achromatic path-length equalization

The compensator plate CP in the system of Figure 36.1 is used to balance the path
lengths of the two interferometer arms over a range of wavelengths. In the particular
case studied in reference 4, CP was 42 lm thicker than BS, and achromaticity was
achieved for k ¼ 10 –14 lm.
a b
– 300 x/ 300 – 300 x/ 300
Figure 36.7 (a) Null image of a star of finite diameter (0.05 arcsec). The peak
intensity is about 230 times greater than that in Figure 36.3(b). (b) Image of the
finite-diameter star and its planet. This image should be compared with Figure 36.4
(b), which was obtained under identical conditions except for neglect of the angular
diameter of the star. The peak intensities of the planet and star in the present image
are nearly the same.
Let d1 and d2 be the optical path lengths of the two channels in air, and denote
by t1 and t2 the thicknesses of CP and BS, respectively. These plates are made of
the same material, whose refractive index within the wavelength range of interest
may be approximated by n(k) a þ bk (a and b are material constants). Also, one
must take into account the 90 phase shift introduced by the (symmetric) beam-
splitter between the reflected and transmitted beams. The overall optical phase
difference between the two channels is thus given by
DU ¼ 12 p þ 2p½d1 þ t1 nðkÞ d2 t2 nðkÞ=k

12 p þ 2pfðt1 t2 Þb þ ½d1 d2 þ ðt1 t2 Þa=kg: ð36:5Þ
In this equation the first bracketed term can be chosen to yield a 90 phase shift by
selecting the plate thicknesses such that (t1t2)b ¼ 14. The second bracketed term is
dependent on k and must therefore be set to zero. Since t1t2 is already fixed,
elimination of the second term requires an adjustment of d1d2, the path-length
difference in air. In practice these adjustments are made iteratively by changing
d1d2 while rotating CP by small amounts until the desired null is achieved.

1 R. N. Bracewell, Detecting nonsolar planets by spinning infrared interferometer,
Nature 274, 780–781 (1978).
2 J. R. P. Angel and N. J. Woolf, searching for life on other planets, Scientific
American 274, 60–66 (April 1996).
3 N. Woolf and J. R. Angel, Astronomical searches for Earth-like planets and signs of
life, Ann. Rev. Astron. Astrophys. 36, 507–537 (1998).
4 P. M. Hinz et al., Imaging circumstellar environments with a nulling interferometer,
Nature 395, 251–253 (1998).
5 J. R. Angel et al., TPF: Terrestrial Planet Finder, JPL publication 99-3, May 1999.
For more information visit the worldwide web at http://tpf.jpl.nasa.gov.
37
Scanning optical microscopy†
The diffraction-limited focusing of a laser beam to either explore or modify a

surface is the basis of several important technologies. Examples include scanning
optical microscopy, optical disk data storage, and laser printing. The size of the
focused spot and the corresponding depth of focus are important factors in
determining the performance characteristics of these systems. In this chapter we
examine methods of forming the focused spot, and clarify the relation between
spot size and depth of focus.
Principle of operation
The essential features of a scanning optical microscope are shown in Figure 37.1. A
laser beam is sent through an objective lens to form a focused spot on the sample.
Ideally, the objective is corrected for all aberrations, yielding a diffraction-limited
focused spot. The light reflected from the sample returns through the objective and
is redirected by the beam-splitter to a detection module. The detection module may
be designed to monitor the power, the phase, or the polarization state of the
returning beam. The electrical signal S(x, y) produced by the detector is thus rep-
resentative of the small area of the sample illuminated by the focused spot at and
around the point (x, y). The sample is moved to different locations by the XY stage
on which it is mounted; the signal S(x, y), plotted against the sample’s position,
yields an image of the sample’s surface over the desired area.
Spot size at best focus

The most important component of any optical microscope is its objective lens. The
quality of the focused spot produced by the objective determines the resolution of the
†
The coauthors of this chapter are Lifeng Li and Wei-Hung Yeh.
525
X
Beam-splitter
Sample
Laser
Z
Y
Collimator Objective
XY
Scanner
Detector
module
S(x, y)
Figure 37.1 Schematic diagram of a scanning optical microscope. The objective

lens focuses the laser beam at the point (x, y) on the sample. The XY scanner on
which the sample is mounted moves the sample in small steps along both X- and
Y- directions, covering the area of interest. At each point the reflected light is picked
up by the detector module and converted to a signal S(x, y). A plot of S(x, y)
constitutes the image of the scanned area.
images obtained, so it is important to have a very small, aberration-free spot at the

focal plane of the objective. Figure 37.2, a schematic drawing of an objective lens,
defines some of its important characteristics. The converging cone of light has a half-
angle h. The numerical aperture NA of the lens is defined in terms of this half-angle
and the refractive index n of the medium in which the sample is immersed:
NA ¼ n sin h: ð37:1Þ
When the sample is in air (n ¼ 1) the numerical aperture is less than unity.
However, if the sample is embedded in a liquid or solid of refractive index n > 1,
the numerical aperture can be as large as n.
The diameter D of an aberration-free focused spot is given by diffraction
theory as1
D k 0 =NA ð37:2Þ
where k0 is the vacuum wavelength of the laser beam. The above equation gives only
a rough estimate of the spot diameter, the exact value depending on how the diameter
is defined [e.g., the diameter of the first dark ring of the Airy disk, the full width at
half maximum (FWHM) of the intensity distribution, etc.], on the distribution of
light at the entrance pupil of the lens (e.g., uniform, truncated Gaussian, etc.), and on
the state of polarization of the laser beam. The proportionality constant between D
and k0 / NA is typically between 0.5 and 1.5, depending on the circumstances.
Figure 37.3 shows plots of intensity distribution at the focal plane of the 0.615NA
objective shown in Figure 37.2. The incident beam is assumed to be uniform and
X
u
Z
Y
2d
Depth of focus
Figure 37.2 A polarized beam of light is brought to diffraction-limited focus by a

microscope objective lens. Since scanning microscopy is typically done with a
monochromatic laser beam, chromatic aberrations of the lens are of no concern.
Bending of the polarization vector, however, is significant and must be taken into
consideration. The half-angle h of the focused cone is used to define the NA-value of
the lens. The depth of focus is within d of the focal plane. For a high-NA singlet,
such as the plano-convex lens shown here, diffraction-limited performance over a
flat field can be achieved only with an aspheric surface. This particular lens, designed
for operation at k0 ¼ 633 nm, has the following set of parameters: n ¼ 1.806092,
Rc ¼ 0.9846 mm, K ¼ 1.00938, A4 ¼ 6.16672 · 102, A6 ¼ 1.42948 · 102,
A8 ¼ 2.14376 · 102, A10 ¼ 8.12147 · 103, aperture radius ¼ 1 mm, thickness
¼ 1.142 mm. The lens NA-value is 0.615 and its focal length is 1.2315 mm.
–2 x (μm) 2 –2 x (μm) 2 –2 x (μm) 2
Figure 37.3 Logarithmic plots of intensity distribution at the focal plane of the
0.615NA objective shown in Figure 37.2. The incident beam is uniform and has
linear polarization along the X-axis. From left to right: X-, Y-, and Z-components
of polarization at best focus. The integrated intensities of the three components
are in the ratios 1 : 0.002 : 0.113.
linearly polarized along the X-axis. The bending of the rays by the lens produces
E-field components along the Y- and Z-axes as well; the distributions of these
components, which carry only a small fraction of the total optical energy, are shown
in Figure 37.3(b), (c). The logarithmic scale of these plots enhances the rings of light
around the central bright spot; in fact these rings are typically weak and do not
contribute much to the scanning signal. The central bright spot in Figure 37.3(a)
is the most important contributor to the signal, but for accurate measurements
the effects of the entire focused spot should be taken into consideration.
Depth of focus
Another important characteristic of the focused spot is its depth of focus.
Typically, for high-NA objectives the range over which the spot size can be con-
sidered to be small is quite limited. As shown in Figure 37.2, if the sample moves
by d along the Z-axis, deviations from perfect focus may be tolerable; for larger
movements, the quality of the scanning signal suffers. The order of magnitude of
the depth of focus is given by the theory of diffraction as d/k (D/k)2, which is an
expression for the Rayleigh range2 of the beam in a medium in which the wave-
length is k. This expression may be written as
d D2 =k: ð37:3Þ
The proportionality constant between d and D2/k depends on the performance

criteria of the system and may be anywhere in the range 0.1 to 1. For the 0.615NA
lens of Figure 37.2, plots of total intensity distribution (i.e., the X-, Y-, and
Z-components of polarization combined) at several distances from focus are
shown in Figure 37.4. At best focus a small elongation of the spot along the
X-axis may be observed. This is characteristic of the focused spots obtained with
linearly polarized light at high NA: the spot is always elongated along the dir-
ection of incident polarization. The FWHM of the spot at this point is 0.57 lm
along X and 0.51 lm along Y. Equations (37.2) and (37.3) predict d k0 / NA2
¼ 1.67 lm, in agreement with the distributions of Figure 37.4. The spot
diameter is substantially enlarged if the depth of focus is exceeded.
Oil immersion objective

To obtain improved resolution one may use an oil-immersion objective. As shown
in Figure 37.5, the front element of this type of lens is in contact with a fluid having
a specific refractive index n. (The front element is typically an aplanatic sphere; for
a discussion of aplanatism see chapter 1, “Abbe’s sine condition”.) The front
element of the lens, the fluid, and the cover plate protecting the sample (if any)
should all have the same or nearly the same refractive index. Thus, upon emerging
from the objective the rays go directly to the sample’s surface without further
bending. Under such conditions the wavelength of the light within the immersion
oil is reduced by a factor n, in consequence of which the effective NA of the lens
increases by the same factor. Equations (37.1)–(37.3) apply to this case as well,
–2 x (μm) 2
Figure 37.4 Logarithmic plots of total intensity distribution at and near the
focus of the 0.615NA objective shown in Figure 37.2. From top to bottom
Dz ¼ 2 lm, 1.5 lm, 1 lm, 0.5 lm, and 0. Because of the symmetry between the
two sides of focus, the distributions for Dz are the same. At best focus the
spot’s FWHM is 0.57 lm along X and 0.51 lm along Y.
showing that for a given cone angle h both the spot size D and the depth of focus d
shrink by a factor n, compared with an objective designed for operation in air.
For an oil-immersion lens having sin h ¼ 0.615 and n ¼ 2, Figure 37.6 shows
plots of the total intensity distribution at 1 lm defocus (top) and at best focus
Objective
Sample
Index-matching fluid
Figure 37.5 An oil-immersion objective focuses the beam onto the sample
through an index-matched fluid of refractive index n. The fluid is in contact with
both the sample and the front element of the lens. The rays that emerge from the
objective do not bend on their way to the sample, thus forming a high-NA cone
of light. For a given half-angle h of the cone, the NA of an oil immersion
objective is superior to that of an air-incidence objective by a factor n.
–2 x (μm) 2
Figure 37.6 Logarithmic plots of intensity distribution at and near the focus of an
oil-immersion objective. The objective consists of the 0.615NA lens of Figure 37.2
in conjunction with a hemispherical glass cap. Both the cap and the immersion oil
have index n ¼ 2, resulting in an overall NA of 1.23. Top: Dz is 1 lm away from
the focal plane. Bottom: the position of best focus; FWHM ¼ 0.28 lm along
X, 0.25 lm along Y.
(bottom). Compared to Figure 37.4, which corresponds to the same value of h in

air, it is apparent that both the spot diameter and the depth of focus have
decreased by a factor n ¼ 2.
Line scans across a grating

Figure 37.7 shows the cross-section of a diffraction grating. The grating is
coated with a thick layer of gold, n and k are 0.14 and 3.37, respectively; the
grating has a groove depth 170 nm and a period 1.5 lm, of which 0.5 lm is the
groove width, 0.66 lm is the land width, and the remaining 0.34 lm is taken up
by the two side walls, pitched at 45 . For the purpose of imaging this grating,
the assumed detector module in the system of Figure 37.1 is a split detector,
oriented with its splitting line parallel to the grooves. As will be described
below, the outputs S1, S2 of the split detector may be combined in different ways
to yield the scanning signal.
Figure 37.8 shows plots of a single line-scan of the grating in the direction
perpendicular to the grooves; the scalar theory of diffraction has been used to
compute these plots. The dashed curves correspond to a 0.6NA air-incidence
objective, while the solid curves represent a 1.2NA oil-immersion objective. The
scans in Figure 37.8(a), obtained by adding S1 and S2, represent the total optical
power returning from the sample. The monitored signal in Figure 37.8(b) is the
so-called push–pull signal, (S1 S2)/(S1 þ S2), which is sensitive to the position
of the groove edges. Clearly, the oil-immersion objective with its superior
NA-value provides a better resolution in both cases.
The origin of the push–pull signal used for sensing the groove edges may be
understood by considering Figure 37.9, which shows the intensity distribution in
the exit pupil of the objective lens for three cases: from top to bottom, the spot is
focused on the land center, on the groove edge, and on the groove center. The
symmetry of this so-called “baseball pattern” is such that, with the beam focused
on the land or on the groove, the split detector receives equal amounts of light on
both its halves. However, on the groove edge the diffraction orders appearing on
0.17 ␮m 0.66 ␮m Gold
0.5 ␮m
Figure 37.7 Cross-section of a diffraction grating used in computer simula-

tions (period 1.5 lm). The gold coating is thick enough to prevent the light from
penetrating through to the other side.
1.0 (a)
Oil immersion
(NA = 1.2)
0.8
Sum signal (S1 + S2)

0.6
0.4 Air-incidence
(NA = 0.6)
0.2
Scalar
0.0
– 0.75 – 0.50 – 0.25 0.00 0.25 0.50 0.75
Distance (␮m)
0.8 (b)
Oil immersion
0.6 (NA = 1.2)
Differential Signal (S1 – S2)/(S1 + S2)
0.4
0.2
Air-incidence
(NA = 0.6)
0.0
– 0.2
– 0.4
– 0.6
Scalar
– 0.8
– 0.75 – 0.50 – 0.25 0.00 0.25 0.50 0.75
Distance (␮m)
Figure 37.8 Scalar diffraction theory applied to the grating of Figure 37.7 yields
single-line scans in the direction perpendicular to the grooves. The scanned period
extends from the center of the land at 0.75 lm to the center of the adjacent land,
at þ0.75 lm, with the groove center at 0. The broken line corresponds to a 0.6NA
air-incidence objective, while the solid line represents a 1.2NA oil-immersion
objective. The detector module consists of a split detector aligned with the
grooves, yielding signals S1 and S2. (a) Sum signal scans corresponding to the total
reflected power. (b) Differential signal scans, corresponding to the “push–pull”
method.
– 2000 x (μm) 2000
Figure 37.9 Computed baseball patterns at the exit pupil of the 1.2NA oil-
immersion objective during the scans depicted in Figure 37.8. From top to bottom,
the focused spot is on the land center, on the groove edge, and on the groove center.
one side of the baseball pattern have a different phase from those appearing on
the opposite side and, therefore, the asymmetry between the two halves of the
baseball pattern yields a fairly large differential signal. Since these calculations are
based on the scalar theory of diffraction, anomalous effects due to surface plasmon
excitation and dependence on the beam’s polarization state are not observed. Such
effects will show up later in our full vector diffraction calculations.
Focusing through a cover plate

At times it is necessary to observe a sample through a transparent cover plate.
Biological samples, for instance, are usually prepared between a pair of thin glass
plates, and the storage layer of compact disks is protected from dust and
fingerprints by a plastic substrate 1.2 mm thick. In either case the objective lens
must be corrected for the specific thickness and refractive index of the cover plate.
As shown in Figure 37.10, a cone of light focused through a parallel plate
becomes compressed toward the optical axis, its value of sin h shrinking by the
refractive index n of the plate. At the same time, the wavelength of the light
inside the plate also shrinks by the same factor, to give k ¼ k0 / n. The net effect is
that the spot diameter D does not change as a result of focusing through the cover
plate. However, Eq. (37.3) implies that the depth of focus will improve. This
would be true, of course, if one interpreted the depth of focus as the depth of the
sample interrogated by the focused beam while the sample remained at rest. But
what happens if one moves the sample in the Z-direction and determines the
distance Dz over which the image of the sample remains sharp? One finds in the
latter case that focusing through the cover plate does not improve the depth of
focus at all. In other words, the depths of focus with and without the cover plate
are exactly the same. (Keep in mind that the objective lens is corrected for each
case separately.)
The reason for the above apparent discrepancy is as follows. If one moves the
sample and the cover plate together by Dz along the positive Z-axis, the top of the
cover plate also moves away from the lens by the same distance. Consequently
the focused spot recedes from the sample’s surface by nDz, which is greater than
the actual travel of the sample. (This analysis, which ignores residual spherical
aberrations, is quite straightforward and requires only the use of Snell’s law and
simple geometry. It also applies to the case where the sample and the cover plate
move along the negative Z-axis.) Thus, as long as the lens remains stationary
while the sample and the cover plate travel together along Z, the cover plate does
Objective
Sample
Cover plate
(or substrate)
Figure 37.10 Focusing through a transparent cover plate of refractive index n.

The cone angle shrinks by a factor of n, but the spot size and the depth of focus
are not affected.
not increase, nor does it decrease, the depth of focus. Aside from protecting the
sample, focusing through the cover plate has no obvious advantages.
The solid immersion lens

A transparent hemisphere of refractive index n may be placed over the sample in
such a way as to bring the cone of light to focus at the center of the hemisphere,
as shown in Figure 37.11. The use of this type of hemisphere, often referred to as
a solid immersion lens (SIL),3 improves the resolution of the system by a factor
of n. To establish smooth and seamless contact without the use of an index-
matching fluid, the bottom of the hemisphere and the top of the sample must both
be flat and free from dirt, dust and scratches. The resolution gain thus achieved is
a consequence of the fact that in going through the hemisphere the cone angle h
remains the same while the wavelength of the light shrinks by a factor of n.
What is remarkable about the SIL is that, unlike in oil-immersion microscopy,
the depth of focus does not suffer as a result of the improved resolution. As long
as the SIL and the sample move together along Z, whether towards or away from
the objective, the bending of the light rays at the spherical surface of the SIL
(governed by Snell’s law) makes the focused spot move in the direction of the
sample, thereby helping to increase the depth of focus. The net effect is that the
depth of focus of the system remains the same whether or not the SIL is placed
on the sample. Figure 37.12 shows computed plots of intensity distribution at
the sample’s surface when the assembly of the sample and the SIL travels
by distances Dz ¼ 2 lm, 1 lm, and 0 away from the position of best focus.
Comparing Figure 37.12 with Figure 37.4, one concludes that the use of the SIL
has reduced the spot diameter by a factor of n ¼ 2 but has not changed the
system’s depth of focus.
Objective Sample
SIL
Figure 37.11 Focusing through a solid immersion lens (SIL) of refractive

index n. The spot size shrinks by a factor of n, but, assuming the SIL and the
sample move together along Z, the depth of focus remains the same.
–2 x (μm) 2
Figure 37.12 Logarithmic plots of intensity distribution when a SIL

(radius ¼ 0.5 mm, n ¼ 2) is placed in front of a 0.615NA objective lens. From
top to bottom, the SIL and the sample, moving together along the Z-axis,
deviate from the position of best focus by 2 lm, 1 lm, and 0. At best focus the
spot’s FWHM is 0.28 lm along X and 0.25 lm along Y. The effective NA is
1.23, but the depth of focus is the same as it was prior to inserting the SIL.
Effect of the air gap

In applications of SIL to microscopy, where the sample is stationary, the SIL and
the sample remain in contact, keeping the width of the air gap at Wg ¼ 0.
However, in optical disk systems, where the disk spins under the SIL at a rapid
rate, a small air gap develops between the bottom of the SIL and the top of the
disk surface.4 Under such circumstances the light must jump through the gap in
order to interact with the storage layer of the disk. This is not a serious problem
for those rays that propagate along the Z-axis, or at a small inclination with
respect to it, since they are readily transmitted through the bottom of the SIL.
However, for those rays that make a large angle with the Z-axis, the Fresnel
transmission coefficients become small; in particular, when the incidence angle
exceeds the critical angle of total internal reflection the transmissivity drops to
zero.1 Fortunately, the phenomenon of frustrated total internal reflection allows
photons to tunnel through the gap and reach the storage layer of the disk. For this
to happen efficiently, the gap width Wg must be a small fraction of k0. (See
Chapter 27, “Some quirks of total internal reflection”.)
The effects of the air gap on the signal level can be seen in Figure 37.13,
which shows results of computer simulations based on the full vector theory of
1.0 (a) SIL: NA = 1.2

Gap = 200 nm
0.8
Gap = 50 nm
0.6 Gap = 0
Air-incidence
0.4 (NA = 0.6)
0.2
Parallel Polarization
0.0
– 0.75 – 0.50 – 0.25 0.00 0.25 0.50 0.75
Distance (␮m)
Figure 37.13 Computed line-scans in the direction perpendicular to the grooves

of the grating of Figure 37.7, based on vector diffraction theory. The scanned range
extends from the center of a land at 0.75 lm to the center of an adjacent land at
þ0.75 lm. The detector module consists of a split detector aligned with the grooves,
yielding the signals S1, S2. (a), (c) Sum signal scans corresponding to the total
reflected power collected by the detector: upper solid line, gap-width Wg ¼ 200 nm;
dotted line, Wg ¼ 50 nm, lower solid line, Wg ¼ 0. The small oscillations riding over
the signals are caused by numerical errors. (b), (d) Differential signal scans cor-
responding to the push–pull method of detection: solid line of smaller amplitude,
Wg ¼ 200 nm; dotted line, Wg ¼ 50 nm, solid line of greater amplitude, Wg ¼ 0. In
each figure the broken-and-dotted curve is obtained in the absence of the SIL with a
0.6NA objective lens, while the other three curves correspond to a SIL with n ¼ 2
and an overall objective NA of 1.2. The linear incident polarization in (a) and (b) is
parallel to the grooves, while in (c) and (d) it is perpendicular to the grooves.
0.8
(b) Gap = 0
0.6
Differntial Signal (S1 – S2)/(S1 + S2)

0.4
Gap = 50 nm
0.2
Gap = 200 nm
0.0 SIL: NA = 1.2
– 0.2
– 0.4
– 0.6
Air-incidence
(NA = 0.6) Parallel Polarization
– 0.8
– 0.75 – 0.50 – 0.25 0.00 0.25 0.50 0.75
Distance (␮m)
1.0 (c)
SIL: NA = 1.2
Gap = 200 nm
0.8
Gap = 50 nm
0.6
Gap = 0
0.4
Air-incidence
(NA = 0.6)
0.2
Perpendicular Polarization
0.0
– 0.75 – 0.50 – 0.25 0.00 0.25 0.50 0.75
Distance (␮m)
diffraction.5,6 The grating used in these simulations is that of Figure 37.7, the
assumed refractive index of the SIL is n ¼ 2, and the incident beam is assumed to
be linearly polarized. The direction of incident polarization is parallel to the
grooves in Figures 37.13(a), (b), and perpendicular to the grooves in Figures 37.13
(c), (d). Several line-scans across a single period of the grating are shown; the plots
in Figures 37.13(a), (c) correspond to the total returned optical power while those
in Figures 37.13(b), (d) represent the differential (or push–pull) signal.5 The signal
amplitude is highest when Wg ¼ 0, but it has dropped considerably at Wg ¼ 50 nm
0.8
(d) Air-incidence
(NA = 0.6)
0.6
Differential Signal (S1 – S2)/(S1 + S2)

0.4
Gap = 0 SIL: NA = 1.2
0.2 Gap = 200 nm
0.0
Gap = 50 nm
– 0.2
– 0.4
– 0.6
Perpendicular Polarization
– 0.8
– 0.75 – 0.50 – 0.25 0.00 0.25 0.50 0.75
Distance (␮m)
and even further at Wg ¼ 200 nm. In practice a gap-width below about k0 /10 is
usually acceptable; going beyond this value causes a sharp reduction in the
signal level.
The scanning signals are sensitive to the direction of incident polarization.
In general the two polarization directions parallel and perpendicular to the
grooves are not equivalent and yield different results, as may be readily
observed in Figure 37.13. To emphasize further the significance of polariza-
tion, Figure 37.14 shows the light intensity pattern at the exit pupil of the
objective lens for polarization directions parallel and perpendicular to the
grooves, all other things being kept equal. The two baseball patterns show clear
differences.
The super SIL

It is well known that a converging cone of light aimed at a point a distance nR
below the center of a glass sphere of radius R and refractive index n comes to
diffraction-limited focus within the sphere at a distance of R/n below the center.1
This fact has been exploited in the design of the super SIL shown in Figure 37.15.
The effective NA of the objective thus increases by a factor n2, not only because
the wavelength within the super SIL is shortened by a factor n but also because
the super SIL increases the sine of the cone angle h by a factor n. The super SIL is
aplanatic, and placing it in front of any aplanatic objective renders the combin-
ation aplanatic as well.
a b
–2000 x (μm) 2000 –2000 x (μm) 2000
Figure 37.14 Computed baseball patterns at the exit pupil of a 0.6NA objective
lens when the grating of Figure 37.7 is placed beneath a SIL of refractive index
n ¼ 2 (total NA ¼ 1.2). The focused spot is at the center of the land and the
assumed gap width is 100 nm. In (a) the polarization vector is parallel to the
grooves, while in (b) it is perpendicular to the grooves.
Objective
Sample
C
Z
Super SIL
Figure 37.15 Focusing through a “super SIL” of refractive index n and radius
R. In the absence of this SIL the cone of light from the objective comes to focus
at a distance of nR from the center C of the sphere. The bending of the rays at the
sphere’s surface shifts the focal point to a distance of R/n from C, without
introducing any aberrations.
The full factor n2 mentioned above may not be realized in practice, because the
bending of the rays within the super SIL works only up to a point, stopping when
the marginal rays become orthogonal to the Z-axis. If the objective happens to
have a large NA to begin with, the super SIL can only increase the value of its sin
h up to 1, at which point the remaining rays will miss the super SIL. The other
improvement by a factor of n, however, is always realized in practice because the
wavelength always shrinks by this factor.
To see the effect of the super SIL on the focused spot, computed light intensity
distributions at and near the focus of a 0.4NA objective are shown in Figure 37.16.
The FWHM spot diameter at best focus is 0.84 lm along X and 0.8 lm along Y,
–2 x (μm) 2
Figure 37.16 Logarithmic plots of intensity distribution at and near the focus
of a 0.4NA objective lens operating at k0 ¼ 633 nm. Top: Dz ¼ 5 lm defocus.
Bottom: best focus (Dz ¼ 0). The spot’s FWHM at best focus is 0.84 lm along X
and 0.80 lm along Y.
while the depth of focus according to Eqs. (37.2) and (37.3) is around 4 lm.
When a super SIL of index n ¼ 2 is placed in front of this objective the plots of
Figure 37.17 are obtained. Clearly the spot size has shrunk by n2, but the depth of
focus is nearly the same as it was before the super SIL was introduced. Once again,
it is observed that focusing through the super SIL does not reduce the depth of
focus, as long as the sample and the super SIL move together along the Z-axis.
(This statement ignores the effects of a small amount of spherical aberration
introduced by the departure of the super SIL from its ideal location.)
A catadioptric SIL
A design that combines the objective and the SIL into one catadioptric element is
shown in Figure 37.18.7 (A catadioptric element is one that involves both the
reflection and the refraction of light.) A collimated beam enters the concave facet
of the lens, is reflected first at a flat internal mirror, then at an aspheric internal
mirror, and is finally brought to focus at the bottom of a plateau that is in contact
(or near contact) with the surface of the sample under investigation. This par-
ticular lens, which is also aplanatic, has a reasonably large field of view, with
–2 x (μm) 2
Figure 37.17 Logarithmic plots of intensity distribution obtained when a super

SIL (R ¼ 0.5 mm, n ¼ 2) is placed in front of the 0.4NA objective depicted in
Figure 37.16. From top to bottom, the super SIL and the sample, moving together
along the Z-axis, deviate from the position of best focus by Dz ¼ 5 lm, 2.5 lm, and
0. At best focus the FWHM of the spot is 0.24 lm along X and 0.20 lm along Y.
The effective NA is 1.6, and the spot size has shrunk accordingly, but the depth of
focus is essentially the same as it was without the super SIL.
NA ¼ 1.1. Because the central portion of the incident beam is not used, the lens
effectively has an annular aperture, which makes the central spot even smaller
than the Airy disk but at the expense of increasing the brightness of the rings.
Figure 37.19 shows plots of intensity distribution at the focus of the lens for the
three components of polarization. The FWHM of the total intensity distribution is
Flat
Aspheric mirror
mirror
Glass (n = 1.813)
Figure 37.18 A catadioptric optical element molded from glass of refractive

index n ¼ 1.813. The spherical entrance facet has a radius of curvature
Rc ¼ 0.7 mm and an aperture radius of 0.41 mm. Once inside the glass, the light
rays are reflected first from the flat aluminized surface and then from the
aspheric surface (also aluminized), before arriving at the flat exit surface of the
plateau. The aspheric parameters are as follows: Rc ¼ 2.5308 mm, K ¼ 1.7076,
A1 ¼ 0.01233, A2 ¼ 0.209 · 103, A3 ¼ 0.4476 · 104, A4 ¼ 0.8797 · 105, aper-
ture radius ¼ 1.8 mm. The vertex of the aspheric surface is at z ¼ 0, that of the
spherical surface is at z ¼ 0.3 mm, the flat mirror is at z ¼ 1.5 mm, and the exit
facet is at z ¼ 1.8 mm.
–2 x (μm) 2 –2 x (μm) 2 –2 x (μm) 2
Figure 37.19 Logarithmic plots of the intensity distribution at the focal plane of
the catadioptric lens of Figure 37.18. The incident beam is collimated and linearly
polarized along the X-axis. From left to right are shown the X-, Y-, and
Z-components of polarization. The integrated intensities of the three components
are in the ratios 1 : 0.003 : 0.128. The effective NA-value of the lens is 1.1, but its
annular shape of aperture gives rise to a spot size slightly less than that of the Airy
disk. The enhanced rings are also caused by the annular shape of the aperture.
0.3 lm along X and 0.26 lm along Y. Depth of focus is not a very useful concept
for this particular element because the incident beam is collimated.

1 M. Born and E. Wolf, Principles of Optics, 6th edition, Pergamon Press, Oxford, 1980.
2 A. E. Siegman, An Introduction to Lasers and Masers, McGraw-Hill, New York,
1971.
3 S. M. Mansfield, W. R. Studenmund, G. S. Kino, and K. Osato, Opt. Lett. 18, 305–307
(1993).
4 B. D. Terris, H. J. Mamin, and D. Rugar, Appl. Phys. Lett. 65, 388–390 (1994).
5 For the computations that led to Figures 37.13 and 37.14, the reflection coefficients
of the grating were first calculated using DELTA, a vector diffraction code developed
by Lifeng Li. These coefficients were subsequently imported to DIFFRACT where
they were combined to represent the effects of a focused beam.
et al. revisited, J. Opt. Soc. Am. A11, 2816–2828 (1994).
7 C. W. Lee et al., Feasibility study on near field optical memory using a catadioptric
optical system, Optical Data Storage Conference, Aspen, Colorado, May 1998.
38
Zernike’s method of phase contrast
Frederik (Fritz) Zernike (1888–1966)
Zernike invented the phase-contrast microscope in 1935, and was awarded

the 1953 Nobel prize in physics for this achievement.1 In an ordinary optical
microscope, an object that imparts a phase modulation to the incident light will
produce only a faint image. This faint image may be attributed to the diffraction
of a small amount of the light out of the entrance pupil of the objective lens. To
improve this image, Zernike in effect extracted a reference beam from the light
collected by the objective lens and produced an interferogram of the object at the
image plane of the microscope, thus converting phase information into amplitude
(or intensity) modulation.
The principles of operation of the phase-contrast microscope are by now fully
understood.1,2,3,4,5,6 Both spatially coherent and spatially incoherent light may be
used in this type of microscopy. For best results, a quasi-monochromatic light
source with a reasonable coherence time must be employed. Our goal in the present
chapter is to give a simple explanation of the main ideas behind the method and to
provide a pictorial survey of this important branch of modern optical microscopy.
545
The phase-contrast microscope

The diagram in Figure 38.1 shows the main elements of a phase-contrast
microscope. The light source may be a coherent source (e.g., a laser) or an inco-
herent one (e.g., a tungsten lamp or an arc lamp); monochromaticity may be
achieved by means of a colored glass filter. The condenser lens projects the source
onto the object, whose image is formed by the objective lens. Although the system
depicted in Figure 38.1 appears as transmissive, it could just as well represent the
unfolded view of a reflective system. In the latter case, the condenser and the
objective lens are physically the same element, and a means of separating the
incident path from the reflected path (such as a beam-splitter) must be provided.
The main difference between an ordinary microscope and a phase contrast
microscope is the presence, in the latter, of a spatial filter (or mask) within the
rear focal plane of the objective lens (see Figure 38.1). To appreciate the action of
this filter, we note that the light emerging from the object over its XY-plane
has a complex-amplitude distribution that may be assumed to be proportional to
exp[i(x, y)]. Here (x, y) is the phase distribution imparted to a uniform incident
beam upon transmission through (or reflection from) a sample that may have
surface-relief structure and non-uniform thickness, or perhaps even an inhomo-
geneous refractive index profile.
Assuming that (x, y) is sufficiently small, we may use a Taylor series expansion
to arrive at the following approximation:
exp½iðx; yÞ 1 þ iðx; yÞ: ð38:1Þ
In the Fourier domain, the first term in the above expansion, being the con-
stant or d.c. term, appears at the center of the plane of spatial frequencies. In
Spatial filter
Source Condenser Object Image

Microscope
objective
Figure 38.1 Schematic diagram of a simple phase-contrast imaging system. The

light source is projected by the condenser lens onto a phase object, allowing the
objective lens to form an image of this object at the image plane. The main com-
ponent is the mask in the Fourier plane, which imparts a uniform phase shift (and
possibly some amplitude attenuation) to the undiffracted component of the beam.
the rear focal plane of the objective lens, therefore, the d.c. term appears as a
bright spot centered at and around the optical axis. Zernike realized that
by placing a 90 phase shift on this d.c. term (i.e., multiplying it by i), he
could bring it in phase with the second term in Eq. (38.1). In this way he
enabled beams corresponding to the two terms in the above expansion to
interfere with each other when they overlapped within the image plane of the
system. The primary function of the spatial filter, therefore, is to delay, by one
quarter of a wavelength, the central region of the beam within the rear focal
plane of the objective lens.
The source and the illumination optics

Two types of illumination will be considered. To provide collimated coherent
illumination we assume that a monochromatic laser beam is brought to focus
on the object by a condenser lens of a very small numerical aperture (NA).
Figure 38.2(a) is the logarithmic intensity distribution at the object plane
produced by a 0.03NA condenser. This distribution has the shape of an Airy
pattern, with a central lobe diameter of 1.22k /NA 41k, where k is the
wavelength of the light source. Since the objects of interest will be small
compared to the Airy disk diameter, and since they will be placed near the
center of the Airy disk, this illumination qualifies as coherent, fairly uniform,
and nearly collimated.
The second type of illumination to be considered is incoherent. Our concern, of
course, is solely with spatial incoherence. However, to ensure that the phase is
meaningfully defined throughout the system and that the coherence time is long
enough for interference to occur, we assume a quasi-monochromatic source with
a sufficiently narrow bandwidth. With this type of illumination, the source can be
modeled as a collection of independent point sources extending over the lumi-
nous area of the lamp. We compute the image obtained with each such point
source independently, and add up the intensities of the resulting images to obtain
the final image.
The imaging optics

The objective lens used in the simulations described below is free from aberra-
tions and, therefore, its performance is diffraction-limited. The objective is a
finite-conjugate lens with a numerical aperture of 0.25 (on the side of the object),
a focal length of 5000k, and a magnification of 10.
The object used throughout this chapter is a transparent piece of flat glass
or plastic, embossed with seven marks of various sizes and shapes, as shown
a b
–45 x/ 45 –12.5 x/ 12.5
c d
–1500 x/ 1500 –125 x/ 125
Figure 38.2 Computed distributions at various cross-sections of the system of

Figure 38.1 (without the phase-contrast mask) for the case of coherent illu-
mination. (a) Logarithmic plot (a ¼ 4) of the intensity distribution at the object
plane, obtained when a collimated, coherent source is brought to focus by a
0.03NA condenser lens. (b) Pattern of phase objects (marks) with different
sizes and separations, on a uniform background. The transmissivity is 100%
over the entire area of this object, but the marks impart to the incident beam
a 36 phase shift (i.e., one-tenth of a wavelength) relative to the background.
(c) Logarithmic plot (a ¼ 6) of the intensity distribution at the exit pupil of the
objective lens. (d) Distribution of intensity in the image plane of the system in
the absence of a phase-contrast filter; the outlines of the marks are barely
visible in this image.
in Figure 38.2(b). The largest mark is 10k long and the smallest mark is
3k wide. These marks are large enough to yield a reasonably clear image with
both coherent and incoherent illumination, in conjunction with an appropriate
phase-contrast filter. All the marks impart to the incident beam a phase shift
of 36 relative to the background (corresponding to an optical path-length
difference of k /10).
For coherent illumination of the object by the beam depicted in Figure 38.2(a),
the logarithmic plot of intensity distribution at the Fourier plane is shown in
Figure 38.2(c). The bright central spot in this figure is the d.c. term mentioned
earlier. Note that the cutoff point of this logarithmic plot is at a ¼ 6 and, there-
fore, the light diffracted by the object and spread throughout the aperture of the
objective lens is quite weak. In the absence of any phase-contrast mechanism the
computed image of the object is as shown in Figure 38.2(d). This obviously is a
very poor image, one in which the boundaries of the marks are barely perceptible.
We will see below how the action of the phase-contrast filter dramatically
improves the quality of this image.
Contrast enhancement with coherent illumination

With a disk-shaped spatial filter (diameter ¼ 550k) placed in the Fourier trans-
form plane of the object, the image shown in Figure 38.3 will be obtained. The
filter in this case is a simple 90 phase-shifter, affecting the bright, central region
of the beam shown in Figure 38.2(c). The images of the marks are now clearly
visible, but the contrast is not remarkable.
To study the effect of amplitude filtering on image quality, we replace the
phase-shifting filter with one that simply blocks the central region of the beam
within the Fourier plane. The resulting image is shown in Figure 38.4. Note that,
by eliminating the d.c. component of the phase modulation function (x, y), use
of this filter has emphasized the boundaries of the marks.
The best choice for the phase-contrast filter is generally a phase/amplitude
mask that shifts the d.c. component of the beam by 90 , while attenuating
a b
–125 x/ 125 – 125 x/ 125
Figure 38.3 Image of the phase object of Figure 38.2(b), obtained with
the coherent illumination of Figure 38.2(a) when a phase-contrast mask is
placed in the Fourier plane. The mask is a small disk of radius 275k,
imparting a þ90 phase shift to the central region of the beam. (a) Intensity
distribution in the image plane. (b) Same as (a) but on a logarithmic scale
(a ¼ 1.65).
a b
– 125 x/ 125 – 125 x/ 125
Figure 38.4 Image of the phase object of Figure 38.2(b), obtained with the
coherent illumination of Figure 38.2(a), when an amplitude mask is placed in the
Fourier plane. The mask, a small disk of radius 275k, blocks the central region of
the beam. (a) Intensity distribution in the image plane. (b) Same as (a) but on a
logarithmic scale (a ¼ 3).
a b
–125 x/ 125 –125 x/ 125
Figure 38.5 Image of the phase object of Figure 38.2(b) obtained with the
coherent illumination of Figure 38.2(a) when a phase/amplitude mask is placed
in the Fourier plane. The mask, a small disk of radius 275k, imparts a þ 90
phase shift to the central region of the beam while attenuating its amplitude by
50%. (a) Intensity distribution in the image plane. (b) Same as (a) but on a
logarithmic scale (a ¼ 1.65).
its amplitude to bring it in line with the magnitude of (x, y). Figure 38.5 shows
the image obtained with a filter that cuts the amplitude in half while shifting the
phase by 90 . The resulting contrast enhancement is quite impressive.
Finally we consider the effect of changing the phase shift from þ 90 to 90 .
This is shown in Figure 38.6, where the images of the marks are now brighter
than their background. A similar situation will arise, of course, if instead of
a b
–125 x/ 125 – 125 x/ 125
Figure 38.6 Same as Figure 38.5 except for the phase shift of the mask, which
is 90 in the present case. (a) Intensity distribution in the image plane.
(b) Same as (a) but on a logarithmic scale (a ¼ 2).
reversing the sign of the phase at the filter we reverse the phase of the marks at
the object. In practice, most phase objects contain a number of positive as well as
negative features, and their images will appear to be darker than the background
in some regions and brighter in other regions.
Contrast enhancement with incoherent illumination

To obtain better resolution in optical microscopy one must illuminate the
object with a cone of light (as opposed to a cylindrical collimated beam). This
point was discussed in Chapter 5, “Coherent and incoherent imaging”. The best
results are typically achieved when the numerical apertures of the illumination
cone and the objective lens are identical because, under such circumstances,
twice as many spatial frequencies of the object are captured by the objective
lens. It turns out that the cone of light does not have to be solid in order
to achieve high resolution; the same benefits also derive from a hollow cone
of light.
In Figure 38.7(a) we show the annular source of incoherent light that is used in
our calculations to illuminate the condenser lens. In this annulus there are 36
independent “point sources”, which provide a good approximation to a homo-
geneous ring of incoherent light. A 0.25NA condenser lens projects the annulus to
a bright spot at its focal plane, as shown in Figure 38.7(b). This spot is large
enough to cover the phase object depicted in Figure 38.2(b).
The logarithmic plot of intensity distribution at the Fourier plane, Figure 38.7(c),
shows a bright annulus as well as a fairly uniform disk of diffracted light within the
a b
–1500 x/ 1500 – 25 x/ 25
c d
–1500 x/ 1500 – 125 x/ 125
Figure 38.7 Imaging of the phase object of Figure 38.2(b), obtained with an
incoherent, annular illuminator. (a) The simulated homogeneous, annular
light source consists of 36 independent, quasi-monochromatic point sources.
These point sources are arranged uniformly around the circumference of
the entrance pupil of the 0.25NA condenser lens. (b) Computed intensity
distribution at the focal plane of the condenser, which is also the location of
the object. (c) Distribution of the logarithm of intensity (a ¼ 6) at the exit
pupil of the 0.25NA objective lens. The annular phase mask placed at this
pupil has a width of 300k, it imparts a þ 90 phase shift and a 50% (ampli-
tude) attenuation to the beam at the outer periphery of the exit pupil.
(d) Computed intensity distribution at the image plane of the system.
exit pupil of the objective lens. Evidently, the phase-contrast filter must also be in
the form of an annular ring, covering the circumference of the objective’s exit pupil
and capable of delivering a 90 phase shift as well as a reasonable attenuation
factor to the incident beam.
The resulting image shown in Figure 38.7(d) is obviously of high quality, both
in terms of resolution and contrast.

1 F. Zernike, Z. Tech. Phys. 16, 454 (1935); Phys. Z. 36, 848 (1935); Physica 9, 686,
974 (1942).
2 M. Françon, Le contraste de phase en optique et en microscopie, Revue d’Optique,
Paris (1950).
3 A. H. Bennett, H. Jupnik, H. Osterberg, and O. W. Richards, Phase Microscopy,
Wiley, New York, 1952.
4 F. D. Kahn, Proc. Phys. Soc. B 68, 1073 (1955).
1980.
39
Polarization microscopy
The state of polarization of a given beam of light is modified upon reflection from
(or transmission through) an object. The resulting change in polarization state
conveys information about the structure and certain physical properties of the
illuminated region. Polarization microscopy is a variant of conventional optical
microscopy that enables one to monitor these changes over a small area of a
specimen. Such observations then allow the user to identify and analyze the
specimen’s structural and other physical features.1,2
Traditionally, observations with a polarization microscope have been cate-
gorized “orthoscopic” or “conoscopic.” Orthoscopic observations involve direct
imaging of the sample itself, thus allowing one to view the indentations, striations,
variations of optical activity and birefringence, etc., over the sample’s surface.
Conoscopic observations, however, involve illuminating a crystalline surface with
a cone of light and then imaging the exit pupil of the objective lens. This mode of
observation is used in characterizing the crystal’s ellipsoid of birefringence and
identifying its optical axes.
The polarization microscope

Figure 39.1 is a simplified diagram of a polarization microscope. The light source
is typically an extended white light source, such as a halogen lamp or an arc lamp.
The collected and collimated beam from the source is linearly polarized as a
result of passage through a polarizer. In metallurgical microscopes, such as the
one shown here, the objective lens is used both for illuminating the sample and
for collecting the reflected light. Typically the source is imaged onto the entrance
pupil of the objective lens, which provides for maximum light-collection effi-
ciency while producing a highly defocused image of the source at the sample.1,3
Any non-uniformities of the source are thereby averaged, to yield a more uniform
light intensity distribution at the sample’s surface.
554
CCD
Camera
Analyzer
Wollaston
Prism
Light
Lens Source
Linear
Objective
Polarizer
Sample
Figure 39.1 Diagram of a conventional polarization microscope. The spatially

incoherent light source is linearly polarized and imaged onto the entrance pupil
of the objective lens. The reflected light returns through the objective and, after
passage through the analyzer, arrives at the image plane. The analyzer is in a
rotatable mount, and its transmission axis is adjusted to yield maximum image
contrast. If the analyzer is replaced with a Wollaston prism, two images will
appear, side by side, on the camera’s CCD plate. The computer downloads both
images simultaneously and subtracts one from the other in order to produce a
differential image.
Although the source is spatially incoherent, the projected beam at the sample’s
surface is, in general, partially coherent. As for the degree of temporal coherence
of the light source, it does not play a role in polarization microscopy and is,
therefore, ignored throughout this chapter. All one needs to assume is that the
light source is quasi-monochromatic, with a bandwidth that is sufficiently narrow
to allow one to restrict attention to a single wavelength. The bandwidth must be
wide enough, however, to render the source spatially incoherent. (An extended
but purely monochromatic source is, of necessity, spatially coherent because the
radiated fields from any two locations on the source maintain their relative phase
at all times.)
Throughout this chapter we assume a quasi-monochromatic source of wave-

length k0, consisting of a fixed number of independent and mutually incoherent
point sources arranged on a tightly packed square lattice. The contribution of
each such point source to the final image is computed independently of those of
all the other point sources. The sum of the intensity distributions thus produced at
the image plane by the individual point sources constitutes the image of the
object. This method of computing the image takes full account of the partial
spatial coherence of the illuminating beam without ever having to introduce the
corresponding correlation functions explicitly.
The light reflected from the sample is collected by the microscope’s objective
lens, then passed through another linear polarizer (usually referred to as the
analyzer), and finally brought to focus at the image plane. This image plane
coincides with the front focal plane of the eyepiece (not shown) or the plane of
the detectors within a TV camera. Modern optical microscopes are usually
equipped with a charge-coupled device (CCD) camera, which picks up the image
and displays it on a computer monitor. The possibility of digital image processing
afforded by this electronic acquisition allows new methods of microscopy, such
as the differential method to be described shortly.
The analyzer is rotated about the optical axis until its transmission axis is
crossed (or nearly crossed) with that of the polarizer. The image contrast is
primarily determined by the action of the object on the state of polarization of the
incident beam. In regions where the sample does not affect the polarization, the
reflected light is blocked by the analyzer, making the corresponding regions of
the image dark. However, in those regions that rotate the polarization vector, a
fraction of the light goes through the analyzer, the transmitted optical power
being proportional to the degree of rotation of the polarization as well as to the
actual reflectivity of the sample at the given spot. The resulting image thus
provides a map and a measure of the ability of the sample to rotate the direction
of incident polarization at its various locations. This has been the basis of
orthoscopic polarization microscopy for many years. The conoscopic approach,
which involves the imaging of the exit pupil of the objective lens, will be dis-
cussed towards the end of this chapter.
The four-corners problem

A limitation of polarization microscopy is rooted in the fact that the beam’s state
of polarization is affected by ordinary reflections and refractions at the various
surfaces throughout the optical path.1,4,5 This usually results in polarization
rotation and/or ellipticity in the four corner areas of the objective’s exit pupil, as
shown in Figure 39.2. The four-corners problem allows transmission of spurious
a b
c d
– 3200 x/0 3200 – 3200 x/0 3200
objective when a single monochromatic point source is used to illuminate the
sample. The intensity plots in (a) and (b) correspond, respectively, to the
components of polarization parallel and perpendicular to the polarizer’s trans-
mission axis. The polarization rotation angle q is depicted in (c) and the
polarization ellipticity g is shown in (d). The gray-scale of the latter plots depicts
positive values of q and g as bright and negative values as dark.
light through the analyzer, thereby reducing the contrast of the image. When the
problem is caused by reflections and refractions at the various surfaces of the
objective (or condenser) lens, a viable solution is to use a specialty objective that
incorporates a half-wave plate in the midst of its optical train.1,6 The half-wave
plate rotates the polarization direction by 90 , allowing the four-corner rotations
before and after the plate to cancel each other out. This solution was offered by
objective-lens manufacturers in the early days, before the advent of powerful
antireflection coatings. Nowadays the various surfaces of the objective and the
condenser are antireflection coated, and the four-corners problem caused by these
surfaces is negligible.
The problem still remains, however, that Fresnel’s reflection coefficients at the
sample’s surface differ for p- and s-polarized rays, causing a polarization rotation
problem that is aggravated with increasing angle of incidence. Moreover, if the
sample is observed through a birefringent substrate, the resulting polarization
variations over the beam’s cross-section give rise to spurious light transmission
through the analyzer, which, once again, reduces the image contrast.5 These
problems can no longer be solved by the incorporation of a half-wave plate within
the objective lens, because they are sample dependent. The differential method of
microscopy described below solves the four-corners problem by splitting the
spurious light between two images of the sample and then eliminating it by
subtracting one image from the other.
Differential method†
A simple modification of the conventional microscope of Figure 39.1 involves
replacing the analyzer with a Wollaston prism. The Wollaston splits the image of
the sample into two and transmits both images, side by side, to the camera. With the
transmission axes of the Wollaston fixed at 45 relative to the polarizer’s axis, the
unrotated light is split equally between the two images. When there is polarization
rotation, however, one image receives more light than the other, the sense of rotation
of the polarization determining which image gets the larger share. The two images
are then subtracted from each other (within the computer) to produce a single
differential image of the sample. The differential image is superior in many respects
to the conventional image, as will be seen in the examples that follow. The main
advantage of differential polarization microscopy is that it does not suffer from the
four-corners problem. Another advantage is that a map of reflectivity variations
across the sample can be readily constructed by adding the two images together;
normalizing the differential image by the sum image then provides a pure map of
polarization rotation at the sample.
The sample
In general, the polarization image of a sample is mixed with its other images, say,
those produced by reflectivity variations or optical phase variations across the
sample. To avoid such complications, we consider a smooth sample having
uniform amplitude and phase reflectivity everywhere, but one that rotates the
polarization of the incident beam as a result of optical activity. A perpendicularly
magnetized thin-film sample provides a good example in this case. By changing
the direction of magnetization (from up to down) in different locations, one can
create a pattern of magnetic domains such as that shown in Figure 39.3. Here the
smallest domain (shown at the center) is one wavelength in diameter. The black
†
To the author’s best knowledge the concept of differential polarization microscopy has not been described
previously in the technical and patent literature and may therefore be novel.
–6 x/0 6
Figure 39.3 Pattern of magnetic domains on a perpendicularly magnetized

sample. The magnetic material rotates the polarization of a linearly polarized
beam at normal incidence by 0.5 . The domains are chosen to represent a wide
range of sizes and shapes; the smallest domain appearing in the center is one
wavelength (k0) in diameter.
and white regions are magnetized in opposite directions and rotate the incident
(linear) polarization by þ 0.5 and 0.5 , respectively.
The material of the sample used in the following examples is assumed to have
complex index of refraction (n, k) ¼ (3.35, 4.03) which gives it a reflectivity of
62% at normal incidence. At oblique incidence the Fresnel reflection coefficients
for p- and s-polarized light differ from each other, thus inducing some rotation
and ellipticity into the reflected polarization state. For instance, at a 53 angle of
incidence, the linear polarization of a ray originally directed at 45 with respect to
the p-direction rotates by 7.4 and acquires 8.7 of ellipticity. This change of the
polarization state upon reflection is caused solely by the Fresnel coefficients of
the sample, independently of its optical activity.
Low-resolution imaging
Figure 39.4 shows computed images, both conventional and differential, of the
magnetic marks of Figure 39.3 obtained with a 50 ·, 0.4NA objective. In these
calculations the source was defocused by a distance of 35k0 below the object
plane, and the images from a total of 361 point sources were superimposed to
simulate the (spatially incoherent) light source. For the conventional image
shown in Figure 39.4(a) the analyzer axis was set 0.5 away from the cross
position, nearly the optimum setting for achieving maximum contrast in this case.
(The contrast may be reversed by rotating the analyzer to the opposite side of the
cross position.) The resolution of these images is not great, as evidenced by the
near-disappearance of the small mark in the center. The contrast, however, is
a
– 300 x/0 300
Figure 39.4 Images of the sample of Figure 39.3 in a polarization microscope

having a 50 ·, 0.4NA objective lens. (a) Conventional image obtained with the
analyzer set 0.5 away from extinction. (b) Differential image obtained with
the Wollaston prism.
quite good, and there is little difference between the conventional and differential
methods of imaging. The reason is that at 0.4NA the half-angle of the focused cone
of light is only 23.6 , which is not large enough to cause a significant four-corners
problem.
High-resolution imaging
Obtaining images with high resolution requires a high-NA objective lens. Figure 39.5
shows both conventional (a), (b) and differential (c), (d) images of the sample of
Figure 39.3 obtained with a 50 ·, 0.8NA objective. The images on the left show dark
domains on a bright background, while the reverse-contrast counterpart of each
image is shown to its right. In these calculations the source was defocused by a
distance of 10k0 below the object plane, and the images from a total of 361 point
sources were superimposed to simulate the (spatially incoherent) light source.
Inspection of Figure 39.5 reveals that the resolution has improved over that of
Figure 39.4. The contrast, however, is quite poor for the conventional images in
a b
c d
– 300 x/0 300 – 300 x/0 300
Figure 39.5 Images of the sample of Figure 39.3 in a polarization microscope

having a 50 ·, 0.8NA objective lens. (a) Conventional image obtained with the
analyzer set þ1.5 away from extinction. (b) Same as (a) but now the analyzer
is set 1.5 from extinction to reverse the contrast. (c) Differential image.
(d) Same as (c) but with the order of subtraction reversed.
Figures 39.5(a), (b), even though the analyzer has been set optimally at 1.5 from
the crossed position. This poor contrast is a manifestation of the four-corners
problem. In comparison, the differential images of Figures 39.5(c), (d) show
excellent contrast, which is not surprising considering that the four-corners con-
tributions to individual images (before subtraction) are identical and can therefore
be removed by subtraction.
To gain a better appreciation of the four-corners problem, consider the
intensity distribution at the plane of the sample, Figure 39.6, corresponding to a
single point source defocused by 10k0. Although the incident beam entering the
objective lens is linearly polarized along the X-axis, the defocused spot, in
consequence of the bending of the rays by the lens, contains all three components
of polarization, along the X-, Y-, and Z-axes; these are shown respectively from
top to bottom in Figure 39.6. The peak intensities of the three components in
Figure 39.6 are in the ratios Ix : Iy : Iz ¼ 1 : 0.007 : 0.185. Upon reflection from the
sample the distributions remain qualitatively the same, but the peak-intensity
ratios change to 1 : 0.017 : 0.142. Thus the relative content of the Y-component
– 12 x/0 12
Figure 39.6 Distribution of incident intensity at the plane of the sample corres-
ponding to a single point source defocused by 10k0 through a 0.8NA objective.
The incident beam entering the lens is linearly polarized along the X-axis. Top to
bottom: intensity distributions corresponding to polarization components along the
X-, Y-, and Z-axes.
increases upon reflection while that of the Z-component decreases. When this
distribution returns to the objective lens, it gives rise to patterns of intensity and
polarization similar to those shown in Figure 39.2. At the exit pupil the values of
the polarization rotation angle q range from 7.0 to þ 8.1 , while the polar-
ization ellipticity g ranges from 8.8 to þ 8.6 . The slight asymmetry between
positive and negative values is caused by the presence of magnetization in the
sample. In the absence of magneto-optical activity, q and g vary between 7.4
and 8.7 , respectively.
Substrate birefringence
Sometimes it is necessary to observe a sample through an intervening medium,
such as a coating layer or a substrate. If this medium happens to be birefringent,
it creates a four-corners problem of its own.5 As a typical example, assume that the
sample of Figure 39.3 is coated with a birefringent layer 500 nm thick whose
principal refractive indices along the coordinate axes are (nx, ny, nz) ¼ (1.5, 1.6, 1.7).
For this sample, conventional microscopy yields the image shown in Figure 39.7(a),
– 300 x/0 300
Figure 39.7 Images of the sample of Figure 39.3, coated with a birefringent
layer and placed in a microscope having a 50 ·, 0.8NA objective. (a) Conventional
image, obtained with the analyzer set optimally at 5 away from extinction.
(b) Differential image. (c) Same as (b) but with the order of subtraction reversed.
while differential microscopy produces the normal and reverse-contrast images

of Figures 39.7(b), (c). Clearly, in the presence of birefringence differential
polarization microscopy is far superior to the conventional method. For this
sample, the reflected polarization pattern at the exit pupil for a single illuminating
point source (see Figure 39.2) exhibits q-values ranging from 20.4 to þ 22.0 ,
and g-values ranging from 23.3 to þ 23.0 . In the absence of magnetic activity
q and g would vary between 21.3 and 23.2 , respectively.
Conoscopic observations
The system depicted in Figure 39.8 captures the essence of conoscopic polar-
ization microscopy. Here a coherent, monochromatic beam of light is linearly
polarized and sent through an objective lens to be focused on a birefringent
crystal. The reflected light is re-collimated by the objective and observed after
going through a crossed analyzer. For the specific example described below, the
objective’s NA-value is 0.375 and its focal length f is 20 000k0. The sample is in
the XY-plane, the Z-axis being perpendicular to its surface. The crystal slab’s
thickness is 430k0, its principal refractive indices are (nx, ny, nz) ¼ (1.686, 1.682,
1.531), and its ellipsoid of birefringence is rotated around the Z-axis by 13 .
The computed intensity distribution at the observation plane of Figure 39.8
is shown in Figure 39.9(a), and the corresponding logarithmic plot appears in
Figure 39.9(b). Within the focused cone there are two rays that propagate along
the two optical axes of the crystal; these rays return without any change in their
state of polarization and are therefore blocked by the analyzer. There are also
groups of rays whose polarization vectors undergo rotation by integer multiples
of 180 in double passage through the slab. These rays are also blocked by
the analyzer, giving rise to the various dark regions in the intensity patterns of
Lens Birefringent
Polarizer Beam-splitter crystal
Aluminum
Analyzer mirror
Observation plane
Figure 39.8 Schematic diagram of a simplified conoscopic microscope. The

double passage of the focused beam through the birefringent crystal causes varying
degrees of polarization rotation over the beam’s cross-section. The crossed analyzer
converts these rotations into an intensity pattern.
a b
– 8000 x/ 0 8000 – 8000 x/ 0 8000
Figure 39.9 (a) Intensity and (b) logarithmic intensity distributions at the
observation plane in the system of Figure 39.8 with a biaxially birefringent crystal.
Figure 39.9. A systematic analysis of the exit-pupil distribution can, therefore,

provide detailed information about the sample’s ellipsoid of birefringence.

1 S. Inoué and R. Oldenbourg, Microscopes, in Handbook of Optics, Vol. II, second
edition, McGraw-Hill, New York, 1995.
2 J. R. Benford and H. E. Rosenberger, Microscopes, in Applied Optics and Optical
Engineering, Vol. IV, ed. R. Kingslake, Academic Press, New York, 1967.
1980.
4 H. Kubota and S. Inoué, Diffraction images in the polarizing microscope, J. Opt. Soc.
Am. 49, 191–198 (1959).
5 Y. C. Hsieh and M. Mansuripur, Image contrast in polarization microscopy of
magneto-optical disk data-storage media through birefringent plastic substrates,
Applied Optics 36, 4839–4852 (1997).
6 J. R. Benford, Microscope objectives, in Applied Optics and Optical Engineering,
Vol. III, ed. R. Kingslake, Academic Press, New York, 1965.
40
Nomarski’s differential interference
contrast microscope
George Nomarski invented the method of differential interference contrast for the
microscopic observation of phase objects in 1953.1,2,3 The features on a phase
object typically modulate the phase of an incident beam without significantly
affecting the beam’s amplitude. Examples include unstained biological samples
having differing refractive indices from their surroundings, and reflective (as
well as transmissive) surfaces containing digs, scratches, bumps, pits, or other
surface-relief features that are smooth enough to reflect specularly the incident
rays of light. A conventional microscope image of a phase object is usually faint,
showing at best the effects of diffraction near the corners and sharp edges but
revealing little information about the detailed structure of the sample.4
Nomarski’s method creates two slightly shifted, overlapping images of the same
surface. The two images, being temporally coherent with respect to one another,
optically interfere, producing contrast variations that contain useful information
about the phase gradients across the sample’s surface. In particular, a feature that has
a slope in the direction of the imposed shear appears with a specific level of brightness
that is distinct from other, differently sloping regions of the same sample.4,5,6
The Nomarski microscope uses a Wollaston prism in the illumination path to
produce two orthogonally polarized, slightly shifted bright spots at the sample’s
surface. Upon reflection from (or transmission through) the sample, the two
beams are collected by the objective lens, then sent through the same (or, in the
case of a transmission microscope, a similar) Wollaston prism, which recombines
the two beams by sliding them back over each other. The two beams subsequently
arrive coincidentally in the image plane of the microscope, but the two images of
the sample which they carry will be relatively displaced. A linear analyzer, placed
after the Wollaston prism in the reflected (transmitted) path, brings the polar-
ization vectors of the two images into alignment, enabling the two to interfere
with each other. A sheared interferogram of the sample’s surface is thus formed at
the image plane of the microscope.
566
Wollaston prism
Because Nomarski’s method of microscopy is fundamentally dependent on the
action of the Wollaston prism, a brief description of this polarizing beam-splitter is
in order. The Wollaston prism, depicted in Figure 40.1, consists of two cemented
wedges from the same uniaxial birefringent crystal (e.g., quartz or calcite). The
individual wedges are precisely cut and polished, then aligned with their optic axes
orthogonal to each other.4 In Figure 40.1 the optic axis of the upper wedge is
horizontal within the plane of the page, while that of the lower wedge is perpen-
dicular to the plane. The crystal’s ordinary and extraordinary refractive indices,
no and ne, interact with the E-field components perpendicular and parallel to the
optic axis, respectively.
The incident beam, in general, has both s- and p-components of polarization.
In going through the upper half of the Wollaston, the p-component interacts
with ne and the s-component with n0, but the propagation direction remains the
same for both the p- and s-beams. In the lower half the roles of n0 and ne are
exchanged, with the result that the p-component is deflected to one side and the
s-component to the other (one beam enters a denser, the other a rarer medium).
The angular separation of the beams is further enhanced by Snell’s law when
they exit the prism. Emerging from the Wollaston, therefore, are two beams,
propagating in different directions and having mutually orthogonal directions of
polarization.
Incident beam
p
s
Optic axis
a
Wollaston
Optic axis
p
s
Emergent beam 2 Emergent beam 1
Figure 40.1 The Wollaston prism consists of two cemented wedges of the
same uniaxial birefringent crystal, aligned with their optic axes in different
directions. The incident beam, with its p- and s-components of polarization, is
split at the interface between the wedges. Emerging from the Wollaston are two
orthogonally polarized beams that propagate in different directions.
Figure 40.2 shows a thin bundle of rays arriving at a Wollaston prism and
splitting into two orthogonally polarized beams. The p- and s-beams go through a
microscope objective and illuminate the sample in two small, slightly displaced
patches that cover the objective’s field of view. Upon reflection from the sample
the beams return through the objective and come together again as they emerge
from the Wollaston. Note that, in a round trip through this system, the optical
path lengths of the p- and s-beams will be the same only if the Wollaston is
centered on the Z-axis. In particular, if the Wollaston is translated along the
X-axis then, during a round trip, one beam sees a longer optical path than the
other. The relative phase of the p- and s-beams, referred to as the bias phase B,
can therefore be adjusted by sliding the Wollaston along the X-axis. Note that, for
a given lateral position of the Wollaston, the bias phase B is constant for all the
ray bundles that go through the system: it is independent of their initial distance
from the Z-axis.
Assuming a ¼ 0.84 for the wedge angles and n0 ¼ 1.54467, ne ¼ 1.55379 for
the ordinary and extraordinary refractive indices of the crystal (quartz), the
angular separation of the two beams emerging from the Wollaston (in the forward
Wollaston
X
Objective
Sample
Figure 40.2 A bundle of rays entering a Wollaston prism is split into p- and
s-polarized beams. The beams go through a microscope objective and illuminate
the sample in two small, slightly displaced patches that cover the objective’s field
of view. Upon reflection from the sample, the beams return through the objective
and come together as they exit the Wollaston prism. The bias phase B between the
two beams may be adjusted by sliding the Wollaston in the horizontal direction.
path) will be 0.0153 . For an objective lens having f ¼ 3750k, where k is the
wavelength of the quasi-monochromatic light source, this angular separation
results in one k of displacement between the two spots that illuminate the sample.
Moreover, for every lateral shift by 100k of the Wollaston, there occurs a bias
phase B ¼ 19.26 between the p- and s-beams in a double pass through the
system. So, for example, if the lateral shift is 1870k then one beam will be
retarded by a full 2p relative to the other.
Differential interference contrast microscope

Figure 40.3 is a diagram of an epi-illumination Nomarski differential interference
contrast microscope. For the computer simulations reported in this chapter the
Observation Plane
Lens
Analyzer at –45º
Polarizer at +45º
Beam-splitter
Light source
Lens
Wollaston prism
Objective
Sample
Figure 40.3 Schematic diagram of an epi-illumination Nomarski microscope.

The spatially incoherent light source is quasi-monochromatic (wavelength k),
the polarizer renders the illuminating beam linearly polarized, and the Wollaston
prism, with axes at 45 to the direction of incident polarization, creates two
slightly displaced, orthogonally polarized patches of light at the sample. The
light reflected from the sample returns through the objective and the Wollaston,
arriving at the crossed analyzer with its two components of polarization rela-
tively phase-shifted. The light that gets through the analyzer forms an image of
the sample at the observation plane.
spatially incoherent light source is assumed to be quasi-monochromatic (wave-

length k), consisting of 529 point sources arranged in a square array. These point
sources are projected onto the mid-plane of the Wollaston prism, which sits at the
entrance pupil of the objective lens. The entrance pupil being at the back focal
plane of the objective, uniform illumination at the sample’s surface is achieved
(Köhler illumination). The illumination is called “critical” if the source is imaged
directly onto the sample. In practice Köhler illumination is preferred over critical
illumination because of its superior uniformity, but coherence-related properties
of the system (such as resolution) are not affected by this choice of illumination.
In this chapter, for reasons having to do with nuances of the computer simulation,
we have chosen to illuminate the sample with a somewhat defocused image of
the source.
The polarizer renders the illuminating beam linearly polarized, and the Wollaston
prism, whose axes are at 45 relative to the transmission axis of the polarizer, creates
two orthogonally polarized, slightly displaced patches of light at the sample. The
light reflected from the sample returns through the objective and the Wollaston but,
as it arrives at the crossed analyzer, its two components of polarization are no longer
in phase. The phase difference between the p- and s-beams at this point is B þ D,
where B is the constant bias phase produced by the Wollaston’s displacement from
the center and D is the imparted phase retardation at the sample’s surface. The
amount of light that gets through the analyzer depends on the above phase shift, with
more light going through as the phase shift increases from 0 to 180 . Each bright
point within the light source illuminates the entire field of view of the objective and
creates an image at the observation plane. The various point sources thus create
overlapping images, which add up in intensity by virtue of the (spatial) incoherence
of the light source.
Examples
Figure 40.4(a) shows the distribution of phase on a uniformly reflecting surface
having several sphero-cylindrical pits with varying depths. The nose feature has a
depth of 0.5k, and the mouth, eyes, and eyebrows are respectively 0.25k, 0.375k,
and 0.75k deep. The computed image of this phase object in a conventional
optical microscope (i.e., like that in Figure 40.3 but without the polarizer, ana-
lyzer, and Wollaston) is shown in Figure 40.4(b). Note that diffraction of light
from the edges of the various features of the face creates dark borders in the
corresponding image regions, but this conventional image lacks information
about the slope and depth distribution within those features.
The computed Nomarski image of the phase object of Figure 40.4(a), obtained
with one k of sheer along the X-axis, is shown in Figure 40.5. The intensity
a b
–11 x/l 11 –550 x/l 550
Figure 40.4 (a) The distribution of phase at an object’s surface and (b) the
distribution of intensity in the image of the same object, as observed in a con-
ventional optical microscope. In (a) the various features of the “face” have the
same reflectance but different depth, resulting in phase modulation of the inci-
dent light. The nose, mouth, eyes, and the eyebrows are respectively 0.5k, 0.25k,
0.375k, and 0.75k deep. The image in (b) is formed by a 0.8NA, 50· objective.
The simulated light source consisted of 529 spatially incoherent point sources,
each defocused by 10k above the sample’s surface. The observed contrast is
purely due to diffraction effects, as the phase object does not give rise to any
contrast in geometric-optical terms.
a b
–550 x/l 550 –550 x/l 550
Figure 40.5 Nomarski images of the phase object in Figure 40.4(a), when the
Wollaston produces one k of shear along the X-axis. The microscope is that
shown in Figure 40.3, having a 50·, 0.8NA objective, and the Wollaston’s
horizontal position is adjusted for B ¼ 0 . (a) Intensity distribution in the image
plane; (b) logarithmic plot of the intensity distribution.
distribution in the image plane is shown in Figure 40.5(a), while a logarithmic

plot of intensity (resembling an over-exposed photographic plate) is shown in
Figure 40.5(b). In these calculations the assumed bias phase B ¼ 0; this results in
identical image brightness for regions with equal but opposite slopes, and also yields
a completely dark image background. Since the assumed shear in Figure 40.5 is
along the X-direction, vertical features (such as the nose) are clearly visible in the
Nomarski image, while horizontal features (such as the mouth) are hidden. The
reverse is true when the shear is along the Y-axis, as in Figure 40.6, where horizontal
features become visible while vertical features disappear.
Figure 40.7 shows the Nomarski image of the object in Figure 40.4(a), but
with a bias phase B ¼ 90 . The background of the image is now bright,
because the analyzer no longer blocks the light reflected from flat regions of
the sample. Moreover there is an asymmetry between regions with positive
and negative slope, as can be seen by comparing the right and left sides of the
nose feature.
Another example of a phase object is shown in Figure 40.8(a). Here a ridge
having height k runs along the 45 direction in the XY-plane. The two edges of
the ridge have differing slopes, the lower edge being 4k wide while the upper
edge is 2k wide. In the middle of the ridge there is a pit of depth k in the shape
a b
–550 x/l 550 –550 x/l 550
Figure 40.6 Same as Figure 40.5, except for the direction of shear, which is
along the Y-axis.
–550 x/l 550
Figure 40.7 Nomarski image of the phase object in Figure 40.4(a), when the
Wollaston produces one k of shear along the X-axis. The microscope is that
shown in Figure 40.3, having a 50·, 0.8NA objective, and the Wollaston’s
horizontal position is adjusted for B ¼ 90 .
a b
–11 x/l 11 –550 x/l 550
Figure 40.8 (a) Phase object and (b) its conventional microscope image. The
object consists of a ridge with a height of k, running at 45 to the X- and Y-axes,
and a pit in the middle of the ridge whose depth is also k. The ridge’s side-
walls have different slopes: the lower wall is 4k wide, while the upper wall is
2k wide. The flat-bottomed pit has the shape of a football stadium. The image
in (b) is formed through a 50·, 0.8NA microscope objective. The simulated
light source consisted of 529 spatially incoherent point sources, each defocused
by 10k above the sample’s surface. The observed image contrast is purely due
to diffraction effects, as the phase object does not give rise to any contrast in
geometric-optical terms.
of a football stadium. The conventional image of this sample is shown in

Figure 40.8(b). Again diffraction from the various edges renders certain
features visible in the image, but specific information about the slopes is
lacking. In contrast, two Nomarski images of the same object obtained with
one k of horizontal shear are shown in Figure 40.9. The bias phase B ¼ 0 in
Figure 40.9(a), whereas B ¼ 90 in Figure 40.9(b). Different slopes produce
different intensity levels in these images. Also note that the symmetry present
in Figure 40.9(a) between equal but opposite slopes is broken in Figure 40.9(b),
where B ¼ 6 0 .
Practical considerations
The back focal plane of high-NA objectives is usually inaccessible from outside
the lens, so the Wollaston prism cannot be directly inserted at the entrance pupil.
By choosing a somewhat different orientation for the optic axes of the crystal
wedges, Nomarski modified the Wollaston prism in such a way that the p- and
s-beams appeared to be separating from each other in a plane external to the
prism.3 In this way the light source could be imaged onto the entrance pupil of the
objective through the Nomarski-modified Wollaston prism, allowing both Köhler
illumination and the separation and recombination of the p- and s-beams at the
entrance pupil.
a b
–550 x/l 550 –550 x/l 550
Figure 40.9 Nomarski images of the phase object of Figure 40.8(a), when
the Wollaston produces one k of shear along the X-axis. The microscope is
that shown in Figure 40.3, having a 50·, 0.8NA objective. The Wollaston’s
horizontal position is adjusted to yield a bias phase B between the p- and
s-polarized beams. (a) B ¼ 0 , (b) B ¼ 90 .
Another practical consideration involves the use of broadband light sources.

The sources used in practice are not always monochromatic and, in fact, may
have a fairly broad spectrum. The analysis offered in this chapter applies to
multi-color sources as well, provided that the individual wavelengths are
treated independently and their corresponding images are eventually super-
imposed. In any given region of the sample, interference causes certain colors
to fade while strengthening others. The color or hue observed through a
broadband Nomarski microscope at a given location is thus a qualitative
measure of the slope of the sample at that location. For quantitative meas-
urements, however, it is best to use quasi-monochromatic light in conjunction
with some form of phase-shifting interferometry.7,8,9 This may be achieved,
for instance, by sliding the Wollaston prism along the shear direction while
monitoring (with a CCD camera) the variations in intensity at specific loca-
tions of the image.

1 G. Nomarski, Diapositif interferentiel à polarisation pour l’étude des objects
transparents ou opaques appartenant à la classe des objects de phase, French patent
No. 1059 124, 1953.
2 G. Nomarski, Microinterféromètre différential à ondes polarisées, J. Phys. Radium
16, 9S–11S (1955).
3 R. D. Allen, G. B. David, and G. Nomarski, The Zeiss–Nomarski differential
interference equipment for transmitted light microscopy, Z. Wiss. Mikroskopie 69 (4),
193–221 (1969).
5 S. Inoué and R. Oldenbourg, Microscopes, chapter 17 in Handbook of Optics, Vol. II,
McGraw-Hill, New York, 1995.
6 M. Pluta, Advanced Light Microscopy, Vol. 2: Specialized Methods, Elsevier,
Amsterdam; Polish Scientific Publishers, Warszawa, 1989.
7 D. L. Lessor, J. S. Hartman, and R. L. Gordon, Quantitative surface topography
determination by Nomarski reflection microscopy. I. Theory, J. Opt. Soc. Am. 69,
357–366 (1979).
8 J. S. Hartman, R. L. Gordon, and D. L. Lessor, Quantitative surface topography
determination by Nomarski reflection microscopy. II. Microscope modification, calibr-
ation, and planar sample experiments, Applied Optics 19, 2998–3009 (1980).
9 W. Shimada, T. Sato, and T. Yatagai, Optical Surface Microtopography using
phase-shifting Nomarski microscope, SPIE 1332, Optical Testing and Metrology,
525–529 (1990).
41
The van Leeuwenhoek microscope
Antoni van Leeuwenhoek (1632–1723), a fabric merchant from Delft, the

Netherlands, used tiny glass spheres to study various microscopic objects at high
magnification with surprisingly good resolution. A contemporary of Sir Isaac
Newton, Christiaan Huygens, and Robert Hooke, he is said to have made over
400 microscopes and bequeathed 26 of them to the Royal Society of London. (A
handful of these microscopes are extant in various European museums.) Using his
single-lens microscope, van Leeuwenhoek observed what he called animalcules –
or micro-organisms, to use the modern terminology – and made the first drawing
of a bacterium in 1683. He kept detailed records of what he saw and wrote about
his findings to the Royal Society of London and the Paris Academy of Science.
His contributions have made him the father of scientific microscopy.1,2,3
Van Leeuwenhoek was an amateur in science and lacked formal training. He
seems to have been inspired to take up microscopy by Robert Hooke’s illustrated
book, Micrographia, which depicted Hooke’s own observations with the micro-
scope. In basic design, van Leeuwenhoek’s instruments were simply powerful
magnifying glasses, not compound microscopes of the type used today. An entire
instrument was only 3– 4 inches (8–10 cm) long, and had to be held up close to the
eye; its use required good lighting and great patience.4 Van Leeuwenhoek devised
tiny, double-convex lenses to be mounted between brass plates. Through them, he
was able to peer at objects mounted on pinheads, magnifying them up to 300 times,
a power that far exceeded that of early compound microscopes.
Compound microscopes had been invented around 1595. Several of van
Leeuwenhoek’s contemporaries, notably Robert Hooke in England and Jan
Swammerdam in the Netherlands, had built compound microscopes and were
making important discoveries with them. However, because of various technical
difficulties, early compound microscopes were not practical for magnifications
beyond 20· or 30·. Van Leeuwenhoek’s skill at grinding lenses, together with his
naturally acute eyesight and great care in adjusting the lighting, enabled him to
576
build microscopes with clearer and brighter images than any of his contemporaries
could achieve.
Van Leeuwenhoek used his invention to confirm the discovery of capillary
systems, to describe the life cycle of ants, and to observe plant and muscle tissue,
protozoa and bacteria, and the spermatozoa of insects and humans. In 1673,
van Leeuwenhoek began writing letters to the newly formed Royal Society of
London, describing his findings – his first letter contained some observations on the
stings of bees. For the next 50 years he corresponded with the Royal Society; his
letters, written in Dutch, were translated into English or Latin and printed in the
Philosophical Transactions of the Royal Society, and often reprinted separately. His
experiments with microscope design and function made him an international
authority on microscopy, and in 1680 he was made a Fellow of the Royal Society.
It is suspected that van Leeuwenhoek produced his lenses by chipping away
the excess glass from the thickened droplet that forms on the bottom of a blown-
glass bulb. These lenses probably had a thickness of 1 mm and a radius of
curvature of 0.75 mm. They had superior magnification and resolution when
compared to other microscopes of the time. The Utrecht museum has one of van
Leeuwenhoek’s microscopes in its collection. This amazing instrument has a
magnification of about 275· with a resolution approaching one micron (in spite
of a scratch on the lens).5
Towards the end of his life van Leeuwenhoek wrote: “ . . . my work, which
I’ve done for a long time, was not pursued in order to gain the praise I now enjoy,
but chiefly from a craving after knowledge, which I notice resides in me more
than in most other men. And therewithal, whenever I found out anything
remarkable, I have thought it my duty to put down my discovery on paper, so that
all ingenious people might be informed thereof.”
Elementary optics of glass spheres

Figure 41.1 shows a ray of light parallel to the optic axis at height h, going
through a glass sphere of radius R and refractive index n. The angle of incidence
on the sphere is denoted by h, and the refracted ray inside the glass makes an
angle h0 with the surface normal. According to Snell’s law, sin h ¼ n sin h0 , and
from simple geometry
CA ¼ R sin h=sinð2h 2h0 Þ: ð41:1Þ
When the ray height h is much smaller than the radius R of the sphere, the angles
h and h0 will be small, in which case the small-angle approximation yields
CA nR=½2ðn 1Þ: ð41:2Þ
(a)
u9
h u9 u
u
C A
(b)
u
u9
h
u A
C
u
Figure 41.1 A ray of height h traveling parallel to the optic axis is refracted by a
glass sphere of radius R and refractive index n. Upon emerging from the sphere,
the ray crosses the optic axis at point A. When h becomes very small, the point A
approaches the paraxial rear focus F 0 of the lens. In (a) n < 2.0 and the emergent
ray crosses the axis outside the sphere, whereas in (b), where n > 2.0, only the
backward extension of the ray crosses the axis. (When n ¼ 2.0, the paraxial rays
come to focus on the rear facet of the sphere.)
Thus, for example, if n ¼ 1.5 then the paraxial focus of the lens is at a distance
CA ¼ 1.5R from the lens center, or if n ¼ 2 then the paraxial focus coincides with
the rear vertex of the sphere, that is, CA ¼ R. Depending on the values of n and h,
the proper path of the ray may be that shown in Figure 41.1(a) or (b), but
equations (41.1) and (41.2) apply to both cases. The paraxial focus, of course, is
relevant only for rays with a small height h; when h increases beyond the paraxial
regime, the point A moves closer to the center C, giving rise (for a beam of wide
cross-section) to spherical aberrations.
Confining our attention to a glass sphere having R ¼ 1 mm and n ¼ 1.5 –

typical of what Van Leeuwenhoek used for his microscopes – we suppose that a
point source of light is placed at the front (paraxial) focus F of the lens, as in
Figure 41.2. A ray that leaves the source at an angle relative to the optic axis
will emerge parallel to the axis only in the paraxial regime, i.e., when is small.
For larger values of the emergent ray crosses the optic axis at the point A, where
CA ¼ R sin h=sinð2h 2h0 Þ: ð41:3Þ
Here and h are related through sin ¼ R sin h/FC. Thus a point source located
at the front focus F and radiating into a reasonably large cone will produce a real
image on the opposite side at some finite distance from C. To be sure, this image
has a certain amount of spherical aberration and, to obtain a good image, one
must limit the angular range of the cone of light accepted by the lens. This may be
achieved by closing down the aperture stop, which may be located either on
the object side or the image side of the lens. In Figure 41.2 the stop is in the
image space and may thus be referred to as the exit pupil of the lens.
Figure 41.3 shows computed distributions pertaining to the system of Figure 41.2.
The point source is located at the paraxial focus of the lens (R ¼ 1 mm, n ¼ 1.5,
CF ¼ 1.5 mm), and the assumed radius of aperture Ra ¼ 0.55 mm. Figure 41.3(a)
shows that the emergent intensity at the exit pupil is somewhat brighter near the
rim compared with that at the center of the aperture. Figure 41.3(b), a plot of phase
distribution at the exit pupil (minus the curvature), shows a significant amount of
spherical aberration. (The curvature of the emergent beam has been removed from
the phase plot; only the residual aberrations are shown.) The emergent beam
u
u u9 u9
f
Y Z
F C A
Ra
R
Figure 41.2 A glass sphere of radius R ¼ 1 mm and refractive index n ¼ 1.5.

The aperture stop, of radius Ra, is also the exit pupil of the lens in this case.
A monochromatic point source (k ¼ 0.5 lm) placed at the paraxial front focus F is
approximately imaged to the point A, at a finite distance from the lens center.
a b c
–1200 x/λ 1200 –1200 x/λ 1200 –400 x/λ 400
Figure 41.3 Various distributions in the system of Figure 41.2 when

Ra ¼ 0.55 mm. (a) Emerging intensity distribution at the exit pupil. (b) Distri-
bution of residual phase at the exit pupil when the curvature of the emergent
beam is taken out (r.m.s. aberrations ¼ 0.96k). The gray-scale encodes values of
phase from 180 (black) to þ180 (white). (c) Logarithmic plot of intensity in
the plane of best focus, located a distance of 27.36 mm from the lens center. The
logarithmic scale emphasizes the weak rings.
comes to best focus at a distance CA ¼ 27.36 mm behind the lens. Figure 41.3(c),
a logarithmic plot of intensity distribution in the plane of best focus, also shows
the substantial rings of light caused by spherical aberration. These clearly indicate
that the image quality of a wide-aperture system would be poor.4
When the aperture is further closed down to Ra ¼ 0.4 mm the distributions of
Figure 41.4 are obtained. The intensity distribution at the exit pupil is now
fairly uniform, and the phase plot shows convergent behavior towards the point
of best focus at CA ¼ 59.3 mm behind the lens. (Notice that in Figure 41.4(b),
unlike Figure 41.3(b), the curvature has not been subtracted from the phase
plot.) The best-focused spot is shown in Figure 41.4(c). In addition to a rela-
tively small spherical aberration, this system also has a fairly large field of
view, as may be inferred from the plots of Figure 41.5. Here a number of
identical point sources are placed in the front focal plane of the lens, and their
corresponding images are computed in the plane of best focus, at CA ¼ 59.3 mm.
All imaged points show spherical aberration similar to that of the central spot, but
there is very little coma and astigmatism, owing to the fact that the system is
essentially monocentric.
Glass sphere as a magnifier

Up to this point we have studied the properties of real images formed by point
sources placed in the (paraxial) focal plane of a spherical lens. Now we will
consider the spherical lens as a magnifying glass, placing the object somewhat
closer to the lens than its front focus and examining the properties of the virtual
image thus formed.
a b c
–850 x/l 850 –850 x/l 850 –400 x/l 400
Figure 41.4 Various distributions in the system of Figure 41.2 when

Ra ¼ 0.4 mm. (a) Intensity distribution at the exit pupil. (b) Total phase distri-
bution at the exit pupil; the r.m.s. value of residual aberrations (with the
curvature taken out) is 0.22k. The gray-scale encodes values of phase from 180
(black) to þ180 (white). (c) Intensity distribution in the plane of best focus,
located a distance of 59.3 mm from the lens center.
a b
–45 x/l 45 –2000 x/l 2000
Figure 41.5 Five point sources placed in the front focal plane of the spherical
lens shown in Figure 41.2. The exit-pupil radius Ra ¼ 0.4 mm, and the best image
(with 45· magnification) appears in a plane 59.3 mm away from the lens
center. (a) Intensity distribution in the object plane. (b) Intensity distribution in
the image plane. All imaged points show spherical aberration, but there is very
little coma or astigmatism.
The diagram of Figure 41.6 is a representation of a Van Leeuwenhoek micro-

scope with a spherical glass lens having R ¼ 1 mm, n ¼ 1.5. To achieve high-
resolution imaging with this system the aperture is closed down to Ra ¼ 0.25 mm,
and the object is displaced from the paraxial focus F by 20 lm towards the lens.
The observer’s eye is placed very close to the lens, so that the pupil of the eye
essentially coincides with the exit pupil of the lens.
The object used in the following calculations is shown in Figure 41.7. This is
a transmissive object with several micron-sized features that impart phase and
X
F
Y Z
C
Object
Observer
Virtual image
Exit Pupil
Figure 41.6 The simulated Van Leeuwenhoek microscope. The lens radius
R ¼ 1 mm, its refractive index n ¼ 1.5, the object is 20 lm to the right of the
paraxial focus F (i.e., 0.48 mm away from the lens), and the exit-pupil radius
Ra ¼ 0.25 mm. The virtual image, formed 316 mm to the left of the lens center,
can be comfortably viewed when the eye is placed at or near the exit pupil.
a b
–21 x/l 21 –21 x/l 21
Figure 41.7 Distributions of (a) intensity and (b) phase immediately in front of the
object. The object is trans-illuminated with a uniform, coherent, and monochromatic
plane wave k ¼ 0.5 lm. The smallest feature in the lower right-hand side is 1 lm in
diameter. The phase values in (b) range from 144 (black) to þ108 (white).
amplitude modulation to the incident beam. With this object we demonstrate

both coherent and incoherent imaging through the system of Figure 41.6. The illu-
mination in both cases is monochromatic at a wavelength k ¼ 0.5 lm, although white
light or other broadband sources can also be used to illuminate the object. The
simplicity of this single-lens microscope keeps chromatic aberrations to a minimum.1
In the case of coherent imaging, the incident beam is collimated, uniform,
and propagates along the Z-axis. The computed distributions of intensity and
phase at the exit pupil of the lens are shown in Figure 41.8. The intensity plot in
a b
–550 x/l 550 –550 x/l 550
Figure 41.8 Distributions of (a) intensity and (b) phase at the exit pupil of the
microscope of Figure 41.6 with the coherently illuminated object of Figure 41.7.
The intensity is shown on a logarithmic scale to emphasize its weak regions. The
phase ranges from 180 (black) to þ180 (white).
a b
–4500 x/l 4500 –4500 x/l 4500
Figure 41.9 Distributions of intensity in the virtual image seen through the
microscope of Figure 41.6 with the object of Figure 41.7. The image in (a) is
computed for a coherent, monochromatic beam of light normally incident on
the object. The incoherent image in (b) is obtained by illuminating the object
with 225 point sources through a 0.15NA condenser lens. These virtual images
have a magnification of 200· and appear at a distance of 316 mm behind the
lens center.
Figure 41.8(a) is drawn on a logarithmic scale to emphasize the spatial-

frequency content of the image-carrying beam. It is found numerically that the best
focus of this system is at a distance CA ¼ 316 mm from the lens center; the
computed coherent image at this distance, having a magnification close to 200, is
shown in Figure 41.9(a).
To compute the incoherent image, we illuminate the object with 225

monochromatic point sources (k ¼ 0.5 lm, NA ¼ 0.15) and superimpose the
resulting intensity distributions in the image plane obtained for individual point
sources. Figure 41.9(b) is the computed incoherent image of the object shown in
Figure 41.7 through the system of Figure 41.6. The magnification is about 200,
and the image exhibits a fairly accurate reproduction of the various features
present in the object, except perhaps the spot 1 lm in diameter on the lower
right-hand side. Thus the microscope depicted in Figure 41.6, having numerical
aperture NA 0.16, is nearly diffraction-limited, over at least a 20 lm field of
view, with a resolution of 2 lm at k ¼ 0.5 lm. (In reality, the field of view of
the microscope is several times greater than that demonstrated in this particular
example.)
Method of computation
The results presented in this chapter were obtained by a combination of ray-
tracing and diffraction calculations. The light emanating from the object was
propagated to the vicinity of the lens using far-field (Fraunhofer) diffraction
formulas. The complex-amplitude distribution at this point was converted into
a set of geometric-optical rays, using the local Poynting vector to represent
the ray. The rays were traced from the entrance pupil to the exit pupil of the
lens using standard methods of ray-tracing. At the exit pupil the ray magni-
tude and phase information was converted into a complex wavefront, and
the wavefront was propagated to the image plane using near-field (Fresnel)
diffraction formulas.
Other applications of glass spheres

Glass balls have found application in other areas as well. A simple method of
coupling the light from a diode laser (or a light-emitting diode) into an optical
fiber uses a spherical glass ball between the source and the fiber’s entrance facet.
This may not be the most efficient coupling mechanism, but it is simple, inex-
pensive, and easy to implement in conjunction with multimode fibers. Tiny glass
beads are often mixed with ordinary paint for use on the streets, on automobile
license plates, etc., to enhance retro-reflectivity. My colleague Stephen Jacobs
of the University of Arizona has made a fused silica ball six inches in diameter,
through which one can look toward the sun and observe beautiful optical
phenomena.6 Looking through this glass sphere, one cannot help but remember
that Nature has employed spherical droplets of water to create the magnificent
rainbow.7,8

1 B. J. Ford, The earliest views, Scientific American, 50–53, April 1998.
2 B. J. Ford, Leeuwenhoek Legacy, Bristol, Biopress; London, Farrand Press; 1991.
3 L. Yount, Antoni van Leeuwenhoek: First to See Microscopic Life, Enslow
Publishers, 1996.
4 J. A. Mahaffey, Making Leeuwenhoek proud: building simple microscopes, Opt. &
Phot. News 10, 62–63, March 1999.
5 These historical anecdotes have been compiled from information available on the
worldwide web. See, for example, encarta.msn.com, www.hcs.ohio-state.edu, www.
letsfindout.com, www.feic.com, www.ucmp.berkeley.edu, www.utmem.edu.
6 S. F. Jacobs and S. C. Johnston, Unusual optical effects of a solid glass sphere, Opt.
& Phot. News 8, 44–45, October 1997.
7 H. M. Nussenzveig, The theory of the rainbow, Scientific American, 116–127, April
1977.
8 C. B. Boyer, The Rainbow, From Myth to Mathematics, Sagamore Press, Thomas
Yoseloff, New York, 1959.
42
Projection photolithography†
Photolithography is the technology of reproducing patterns using light.

Developed originally for reproducing engravings and photographs and later used
to make printing plates, photolithography was found ideal in the 1960s for mass-
producing integrated circuits.1 Projection exposure tools, which are now used
routinely in the semiconductor industry, have continually improved over the past
several decades in order to satisfy the insatiable demand for reduced feature size,
increased chip size, improved reliability and production yield, and lower overall
cost. High-numerical-aperture lenses, short-wavelength light sources, and com-
plex photoresist chemistry have been developed to achieve fabrication of fine
patterns over fairly large areas. Research and development efforts in recent years
have been directed at improving the resolution and depth of focus of the
photolithographic process by using phase-shifting masks (PSMs) in place of the
conventional binary intensity masks (BIMs). In this chapter we describe briefly
the principles of projection photolithography and explore the range of possibi-
lities opened up by the introduction of PSMs.
Basic principles
Figure 42.1 is a diagram of a typical projection system used in optical litho-
graphy. A quasi-monochromatic, spatially incoherent light source (wavelength k)
is used to illuminate the mask. Steps are usually taken to homogenize the source,
thus ensuring a highly uniform intensity distribution at the plane of the mask. The
condenser stop may be controlled to adjust the degree of coherence of the illu-
minating beam; this control of partial coherence is especially important when
PSMs are used to improve the performance of optical lithography beyond what is
achievable with the traditional BIMs.
†
The coauthor of this chapter is Rongguang Liang.
586
Light source
Homogenizer
Condenser stop
Condenser lens
Mask and stage
Projection lens
u
Wafer and stage
Figure 42.1 Essential elements of a photolithographic “stepper” used for

exposing semiconductor wafers. The condenser stop controls the degree of
coherence of the illumination. The numerical aperture NA0 of the projection
lens is defined as sin h, where h is the half-angle of the cone subtended by the
clear aperture of the projection lens at the wafer. The uniformly illuminated
mask is imaged onto the wafer with a magnification M that is typically
around 1/5.
The light transmitted through the mask is collected by the projection lens, which
images the mask onto the wafer, typically with a magnification M ¼ 1/5. Thus, if
the numerical aperture of the projection lens is defined as NA0 ¼ sin h, its angular
aperture on the mask side will be sin h0 ¼ M NA0. If the condenser’s numerical
aperture NAc happens to be much less than sin h0 then the illumination is coherent,
while if NAc sin h0 then the illumination is essentially incoherent. In practice the
ratio r ¼ NAc /(M NA0) is used as a measure of the incoherence of illumination.
For example, if M ¼ 1/5 and NA0 ¼ 0.6, then NAc ¼ 0.084 yields r ¼ 0.7, while
NAc ¼ 0.06 yields r ¼ 0.5. For a given projection lens, therefore, the incoherence of
illumination is proportional to the condenser’s stop diameter.1,2,3
Over the past decade, photolithographic systems have evolved through several
generations. The wavelength of the light source has steadily decreased from 365 nm
(i-line of mercury) to 257 nm (high-pressure mercury arc lamp) to 248 nm (KrF

laser), and is presently at 193 nm (ArF excimer laser). The numerical aperture
NA0 of the projection lens, having increased from its value of 0.16 in the early
days to 0.6 in present-day systems, is likely to increase still further. The illu-
mination systems have also improved, taking advantage of off-axis illumination
and related configurations.1,2 Other improvements have occurred in the area of
photoresists and the control of their exposure and development processes and
also in the control of the flatness of the wafer, which reduces the need for a large
depth of focus, etc.
These topics are beyond the scope of the present chapter, and we refer the
interested reader to the published literature for further information.1,2,3,4,5,6 In the
remainder of this chapter we present computed images of various masks obtained
in a typical projection system (NA0 ¼ 0.6, M ¼ 1/5) and compare the resulting
image contrasts and resolutions.
PSM versus BIM

Traditional “binary intensity” masks (BIMs) consist of opaque chromium lines on
transparent glass substrates; these masks modulate the intensity of the incident
light without affecting its phase. Modern masks have begun to take advantage of
optical phase by changing the thickness of the transparent regions of the mask,
either by depositing additional transparent material where needed or by removing
a thin layer from the substrate at specific locations, thereby selectively adjusting
the transmitted optical phase.1,2
The basic idea of an optical phase-shifting mask for lithography originated in
the early 1980s with M. D. Levenson6 in the US and, independently and almost
simultaneously, with M. Shibuya7 in Japan. Figure 42.2 shows several different
mask designs that exploit optical phase to improve the resolution of the photo-
lithographic process. In addition to improved resolution, these PSMs also increase
the effective depth of focus and provide a wider process window (i.e., a wider
range of acceptable focuses and exposures).1
Alternating-aperture phase-shifting mask

Consider the simple mask consisting of three bright lines on a dark background
shown in Figure 42.3. Each bright line is 3k wide, and the separation between
adjacent lines is also 3k. (Note that these are the mask dimensions; at the wafer
the features are demagnified by a factor 1/M ¼ 5.) We assume two different
designs for the mask. In the first, the mask is a conventional BIM, the same phase
being imparted to the light transmitted through each aperture. In the second, the
(a) (b)
+ +
Amplitude
Amplitude
0 0
(c) (d)
+
Amplitude
Amplitude
0 0
– –
(e) (f)
Amplitude
+ +
Amplitude
0 0
–
–
Figure 42.2 Several mask structures and, below each structure, the corres-
ponding E-field patterns immediately after transmission through the mask. (a)
Conventional transmission mask. (b) Alternating-aperture phase mask with
etched substrate. (c) A chromeless phase-edge mask produces dark lines in the
image solely through destructive interference at the phase transitions. (d) A
shifter–shutter mask is similar to (c) except that each dark line is produced by a
pair of adjacent phase-edges. (e) A rim-shifter mask contains chrome lines
bracketed by 180 phase-edges. (f) An attenuated phase-shift mask; here the
shaded regions represent partially transmissive material with a 180 phase shift.
(Adapted from reference 1.)
mask is a PSM in which the upper and lower bright lines are phase-shifted by
180 relative to the central bright line. In Figures 42.4(a), (b) we compare the
intensity patterns of the images obtained at the wafer for these two types of mask.
The assumed projection system is that of Figure 42.1, with NA0 ¼ 0.6, M ¼ 1/5,
and r ¼ 0.7. Clearly the PSM is better at resolving the dark spaces between
adjacent bright lines. For direct comparison, a cross-section through these two
intensity distributions is shown in Figure 42.4(c). Increasing the coherence of the
illumination by closing down the aperture of the condenser to r ¼ 0.5 improves
the image contrast of the PSM but degrades that of the BIM image, as can be
readily observed in Figures 42.4(d)–(f).
–10.5 x/ 10.5
Figure 42.3 A simple mask containing three transparent apertures on an opaque

background. The apertures as well as the spaces between apertures are 3k wide.
When the apertures impart a uniform phase to the transmitted beam, the mask
is a BIM. When the upper and lower apertures impose on the transmitted beam
a 180 phase shift relative to the middle aperture, the mask is an alternating-
aperture PSM.
Isolated bright line

As our second example we consider the case of an isolated bright line.
Figures 42.5(a), (b) show respectively a BIM and a PSM for a line of width 4k.
(Again, this is the dimension at the mask; the projected line at the wafer is only
0.8k wide.) The PSM of Figure 42.5(b) contains two 0.8k-wide side-riggers,
each imparting a 180 phase shift to the incident beam relative to the central
bright line.4
Figures 42.6(a), (b) show the computed intensity patterns at the wafer for
the two masks, and Figure 42.6(c) shows cross-sections of both patterns (the
assumed coherence factor r is 0.7). The side-riggers produce small bumps
in the intensity pattern of the PSM, but these are usually below the resist
threshold and are not printed. In Figure 42.6 it can be seen that the computed
image of the bright line using the PSM is about 10% narrower than that
obtained with the BIM. This modest reduction in the printed line-width can be
slightly improved upon if the side-riggers’ location and width are properly
optimized and also if the condenser stop is further closed down to increase the
coherence of illumination (r-values as low as 0.3 have been suggested in the
literature4,5).
Contact hole
Figure 42.7(a) shows a simple 4k · 4k square aperture on a dark background.
This feature has uniform phase across the aperture and, therefore, represents
the BIM for a contact hole. A corresponding PSM for the same hole is shown
in Figure 42.7(b). Here four side-rigger lines of width 0.5k and 180 phase
a d
–2.1 x/ 2.1 –2.1 x/ 2.1
b e
–2.1 x/ 2.1 –2.1 x/ 2.1

1 1
c f
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
BIM PSM PSM BIM
0 0
–2.1 y/ 2.1 –2.1 y/ 2.1
Figure 42.4 Computed plots of intensity distribution at the wafer for the
mask of Figure 42.3 placed in the system of Figure 42.1 (NA0 ¼ 0.6, M ¼ 1/5).
(a) Image of the BIM obtained with r ¼ 0.7. (b) Image of the PSM obtained
with r ¼ 0.7. (c) Cross-sections of the intensity patterns for the BIM (broken
line) and the PSM (solid line). (d)–(f) Same as the patterns in the left-hand
column, but for r ¼ 0.5.
shift (relative to the central aperture) are placed around the hole.4 The
computed intensity patterns of the images of these masks at the wafer appear in
Figures 42.8(a),(b), respectively. The side-rigger features are too small to be
printed, but their destructive interference with the central aperture results in a
smaller projected hole, as revealed in the cross-sectional intensity profiles at
the wafer shown in Figure 42.8(c). As before, the printed feature size can be
further optimized by adjusting the dimensions of the side-riggers as well as by
closing the condenser stop to reduce the value of r.
a b
–13 x/ 13 –13 x/ 13
Figure 42.5 Masks designed for creating an isolated bright line at the wafer.
(a) BIM containing a 4k-wide line on an opaque background. (b) PSM featuring
the same 4k-wide line flanked by a pair of 0.8k-wide side-riggers. Each side-
rigger imparts to the incident beam a 180 phase shift relative to the central line.
The separation between the central line and each side-rigger is 2k.
More complicated patterns

Figure 42.9(a) shows a mask with five transparent apertures. The widths of line
(bright) and space (dark) on this mask are both equal to 4.8k. If the mask is used
without any phase shifts, the intensity pattern of Figure 42.9(b) will be obtained
at the wafer. Placing 180 phase-shifters on alternate bright apertures results in
the image intensity distribution shown in Figure 42.9(c). Two different cross-
sections of these patterns are also given in Figures 42.9(d), (e). In this case of
relatively large features, there are apparently no significant differences between a
BIM and a PSM.
With shrinking feature size, however, the advantages of the PSM become
apparent. Figure 42.10 is the counterpart of Figure 42.9 for the case where the
line- and space-widths (at the mask) are both reduced to 3k. The BIM is now seen
to yield a fairly low-contrast image at the wafer, while the PSM provides better
resolution and sharper contrast. Reducing the feature size still further to 2.4k
(at the mask) results in the patterns of Figure 42.11. Here the PSM still performs
reasonably well, while the image quality of the BIM has been substantially
degraded.
Phase-shifters on a transparent background

As a final example, consider the fully transparent (i.e., chromeless) PSM shown
in Figure 42.12(a). Each of the three rectangular features on this mask is 4k wide
and is phase-shifted by 180 relative to the background. Also, the spaces sep-
arating adjacent rectangular features are each 4k wide. The computed intensity
distribution at the wafer in a system having NA0 ¼ 0.6, M ¼ 1/5, r ¼ 0.7 is shown
a
–2.6 x/ 2.6
–2.6 x/ 2.6

1
c
0.8
0.6
0.4 PSM BIM
0.2
0
–2.6 y/ 2.6
Figure 42.6 Computed intensity patterns at the wafer for the masks of Figure 42.5
in the system of Figure 42.1 (NA0 ¼ 0.6, M ¼ 1/5, r ¼ 0.7). (a) Using the BIM;
(b) using the PSM; (c) the cross-sections of the intensity patterns in the images of the
BIM (broken line) and the PSM (solid line).
in Figure 42.12(b), and a cross-sectional view is provided in Figure 42.12(c).

Depending on the intended application, this image may or may not be acceptable.
For instance, suppose the long edges of the rectangular features of the mask are
meant to produce dark lines at the wafer. This they do quite well, as is evident
from the presence of four horizontal dark lines in Figure 42.12(b). However, if
the ends of these dark lines are required to be disconnected from each other,
then the PSM has failed in providing the necessary isolation. The problem is
rooted in the sharp 0 –180 phase-edge occurring at the short end of each
rectangular feature. This problem can be remedied in principle by softening the
phase transition at these short ends by providing a gradual transition from 180
to 120 to 60 and eventually to 0 . Such phase stair-steps, however, are usually
a b
–5 x/ 5 –5 x/ 5
Figure 42.7 Mask patterns for creating a contact hole. (a) BIM containing a
4k · 4k square aperture on an opaque background. (b) PSM featuring the same
4k · 4k aperture surrounded by 0.5k-wide side-riggers. Each side-rigger imparts
to the incident beam a 180 phase shift relative to the central aperture.
–1.8 x/ 1.8

b
–1.8 x/ 1.8

1
c
0.8
BIM
0.6
0.4
0.2 PSM
0
–1.8 x/ 1.8
Figure 42.8 Computed intensity patterns at the wafer for the masks of Figure 42.7
in the system of Figure 42.1 (NA0 ¼ 0.6, M ¼ 1/5, r ¼ 0.7). (a) Using the BIM;
(b) using the PSM; (c) the cross-sections of the intensity patterns in the images of the
BIM (broken line) and the PSM (solid line).
a b c
–27.5 x/ 27.5 –5.5 x/ 5.5 –5.5 x/ 5.5

1 1
d e PSM BIM
0.8 PSM 0.8
0.6 0.6 Mansuripur Figure 7

BIM
0.4 0.4
0.2 0.2
0 0
–5.5 x/ 5.5 –5.5 x/ 5.5
Figure 42.9 (a) Mask pattern containing five transparent apertures on an

opaque background. The lines and spaces are all 4.8k wide. When used as a
BIM, all apertures impart the same uniform phase to the incident beam.
When used as a PSM, the apertures are alternately phase-shifted by 0 and 180 .
The assumed projection-system parameters are NA0 ¼ 0.6, M ¼ 1/5, r ¼ 0.7.
(b) Computed intensity pattern in the image of the BIM. (c) Computed intensity
pattern in the image of the PSM; the arrows mark the cross-sections displayed in
(d) and (e). (d) Cross-sectional plots of intensity distributions in the images of
the BIM (broken line) and the PSM (solid line). (e) A different cross-section of
the two images.
impractical because they are costly and, moreover, they produce masks that are
difficult to inspect and to repair. In today’s practice, such unwanted dark lines
are erased by a second exposure through a different mask.
Concluding remarks
Incorporating the advantages of optical phase in the design, manufacture, and
testing of photomasks is still very much a research topic; many potential benefits
of the PSM await to be realized. The type of PSM in common use today is the
attenuated PSM depicted in Figure 42.2(f), where the traditional opaque chrome
is replaced by a material that transmits 8% with a 180 phase shift. This is
useful for printing bright spaces and contact holes, and has essentially replaced
the shifter–shutter type of mask (see Figure 42.2(d)). Also, the more recent
a b c
–15 x/ 15 –3 x/ 3 –3 x/ 3

1 1
d e
PSM
0.8 0.8
BIM
0.6 0.6
0.4 BIM 0.4
0.2 0.2
PSM
0 0
–3 x/ 3 –3 x/ 3
Figure 42.10 Same as Figure 42.9 but for smaller mask features. The lines and
spaces on the mask are now 3k wide.
a b c
–13 x/ 13 –2.6 x/ 2.6 –2.6 x/ 2.6

1 1
d e
BIM
0.8 0.8
BIM
0.6 0.6
0.4 0.4
PSM
0.2 0.2 PSM
0 0
–2.6 x/ 2.6 –2.6 x/ 2.6
Figure 42.11 Same as Figure 42.9 but for very small mask features. The lines
and spaces on the mask are now 2.4k wide.
a
–15 x/ 15
b
–3 x/ 3
1
c
0.8
0.6
0.4
0.2
–3 y/ 3
Figure 42.12 (a) Transparent PSM containing three rectangular regions of

width 4k, each imparting a 180 phase shift to the incident beam. Like the
background, the spaces between adjacent apertures (also 4k wide) are fully
transparent and impart a 0 phase to the beam. (b) Computed intensity distribution
in the image plane of the system of Figure 42.1 having NA0 ¼ 0.6, M ¼ 1/5,
r ¼ 0.7. (c) Central cross-section of the intensity pattern of the image seen in (b).
high-transmission tri-tone PSM, where the phase-shifted material transmits 18%

and there is a separately patterned opaque layer, has superseded the rim-shifters
(see Figure 42.2(e)).8

1 M. D. Levenson, Wavefront engineering for photolithography, Physics Today, 28–36,
July 1993.
2 M. D. Levenson, Extending the lifetime of optical lithography technologies with
wavefront engineering, Jpn. J. Appl. Phys. 33, 6765–6773 (1994).
3 M. D. Levenson, Wavefront engineering from 500 nm to 100 nm CD, in Emerging
Lithographic Technologies, SPIE 3048, 2–13 (1997).
4 T. Terasawa, N. Hasegawa, T. Kurosaki, and T. Tanaka, 0.3-micron optical
lithography using a phase-shifting mask, SPIE 1088, 25–33 (1989).
5 N. Hasegawa, T. Terasawa, T. Tanaka, and T. Kurosaki, Submicron optical
lithography using phase-shifting mask, Electro-chem. Ind. Phys. Chem. 58, 330–335
(1990).
6 M. D. Levenson, N. S. Viswanathan and R. A. Simpson, Improving resolution in
photolithography with a phase-shifting mask, IEEE Trans. Electron Devices ED-29,
1828–1836 (1982).
7 M. Shibuya, Projection master for transmitted illumination, Japanese Patent Gazette
# Showa 62-50811, application dated 9/30/80, issued 10/27/87.
8 M. D. Levenson, private communication.
43
Interaction of light with subwavelength structures†
When a light field interacts with structures that have complex geometric features
comparable in size to the wavelength of the light, it is not permissible to invoke the
assumptions of the classical diffraction theory, which simplify the problem and allow
for approximate solutions. For such cases, direct numerical solutions of the governing
equations are sought through approximating the continuous time and space deriva-
tives by the appropriate difference operators. The Finite Difference Time Domain
(FDTD) method discretizes Maxwell’s equations by using a central difference
operator in both the time and space variables.1 The E- and B-fields are then repre-
sented by their discrete values on the spatial grid, and are advanced in time in steps of
Dt. The numerical solution thus obtained to Maxwell’s equations (in conjunction
with the relevant constitutive relations) provides a highly reliable representation of
the electromagnetic field distribution in the space-time region under consideration.
This chapter presents examples of application of the FDTD method to prob-
lems involving the interaction between a focused beam of light and certain
subwavelength structures of practical interest. A few general remarks concerning
the nature of the FDTD method appear in the next section. This is followed by a
description of the simulated system and two examples in which comparison is
possible between the FDTD method and an alternative method of calculation. We
then present simulation results for the case of a focused beam interacting with
small pits and apertures in a thin film supported by a transparent substrate.
The FDTD method

The spatial unit cell used in three-dimensional FDTD simulations is shown in
Figure 43.1. The components of the vector fields E and B are located at different
†
The co-authors of this chapter are Armis R. Zakharian, now with Corning Corp., and Jerome V. Moloney of
the University of Arizona.
599
Ez
Bx
By
Ey
Δz Bz
Ex
Δx
Δy
Figure 43.1 The unit cell of the FDTD mesh has dimensions Dx · Dy · Dz. The
various components of the E and B fields are assigned to different locations on
the unit cell. The staggered field components are shifted by a half-pixel in
various directions.
positions with respect to the cell center, so that every component of the electric
field is surrounded by four circulating components of the magnetic field, and vice
versa. Such a staggered mesh is motivated by the integral form of Maxwell’s curl
equations. The contour integrals of E (B) along the edges of the cell in Faraday’s
law (Ampere’s law) circulate around the corresponding magnetic (electric) field
component at the center of the cell face.
In 3-D simulations at least six field components must be stored and updated
at each grid point, which leads to considerable memory and CPU requirements
for FDTD simulations. Fortunately, the time update of any field component
involves only nearby fields located one or two cells away on the grid. This kind
of locality in the physical space translates into computer memory access
locality and allows for efficient implementation of the FDTD algorithm on
many types of shared and distributed memory parallel platforms. Low-reflection
absorbing boundary conditions that terminate the computational domain by a
Perfectly Matched Layer (PML) allow the simulation of physical problems with
open boundaries.2
Since the FDTD algorithm solves Maxwell’s equations in the time domain,
calculation for a broad range of frequencies is possible in a single simulation using
a time-pulsed excitation. Other advantages include the possibility of modeling
dispersive and non linear materials. An important property of the FDTD method is
that it introduces no additional dissipation into the physical problem due to
numerical discretization, and hence energy is conserved. However, the finite dif-
ference method contributes to a dispersion error. In the commonly used second
order accurate implementation of FDTD, this error diminishes with cell size h as
O(h2). In practice, therefore, to keep the numerical dispersion errors under control,
a grid with about 30 points per wavelength is desired. The rather large number of
(a)
0.4
0.2
z [μm]
0.0
–0.2
–0.4
–2
–1
0 2
y [μ 0 1
m] 1 –1
2 –2 x [μm]
(b)
0.4
0.2
z [μm]
0.0
–0.2
–0.4
–2
–1
0 2
y [μ 1
m] 1 0
2 –1 x [μm]
–2
Figure 43.2 3-D computational domain for simulating the interaction between
a focused beam of light and various marks (i.e., bumps or pits) on the surface of
a multilayer data storage medium. (a) Non-uniform conformal grid; the grid-line
density is higher near the center, where the focused beam and the multilayer
stack are located. (b) Nested rectangular cells forming a non-conformal hier-
archical grid.
points and iterations thus required for accurate results may render solution
impractical for a problem with large spatial and/or temporal domain.
In many cases it is desirable to retain the efficiency of the FDTD scheme on the
rectangular grids, but achieve higher resolution only in those regions of the
computational domain where it is needed. The non-uniform grids allow one to vary
a cell size in each coordinate direction, keeping the grid structured and conformal
as in Figure 43.2(a). A more efficient approach is to employ a collection of nested
rectangular cells that form a non-conformal hierarchical grid, as in Figure 43.2(b).
Each successive nested level has a higher resolution, e.g., by a factor of two, than
the previous level, allowing smaller cell sizes to be “focused” in the regions of
interest (e.g., sub-wavelength features, photonic crystal microcavity, etc.). Inside
each rectangular region the standard FDTD algorithm is applied, while at the
boundaries between the grids an update scheme and interpolation must be
employed to keep the method stable and accurate. In FDTD the time step Dt is
proportional to the cell size, and hence the smallest time step is required on the
grids with the highest resolution. Each grid can be updated with its own time-step,
the grids with cell size 2h doing half as many iterations as grids with cell size h.
The simulated system

The FDTD algorithm is quite powerful and can be applied to a wide variety of
problems in electromagnetics. For demonstration purposes in this chapter, however,
we confine our attention to a simple system involving the interaction between a
focused beam of light and small (subwavelength) structures located in the focal
region. Figure 43.3 shows a coherent, monochromatic beam of light (free-space
wavelength ¼ k0), brought to focus by an aberration-free objective lens (numerical
aperture NA ¼ 0.6, focal length f ¼ 5000k0). The incident beam at the entrance pupil
is linearly polarized along the X-axis, and the total optical power (i.e., integrated
intensity) at the entrance pupil of the lens is set to unity. The sample typically
consists of a thin film (or thin-film stack) coated on a transparent substrate; the
various samples used in our simulations are depicted in Figures 43.3(b)–(f). Detailed
descriptions of these samples will be given in the context of the relevant simulations
in the following sections. The focused spot may illuminate the sample directly, as in
Figures 43.3(b), (d), and (f), or through a glass hemisphere (i.e., solid immersion
lens) placed in contact with the sample, as in Figure 43.3(c). When the hemispherical
lens is present, the thin film(s) may be coated directly on its flat facet, in which case
the hemisphere acts as the sample substrate as well.
Figure 43.4 shows computed plots of intensity (top) and phase (bottom) at the
focal plane of the lens depicted in Figure 43.3(a). From left to right, these distri-
butions represent the E-field components along the X-, Y-, and Z-axes. At the focal
plane the peak intensities are in the ratio of jExj2 : jEyj2 : jEzj2 ¼ 1000 : 0.4 : 45. The
various rings of the focused spot are phase-shifted by 180 relative to their adjacent
neighbors, and the Z-component of the field is 90 out-of-phase relative to Ex and
Ey. (In the remainder of this chapter we will omit the plots of Ez distribution, since
Ez can always be computed from a knowledge of the Ex and Ey distributions.)
The intensity and phase distributions in this chapter are plotted in an interval
xmin x xmax and ymin y ymax of the XY-plane by assigning the color red
to the maximum value of the function, blue to the minimum value, and the
continuum of the white light spectrum to the values in between. The phase
plots cover the range from 180 (blue) to þ180 (red). The intensity distri-
butions are first normalized by the peak value of the corresponding function,
say, Ix-peak ¼ max(jExj2) within the displayed interval. The base 10 logarithm
(a) Objective
Y
Substrate
X Z
Incident
beam Thin film(s)
(b) (c) (d) (e) (f)
Glass
Metal film hemisphere Metal film Metal film Metal film
Figure 43.3 (a) Diagram of the simulated system, in which an aberration-free

objective lens brings a coherent, monochromatic beam of light to focus. The
bottom row shows the various samples used in the simulations. In (b), (d), and
(e) a 50 nm-thick metal film is coated over a transparent substrate. In (c) the
bilayer at the bottom of the glass hemispherical lens consists of quarter-wave-
thick dielectric films. The sphero-cylindrical pits in (d) and (e) are 500 nm long,
300 nm wide, and 100 nm deep. In (f) the aperture in the 20 nm-thick metal film
is circular in one simulation and bowtie-shaped in another. The circular hole’s
diameter is 400 nm, while the bowtie aperture is 400 nm long, 300 nm wide on
each side, and 60 nm wide at the neck.
of the normalized function is then evaluated, and all pixel values below a
certain level, say, a, are set equal to a. Displayed plots of log_intensity_a
thus cover the range from 10 aIpeak (blue) to Ipeak (red).
When the beam is focused through a hemispherical lens of refractive index n,
as in Figure 43.3(c), the same distribution as in Figure 43.4 is found at the focal
plane, but the spatial coordinates must shrink by a factor of n to account for
the reduced wavelength (k ¼ k0 /n) within the medium of the hemispherical lens.
At the bottom of the hemisphere, therefore, the focused spot diameter is reduced
by a factor of n compared to that shown in Figure 43.4.
(a) (b) (c)
(d) (e) (f)
–4.0 x/0 +4 –4.0 x/0 +4.0 –4.0 x/0 +4.0
Figure 43.4 Plots of log_intensity_4 (top) and phase (bottom) at the focal plane
of the lens of Figure 43.3(a). Left to right: E-field components along the X-, Y-, and
Z-axes. At the entrance pupil the incident beam (wavelength ¼ k0) is Gaussian with
1/e (amplitude) radius of 4000k0, truncated at the lens aperture (radius ¼ 3000k0).
The incident beam is linearly polarized along the X-axis, and its total optical power
captured by the lens is unity, that is, Px ¼ 1.0, Py ¼ 0.0. (The power content of the Ez
component at the focal plane is 8.3% of total power.)
In some cases the beam must be focused onto the object of interest through a
parallel plate cover glass or through the sample’s substrate, as is the case, for
instance, in Figure 43.3(e). Under such circumstances, to obtain a focused spot
free from spherical aberration, the objective lens must be designed for the specific
thickness and refractive index of the substrate. Unlike focusing through a glass
hemisphere, however, the focused spot inside a cover plate (or flat substrate, as
the case may be) has exactly the same dimensions as that obtained by focusing in
air through an objective of the same NA. The reason is that, in passing from the
air to the substrate through a flat interface, the effect of the reduced wavelength
on the focused beam is exactly canceled out by the reduced angle of the focused
cone (Snell’s law). The spot that illuminates the concave pit of Figure 43.3(e)
through the sample’s substrate, therefore, has exactly the same size as that which
directly illuminates the convex pit of Figure 43.3(d).
Reflection from a metallic mirror

The mirror depicted in Figure 43.3(b) is a 50-nm-thick metal film of complex
refractive index n þ ik ¼ 2.0 þ i7.0, coated over a transparent substrate of index
n ¼ 1.5. The large absorption coefficient k of the metal film ensures that the
light does not reach the substrate; most of the incident light is therefore
reflected, while a small fraction is absorbed in the metal. For the incident beam
depicted in Figure 43.4 at k0 ¼ 650 nm, Figure 43.5 shows computed plots of
reflected intensity (top) and phase (bottom) obtained with the FDTD method.
(The FDTD mesh size was Lx ¼ Ly ¼ 12k0, and the mirror’s front facet was a
distance z ¼ 180 nm beyond the focal plane of the lens.) The integrated intensity
of the reflected light over the XY-plane for the X- and Y-components of
polarization may be defined as follows:
ZZ ZZ
2
Px ¼ 2
jEx j dx dy, Py ¼ Ey dx dy:
Using the FDTD method, we found Px ¼ 0.85, Py ¼ 0.0016 for the mirror of
Figure 43.3(b) illuminated with the focused spot of Figure 43.4. To verify the
(a) (b)
(c) (d)
–2.6 x(μm) +2.6 –2.6 x(μm) +2.6
Figure 43.5 Plots of reflected log_intensity_4 (top) and phase (bottom) from
the metallic mirror depicted in Figure 43.3(b) at k0 ¼ 650 nm. The panels on the
left-hand side correspond to Ex, while those on the right-hand side represent the
Ey component of the reflected field. The front facet of the mirror is located a
distance z ¼ 180 nm beyond the focal plane.
accuracy of the FDTD method, we simulated the same system using an alternative
method based on the superposition of plane-wave solutions to Maxwell’s equations
with matching boundary conditions at the various interfaces. The intensity and phase
distributions thus obtained were visually indistinguishable from those shown in
Figure 43.5, and the corresponding integrated intensities were found to be Px ¼ 0.86,
Py ¼ 0.0018. The slight differences between the two methods of computation reflect
the cumulative effect of numerical errors inherent to the FDTD algorithm.
Similar simulations were performed for the sample of Figure 43.3(b) illuminated
through a glass hemisphere of index n ¼ 1.5. (The FDTD mesh size in this case was
Lx ¼ Ly ¼ 8k0, but the mirror’s front facet remained at z ¼ 180 nm beyond the focal
plane of the lens.) The computed values of integrated intensity were Px ¼ 0.78,
Py ¼ 0.0019. The corresponding quantities obtained with the alternative (and more
accurate) method of plane-wave superposition were Px ¼ 0.80, Py ¼ 0.0022. Once
again, comparison against a benchmark has shown the effect of small but cumu-
lative numerical errors on the results of FDTD calculations.
(a) (b)
(c) (d)
–1.067 x (μm) +1.067 –1.067 x (μm) +1.067
Figure 43.6 Plots of reflected log_intensity_3 (top) and phase (bottom) from the
dielectric bilayer depicted in Figure 43.3(c) at k0 ¼ 400 nm. The panels on the left-
hand side correspond to Ex, while those on the right-hand side represent the Ey
component of the reflected field. The front facet of the stack is at z1 ¼ 230 nm
beyond the focal plane.
Although the alternative method employed in the above examples is faster and
more accurate than FDTD, it has the disadvantage of being restricted to geom-
etries such as those in Figures 43.3(b) and 43.3(c), where the sample consists of
one or more homogeneous layers with flat surfaces/interfaces. As soon as inho-
mogeneities or non-uniformities are introduced, the computation method based
on plane-wave superposition fails, and the FDTD method becomes an attractive
(though costly) candidate for numerical solution of Maxwell’s equations.
Reflection and transmission at a dielectric bilayer

The sample depicted in Figure 43.3(c) consists of two quarter-wave-thick dielectric
layers coated at the bottom of a glass hemisphere of index n ¼ 1.5. The layer
directly in contact with the hemisphere has n ¼ 2.0, d ¼ 50 nm, while the other
layer has n ¼ 1.5, d ¼ 67 nm. Since the layers are homogeneous and the interfaces
are flat, the method of computation based on plane-wave superposition may be
used once again to check the accuracy of the FDTD simulations.
(a) (b)
(c) (d)
–1.067 x (μm) +1.067 –1.067 x (μm) +1.067
Figure 43.7 Same as Figure 43.6 but for the transmitted beam. The distance
from the rear of the stack to the plane where the transmitted beam is observed is
z2 ¼ 30 nm.
In our FDTD calculations of the bilayer stack of Figure 43.3(c) the incident
focused beam had k0 ¼ 400 nm, the mesh size was Lx ¼ Ly ¼ 8.08k0, and the
distance from the focal plane to the top of the stack was z1 ¼ 230 nm, while that
from the bottom of the stack to the plane in which the transmitted beam is
observed was z2 ¼ 30 nm. Figures 43.6 and 43.7 show computed plots of intensity
and phase for the reflected and transmitted fields, respectively. The corresponding
distributions obtained with the alternative method of plane-wave superposition
were visually indistinguishable from those in Figures 43.6 and 43.7. The integrated
values of reflected intensity are Px ¼ 0.022 (0.019 with the alternative method) and
Py ¼ 0.0026 (both methods). The corresponding quantities for the transmitted beam
are Px ¼ 0.97 (1.01 with the alternative method) and Py ¼ 0.01 (both methods).
Once again the FDTD method is seen to be adequate for these types of calculation,
provided that a few percentage point deviation from the exact solution (caused by
discretization and numerical errors) is deemed acceptable.
(a) (b)
(c) (d)
–1.73 x (μm) +1.73 –1.73 x (μm) +1.73
Figure 43.8 Plots of reflected log_intensity_3 (top) and phase (bottom) from
the convex pit in the sample depicted in Figure 43.3(d) at k0 ¼ 650 nm. The
panels on the left-hand side correspond to Ex, while those on the right-hand side
represent the Ey component of the reflected field. The pit center is 250 nm to the
left of the focused spot center.
Reflection from convex and concave pits

The substrate shown in cross-section in Figure 43.3(d) is embossed with a sphero-
cylindrical pit having a length of 500 nm along X, width of 300 nm along Y, and
depth of 100 nm along Z (the profile of the pit in the XY-plane can also be seen in
Figure 43.2). The substrate’s index is n ¼ 1.5, and the metal film’s thickness and
complex index are d ¼ 50 nm, n þ ik ¼ 2.0 þ 7.0i. In our FDTD simulations the
incident wavelength was k0 ¼ 650 nm, the mesh size was Lx ¼ Ly ¼ 12k0, and the
front facet of the metal film was at z ¼ 280 nm beyond the focal plane. Figure 43.8
shows computed plots of reflected intensity and phase from a pit whose center
has been displaced by Dx ¼ 250 nm from the center of the focused spot. The
integrated values of reflected intensity are Px ¼ 0.82, Py ¼ 0.0025.
The pit in the above example is similar to those embossed on the plastic
substrate of a compact disk (CD) or a digital versatile disk (DVD). However, the
focused laser beam in a CD or DVD player does not shine directly onto the pit;
rather, the beam arrives through the plastic disk substrate as in Figure 43.3(e).
(a) (b)
(c) (d)
–1.73 x (μm) +1.73 –1.73 x (μm) +1.73
Figure 43.9 Same as Figure 43.8 but for the sample of Figure 43.3(e). The
objective is now corrected for the thickness and refractive index of the substrate,
so the beam focused on this concave pit continues to be the diffraction-limited
spot shown in Figure 43.4.
(a) (b)
(c) (d)
–1.3 x (μm) +1.3 –1.3 x (μm) +1.3
Figure 43.10 Plots of transmitted log_intensity_3 (top) and phase (bottom) at

k0 ¼ 650 nm through the thin metal film depicted in Figure 43.3(f). The film
contains a 400 nm-diameter circular aperture at its center. The panels on the left-
hand side correspond to Ex, while those on the right-hand side represent the Ey
component of the field. The observation plane is 20 nm past the interface
between the film and the substrate.
We simulated this case at k0 ¼ 650 nm with an FDTD mesh of dimensions

Lx ¼ Ly ¼ 8k0; the front facet of the metal film was at z ¼ 280 nm beyond the focal
plane. Figure 43.9 shows computed plots of reflected intensity and phase from
the pit of Figure 43.3(e) when the pit center is displaced by Dx ¼ 250 nm from
the center of the focused spot. The computed values of integrated intensity in this
case are Px ¼ 0.77, Py ¼ 0.022. A comparison of jExj2 distributions in Figures 43.8
and 43.9 reveals that, whereas the convex pit of Figure 43.3(d) tends to concentrate
the incoming rays toward the pit center, the concave pit of Figure 43.3(e) disperses
these rays away from the center.
Transmission through small apertures

Figure 43.3(f) shows a 20 nm-thick metal film (n þ ik ¼ 2.0 þ i7.0) with an
air-filled hole at the center, coated on a glass substrate of index n ¼ 1.5. The
(a) (b)
(c) (d)
–1.3 x (μm) +1.3 –1.3 x (μm) +1.3
Figure 43.11 Same as Figure 43.10 but with evanescent field components
filtered out.
hole is either a 400 nm-diameter circular aperture, or a bowtie-shaped aperture

400 nm-long and 300 nm-wide. In our FDTD simulations of these apertures the
wavelength was k0 ¼ 650 nm, the mesh size was Lx ¼ Ly ¼ 12k0, the front facet
of the metal film was at z1 ¼ 77 nm beyond the focal plane, and the distance
from the rear facet of the film to the plane in which the transmitted beam is
observed was z2 ¼ 20 nm.
Figure 43.10 shows computed plots of the transmitted intensity and phase for
the sample of Figure 43.3(f) containing a circular aperture. Note that, despite its
large absorption coefficient, the metal film is not thick enough to completely
block the incident beam. Thus, in addition to the light that passes through the
aperture, a weak ring of light is also transmitted through the film. The integrated
values of transmitted intensity are Px ¼ 0.35, Py ¼ 0.0086. Since the focused
cone of light consists of p- as well as s-polarized rays, the difference in
sample reflectivity for these differently polarized rays at oblique incidence is
partly responsible for the elongated shape of the transmitted intensity profile in
Figure 43.10(a). The proximity of the observation plane to the aperture ensures
that the transmitted field contains a mixture of propagating as well as evanescent
(a) (b)
(c) (d)
–1.3 x (μm) +1.3 –1.3 x (μm) +1.3
Figure 43.12 Plots of transmitted log_intensity_3 (top) and phase (bottom)

through the 20 nm-thick metal film depicted in Figure 43.3(f). The bowtie
aperture at the center of the film is 400 nm-long along X and 300 nm wide along
Y; the rectangular neck of the bowtie is 60 nm wide. The panels on the left-hand
side correspond to Ex, while those on the right-hand side represent the Ey
component of the transmitted field.
plane waves. If these evanescent components are filtered out, then the remaining
field will propagate undiminished to the far field. The filtered field in the
same observation plane (i.e., at z2 ¼ 20 nm beyond the interface between the
metal film and the substrate) is shown in Figure 43.11. The integrated intensities
of the X- and Y-components of polarization in these calculations are found to
be Px ¼ 0.31, Py ¼ 0.002.
For the bowtie aperture in the thin-film sample of Figure 43.3(f), computed
plots of transmitted intensity and phase are shown in Figure 43.12. The
computed integrated intensities in this case are Px ¼ 0.194, Py ¼ 0.04. When the
evanescent content of the transmitted field is filtered out, the distributions
shown in Figure 43.13 are obtained. (The integrated intensity values now drop
to Px ¼ 0.093, Py ¼ 0.005.) Note that the bowtie shape of the aperture is no
longer discernible in the filtered transmitted beam, ostensibly because the fine
features of this aperture contribute primarily to the evanescent field.
(a) (b)
(c) (d)
–1.3 x (μm) +1.3 –1.3 x (μm) +1.3
Figure 43.13 Same as Figure 43.12 but with evanescent field components
filtered out.
When the bowtie aperture was rotated 90 in the plane of the metallic film (to
make the incident E-field perpendicular to the line that connects the sharp ends of
the triangles), the computed integrated intensities dropped to Px ¼ 0.1, Py ¼ 0.012
before filtering and Px ¼ 0.047, Py ¼ 0.0035 after filtering. The transmission
efficiency of the bowtie aperture is thus seen to drop by nearly a factor of 2.0
when the incident polarization goes from being parallel to the line that connects
the sharp ends of the triangles to being perpendicular to it.

1 K. S. Yee, IEEE Trans. Antennas and Prop. 14, 302–307 (1966).
2 A. Taflove and S. C. Hagness, Computational Electrodynamics, Artech House,
Norwood, MA (2000).
44
The Ronchi test
In the 1920s Vasco Ronchi developed the well-known method of testing optical
systems now named after him.1,2 The essential features of the Ronchi test may be
described by reference to Figure 44.1. A lens (or more generally, an optical
system consisting of a number of lenses and mirrors) is placed in the position
of the “object under test”. The lens is then illuminated with a beam of light,
which, for the purposes of the present chapter, will be assumed to be coherent and
quasi-monochromatic. These restrictions on the beam may be substantially
relaxed in practice.3
The lens brings the incident beam to a focus in the vicinity of a diffraction
grating, which is placed perpendicular to the optical axis, i.e., the Z-axis. The
grating, also referred to as a Ronchi ruling, may be as simple as a low-frequency
wire grid or as sophisticated as a modern short-pitched, phase/amplitude grating.
The position of the grating should be adjustable in the vicinity of focus, so that it
may be shifted back and forth along the optical axis. The grating breaks up the
incident beam into multiple diffracted orders, which will subsequently propagate
along Z and reach the lens labeled “pupil relay” in Figure 44.1.
The pupil relay may simply be the lens of the eye, which projects the exit pupil
of the object under test onto the retina of the observer. Alternatively, it may be a
conventional lens that creates a real image of the exit pupil on a screen or on a
CCD camera.
The diffracted orders from the grating will be collected by the relay lens and,
within their overlap areas, will create interference fringes characteristic of the
aberrations of the optical system under consideration. By analyzing these fringes,
one can determine the type and, with some effort, the magnitude of the aberra-
tions present at the exit pupil of the system.
The above description of the Ronchi test relies on its modern interpretation;
this is based on our current understanding of physical optics and the theory of
diffraction gratings. Historically, however, the gratings used in the early days
614
Object
under test Grating Pupil
(Ronchi ruling) relay
Observation
plane
Figure 44.1 A beam of coherent, quasi-monochromatic light is brought to

focus by an optical system that is undergoing tests to determine its aberrations. A
diffraction grating, placed perpendicular to the optical axis in the vicinity of
focus, breaks up the incident beam into several diffraction orders. The diffracted
orders propagate, independently of each other, and are collected by a pupil relay
lens, which forms an image of the exit pupil of the object under test at the
observation plane.
were quite coarse, and the results obtained with them required no more than a
simple geometric-optical theory for their interpretation. Typically, one would
place the eye at the focus of the lens and hold a grating (e.g., a wire grid) in front
of the eye, moving the grating in and out until a clear pattern became visible. At
this point the beam would be illuminating several of the wires simultaneously. By
looking through the grating and observing the shadows that the wires cast on the
exit pupil, one could determine the type of aberration present in the system. The
coarseness of the grating, of course, caused several of the diffracted orders (as we
understand them today) to overlap each other, thus resulting in reduced contrast
and smearing of the patterns near the boundaries. These problems were eventu-
ally overcome when finer gratings became available and the diffraction theory of
the Ronchi test was better understood.
Choosing an appropriate grating

For best results the pitch of the grating should be chosen such that, as shown in
Figure 44.2, no more than two diffraction orders will overlap at any given point.
To determine the appropriate grating period P, one needs to know the wavelength
k0 of the beam used for testing, and the numerical aperture NA of the focused
cone of light. (By definition, NA ¼ sin h, where h is the half-angle subtended by
the exit pupil of the lens at its focal point. If the lens under test is being used at
full aperture, NA will also be equal to 0.5 divided by the lens’s f-number.) To
avoid multiple overlaps among diffracted orders, the angle between adjacent
orders must exceed the focused cone’s half-angle. Now, it is well known in the
theory of diffraction gratings that, at normal incidence, sin hn ¼ nk0 /P where n, an
integer, is the order of diffraction and hn is the corresponding deviation angle
from the surface normal. Therefore, we arrive at the conclusion that P should be
less than or equal to k0 /NA. For example, assume that the lens under test has a
–2 –1 0 1 2
Figure 44.2 Several diffracted orders in the far field of the grating of
Figure 44.1. When the grating’s period is chosen properly, each diffracted order
(i.e., emergent cone of light) will overlap only with its nearest neighbors. Except
for a lateral shift in position, the various orders are identical, carrying the
amplitude and phase distribution of the beam as it appears at the exit pupil of the
object under test.
Figure 44.3 Distribution of intensity at the observation plane of Figure 44.1 in

the absence of aberrations. The pupil relay lens is chosen to have the same
numerical aperture as the object under test, thereby limiting the collected light to
the zeroth-order beam and to those portions of the first-order beams that
overlap the zeroth-order beam.
numerical aperture NA ¼ 0.5. Then, if the grating period is chosen to be 2k0,

each diffracted order will deviate from the zero-order beam by 30 , making the
þfirst-order beam just touch the first-order beam in the far field.
Figure 44.3 shows the computed intensity distribution at the observation plane
of an aberration-free system in which the relay lens has the same numerical
aperture as the lens under test (NA ¼ 0.5). This equality of the numerical aper-
tures means that only the zeroth-order diffracted beam will be fully transmitted to
the observation plane. Of the first-order beams, only those portions that overlap
the zero order will reach the observation plane. The period of the grating in this
example has been a little less than k0 /NA, leaving a small gap between þfirst
order and first order. The absence of aberrations means that the phase distri-
bution over the cross-sections of the various diffracted orders is uniform and,
therefore, no interference fringes are to be expected.
Ronchigrams for primary or Seidel aberrations

Figure 44.4 shows the computed patterns of intensity distribution at the
observation plane of Figure 44.1 corresponding to different types of primary
(Seidel) aberrations of the lens. For these calculations we fixed the distance between
the lens under test and the relay lens and then placed the grating at the paraxial focus
of the converging wavefront. The pattern in Figure 44.4(a) was obtained when we
assumed the presence of three waves of curvature (or defocus) at the exit pupil of the
lens. Different amounts of defocus would create essentially the same pattern, albeit
with a different number of fringes. In Figure 44.4(b) we observe the fringes arising
from the presence of three waves of third-order spherical aberration in the test
system. The shapes of these fringes depend not only on the magnitude of the
aberration but also on the position of the grating relative to the focal plane. (We will
have more to say about this point later.) Figure 44.4(c) shows the fringes that would
arise when three waves of primary astigmatism are present. When the orientation of
the astigmatism changes, the fringes will remain straight lines but their orientation
within the observation plane will change accordingly.
The last three frames in Figure 44.4 represent the effects of third-order coma. A
change in orientation of this aberration causes the interference pattern to change
drastically. Figures 44.4(d)–(f) correspond to three waves of coma oriented at 0 ,
45 , and 90 , respectively.
a b c
d e f
Figure 44.4 Computed plots of intensity distribution at the observation plane

of Figure 44.1. The lens under test is assumed to have three waves of primary
(Seidel) aberrations; the grating is at the nominal focal plane of the lens. (a)
Defocus, (b) spherical aberration, (c) astigmatism oriented at 45 , (d) coma at 0 ,
(e) coma at 45 , (f) coma at 90 .
Sliding the grating along the optical axis

A change in the position of the grating relative to the focal plane influences
the observed fringe pattern. We limit our discussion to the case of spherical
aberration, although similar analyses could be performed for other aberrations.
Assuming three waves of spherical aberration as before, we obtain the pat-
terns displayed in Figure 44.5 as we slide the grating along the optical axis in
the system of Figure 44.1. Once again, we have taken the lens under test to
have NA ¼ 0.5 and f ¼ 6000k0. The paraxial focus of the lens under test
coincides with the front focal point of the relay lens, and the grating is shifted
by different amounts Dz relative to this common focus. Frames (a)–(f) in
Figure 44.5 correspond to different values of Dz, starting at Dz ¼ 10k0 in (a)
and moving forward to Dz ¼ þ25k0 in (f). In the process, as the grating moves
through paraxial focus and towards marginal focus, we observe a rich variety
of patterns that aid us in determining the nature and the magnitude of the
aberration.
To be sure, the Ronchi test is not the only scheme used during the fabrication
and evaluation of optical systems; several other tests exist and their relative
a b c
d e f

Figure 44.1, showing the patterns obtained by sliding the grating along the optical
axis. The lens under test (NA ¼ 0.5, f ¼ 6000k0) is assumed to have three waves of
primary spherical aberration, and its paraxial focus is coincident with the focal
point of the relay lens. The grating is moved along the optical axis by an amount
Dz relative to the (common) focal plane; positive distances are towards
the marginal focus. (a) Dz ¼ 10k0, (b) Dz ¼ 0, (c) Dz ¼ 10k0, (d) Dz ¼ 15k0,
(e) Dz ¼ 20k0, (f) Dz ¼ 25k0.
merits have been expounded in the literature.3 It is useful here to examine some
of these alternative methods and to compare the resulting patterns (interferograms
or otherwise) with those obtained with the Ronchi test.
Testing by interfering with a reference plane wave

Figure 44.6 shows the schematic diagram of a Mach–Zehnder interferometer,
which is one among many that can be used to evaluate the aberrated wavefronts
directly. In this system a coherent monochromatic beam of light is sent through
the lens under test, is collected and recollimated by a well-corrected lens, and is
made to interfere with a reference beam that has been split off the incident
wavefront. The flat mirror shown in the lower left side of the interferometer is
mounted on a tip–tilt stage that allows the introduction of a small amount of tilt in
the reference beam. Figure 44.7 shows the computed patterns of intensity dis-
tribution at the observation plane of the Mach–Zehnder interferometer corres-
ponding to three waves of primary coma. In obtaining the various frames of
Figure 44.7 we have fixed all the system parameters and only varied the tilt of the
reference beam. Note that the characteristic fringes of coma in Figure 44.7 are quite
different from those of coma in the Ronchi test, shown in Figures 44.4(d)–(f).
Incidentally, the patterns of Figure 44.7 show similarities with the Ronchigrams of
spherical aberration displayed in Figure 44.5. This is not a coincidence; it is rooted
in the algebraic forms of the aberration function for third-order coma (q3 cos ) and
Object
under test
Beam-splitter Pupil
relay
Mirror
Observation
Mirror plane
Beam-splitter
Figure 44.6 Schematic diagram of a Mach–Zehnder interferometer that might

be set up for a direct measurement of wavefront aberrations. The pupil relay lens
(itself free from aberrations) forms at the observation plane an image of the exit
pupil of the lens under test. A fraction of the incident beam is diverted from its
original path and sent to the observation plane by means of the various mirrors
and beam-splitters. The observed fringes are characteristic of the aberrations
present at the exit pupil of the lens under test. A small tilt of the mirror shown at
the lower left side of the figure would introduce a linear phase shift on the
reference beam. This tilt is generally useful in producing signature fringe pat-
terns at the observation plane.
a b c
d e f

Figure 44.6. The lens under test (NA ¼ 0.5, f ¼ 6000k0) is assumed to have three
waves of primary coma, and its nominal focus is coincident with the focal point of
the relay lens. The tilt angle w of the reference beam increases progressively from
(a) to (f): (a) w ¼ 0.1 , (b) w ¼ 0 , (c) w ¼ 0.05 , (d) w ¼ 0.07 , (e) w ¼ 0.1 ,
(f) w ¼ 0.18 .
spherical aberration (q4) and also in the fact that a Ronchigram, being a kind of
shearing interferogram (albeit with a large shear), is related to the derivative of the
wavefront aberration function.
Knife-edge and wire tests

A schematic diagram of the knife-edge method of testing optical systems is
shown in Figure 44.8. A geometric-optical interpretation of this test suffices for
most practical purposes: the knife-edge blocks different groups of rays in its
various positions along the optical axis, allowing the remaining rays to reach the
observation plane.3 Another method of testing, known as the wire test, is quite
similar to the knife-edge method, being obtained from it by substituting for the
knife-edge a length of fine wire.3
Since the grating in the Ronchi test may be thought of as a series of parallel
knife-edges or, more aptly, a series of parallel wires, it should not come as a
surprise that similarities exist between Ronchigrams and the patterns observed in
these other tests. In fact, early attempts at explaining the results of Ronchi’s method
were based on geometrical optics, and considered the grating as a set of parallel
wires whose shadows produced the observed patterns.4 We will not delve into these
matters, but simply draw the reader’s attention to Figures 44.9 and 44.10, where we
Object Knife-edge Pupil
under test relay
Observation
plane
Figure 44.8 In the knife-edge test a certain region in the vicinity of focus is
blocked by a knife-edge; the nature and the magnitude of the aberrations are then
inferred from the resulting patterns of intensity distribution at the observation
plane. (The knife-edge may be moved both along and perpendicular to the
optical axis.) The wire test is similar to the knife-edge test except that a fine wire
is used instead, to block certain groups of rays.
a b
c d
Figure 44.9 Computed plots of intensity distribution at the observation plane

of Figure 44.8 corresponding to the knife-edge test carried out with a laser beam.
The lens under test (NA ¼ 0.5, f ¼ 6000k0) and the pupil relay lens (NA ¼ 0.5) are
assumed to be fixed in their respective positions, while the knife-edge moves
along the optical axis. (The tip of the knife remains on the axis at all times.) The
lens under test is assumed to have three waves of primary spherical aberration. In
frames (a) to (d) the distance of the knife-edge from paraxial focus Dz ¼ 15k0,
0, þ15k0, and þ20k0, respectively. (Positive distances are in the direction of the
marginal focus.)

Figure 44.8 corresponding to the wire test with an extended, quasi-monochromatic
light source. The lens under test (NA ¼ 0.5, f ¼ 6000k0) has three waves of primary
spherical aberration. The assumed wire diameter is 15k0, which is comparable to the
size of the image of the extended light source, as measured in the vicinity of focus.
In (a) the wire is centered on axis and is 25k0 away from paraxial focus (in the
direction of the marginal focus). In (b) the wire is again centered on the axis, but is
20k0 away from paraxial focus. In (c) the wire has been shifted 0.5k0 off-axis while
its distance from paraxial focus remains at 20k0.
show several computed patterns of intensity distribution for the knife-edge and
wire tests, respectively.
The results of the simulated knife-edge test depicted in Figure 44.9 assume a
laser as the light source. Consequently, frames (a) and (b) of Figure 44.9 exhibit
several dark lines which, with a less coherent light source, would have been
absent. The results of the simulated wire test shown in Figure 44.10 assume an
extended light source, since the small amount of spherical aberration present in
the system under consideration would render the test useless with a wire, which
fine as it may be, will still be wider than the focused spot produced by a laser
beam. Note the similarities between the patterns of Figures 44.9 and 44.10 on the
one hand, and those of Figures 44.5(d)–(f) on the other.
Extensions of the Ronchi test

Several modifications and extensions of the Ronchi test have appeared over the
years, and have helped to solve specific problems in testing of optical systems.3
As an example we mention the double-frequency grating lateral-shear interfer-
ometer invented by James Wyant in the early 1970s.5 The grating in this device
has two slightly different frequencies, which give rise to two +first-order beams
as well as two first-order beams; the beams in each pair are slightly shifted
relative to each other. Moreover, the (average) pitch of the grating is such that
there is no overlap between the zeroth, +first and first orders. Consequently,
interference occurs between the two +first-order beams (and, likewise, between
the two first-order beams). One can thus obtain an arbitrarily small lateral shear
of the wavefront under test and use the results to achieve accurate quantitative
measurements.
A two-dimensional version of the double-frequency grating has also been
employed to generate lateral wavefront shear simultaneously along the X- and
Y-axes. (Remember that beam propagation is along Z and, therefore, X and Y are
orthogonal axes in the plane of the grating.) In the absence of a two-dimensional
grating, one must rotate a one-dimensional grating by 90 to obtain wavefront
shear first along the X- and then along the Y-axis.

1 V. Ronchi, Le Frange di Combinazioni Nello Studio delle Superficie e dei Sistemi
Ottici, Riv. Ottica Mecc. Precis. 2, 9 (1923).
2 V. Ronchi, Due Nuovi Metodi per lo Studio delle Superficie e dei Sistemi Ottici, Ann.
Sc. Norm. Super. Pisa 15 (1923).
3 D. Malacara, ed., Optical Shop Testing, second edition, Wiley, New York, 1992.
4 G. Toraldo di Francia, Geometrical and interferential aspects of the Ronchi test, in
Optical Image Evaluation, National Bureau of Standards Circular 526, issued April
29, 1954.
5 J. C. Wyant, Double frequency grating lateral shear interferometer, Appl. Opt. 12,
2057 (1973).
45
The Shack–Hartmann wavefront sensor
Roland Shack invented the device now known as the Shack–Hartmann wavefront
sensor in the early 1970s.1,2 This sensor, which in recent years has been com-
mercialized, measures the phase distribution over the cross-section of a given
beam of light without relying on interference and, therefore, does not require a
reference beam.
The standard method of wavefront analysis is interferometry, where one brings
together on an observation plane the beam under investigation (hereinafter the
test beam) and a reference beam in order to form tell-tale fringes.3 The trouble
with interferometry is that it requires a reference beam, which is not always
readily available. Moreover, the coherence length of the light used in these
measurements must be long compared with the path-length difference between
the reference and test beams. Thus, when the available light source happens to be
broad-band, it becomes difficult (though by no means impossible) to produce
high-contrast fringes. The Shack–Hartmann instrument solves these problems by
eliminating altogether the need for the reference beam.
Wavefront analysis by interferometry

Before embarking on a discussion of the Shack–Hartmann wavefront sensor, it
will be instructive to describe the operation of a conventional interferometer.
Consider, for instance, the system of Figure 45.1, where a spherical mirror is
under investigation. While grinding and polishing the glass blank, the optician
frequently performs this type of test to determine departures of the surface from
the desired figure. A point source reflected from a 50/50 beam-splitter is used to
illuminate the test mirror. Before arriving at the mirror, however, the beam is
partially reflected from the spherical surface of a plano-convex lens attached to
the front facet of the beam-splitter cube (i.e., the spherical cap). The center of
curvature of this spherical cap is at C, which is also the virtual image of the point
624
Monochromatic
Light Source
Condenser
Pinhole
Spherical
Cap
C
C′
Beam-splitter Cube
Observation
Plane
Test Mirror
Figure 45.1 The Shack cube is used here to measure the surface quality of a
spherical mirror. The cube is a 50/50 beam-splitter capped by an index-matched
plano-convex lens. The light from the point source is partially reflected from this
spherical cap, producing a reference beam that comes to focus at C. The beam
that passes through the cap illuminates the test mirror, then returns and crosses
the cube and is focused at C 0 . The interference pattern between the test and
reference beams is viewed at the observation plane. The cube’s axis is slightly
displaced from the axis of the mirror in order to separate C from C 0 , which is
needed for producing straight-line fringes.
source in the beam-splitter’s half-silvered mirror. The light reflected from the
spherical cap (and focused at C) forms the reference beam. (Incidentally, this
interferometer was also invented by Roland Shack in the 1970s, and is now
known as the Shack cube.4)
Note in Figure 45.1 that the pinhole is placed directly on the face of the beam-
splitter to eliminate possible aberrations of the beam upon entering and exiting
the cube. The reflectivity of the spherical cap is about 4%, which is similar to that
of the uncoated test mirror. The equal-strength test and reference beams thus
produce a high-contrast fringe pattern. Figure 45.2(a) shows a typical phase
distribution over the cross-section of a test beam reflected from a mirror having
several waves of aberration. The computed interference pattern between this and
an equal-strength reference beam is shown in Figure 45.2(b). Needless to say, the
fringe contrast is excellent and the observed fringes may be related directly to the
wavefront aberrations. In general, the coherence length of the light source must
be long enough to ensure that, at the observation plane, the test and reference
–3100 x/ 3100
Figure 45.2 (a) A typical phase distribution in the cross-section of a mono-

chromatic test beam (wavelength k). The gray-scale covers the range from
180 (black) to þ180 (white). (b) The interference pattern obtained by adding
the test beam in (a) to a reference plane wave of equal magnitude.
beams remain mutually coherent. For testing small mirrors having a short focal
length (say, less than 10 cm) a single radiation line of an arc lamp may suffice,
but for larger mirrors a long-coherence-length laser is usually necessary.
In practice, the center of curvature of the test mirror is slightly displaced from C,
as shown in Figure 45.1, so that the rays bouncing off the mirror and arriving at the
exit facet of the cube would converge not to C, but to a nearby point C 0 . This small
lateral displacement of the test beam relative to the reference beam produces
straight-line fringes in the interferogram at the observation plane. Such fringes are
very sensitive to small aberrations of the mirror, and their deviation from linearity
can be related easily to minute surface errors. If the errors are large, however, there is
no need for straight-line fringes, and the center of the test mirror can coincide with C.
The combination of the cube beam-splitter and the spherical cap may be
considered a thick lens. This lens projects a real image of the test mirror in the
space behind the cube. The best place to observe the fringes, therefore, is at the
location of this image, where the fringes are localized on the mirror, and the
observer can readily identify areas that need further grinding and polishing.
Another advantage is that scratches and dust particles on the mirror come to focus
at its image, thus eliminating spurious fringes of the scattered light that downgrade
the quality of the interferograms obtained at other locations in the image space.
(The spherical cap, of course, must be kept clean at all times to prevent dust
particles that have collected there from producing their own spurious fringes.)
The test mirror depicted in Figure 45.1 does not have to be spherical, but may
be a mild paraboloid or hyperboloid whose center of curvature is, as before,
placed at or near the point C. The departure of the mirror’s figure from sphericity
imparts a certain amount of spherical aberration to the test beam, which may be
calculated in advance. The optician then looks for aberrations above and beyond
this expected amount of spherical aberration in order to determine the necessary
corrections.
Large telescope mirrors may also be tested with a Shack cube, but they require
the use of an additional lens system known as a null-corrector.3 A telescope’s
primary mirror is generally a large paraboloid or hyperboloid designed for
operation at “infinite conjugate”, that is, it brings the collimated beam of a distant
star to focus within the mirror’s focal plane. Testing such a large mirror with a
collimated beam is impractical, however, and its actual departure from sphericity
is too severe to be simply subtracted from the interferograms obtained in the
system of Figure 45.1. In such situations, a null-corrector is designed to cancel
the spherical aberrations of a test beam originating from a point source located at
the mirror’s center of curvature. When a properly calibrated null-corrector is
inserted between the Shack cube and the test mirror, the observed interferogram
registers only the departure of the mirror from its desired figure.
The Shack–Hartmann wavefront sensor

Figure 45.3 shows a schematic of the Shack–Hartmann wavefront sensor. This
device is in many ways superior to conventional interferometers: since it does not
require a reference beam, it is simpler to align and to operate and, because it does
not rely on interference, it may be used with a white light source. At the heart of
the instrument is a lenslet array and a charge-coupled device (CCD) camera. The
lenslets are identical and have a fairly large f-number, and the CCD detector,
whose number of pixels is much greater than the number of lenslets, is placed at
the focal plane of the lenslet array.
Upon entering the system, the (aberrated) test beam is collimated and expanded
or reduced, as necessary, to match the dimensions of the lenslet array. The array
CCD array
Test beam
Beam
expander Lenslet array
Figure 45.3 The basic elements of a Shack–Hartmann wavefront sensor. Upon

entering the system, the (aberrated) test beam is processed to yield a collimated
beam with a diameter that matches the dimensions of the lenslet array. Each
lenslet captures a fraction of the beam and brings it to focus within its focal
plane, where a CCD camera monitors the intensity distribution. The CCD has
many more pixels than there are lenslets, so the location of each focused spot is
determined simply by identifying the illuminated pixel within the relevant sub-
array of pixels. The complete wavefront may be reconstructed from knowledge
of the positions of individual focused spots.
typically consists of n · n identical lenslets, where n is around 100. Each lenslet

thus acts on a small patch of the wavefront and brings it to focus within its focal
plane. Located at this focal plane is the CCD array with m · m pixels, where m is
typically around 1000. Consequently, each focused spot is assigned an exclusive
sub-array of the CCD containing (m /n) · (m/n) pixels. If the patch of the wavefront
captured by a given lenslet happens to have uniform phase, it forms a bright spot at
the focal point of the lenslet, that is, at the center of the corresponding CCD sub-
array. However, if the phase distribution at the lenslet’s pupil happens to be non-
uniform, the focused spot appears at a different location in the focal plane (but still
within the associated CCD sub-array).
Figure 45.4 shows the computed distribution of intensity at the focal plane of a
6 · 6 lenslet array illuminated by the test beam of Figure 45.2(a). It may be
verified that the center of each focused spot is shifted away from its respective
frame’s center by an amount proportional to the slope of the corresponding
segment of the incident wavefront.
In practice, individual lenslets are very small compared to the test beam’s
diameter. Consequently, the incident phase distribution at the pupil of any given
lenslet may be approximated by a linear function of the pupil coordinates. It is
thus clear that the aberrations of each focused spot are negligible; all one needs to
know is the shift of the spot from the focal point of its associated lenslet, which is
readily obtained by examining the CCD’s output. So long as the focused spots
remain within their allotted sub-array of detectors, the system can compute
the local slope of the wavefront at the entrance pupil of each and every lenslet.
–3000 x/ 3000
Figure 45.4 Intensity distribution at the focal plane of a 6 · 6 lenslet array, when
the incident beam is assumed to have the phase distribution of Figure 45.2(a).
Each square lenslet is 1000k · 1000k in size, and has focal length f ¼ 25 000k. The
logarithmic plot of intensity shown here reveals the fine detail of the distribution at
the CCD array. In practice, the fine detail is rather faint and only the center of each
spot is detected by the CCD.
The local slopes are then patched together to reconstruct the complete phase
distribution over the cross-section of the test beam.
Historical notes
The predecessor to the Shack–Hartmann sensor was Hartmann’s screen test,
which used an array of holes in place of the lenslets.3,5,6,7 Shack realized the
advantages of using a lenslet array and set out to fabricate one, since no such
array with the characteristics he desired was available at the time. He made a
mold by using a cutting tool to carve parallel grooves in a piece of flat glass, as
shown in Figure 45.5. Two such pieces of grooved glass, oriented at right angles
to each other, were clamped to an acrylic sheet and heated in an oven to mold
Figure 45.5 A piece of flat glass on which identical grooves were carved
served as a mold for the early lenslet arrays. Two such pieces were prepared
and placed face-to-face across a plastic sheet at right angles to each other.
The assembly was then heated in an oven to transfer the pattern of the mold
to the plastic sheet. The 1 mm-wide grooves had a depth of only a few
micrometers.
convex ribs on each side of the acrylic sheet, thus forming an array of crossed
cylindrical lenses. The first such array had 50 · 50 lenslets, each with an area of
1 · 1 mm2 and a focal length of 150 mm.
Before the advent of CCD detectors in the 1980s, wavefront analysis was done
by examining a photographic plate exposed to the array of focused spots. The
plate was also exposed (simultaneously and through the same array of lenslets) to
a parallel, aberration-free reference beam. The spots formed by this reference
beam marked the center of each frame, thus providing reference points for
measuring the displacement of the spots formed by the test beam. The tedious
task of exposing and developing the photographic plate, followed by painstak-
ingly measuring the positions of individual spots, was rewarding nonetheless; it
allowed astronomers to measure the aberrations of their telescopes in the field
using unfiltered star light. Even atmospheric turbulence did not pose a serious
problem for this method, since its effects were simply averaged over during the
relatively long exposure time of the photographic plate.

1 R. V. Shack and B. C. Platt, Production and use of a lenticular Hartmann screen
(abstract only), J. Opt. Soc. Am. 61, 656 (1971).
2 R. Riekher, Fernrohre und ihre Meister, Verlag Technik, Berlin, 1990.
3 D. Malacara, Optical Shop Testing, second edition, Wiley, New York, 1992.
4 R. V. Shack and G. W. Hopkins, The Shack interferometer, SPIE 126, Clever Optics,
139–142 (1977).
5 J. Hartmann, Bemerkungen uber den Bau und die Justirung von Spektrographen,
Z. Instrum. 20, 47 (1900).
6 J. Hartmann, Objektivuntersuchungen, Z. Instrum. 24, 33 (1904).
7 R. Kingslake, The absolute Hartmann test, Trans. Opt. Soc. 29, 133 (1927–1928).
46
Ellipsometry
The goal of ellipsometry is to determine the optical and structural constants

of thin films and flat surfaces from the measurements of the ellipse of polar-
ization in reflected or transmitted light.1,2,3,4,5 In the absence of birefringence
and optical activity a flat surface, a single-layer film, or a thin-film stack may
be characterized by the complex reflection coefficients rp ¼ jrpjexp(irp) and
rs ¼ jrsjexp(irs) for p- and s-polarized incident beams, as well as by the
corresponding transmission coefficients tp ¼ jtpjexp(itp) and ts ¼ jtsjexp(its).6
Strictly speaking, an ellipsometer is a device that measures the complex ratios
rp /rs and/or tp /ts. The amplitude ratios are usually deduced from the angles wr
and wt, which are defined by tan wr ¼ jrpj/jrsj and tan wt ¼ jtpj/jtsj. In practice,
measuring the individual reflectivities Rp ¼ jrpj2, Rs ¼ jrsj2 or transmissivities
Tp ¼ jtpj2, Ts ¼ jtsj2 does not require much additional effort. Measuring the indi-
vidual phases, of course, is difficult, but the relative phase angles rp rs and
tp ts can be readily obtained by ellipsometric methods. The values of Rp, Rs,
rp rs, wr, Tp, Ts, tp ts and wt may be measured as functions of the angle of
incidence, h, or as functions of the wavelength of the light, k, or both.
The results of ellipsometric measurements are fed to a computer program that
searches the space of unknown parameters to find agreement between the
measured data points and theoretical calculations.5 The unknown parameters of
the sample usually include thickness, refractive index, and absorption coefficient
of one or more layers. In general, the larger is the collected data set, the more
accurate will be the estimates of the unknown parameters or the greater will be
the number of unknowns that can be estimated. The relationship between the
measurables and the unknowns is usually nonlinear, and there is no a priori
guarantee that the various measurements on a given sample are independent of
each other, nor that a given set of measurements is sufficient for determining the
unknowns. Powerful numerical algorithms exist that search the space of unknown
parameters and find estimates that closely reproduce the measured data.
632
46 Ellipsometry 633
The nulling ellipsometer

Figure 46.1 is the diagram of a conventional nulling ellipsometer.4,5 The quasi-
monochromatic light of wavelength k enters a rotatable polarizer, whose trans-
mission axis may be oriented at an arbitrary angle qp relative to the X-axis. The
polarizer’s output is thus a collimated, linearly polarized beam of light with an
adjustable E-field orientation. This beam goes through a quarter-wave plate
(QWP) whose fast and slow axes are fixed at 45 to the X-axis. (The QWP
imparts a relative 90 phase shift to the E-field components along its axes.) The
beam emerging from the QWP has equal amplitudes along X and Y, that is,
jExj ¼ jEyj. The phase difference between these E-field components is adjustable
in accordance with the following relation: x y ¼ 2(qp 45 ).
Reflection from the sample imparts a phase difference rp rs to the p- and
s-components of the beam, which may be cancelled out by properly selecting the
r
cto
cte
ra
De
zer
aly
X An
ns
Le
rp
45˚
u
Z
Light source Polarizer Quarter-wave Lens

Y plate Sample
Figure 46.1 In a nulling ellipsometer the collimated beam of light emerging

from the source (wavelength k) is linearly polarized along the direction qp by a
rotatable polarizer. The quarter-wave plate’s axes are typically at 45 to the
XZ-plane of incidence. Thus the beam incident on the sample has equal amounts
of p- and s-polarization, the relative phase between these two components
depending on qp. Reflection from the sample induces a phase shift rp rs
between the p- and s-components, which may be cancelled out by adjusting the
polarizer’s orientation. Subsequently, the analyzer in the detection arm is rotated
to extinguish the light transmitted to the detector. In the null condition, the value
of qp yields the sample’s phase shift rp rs while the analyzer angle qa yields
the ellipsometric parameter wr, which is related to the amplitude ratio jrpj/jrsj of
the reflection coefficients.
polarizer angle qp. At this point the reflected beam is linearly polarized, its E-field
components along X and Y being proportional to jrpj and jrsj, respectively. In the
reflected path the analyzer, whose transmission axis is also adjustable, is rotated
through an angle qa ¼ tan1(jrpj/jrsj)¼wr to block the light that would otherwise
reach the detector. Thus by measuring the values of qa and qp that null the detector’s
signal, one obtains the amplitude ratio jrpj/jrsj and the relative phase rp rs of the
sample’s reflection coefficients.
Measuring the sample reflectivities Rp, Rs using a nulling ellipsometer is
straightforward; all one needs to do is monitor the detector signal S at qa ¼ 0 and 90 .
Calibration requires removing the sample and aligning the arms of the ellipsometer
with each other (i.e., h ¼ 90 ), in which case the light from the source goes through
the entire system and yields a detector signal corresponding to a 100% sample
reflectivity. Optical power fluctuations could be countered by splitting off a small
fraction of the beam at the source and monitoring its variations with an auxiliary
detector. The signal from the auxiliary detector is subsequently used to normalize the
reflectivity signals.
Needless to say, the same types of measurement as discussed above, when
performed on the transmitted beam, yield the values of Tp, Ts, tp ts and wt.
Thin film on transparent substrate

Figure 46.2 shows a sample consisting of a thin absorbing layer on a glass substrate.
To allow the transmitted beam to exit the substrate without a change in its state of
Ep Reflected beam
Es
u
Incident beam
d
Substrate
Transmitted beam
Figure 46.2 A 25 nm-thick film of complex refractive index n þ ik ¼ 4.5þ1.75i

is deposited on a hemispherical glass substrate (n0 ¼ 1.5). The probe beam has
k ¼ 633 nm and is incident at h ¼ 60 . To avoid complications arising from
reflections or losses at the substrate bottom, the hemispherical surface is antire-
flection coated.
46 Ellipsometry 635
polarization, and also to eliminate spurious reflections, an antireflection-coated

hemispherical substrate is assumed. The film, which is 25 nm thick, has a complex
index of refraction nþik ¼ 4.5þ1.75i, and the substrate’s refractive index is
n0 ¼ 1.5. Computed values of the sample’s reflection and transmission character-
istics at k ¼ 633 nm and h ¼ 60 are: Rp ¼ 29.63%, Rs ¼ 74.83%, rp rs ¼ 3.95 ,
wr ¼ 32.18 , Tp ¼ 24.13%, Ts ¼ 6.96%, tp ts ¼ 1.50 , wt ¼ 61.76 .
We now examine the sensitivity of ellipsometric measurements to variations in
the sample parameters. For example, if the refractive index n of the film is varied in
the range 4.0 to 5.0, the various characteristics of the sample vary as in Figure 46.3.
(The variations shown here are relative to the nominal sample characteristics
evaluated at n ¼ 4.5.) It is seen that Rp, Rs, wr, wt are more sensitive to changes of
n than Tp, Ts, rp rs and tp ts. Similarly, Figure 46.4 shows variations of
the sample characteristics with changes in k. Here Tp, rp rs and tp ts are
seen to be more sensitive to k than Rp, Rs, Ts, wr, wt. Figure 46.5 shows variations
of the sample characteristics with a changing film thickness d in the range from
20 nm to 30 nm. In this case wt and, to some extent, wr are insensitive to d, but
the remaining characteristics are quite sensitive.
When all the components of the system are assumed to be perfect, the
ellipsometer is sensitive enough to determine accurately the unknown sample
parameters. In practice, however, no measurement system is perfect: the polarizer
and the analyzer have a finite extinction ratio, allowing a small fraction of
(a) (b)
6
ΔRp 2
Δcr
4
Angle Varitation (degrees)
ΔTp
R and T Variation (%)
1 Δct
2 ΔRs
ΔTs
0 0
Δ(ftp– fts)
Δ(frp– frs)
–2
–1
–4
–2
–6
4.0 4.2 4.4 4.6 4.8 5.0 4.0 4.2 4.4 4.6 4.8 5.0
n n
Figure 46.3 Variations of the reflection and transmission characteristics of the

sample of Figure 46.2 at k ¼ 633 nm, h ¼ 60 , when the film’s refractive index n
is varied from 4.0 to 5.0. The changes are relative to the nominal values obtained
with n ¼ 4.5.
(a) (b)
3
2
ΔTp Δ(frp– frs)
2
Angle Variation (degrees)

1
1 ΔTs
ΔRp
Δcr Δct
0 0
ΔRs
Δ(ftp– fts)
–1
–1
–2
–3 –2
1.5 1.6 1.7 1.8 1.9 2.0 1.5 1.6 1.7 1.8 1.9 2.0
k k
Figure 46.4 Variations of the reflection and transmission characteristics of

the sample of Figure 46.2 at k ¼ 633 nm, h ¼ 60 , when the film’s absorption
coefficient k is varied from 1.5 to 2. The changes are relative to the nominal
values obtained with k ¼ 1.75.
5 (a) 5 (b)
Δ(frp– frs)
4 4
ΔTp
3 3
ΔRp
Angle Variation (degrees)
2 ΔTs 2
Δcr
1 ΔRs 1 Δ(ftp– fts)
0 0 Δct
–1 –1
–2 –2
–3 –3
–4 –4
–5 –5
20 22 24 26 28 30 20 22 24 26 28 30
Thickness (nm) Thickness (nm)
Figure 46.5 Variations of the reflection and transmission characteristics of the

sample of Figure 46.2 at k ¼ 633 nm, h ¼ 60 , when the film thickness d is varied
from 20 nm to 30 nm. The changes are relative to the nominal values obtained
with d ¼ 25 m.
46 Ellipsometry 637
the undesirable E-field component to pass through; the quarter-wave plate’s

retardation deviates from 90 , and the beam that illuminates the sample is not an
ideal plane wave but has a finite diameter. Moreover, when the beam is focused
on the sample to provide a reasonable spatial resolution, the focused cone of light
contains a range of incidence angles, resulting in measured values that are
averages over these angles. One consequence of such system imperfections is
that, in the “null condition,” a minimum amount of light would still reach the
detector. Another consequence is the limited accuracy with which the various
reflection and transmission characteristics of the sample are measured.
Performance of the nulling ellipsometer

For the system depicted in Figure 46.1 we show in Figure 46.6 computed plots of
the detector signal S versus the angle qa of the analyzer for several values of the
polarizer angle qp. The assumed focusing and collimating lenses are identical,
having NA ¼ 0.025, which corresponds to a 3 focused cone at the sample.
1.1 1.1
(a) (b) rp = 54°
rp = 47°
1.0 1.0 39°
0.9 32° 0.9 32°
24°
0.8 17° 0.8
17°
Detector Signal
0.7 0.7
9°
0.6 2° 0.6
2°
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
0 45 90 135 180 0 45 90 135 180
Analyzer angle ra (degrees) Analyzer angle ra (degrees)
Figure 46.6 The detector signal S versus the orientation angle qa of the
analyzer in the nulling ellipsometer of Figure 46.1 with the sample of Figure
46.2. Different curves correspond to different values of the polarizer angle
qp. The total optical power of the unpolarized (or circularly polarized) beam
emerging from the source is unity, the detector’s conversion factor is 4,
the incidence angle is h ¼ 60 , and the focusing and collimating lenses have
NA ¼ 0.025. In (a) the assumed system is perfect. In (b) there are departures from
ideal behavior, namely, the polarizer and analyzer have a 1:100 extinction ratio,
the angle of incidence deviates by 1 , and the quarter-wave plate’s retardation
is 87 while its axes are 1 away from the ideal 45 orientation.
In Figure 46.6(a) the assumed system is perfect, while in Figure 46.6(b) errors are
incorporated into the various components, namely, the assumed polarizer and
analyzer have a 1:100 extinction ratio, the angle of incidence on the sample is
h ¼ 61 , and the QWP’s retardation is 87 while its axes are 1 away from the
ideal 45 orientation.
The null in Figure 46.6(a) is achieved with qp ¼ 47 and qa ¼ 32.2 , yielding
rp rs ¼ 4 and wr ¼ 32.2 , as expected. Also the detector signals at qa ¼ 0 and
90 are 0.296 and 0.748, which correspond to the correct values of Rp and Rs. In
practice, even in this ideal case with perfect components the exact location of the
null may not be easy to determine. This produces a certain degree of inaccuracy,
depending on the available signal-to-noise ratio at the detector. In the case of
Figure 46.6(b), where the assumed components have substantial errors, the
minimum signal occurs at qp ¼ 54 and qa ¼ 30 , yielding rp rs ¼ 18 and
wr ¼ 30 . The reflectivities in this case (obtained at qp ¼ 9 , and qa ¼ 0 and 90 ) are
Rp ¼ 0.308, Rs ¼ 0.727. If we consider the sensitivity curves in Figures 46.3–46.5,
such huge errors are clearly unacceptable.
A more realistic situation might correspond to small system errors; suppose,
for instance, that the polarizer and the analyzer have extinction ratios of 1:1000,
the angle of incidence on the sample has a 0.25 error (h ¼ 60.25 ), and
the QWP’s retardation is 90.5 while its axes are misaligned by only 0.25 .
In this case the minimum signal occurs at qp ¼ 49 and qa ¼ 31.8 , yielding
rp rs ¼ 8 and wr ¼ 31.8 . The reflectivities (obtained at qp ¼ 4 , and qa ¼ 0
and 90 ) are Rp ¼ 0.291 and Rs ¼ 0.757. It is thus clear that the nulling ellipso-
meter requires a high degree of accuracy in its components in order to achieve a
reasonable level of confidence in its estimates of sample parameters.
Ellipsometry with a variable retarder

Figure 46.7 shows a different kind of ellipsometer, consisting of a fixed polarizer,
a variable retarder (e.g., a liquid crystal cell or a photoelastic modulator), and a
fixed differential detection module. None of these components needs to be rotated
or otherwise adjusted during measurements. The variable retarder provides a
range of polarization states at the sample. For instance, the incident beam is
p-polarized when the retardation D is 0 , circularly polarized when D ¼ 90 ,
and s-polarized when D ¼ 180 . The detection module consists of a Wollaston
prism with transmission axes fixed at 45 to the plane of incidence, followed by
a pair of identical photodetectors.
When the relative phase D imparted by the retarder to the incident beam is
continuously varied from 0 to 360 , the sum signal S1 þ S2 oscillates between a
maximum and a minimum value; these correspond to Rp and Rs, although not
46 Ellipsometry 639
S1
S2
ors
ect
det
oto
Ph
pri ston
sm
lla
Wo
s
Len
X
45°
u
Z
Y Light source Polarizer Variable Lens

retarder Sample
Figure 46.7 Diagram of an ellipsometer based on a variable retarder and a

differential detection module. The beam emerging from the polarizer is colli-
mated and linearly polarized along the X-axis. The variable retarder’s axes are
fixed at 45 to the XZ-plane of incidence, while its phase is varied continuously
from 0 to 360 . The light beam is focused on the sample through a low-NA lens,
and the reflected beam is recollimated by an identical lens in the reflection path.
The reflected beam is monitored by a differential detector consisting of a
Wollaston prism (oriented at 45 to the plane of incidence) and two identical
photodetectors. The sum of the detector signals S1 þ S2 contains information
about the sample reflectivities Rp and Rs, while their normalized difference
(S1 S2)/(S1þS2) yields the relative phase rp rs.
necessarily in that order. At the same time, the normalized difference signal (S1 S2)/
(S1þS2) exhibits a peak-to-valley variation equal to 2 sin(rp rs). The system of
Figure 46.7 does not provide an independent measure of the other ellipsometric
parameter, wr. However, since Rp and Rs are directly measurable, wr is redundant.
In operating the system of Figure 46.7 it is not necessary to know the time-
dependence of the retardation D, nor in fact does one need to know the specific
value of D at any point during the measurement. The maximum and minimum
values of the sum signal and of the normalized difference signal contain all the
necessary information. Unlike the nulling ellipsometer, this system does not
require any adjustment of angles around a broad minimum; therefore, there is
much less uncertainty about the measured data points.
For the ideal system depicted in Figure 46.7, Figure 46.8(a) shows computed
plots of the sum signal and the normalized difference signal versus the retardation
D. The maximum and minimum values of the sum signal are 0.748 and 0.296,
0.8 (a) 0.8 (b)
0.7 S 1 + S2 0.7
S1 + S2
0.6 0.6
Sum and Difference Signals
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
(S1 – S2)/(S1 + S2) (S1 – S2)/(S1 + S2)
0.1 0.1
0.0 0.0
–0.1 –0.1
0 90 180 270 360 0 90 180 270 360
Retardation (degrees) Retardation (degrees)
Figure 46.8 Computed plots of the sum and (normalized) difference signals in
the system of Figure 46.7 for the sample shown in Figure 46.2. The horizontal
axis depicts the relative phase imparted to the beam by the variable retarder. The
beam emerging from the polarizer has unit optical power, the detectors’ con-
version factor is unity, the incidence angle is h ¼ 60 , and the focusing and
collimating lenses have NA ¼ 0.025. (a) The assumed system is perfect. (b) Two
instances (solid lines, broken lines) of imperfect system behavior.
corresponding to Rs and Rp. The normalized difference signal has a peak-to-

valley variation of 0.1375, yielding rp rs ¼ 3.94 .
In Figure 46.8(b) we have assumed some imperfection in the system components.
Two cases are examined, one leading to the solid curves and the other to the broken
curves. In the former case the polarizer’s extinction ratio is 1:1000, the retarder axes
are misaligned by 1 , the Wollaston prism has a 1:100 leak ratio between its two
channels, and the angle of incidence h is in error by 0.5 . From the computed sum
and difference signals Rp ¼ 0.290, Rs ¼ 0.750, and rp rs ¼ 4.23 . In the case of
the broken curves in Figure 46.8(b) the assumed imperfections are large. Here the
polarizer’s extinction ratio is 1:100, the retarder’s orientation angle is 43 , the angle
of incidence is h ¼ 60.5 , and the Wollaston prism leaks 2% of the wrong polar-
ization into each channel. From the computed sum and difference signals the values
of Rp ¼ 0.290, Rs ¼ 0.749, and rp rs ¼ 4.6 are obtained. Obviously, the system
of Figure 46.7 is quite tolerant of imperfections and misalignments; therefore, it is
suitable for accurate determination of the sample parameters.

1 A. Rothen, The ellipsometer, an apparatus to measure thicknesses of thin surface
films, Review of Scientific Instruments 16, 26–30 (1945).
46 Ellipsometry 641
2 A. B. Winterbottom, Optical methods of studying films on reflecting bases depending
on polarization and interference phenomena, Trans. Faraday Society 42, 487–495
(1946).
3 R. H. Muller, Definitions and conventions in ellipsometry, Surface Science 16, 14–33
(1969).
4 R. M. A. Azzam and N. M. Bashara, Ellipsometry and Polarized Light, North-Holland,
Amsterdam, 1977.
5 R. M. A. Azzam, Ellipsometry, chapter 27 in Handbook of Optics, Vol. 2, McGraw-Hill,
New York, 1995.
6 O. S. Heavens, Optical Properties of Thin Solid Films, Butterworths, London, 1955.
47
Holography and holographic interferometry
Dennis Gabor (1900–1979). His life-long love of physics started at the age
of 15. Fascinated by Abbe’s theory of the microscope and by Lippmann’s
method of color photography, he and his brother built up a home laboratory
and began experimenting with X-rays and radioactivity. Gabor entered the
Technische Hochschule Berlin and acquired a diploma in 1924 and an
electrical engineering doctorate in 1927. His thesis work involved the
development of high-speed cathode ray oscillographs, in the course of which
he built the first iron-shrouded magnetic electron lens. In 1927 he joined
Siemens & Halske AG, where he invented a high-pressure quartz mercury
lamp, since used in millions of street lamps. With the rise of Hitler in 1933,
Gabor left for England and obtained employment with the British firm
Thomson–Houston. At Thomson–Houston he developed a system of stereo-
scopic cinematography, and in his last year there carried out basic experi-
ments in holography. In 1949 he joined the Imperial College of Science and
Technology (London) and remained there as Professor of Applied Electron
Physics until his retirement in 1967. (Photo: courtesy of AIP Emilio Segré
Visual Archives, W. F. Meggers Collection.)
642
Holography dates from 1947, when the Hungarian-born British scientist Dennis
Gabor (1900–1979) developed the theory of holography while working to
improve electron microscopy.1,2 Gabor coined the term “hologram” from the
Greek words holos, meaning whole, and gramma, meaning message. The 1971
Nobel prize in physics was awarded to Gabor for his invention of holography.
Further progress in the field was prevented during the following decade
because the light sources available at the time were not truly coherent. This
barrier was overcome in 1960, with the invention of the laser. In 1962 Emmett
Leith and Juris Upatnieks of the University of Michigan recognized, from their
work in side-looking radar, that holography could be used as a three-dimensional
visual medium. They improved upon Gabor’s original idea by using a laser and
an off-axis technique.3 The result was the first laser transmission hologram of
three-dimensional objects. The basic off-axis technique of Leith and Upatnieks
is still the staple of holographic methodology. These transmission holograms
produce images with clarity and realistic depth, but require laser light to view the
holographic image.
The Russian physicist Uri Denisyuk combined holography with Lippmann’s
method of color photography. In 1962 Denisyuk’s approach produced a white-light
reflection hologram, which could be viewed in the light from an ordinary light bulb.
In 1968 Stephen Benton, then at Polaroid corporation, invented white-light trans-
mission holography.4,5,6 This type of hologram can be viewed in ordinary white
light and is commonly known as the rainbow hologram. These holograms, which
are “printed” by direct stamping of the interference pattern onto plastic, can be
mass produced rather inexpensively.7
Basic principles
A setup for recording a simple transmission hologram is shown in Figure 47.1.
The coherent beam of the laser, after being expanded to cover the area of interest,
is split into an object beam and a reference beam. The object beam passes through
(or reflects from) the object before arriving at the photographic plate; the refer-
ence beam is directed toward the photographic plate at an oblique angle h. At the
XY-plane of the plate the complex-amplitude distribution of the object beam is
AO(x, y). The reference beam’s amplitude, AR(x, y), is proportional to exp[i(2p/k)
(xSx þ ySy)], where Sx and Sy are the direction cosines of the beam. The two
beams interfere at the plate, upon which their interference fringes are recorded.
When the plate is properly processed and developed, its amplitude transmissivity
s(x, y) becomes proportional to the incident intensity pattern, that is,8,9,10
sðx; yÞ ¼ Iðx; yÞ ¼ jAO ðx; yÞ þ AR ðx; yÞj2 : ð47:1Þ

Object beam
Object
Beam-splitter
Z
Laser
Beam expander
Photographic
plate
am
be
nce
f ere
Re
Mir
ror
Figure 47.1 The basic optical system used for recording a simple hologram.
The laser beam is expanded to accommodate the size of the object. The beam-
splitter separates a fraction of the light to be used as a reference beam and sends
it along a path that reaches the photographic plate at an oblique angle. The rest of
the beam continues along the Z-axis, interacts with the object, and arrives at the
photographic plate while carrying the phase/amplitude information about
the object. The two beams interfere and the plate records the resulting fringes of
the interference pattern. The film is subsequently developed into a positive (or
negative) transparency and becomes a permanent record of the object wave.
To reconstruct the object wave, the developed plate is returned to its original
position and illuminated with the reference beam, as shown in Figure 47.2. The
transmitted beam’s complex amplitude may thus be written
Aðx; yÞ ¼ sðx; yÞAR ðx; yÞ

¼ fjAO ðx; yÞj2 þ jAR ðx; yÞj2 gAR ðx; yÞ
þ jAR ðx; yÞj2 AO ðx; yÞ þ A2R ðx; yÞA
O ðx; yÞ: ð47:2Þ
Note in the above equation that jAR(x, y)j is a constant, independent of x and y, and
that A2R(x, y) is a plane wave with direction cosines 2Sx and 2Sy. (When h is small,
the propagation direction of this plane wave makes an angle 2h with the Z-axis.)
Thus in addition to the reference beam AR(x, y) – which is modulated by the squared
modulus of the object wave – the wavefront emerging from the hologram contains
the original object wave AO(x, y), as well as its complex conjugate A*O(x, y). The
reconstructed object wave travels in its original direction (i.e., along the Z-axis, in the
case of Figure 47.2), but the conjugate wave rides on a plane wave whose deviation
angle from the Z-axis is nearly twice that of the original reference beam.
Hologram
Beam-splitter
Z
Laser
Beam expander Reconstructed

wavefront
n
tio
ruc
nst
Mir e co
ro R am
r be
Figure 47.2 To reconstruct the recorded wavefront one places the hologram in
front of the same reference beam as used for recording. Upon transmission
through the hologram several reconstructed waves emerge. If the hologram is in
the same position as it was during recording, the virtual image of the object will
be carried by the component of the emergent beam traveling along the Z-axis.
However, if the hologram is flipped then a real image of the object emerges
along the Z-axis. (The flipping is such that the reconstruction beam becomes the
conjugate of the original reference beam with respect to the hologram.)
Behind the hologram, the reconstructed object wave yields the virtual image of
the recorded object; this image may be viewed through the lens of an eye or
photographed through the lens of a camera. The conjugate wave yields a real
image of the object, which can be visually inspected or photographed by placing
a photographic plate directly in its path. The transmitted portion of the recon-
struction beam itself does not carry any useful information and is generally
ignored.
Hologram of a simple phase-amplitude object

As an example, consider the phase–amplitude object shown in Figure 47.3. The
featureless areas of the face are transparent to the incident light, but the eyes,
nose, and mouth alter both the amplitude and the phase of the beam. The eyes are
partially transmissive depressions with a 50% transmittance and a maximum
phase depth of 5p at the center. The nose and the mouth are also 50% trans-
missive, but they are raised above the surface of the face and their corresponding
phase depth at the center is 5p. Figure 47.3(a) shows the pattern of transmitted
intensity for a uniform incident beam. Figure 47.3(b), an interferogram between
the beam transmitted through the face and a collinear plane wave, shows the
fringes caused by the phase modulation imparted to the beam by the various
features of the face.
a b
–105 x/ 105 –105 x/ 105
Figure 47.3 This face is a partially transmissive phase/amplitude object. The

intensity pattern shown in (a) is obtained when the face is illuminated by a
coherent, collimated, and uniform laser beam (i.e., a plane wave). The amplitude
transmission coefficient of the facial features (eyes, nose, mouth) is 0.7. The
interferogram in (b) is obtained when the transmitted beam is made to interfere
with a plane wave. The features of the face modulate the phase of the transmitted
beam in a continuous fashion by an amount that rises to 5p at the center of the
eyes and falls to 5p at the center of the nose and the mouth.
When a plane wave (wavelength k) is transmitted through the face at z ¼ 0 and

propagated to a photographic plate at z ¼ 3500k, one obtains the intensity and phase
distributions shown in Figures 47.4(a), (b), respectively. Figure 47.4(c) is the
interference pattern formed with a reference plane wave traveling at an oblique angle
h ¼ 8 . The photographic plate is exposed to this interference pattern and subse-
quently developed into a positive transparency, that is, one in which the amplitude
transmissivity is proportional to the incident intensity distribution during exposure.
This transparency is a coherent-light hologram of the face.
Note in Figure 47.4(c) that the chosen diameter of the reference beam is
not large enough to cover the regions of the object wave far away from the
Z-axis. This is simply due to the limited computer memory available for
these calculations and is not a limitation in holography. Whereas in practice
the reference beam is usually large enough to record all significant spatial
frequencies of the object onto the hologram, in the present calculations the
small diameter of the reference beam limits the range of admissible spatial
frequencies, resulting in the loss of fine detail in the reconstructed images of
the original object.
When the developed hologram is placed in the system of Figure 47.2 and
illuminated with the reconstruction beam, the original object wave and its
complex conjugate appear among the transmitted waves, in accordance with
Eq. (47.2). Figure 47.4(d) shows the transmitted intensity pattern immediately
behind the hologram. At this point the overlapping components of the emergent
a b
c d
–210 x/ 210 –210 x/ 210
Figure 47.4 A plane wave traveling along the Z-axis and transmitted through
the face at z ¼ 0 arrives at the photographic plate at z ¼ 3500k. (a) Distribution of
the logarithm of intensity of the object wave at the plate. (b) Object wave’s
phase distribution at the plate. (c) Interference pattern (logarithm of intensity)
between the object wave and a reference plane wave traveling at h ¼ 8 relative
to the Z-axis. (d) Distribution of the logarithm of intensity immediately after the
hologram, when the exposed plate is developed into a positive transparency and
placed in front of the reconstruction beam.
beam are all mixed together and, therefore, difficult to identify separately. Since
these components are traveling in different directions, propagation over a short
distance is all that is required to disentangle them from each other.
Holographic images of the recorded object

When the above hologram is placed in the same position as during recording and
illuminated with the same reference beam (now called the reconstruction beam)
one obtains, at z ¼ 3500k behind the hologram, the reconstructed intensity and
phase patterns of Figures 47.5(a), (b). The central region of this figure contains
the reconstructed object wave, AO(x, y), carrying the virtual image of the face.
The transmitted fraction of the reconstruction beam – modulated by the squared
modulus of the object wave – appears to the right and above the central region.
a b
–1000 x/ 1000 –1000 x/ 1000
c d
–105 x/ 105 –105 x/ 105
Figure 47.5 The reconstructed wavefront at z ¼ 3500k behind the hologram.

The incident beam is the same as the reference beam used in creating the
hologram. (a), (b) Distributions of the logarithm of intensity and the phase over
the entire reconstructed field. The central region of this field carries a virtual
image of the face. (c), (d) Distributions of the logarithm of intensity and the
phase in the image plane of a unit-magnification lens that captures the central
portion of the field and creates a real image of the face from the reconstructed
object wave.
The real image of the face – produced by the conjugate wave A*O(x, y) – is shifted
further off-axis, and appears in the upper right corner of Figure 47.5(a).
Holographic reconstruction produces not only the amplitude of the original
object but also its phase pattern, as is evident from Figure 47.5(b). Unlike regular
photography, which maintains a record of the intensity profile but loses all trace
of phase, the holographic process preserves both the amplitude and the phase
information, and faithfully reproduces the entire object wave upon reconstruction.
A comparison of the central regions of Figures 47.5(a), (b) with the original object
wave of Figures 47.4(a), (b) might be worthwhile here, although one should note
that the reconstructed wave in Figures 47.5(a), (b) is captured at an effective distance
of 7000k from the original object, whereas the patterns of Figures 47.4(a), (b)
correspond to a propagation distance of only 3500k.
To observe the virtual image, one should place an imaging lens in the central
region of the field and produce a real image from the reconstructed object
wave. (Alternatively, one could propagate the reconstructed object wave
backwards in space by 7000k to reproduce the object wave at its point of
origination.) A one-to-one imaging lens (NA ¼ 0.04, f ¼ 3500k) placed in the
central region of Figures 47.5(a), (b) will create an inverted real image of the
face at z ¼ 7000k behind the lens. The resulting intensity and phase patterns
are shown in Figures 47.5(c), (d). The loss of resolution due to the small size of
the hologram is visible at the edges of the various facial features, from which the
high-spatial-frequency content of the original face is obviously missing (compare
with Figure 47.3(b)).
If the hologram is flipped during playback, the reconstruction beam, being a
plane wave in this example, becomes the conjugate of the original reference
beam, namely, A*R(x, y). (Alternatively, the reference beam may be conjugated
and brought in from the opposite side of the hologram.) Under such circum-
stances the transmitted wave along the original direction of the object wave (i.e.,
the Z-axis in the present example) becomes the conjugated object wave, A*O(x, y),
and the reconstructed object wave moves off-axis. This situation is depicted in
Figure 47.6, where, after propagating 3500k beyond the hologram, the various
components of the transmitted beam have separated from each other. The
intensity distribution in Figure 47.6(a) reveals at the center the real image of the
face, slightly to the lower left the directly transmitted reconstruction beam, and
close to the lower left corner the beam containing the virtual image. There is also
a weaker image of the face on the right-hand side of the real image; this “second
harmonic” of the face is created by the nonlinearity of the photographic process.
Figures 47.6(c), (d) are close-ups of the intensity and phase patterns in the real
image produced by the conjugated object wave.
Holographic interferometry
Suppose the face shown in Figure 47.3 is somehow distorted at a later time or has
undergone changes in its optical properties such that the beam transmitted
through the face has acquired a certain degree of phase modulation. To render
this phase modulation visible by converting it to intensity variations, it is
necessary to interfere the beam transmitted through the face with a reference
beam. If a collinear plane wave is chosen as reference, the resulting interferogram
will resemble that in Figure 47.7(a). Here the deformation contours appear as
black and white fringes superimposed on the face. One can also see in this figure
the fringes caused by the phase structure of the facial features, namely, the eyes,
the nose, and the mouth.
a b
–1000 x/ 1000 –1000 x/ 1000
c d
–105 x/ 105 –105 x/ 105
Figure 47.6 The reconstructed wavefront at z ¼ 3500k behind the hologram.

The incident beam is the conjugate of the reference beam used in creating the
hologram. (a), (b) Distributions of the logarithm of intensity and the phase over
the entire reconstructed field. (c), (d) Close-ups of the central region of the
reconstructed field, showing the logarithm of intensity and the phase distribution
of the real image of the face.
a b
–105 x/ 105 –105 x/ 105
Figure 47.7 Two interferograms of the distorted face. In (a) the reference
beam is a plane wave, whereas in (b) the distorted face is made to interfere with
its own undistorted version.
An alternative “reference beam” is provided by the original, undistorted wave

from the face itself. If the wave transmitted through the distorted face is made to
interfere with that from the original face, the resulting fringe pattern will look like
that in Figure 47.7(b). Here the features of the face itself do not appear in the
interferogram; only the distortion fringes are visible. This is a clear advantage, of
course, because one is usually interested in the changes induced in the object, not
in the features of the object itself. The problem in most cases, however, is that the
distorted and the undistorted objects are not simultaneously available and,
therefore, creating an interferogram between the two using traditional methods of
interferometry is not a viable option.
Holographic interferometry provides a solution to this problem by allowing the
original wavefront, while still available, to be stored on a photographic plate.
Later, when the object is distorted, a second recording of its wavefront is made;
then the two wavefronts are reconstructed and allowed to interfere with each
other. Interestingly enough, these two recordings can be made on the same
photographic plate by double exposure. Moreover, the two wavefronts are
automatically superimposed during reconstruction.10 The essential idea behind
holographic interferometry may be readily grasped by reference to Eqs. (47.1)
0
and (47.2) above. If the distorted wavefront is denoted by A O(x, y), it is clear
that, upon reconstructing the double exposure hologram, the emergent object
0
wave will be AO(x, y) þ AO(x, y), while the emergent conjugate wave will be
0
A*O(x, y) þ A *O(x, y). In this way both the virtual image and the real image show
fringe patterns corresponding to contours of constant phase shift between the
original object and its distorted version.
Figures 47.8(a), (b) show the intensity and phase patterns at the photographic
plate corresponding to the distorted face. When this beam is combined with a
reference plane wave traveling at 8 to the Z-axis, the fringe pattern of Figure 47.8(c)
is obtained. This fringe pattern is recorded on the same film that had previously
recorded the hologram of the original face. When the resulting double-exposure
hologram is developed into a positive transparency and placed in front of the
reconstruction beam, the intensity distribution of Figure 47.8(d) appears immedi-
ately behind the hologram.
Assuming that the reconstruction beam is the conjugate of the reference beam
used in recording both holograms, the emergent beam along the Z-axis will be the
0
conjugate of the combined object waves, namely, A*O(x, y) þ A *O(x, y). The
intensity and phase patterns in Figures 47.9(a), (b) are obtained after propagating
the emergent beam a distance of 3500k beyond the hologram. The fringe pattern
caused by the distorted face is clearly visible in this holographic interferogram.
In an ideal situation, where the hologram is large enough to capture all sig-
nificant spatial frequencies of both object waves, the features of the original
a b
c d
–210 x/ 210 –210 x/ 210
Figure 47.8 A plane wave traveling along the Z-axis and transmitted through
the distorted face at z ¼ 0 arrives at the photographic plate at z ¼ 3500k. In this
double-exposure experiment a hologram of the undistorted face has already been
recorded on the plate. (a) Logarithmic plot of the object wave’s intensity distri-
bution at the plate. (b) The object wave’s phase distribution at the plate. (c) Pattern
of interference between the object wave and a reference plane wave traveling
at h ¼ 8 relative to the Z-axis. (d) Distribution of the logarithm of intensity
immediately after the hologram, when the twice-exposed film is developed into a
positive transparency and placed in front of the reconstruction beam.
object will be invisible in the interferogram. However, in these calculations, the

hologram is of necessity small and, therefore, the features are not completely
absent from the final image. In any event, if the reference beam is large enough to
capture the high-spatial-frequency content of the object waves, the interferogram
of Figure 47.9(a) will approach the ideal one shown in Figure 47.7(b).
Real-time interferometry using a holographic image

If a hologram of an object in a given state is made, the reconstructed image can be
made to interfere in real time with the “live” images of the same object in
different states. Hence deformations that are dynamic in nature can be observed
a b
–105 x/ 105 –105 x/ 105
Figure 47.9 The reconstructed wavefront at z ¼ 3500k behind the double-

exposure hologram, showing the interference pattern between the real images of the
distorted and undistorted face. The incident beam is the conjugate of the original
reference beam used in both exposures, and the component of the reconstructed
wave traveling along the Z-axis carries the real images. (a) Logarithmic plot of
intensity and (b) plot of phase distribution over the area of the real image.
directly. This also provides a natural and very sensitive method of aligning the
hologram to the original position after it has been removed for processing.

1 D. Gabor, A new microscopic principle, Nature 161, 777–778 (1948).
2 D. Gabor, Microscopy by reconstructed wavefronts, Proc. Roy. Soc. London A 197,
454–487 (1949).
3 E. N. Leith and J. Upatnieks, Reconstructed wavefronts and communication theory,
J. Opt. Soc. Am. 52, 1123–1130 (1962).
4 S. A. Benton, Hologram reconstruction with extended incoherent sources, J. Opt.
Soc. Am. 59, 1454A (1969).
5 S. A. Benton, The mathematical optics of white light transmission holograms,
in Proceedings of the First International Symposium on Display Holography, ed.
T. H. Jeong, Lake Forest College, July 1982.
6 S. A. Benton, Survey of holographic stereograms, in Processing and Display of
Three-Dimensional Data, SPIE 367, 15–19 (1983).
7 The introductory section is adapted from Holophile, Inc.’s website at www.
holophile. com.
9 P. Hariharan, Optical Holography, Cambridge University Press, UK, 1984.
10 C. M. Vest, Holographic Interferometry, Wiley, New York, 1979.
48
Self-focusing in nonlinear optical media†
Self-focusing and self-trapping in nonlinear optical media were discovered

soon after the invention of the laser in the early 1960s.1,2,3,4,5 These phenomena
provided an explanation for the appearance of hot spots and associated optical
damage in media irradiated by high-power laser pulses. The very high intensities
achievable with the laser made it possible to observe these and other nonlinear
effects, which depend upon the change in refractive index of the medium in
response to the local electric field intensity.
The physics of optical nonlinearity

In a medium exhibiting third-order nonlinearity, the index of refraction n depends
on the local E-field intensity I(x, y, z) as follows:2
nðx; y; zÞ ¼ n0 þ n2 Iðx; y; zÞ: ð48:1Þ
Here n0 is the medium’s background index of refraction (observed at low optical

intensities) and n2 is the nonlinear coefficient of the material. Whereas n0 is a
dimensionless quantity, the nonlinear coefficient n2 has inverse intensity units, i.e.
units of area/power. Several physical mechanisms can cause the refractive index of
a given medium to depend on the E-field intensity; notable among them are the
anharmonic motion of electrons in crystals, electrostriction, and the molecular
orientation known as the Kerr effect.2 Electrostriction is caused by the volume
force of an inhomogeneous electric field within a dielectric medium. The volume
force draws the material into the high-field region, increasing its local density and,
consequently, its refractive index. Optical glasses such as fused silica exhibit both
electronic and electrostrictive nonlinearities, their n2-values being in the range
5 · 1016 to 5 · 1015 cm2/W. The Kerr effect is observed in materials whose
†
The coauthor of this chapter is Ewan M. Wright of the College of Optical Sciences, University of Arizona.
654
molecules possess anisotropic polarizability and so tend to be aligned by the

E-field, thus causing a change in the local refractive index. The liquid carbon
disulfide (CS2), which has a fairly large n2-value, 2.6 · 1014 cm2/W, is a good
example of this class of materials.
When n2 is positive, the index of refraction in regions of high intensity tends to
be larger than that in regions where the E-field is weak. Consequently, for an
initially collimated and localized beam profile (such as a Gaussian), the wave-
front propagating through the medium develops a phase pattern that resembles
the curvature of a converging beam. While diffraction effects tend to broaden the
cross-section of the beam, wavefront curvature – caused by nonlinearity –
attempts to pull the beam towards regions of higher intensity. As long as the
nonlinear effect is weak, diffraction predominates; however, as one increases the
beam’s power a point is reached where the tendency of the beam to become
focused balances the effects of diffraction. The beam can then propagate over
long distances without any noticeable expansion or contraction. Physically, the
field has built an effective waveguide for itself, which enables it to propagate
without spreading. This phenomenon, known as self-trapping, occurs at
the critical input power Pcr ¼ 0.146k2/(n0n2). Typical values of Pcr are 33 kW for
CS2 at k ¼ 1 lm, and 0.2–2 MW for common optical glasses in the visible and
near-infrared range. Self-trapping is inherently unstable and is readily destroyed
by slight perturbations of the wavefront; nonetheless, it is possible to arrange
well-controlled experiments to demonstrate the phenomenon.
If the laser power is further increased beyond the threshold of self-trapping, the
phenomenon known as self-focusing collapse is observed. In this case, not only
does the nonlinear effect counter the natural tendency of the beam to diverge but
also it forces the beam to collapse under its own weight and come to a sharp focus
(a singularity, in the approximate paraxial theory) within a finite distance.3
Further increases in laser power break up the beam into multiple filaments, each
of which carries enough power to exhibit self-focusing in its own right.
Our goal in this chapter is to demonstrate some interesting examples of self-
focusing in nonlinear media, both to elucidate the fundamental physics and to
highlight the key effects produced by self-focusing in bulk media.
Gaussian beam profile

Figure 48.1(a) shows the distribution of intensity in the cross-section of a Gaussian
beam of wavelength k and having a 1/e radius 1000k. The beam is linearly
polarized along the X-axis and propagates along the Z-axis. The beam’s waist is at
z ¼ 0, so the phase distribution in this plane is uniform over the beam’s cross-
section. The full-width at half-maximum (FWHM) intensity of the beam at the
1100
a b
y/
–1100
–1100 x/ 1100 –1100 x/ 1100
Figure 48.1 Plots of intensity distribution for (a) the X-component and (b) the
Z-component of polarization. These plots represent the cross-section of a Gaussian
beam having a 1/e radius 1000k.
waist equals 1177k, and its peak intensity Imax ¼ 0.64I0. Here I0 is an arbitrary scale
factor used to normalize all intensity profiles throughout this chapter.
The beam cannot satisfy Maxwell’s equations unless it has a component of
polarization Ez along the Z-axis; the computed intensity profile jEzj2 for this
Z-component is shown in Figure 48.1(b). For a beam whose cross-section is
substantially larger than a wavelength, the power content of Ez is typically much
less than that of Ex. For example, in the present case the fraction of the total
optical power carried by the Z-component is only 0.25 · 107. We will see below
that Ez gains in strength as the beam converges towards focus.
Self-focusing by transmission through a thin slab

Consider the transmission of the Gaussian beam depicted in Figure 48.1 through a
thin slab of transparent material. (By thin we mean that the medium thickness is
much less than the Rayleigh range of the incident beam, so that diffractive effects
in the medium may be neglected.) Let the thickness d and the nonlinear coeffi-
cient n2 of the slab be chosen to yield a phase shift D ¼ 2pn2Imaxd/k ¼ 10p at the
beam center, where the intensity is at its peak. Upon transmission through the
slab the beam acquires the intensity and phase distributions shown in Figure 48.2.
The distribution of jExj2 in Figure 48.2(a) is the same as that in Figure 48.1(a), but
the intensity profile of Ez in Figure 48.2(b) is somewhat different from that in
Figure 48.1(b). The fractional power of the Z-component is now 111 · 107,
which, small as it may be, is substantially greater than the corresponding value
before entering the nonlinear medium. This behavior may be understood by
observing that the emergent beam has acquired a fairly large curvature and,
consequently, its polarization vector has bent further toward the Z-axis.
a b
c d
–1100 x/ 1100 –1100 x/ 1100
Figure 48.2 The beam of Figure 48.1 goes through a thin slab of a nonlinear
material, creating a change in the index of refraction in proportion to its
intensity. At the center, where the beam is brightest, the self-induced phase
shift is 10p. The intensity of the beam upon emerging from the slab is shown in
(a), (b), and its phase distribution in (c), (d). The plots on the left-hand side
correspond to the X-component of polarization, while those on the right-hand
side represent the Z-component.
Figure 48.2(c) shows the phase profile of the emergent wavefront for the
X-component of polarization. The gray-scale ranges from p (black) to p (white),
and the number of rings indicates a total phase shift of 10p from the center to the
rim. The phase profile for the Z-component of polarization in Figure 48.2(d)
shows, in addition to the curvature, a p phase shift between the right and left
halves of the beam. Again this is a simple geometrical consequence of the
bending of the rays toward the optical axis.
The above example clearly demonstrates that a nonlinear medium can impart a
curvature phase factor to a beam during transmission. When the curvature is
negative the beam becomes divergent and expands upon further propagation.
Conversely, a positive curvature causes the beam to converge towards a focus.
This is the underlying physical mechanism of self-focusing in thick nonlinear
media, to which we now turn.
Self-focusing through a thick slab

Let us now consider propagation of the Gaussian beam of Figure 48.1 through a
thick slab of a nonlinear material, where the effects of diffraction during
propagation within the medium must be retained. For simulation purposes we
divide the thick slab into 60 thin slabs (in which we place the nonlinearity), and
propagate the beam between pairs of adjacent slabs through a linear medium of
refractive index n0, which fills the gap between the slabs. We choose a separation
of 5000k between adjacent slabs, compute the incident intensity profile at each
slab (using Fresnel’s diffraction formula), and allow the nonlinear medium of
each slab to impart to the beam a phase pattern (x, y) in proportion to the
incident intensity distribution I(x, y). The specific phase shift assumed is 5 at the
reference intensity of I0. The above procedure is repeated 60 times for a total
propagation distance of 300 000k. (This numerical scheme of breaking the
propagation into alternate sections of linear propagation followed by a nonlinear
phase mask is equivalent to the split-step beam propagation method commonly
employed in optics.) For the above choice of parameters the input power is about
20 times greater than the critical power for self-trapping, Pcr, and we expect the
simulation to display self-focusing collapse.3
The results of this simulation appear in Table 48.1 and Figure 48.3. The left-hand
column in the figure shows the cross-sectional profile of jExj2, while the right-hand
column shows the corresponding plots of jEzj2. From top to bottom, the intensity
profiles are obtained after 20, 30, 40, 50, and 60 steps in the simulation. Note that the
beam is converging towards a focus and that Ez is becoming stronger as the beam
gets smaller. The FWHM of the beam drops from 1177k in the beginning to 196k
after 60 iterations. The focusing, of course, is not diffraction-limited, because the
curvature imparted to the beam by the nonlinear medium does not exactly constitute
a spherical wavefront. The departure of the wavefront from perfect sphericity sad-
dles the beam with primary and higher-order spherical aberrations.
It is clear physically that self-focusing collapse cannot proceed indefinitely.
Some mechanisms that can arrest the collapse are saturation of the nonlinear
refractive-index change, nonlinear absorption arising from multi-photon ioniza-
tion, and optical breakdown.
Asymmetric intensity profile and self-deflection

Our next example is similar to the previous one, except that now the beam
launched into the thick nonlinear medium has an asymmetric profile.4 The
asymmetry is produced by blocking off half the incident Gaussian beam.
pffiffiffi To
maintain the total optical power, we multiply the beam’s amplitude by 2, thus
Table 48.1. Various properties of the beam during propagation through a
nonlinear slab. The corresponding intensity profiles are shown in Figure 48.3
Number Fractional power

of steps Imax / I0 of Z-component FWHM (·k)
7
10 0.65 0.29 · 10 1160
20 0.68 0.39 · 107 1103
30 0.76 0.58 · 107 998
40 0.92 0.91 · 107 824
50 1.30 1.52 · 107 545
60 3.10 3.21 · 107 196
preserving the integrated intensity over the clear aperture of the beam. As before,
the distance between adjacent thin slabs is 5000k, and the beam is propagated in
60 steps for a total distance of 300 000k.
Figure 48.4 shows, from top to bottom, the initial half-Gaussian beam as well
as the patterns of intensity distribution within the medium after 20, 40, 50, and 60
propagation steps. Both columns show the profile of jExj2, the intensity distri-
bution being on the left-hand side and its logarithm on the right-hand side.
(The logarithmic plot enhances weak features of the distribution, just like an
over-exposed photograph.)
We note several new features in this example. First, the beam comes to a
focus in the narrow dimension before it collapses in the wide dimension.
Second, the center of the beam shifts to the right as it propagates. This self-
deflection is caused by the prism-like phase factor that the nonlinear medium
imparts to the beam.4 An ideal prism imparts a phase factor that is linear in the
spatial coordinate x, namely, exp(i2prx/k), deflecting the beam by an angle
h ¼ sin1r. One can explain the observed self-deflection in Figure 48.4 by
noting the similarity between the ideal phase factor of a prism and the phase
factor exp[i(x, y)] imposed on the half-Gaussian beam by the nonlinear
medium.
Finally, note in Figure 48.4 that the beam breaks up into multiple branches
after coming to focus. In practice the intensity at the focal point may be large
enough to damage the material. Even if damage does not occur, small material
inhomogeneities can cause substantial aberrations, distorting and breaking up the
beam in unpredictable ways. The fact that computer simulations also show this
type of breakup is due to small numerical errors incurred during computation.
Usually these numerical errors are insignificant, but when the intensity begins to
build up in the vicinity of a focal point, they cause the breakup of the beam in a
random-looking fashion.
–1100 x/ 1100 –1100 x/ 1100
Figure 48.3 Top to bottom: plots of intensity distribution after 20, 30, 40, 50,
60 steps of propagation through a nonlinear medium. The X-component of
polarization is on the left, and the Z-component on the right. The incident beam
is the Gaussian shown in Figure 48.1.
Beam filamentation
As mentioned above, if the beam’s power is large enough the beam breaks up
into many cells, each of which contains several critical powers and comes
independently to focus. Our final example concerns a uniform beam of diameter
–1100 x/ 1100 –1100 x/ 1100
Figure 48.4 Top to bottom: distributions of intensity (left) and logarithm of

intensity (right) after 0, 20, 40, 50, 60 propagation steps through a nonlinear
medium. The incident beam is the Gaussian of Figure 48.1, with its left half
blocked but its intensity doubled to preserve the total power. In units of the
reference intensity I0, the peak intensity Imax starts at 1.28 and increases to 1.51
after 10 steps, 1.95 after 20 steps, 3.24 after 30 steps, 6.67 after 40 steps, and
14.95 after 50 steps, then drops to 7.76 after 60 steps.
2000k with a constant intensity equal to 0.32I0 across the aperture. In this
simulation we placed 40 thin slabs of a nonlinear material at intervals of 15 000k
along the Z-axis. Each slab imparts a phase shift of 15 at the reference intensity of
I0, which is equivalent to an incident optical power of 60Pcr. Shown in Figure 48.5
–1100 x/ 1100 –1100 x/ 1100
Figure 48.5 Top to bottom: distributions of intensity (left) and logarithm of

intensity (right) after 10, 20, 30, 35, 40 propagation steps through a nonlinear
medium. The incident beam is uniform, having a circular cross-section of radius
1000k. In units of the reference intensity I0, the peak intensity Imax starts at 0.32
and then fluctuates as follows: 1.18 after 10 steps, 0.9 after 20 steps, 7.85 after
30 steps, 3.00 after 35 steps, and 6.87 after 40 steps.
are the results of simulation after 10, 20, 30, 35, and 40 steps. At first, as a result of
diffraction during propagation, the beam breaks up into multiple rings. After 30
iterations the central region of the beam comes to a focus. Afterwards, the central
spot goes out of focus, but one of the rings breaks into multiple filaments.5 Small
perturbations are necessary to break up a ring; as mentioned above these are

provided by material inhomogeneities in practice, and by small numerical errors
inherent to computer simulations in these calculations. The number of filaments
depends on the power of the beam as well as on the strength of nonlinearity of the
material.
Concluding remarks
Another mechanism that can couple the refractive index to the beam intensity
profile is absorption of the light followed by heating and thermal diffusion.
Variation of the refractive index in response to thermal expansion (or contraction)
of the material is a frequently observed source of nonlinear optical behavior.
Thermal effects usually produce negative values of n2, thus causing defocusing of
the beam. Heat diffusion further complicates the relation between n(x, y, z) and
I(x, y, z), by removing the local nature of their interdependence. In this chapter we
have confined our attention to the simple case of local nonlinearity with a positive
value for n2 and have shown examples of self-focusing and beam filamentation.
Similar studies can be carried out for thermally induced nonlinearities, provided
that heat diffusion is taken into consideration properly.

1 R. Y. Chiao, E. Garmire, and C. H. Townes, Self-trapping of optical beams, Phys.
Rev. Lett. 13, 479–482 (1964).
2 R. W. Boyd, Nonlinear Optics, chapter 4, Academic Press, Boston, 1992.
3 P. L. Kelley, Self-focusing of optical beams, Phys. Rev. Lett. 15, 1005–1008 (1965).
4 G. A. Swartzlander and A. E. Kaplan, Self-deflection of laser beams in a thin
nonlinear film, J. Opt. Soc. Am. B 5, 765–768 (1988).
5 A. J. Campillo, S. L. Shapiro, and B. R. Suydam, Periodic breakup of optical beams
due to self-focusing, Appl. Phys. Lett. 23, 628–630 (1973); also, Relationship of self-
focusing to spatial instability modes, Appl. Phys. Lett. 24, 178–180 (1974).
49
Spatial optical solitons†
The possibility of self-trapping of optical beams due to an intensity-dependent

refractive index was recognized in the early days of nonlinear optics.1 However,
it was soon realized that in a three-dimensional medium, in which light
diffracts in two transverse dimensions, self-trapping is not stable and leads
to catastrophic collapse and filamentation. Stable self-trapping was then found
to be feasible in two-dimensional media, in which the optical beam diffracts
only in one transverse direction. Subsequently, the connection between self-
trapping and soliton theory,2 and a complete analogy between spatial and
temporal solitons were established. Whereas the formation of temporal solitons
requires a balance between dispersion and nonlinear phase modulation, spatial
solitons owe their existence to the balancing of diffraction with wavefront
curvature induced by the nonlinear refractive index profile of the propagation
medium.
To observe a spatial soliton one must limit diffraction to one transverse dir-
ection, which can be achieved in a planar optical waveguide. The first experi-
ments of this type were conducted using a multimode liquid waveguide (CS2
confined between a pair of glass slides).3 Formation of spatial optical solitons in
single-mode planar glass waveguides was reported shortly afterwards.4
Kerr nonlinearity
The simplest nonlinearity capable of producing self-trapping (leading to soliton
formation in a planar waveguide) is a Kerr nonlinearity, obtained when the
refractive index of the medium has an intensity-dependent term of the form
nðx, y, zÞ ¼ n0 þ n2 Iðx, y, zÞ;
†
This chapter is co-authored with Ewan M. Wright, Professor of Optical Sciences at the University of
Arizona.
664
where I ¼ jEj2 is the electric field intensity of the optical beam. Since diffraction
tends to expand the spatial dimensions of a beam, the requisite nonlinearity must
produce self-focusing, which translates into a positive coefficient n2 for the Kerr
medium. (In contrast, temporal solitons can exist in media having either negative
or positive nonlinear indices, depending on whether the dispersion of the medium
is normal or anomalous.) The only spatial solitons that could exist in media with
negative n2 are dark solitons, which are localized depressions in a cw back-
ground. Although both bright and dark solitons (spatial as well as temporal) have
been observed experimentally, we limit the discussion in this chapter to bright
spatial solitons in planar waveguides.
The beam propagation method (BPM)

We use BPM to simulate the propagation of a beam of light through an optical
waveguide exhibiting nonlinear effects. (See Chapter 32, The beam propagation
method.) The BPM consists of a series of diffractive propagation steps in an
isotropic, homogeneous medium, with each step followed by passage of the beam
through one or more phase masks. The mask(s) impart to the beam’s cross-
section a pattern of phase modulation that accounts for the cumulative effects of
propagation in the inhomogeneous medium of the waveguide. For example, a
mask can represent the phase modulation caused by the differing indices of
refraction of core and cladding or, by becoming dependent on the local intensity
distribution, a mask can mimic the phase modulation induced by the nonlinear
index of the medium. The BPM can thus simulate many systems of practical
interest provided that (i) the propagation steps are sufficiently small and (ii) the
phase masks embody realistic effects of interaction between the beam and the
waveguide during each propagation step.
Slab waveguide with and without nonlinearity

Figure 49.1 is the diagram of a slab waveguide consisting of a guiding layer
of refractive index n1 ¼ 1.5056, sandwiched between cladding layers of index
n0 ¼ 1.50. A Gaussian beam of elliptical cross-section (free-space wave-
length ¼ k0), launched into the guide from the left side, propagates along the
positive Z-axis. The thickness of the guiding layer is assumed to be 5k, where
k ¼ k0 /n0 is the wavelength of the light within the glass medium. In the absence
of nonlinearity, the injected beam spreads in the lateral direction X, but its
diffraction along the Y-axis is arrested by the action of the waveguide.
Figure 49.2 shows cross-sectional plots of the beam’s intensity profile at
various locations along the Z-axis (all spatial dimensions are in units of k). The
X
Input beam
Guiding layer
Figure 49.1 A slab waveguide confines the beam in one spatial dimension (Y),
so that nonlinearity can act in the second dimension (X) to produce self-con-
finement. The incident beam has wavelength k0 in free space. In our simulations
the cladding glass material has refractive index n0 ¼ 1.5, the guiding layer has
index n1 ¼ 1.5056, and the thickness of the guiding layer is 5k, where k ¼ k0 /n0
is the wavelength of the guided beam within the glass medium. The core and
cladding materials have the same nonlinear (Kerr) coefficient n2.
top frame shows the intensity profile of the injected beam upon entering the
waveguide. From top to bottom: z/k ¼ 0, 200, 500, 800. It is seen that the injected
beam initially expands to fill the guiding region in the Y-direction. The beam
subsequently broadens along X as it propagates in the Z-direction, but its width
along Y remains constant.
If a Kerr nonlinearity is introduced in the above waveguide, the broadening
along X will be countered by a self-focusing phase factor imposed on the
propagating beam’s cross-section. The Kerr medium’s refractive index
responds to the light by increasing in proportion to the local intensity, namely,
n(x, y, z) ¼ n0,1 þ n2I(x, y, z), where n2 > 0. Thus the bright central region of
the beam is phase shifted more than its tail sections, which are less bright,
resulting in a lens-like phase pattern that tends to focus the beam towards the
center. If the beam’s intensity is weak, this self-focusing effect will not be
sufficient to counter diffraction broadening. However, once the optical power
density exceeds a certain critical value, the index modulation becomes strong
enough to balance the effects of diffraction, resulting in an unchanging, stable
beam profile along the propagation path.
Figure 49.3 shows computed cross-sectional profiles of intensity along the
Z-axis for the same waveguide and the same injected beam as depicted in
Figure 49.2. The difference is that in the present case the nonlinear index n2 is no
longer zero, but chosen to yield a stable, non-diffracting guided beam. (The peak
intensity reached in this simulation raises the refractive index, locally and
instantaneously, by Dn ¼ 0.0022.) The confined beam is a spatial soliton whose
properties can be readily evinced from Maxwell’s equations in conjunction with
the nonlinear index of the medium. Self-trapping in this one-dimensional case
(along the X-axis) is highly stable, and slight inhomogeneities of the guiding
y/
–5
– 25 x/ 25
Figure 49.2 Confinement and propagation of an elliptically shaped Gaussian

beam through the slab waveguide depicted in Figure 49.1. The top frame shows
the intensity profile of the injected beam upon entering the waveguide. From top
to bottom: z/k ¼ 0, 200, 500, 800. All spatial dimensions are in units of k, the
wavelength of the guided beam in the glass medium. The beam propagation
method (BPM) has been used to obtain these images of the guided beam at
various cross sections. The propagation step-size was Dz ¼ 2.5k, and the 5k-wide
guiding layer was simulated by a rectangular aperture which advanced the phase
of the incident beam by D ¼ 5 in each step.
medium or variations of the input optical power do not destabilize the trapped
beam. (Note that the second transverse dimension Y is essentially taken out of the
equations by the action of the slab waveguide.) In contrast, two-dimensional self
focusing in a Kerr medium (i.e., in the absence of the slab waveguide) would be
highly unstable, resulting in catastrophic collapse and subsequent filamentation of
the beam.5,6 (See Chapter 48, Self-focusing in nonlinear optical media.)
Figure 49.4 shows the total optical power P (as a fraction of the input power
P0) plotted versus z/k in the linear and nonlinear waveguides whose behaviors are
y/
–5
– 15 x/ 15
Figure 49.3 Same as Figure 49.2, but with nonlinearity added to the wave-
guide, n ¼ n0,1 þ n2I. (Note that the horizontal scale differs from that in
Figure 49.2.) The injected Gaussian beam initially expands in the Y-direction to
fill the guiding layer, but in the X-direction self-focusing combats the natural
tendency of the beam to expand by diffraction. From top to bottom, z/k ¼ 0, 200,
500, 800. The net result is a stable, non-diffracting beam that propagates along
the Z-axis, confined in the Y-direction by the waveguide, and in the X-direction
by the self-focusing action of the nonlinear medium.
depicted in Figures 49.2 and 49.3, respectively. The power P is computed at each
step of the simulation by integrating the guided beam’s intensity in the cross-
sectional plane of the waveguide. The initial steep drop in P/P0 is caused by
radiation into the cladding, at a time when the injected beam is still adjusting to
the waveguide; the guided mode is seen to stabilize after a fairly short propa-
gation distance. P(z) in the nonlinear guide behaves more or less the same as it
does in the linear guide, except for the steady-state value of the guided optical
power, which is somewhat greater in the presence of nonlinearity.
1.0
0.9
Normalized Optical Power (P/P0)
0.8
0.7
Nonlinear waveguide
0.6
Linear waveguide
0.5
0 200 400 600 800
Figure 49.4 Total optical power P (as a fraction of the input power P0) plotted
versus z/k in the linear and nonlinear waveguides whose behaviors are depicted
in Figures 49.2 and 49.3, respectively.
Adjacent pair of out-of-phase solitons

When two solitons propagate side by side in the same slab waveguide, they
interact and affect each other’s behavior. Figure 49.5 shows the case of two
identical solitons, launched side by side with a relative phase of 180 . The left
column in Figure 49.5 displays several cross-sectional intensity profiles
throughout the guide, while the right column shows the corresponding phase
distributions (light gray ¼ 0 , dark gray ¼ 180 ). From top to bottom, the frames
represent propagation distances z/k ¼ 0, 200, 500, 800, 1100, and 1600. At first,
the beams expand to fill the guide in the Y-direction. Self-focusing of each beam
follows, with the result that two identical solitons (aside from their 180 phase
shift) form in the same neighborhood. The tails of the two solitons overlap,
however, and this overlap is strong enough to cause their mutual repulsion via the
induced nonlinear phase. Note that the 180 phase difference between the left and
right halves of the guide is preserved throughout propagation.
y/
–5
– 25 x/ 25 – 25 x/ 25
Figure 49.5 Two Gaussian beams, separated along the X-axis and having a
relative phase of 180 , are simultaneously launched into the slab waveguide of
Figure 49.1. The beams initially expand in the Y-direction to fill the width of the
guide; self-focusing then confines each Gaussian beam in the X-direction, and
the interaction between the two pushes them apart. (The peak intensity reached
in this simulation raises the refractive index, locally and instantaneously, by
Dn ¼ 0.003.) The left column displays cross-sectional intensity profiles, while the
right column shows the corresponding phase distributions (light gray ¼ 0 , dark
gray ¼ 180 ). From top to bottom, the frames represent propagation distances
z/k ¼ 0, 200, 500, 800, 1100, and 1600.
1.0
Out-of-phase soliton pair
0.9

0.8
0.7
0.6
0.5
0 400 800 1200 1600
versus z/k for the pair of out-of-phase solitons depicted in Figure 49.5.
Figure 49.6 is a plot of the total optical power versus propagation distance for
the pair of out-of-phase solitons depicted in Figure 49.5. The steep initial drop is
caused by radiation into the cladding, during the time when the injected beams
are still adjusting to the waveguide. Once the solitons are established, however,
their power content remains essentially constant.
Adjacent pair of in-phase solitons

Figure 49.7 shows the case of two Gaussian beams launched into the slab
waveguide of Figure 49.1 simultaneously and with identical phase. The two
solitons thus formed within the nonlinear guide begin to attract each other. In
Figure 49.7, from top to bottom, the propagation distance from the input port is
z/k ¼ 0, 100, 400, 550, 700, 850, 1100 (left column), and z/k ¼ 1250, 1550, 2150,
2300, 2500, 2650, 3050 (right column). At first it appears that the two solitons
fuse together, but soon they separate and move apart, only to return and collide
once again. The two solitons thus engage in a periodic dance. In a truly
y/
–5
– 20 x/ 20 – 20 x/ 20
Figure 49.7 Two identical Gaussian beams, separated along the X-axis by 20k
and having a constant, uniform phase in their cross-sectional plane, are simultan-
eously launched into the slab waveguide of Figure 49.1. The various frames display
the patterns of intensity distribution in the waveguide’s cross section along the
Z-axis. From top to bottom, the propagation distance from the input port is z/k ¼ 0,
100, 400, 550, 700, 850, 1100 (left column), and z/k ¼ 1250, 1550, 2150, 2300,
2500, 2650, 3050 (right column). The peak intensity reached in this simulation
raises the local refractive index by Dn ¼ 0.01. Initially, the beams expand in the
Y-direction and fill the width of the guide, while they self-focus in the X-direction.
Thus confined, the two beams move toward each other and collide, appearing for a
brief period to have fused together. Following collision, the two solitons re-appear
and move apart, but their mutual attraction brings them back together again.
1.0
In-phase soliton pair
0.9
0.8
0.7
0.6
0.5
0 800 1600 2400 3200
versus z/k for the in-phase soliton pair whose behavior is depicted in Figure 49.7.
one-dimensional system the dance would have continued forever, but in this
quasi-one-dimensional case, it appears that the solitons get somewhat closer
together after each oscillation period.
Figure 49.8 is a plot of the total optical power versus propagation distance for
the in-phase soliton pair depicted in Figure 49.7. It is seen that, once the solitons
are established, their power content remains constant despite repeated collisions.
It is a well-known property of solitons that, upon collision, they pass through
each other unscathed. The above behavior of the in-phase soliton pair is a clear
confirmation of this property, even in a non-ideal (i.e., quasi-one-dimensional)
situation.
Bouncing soliton
Consider the rectangular channel waveguide depicted in Figure 49.9. The guiding
channel has length ¼ 40k, width ¼ 5k, and refractive index n1 ¼ 1.5056, while the
index of the cladding glass is n0 ¼ 1.50. The core and cladding materials are
X
Input beam
Y
Channel waveguide
Figure 49.9 Rectangular channel waveguide having length ¼ 40k and

width ¼ 5k. The cladding glass has index of refraction n0 ¼ 1.50, whereas the
index of the guiding channel is n1 ¼ 1.5056. The core and cladding materials
have the same nonlinear (Kerr) coefficient n2.
assumed to have the same nonlinear (Kerr) coefficient n2. An elliptically shaped
Gaussian beam is launched with a slight sideways tilt into this waveguide. (The tilt
is simulated by imposing on the beam a linearly varying phase along the X-axis.)
As before, the injected beam initially expands to fill the channel in the Y-direction,
while simultaneously contracting along X to form a soliton (see Figure 49.10).
However, the sideways tilt of the injected beam propels the soliton towards the
right-hand side.
In Figure 49.10, from top to bottom, the displayed cross-sectional intensity
patterns correspond to propagation distances z/k ¼ 0, 100, 200, 300, 500, 650, 800
(left column), and z/k ¼ 1050, 1100, 1350, 1550, 1800, 2100, 2400 (right col-
umn). When the soliton encounters the channel wall on the right-hand side, it is
squeezed against the wall, then bounces back. Subsequently, it moves towards the
left wall, gets squeezed, and bounces back again. This pattern of behavior is
repeated indefinitely as the beam propagates along the Z-axis. (The peak intensity
reached in this simulation raises the local refractive index by Dn ¼ 0.0022.)
Thus spatial solitons exhibit a particle-like behavior, retaining their identity
even after interactions with each other or with the channel walls. This property
underlies their potential utility as information-carrying bits in all-optical
switching applications.7,8
Figure 49.11 is a plot of the total optical power along the Z-axis for the
bouncing soliton depicted in Figure 49.10. Note in particular that no loss of
power occurs when the soliton encounters the side walls of the channel. This is
what one would expect based on the principle of total internal reflection.
Concluding remarks
The simulations reported in this chapter are quite stable and yield similar soli-
tonic behaviors under diverse conditions. For example, in all cases considered,
y/
–5
– 25 x/ 25 – 25 x/ 25
Figure 49.10 Elliptically shaped Gaussian beam launched sideways into the
channel waveguide of Figure 49.9 forms a bouncing soliton. From top to bottom,
the cross-sectional intensity patterns correspond to propagation distances z/k ¼ 0,
100, 200, 300, 500, 650, 800 (left column), and z/k ¼ 1050, 1100, 1350, 1550,
1800, 2100, 2400 (right column). The injected beam (left column, top) has a
Gaussian amplitude profile, similar to that in Figure 49.3, but it is also modulated
by a linear phase along the X-axis, which gives its motion a slight tilt toward the
right-hand side. As before, the soliton forms and propagates along the Z-axis, but
it slowly drifts to the right. Upon encountering a channel wall, the soliton is
squeezed against the wall, then bounces back.
1.0
Bouncing soliton
0.9
0.8
0.7
0.6
0.5
0 600 1200 1800 2400
Figure 49.11 Total optical power P (as a fraction of the input power P0)
plotted versus z/k for the bouncing soliton whose behavior is depicted in
Figure 49.10.
the optical nonlinearity was placed uniformly in the entire waveguide, that is, the
guide and the cladding layers had the same coefficient of non-linearity (n2). In
general, this is not necessary and one can simulate situations where, for instance,
nonlinearity is present in the guiding layer only, without causing any significant
modification of the results.

1 R. Y. Chiao, E. Garmire and C. H. Townes, Phys. Rev. Lett. 13, 479 (1964).
2 V. E. Zakharov and A. B. Shabat, Sov. Phys. JETP 34, 62 (1972).
3 S. Maneuf, R. Desailly, C. Froehly, Stable self-trapping of laser beams: Observation
in a nonlinear planar waveguide, Optics Communications 65, 193–198 (1988).
4 J. S. Aitchison, Y. Silberberg, A. M. Weiner, et al., Spatial optical solitons in planar
glass waveguides, J. Opt. Soc. Am. B 8, 1290–1297 (1991).
5 G. I. Stegeman, The growing family of spatial solitons, Optica Applicata 26, 239–248
(1996).
6 Special issue of Optical and Quantum Electronics on Spatial Solitons, Vol. 30, No. 10,
published by Chapman & Hall, London, October 1998.
7 A. Aceves, P. Varatharajah, A. C. Newell, et al., Particle aspects of collimated light
channel propagation at nonlinear interfaces and in waveguides, J. Opt. Soc. Am. B 7,
963–974 (1990).
8 A. C. Newell and J. V. Moloney, Nonlinear Optics, Addison-Wesley, Redwood City,
California (1992).
50
Laser heating of multilayer stacks
Laser beams can deliver controlled doses of optical energy to specific locations on
an object, thereby creating hot spots that can melt, anneal, ablate, or otherwise
modify the local properties of a given substance. Applications include laser cutting,
micro-machining, selective annealing, surface texturing, biological tissue treat-
ment, laser surgery, and optical recording. There are also situations, as in the case
of laser mirrors, where the temperature rise is an unavoidable consequence of the
system’s operating conditions. In all the above cases the processes of light
absorption and heat diffusion must be fully analyzed in order to optimize the
performance of the system and/or to avoid catastrophic failure.
The physics of laser heating involves the absorption of optical energy and
its conversion to heat by the sample, followed by diffusion and redistribution
of this thermal energy through the volume of the material. When the sample is
inhomogeneous (as when it consists of several layers having different optical
and thermal properties) the absorption and diffusion processes become quite
complex, giving rise to interesting temperature profiles throughout the body
of the sample. This chapter describes some of the phenomena that occur in
thin-film stacks subjected to localized irradiation. We confine our attention to
examples from the field of optical data storage but the selected examples
have many features in common with problems in other areas, and it is hoped
that the reader will find this analysis useful in understanding a variety of similar
situations.
Magneto-optical disk
The cross-section of a quadrilayer magneto-optical (MO) disk, optimized for
operation at k ¼ 400 nm, is shown in Figure 50.1. (GaN-based semiconductor
diode lasers operating at these blue and violet wavelengths are becoming com-
mercially available, and optical disk systems are expected to take advantage of
678
Electromagnet
r
30 nm Aluminum alloy
30 nm Dielectric (SiN)
10 nm Magnetic film (TbFeCo)
45 nm Dielectric (SiN)
Substrate (Polycarbonate)
Focused laser beam
Figure 50.1 Quadrilayer stack of a magneto-optical disk. The electromagnet

applies a magnetic field Bz(t) in the Z-direction, which is also the easy axis of
magnetization of the magnetic layer. The laser beam, focused on the magnetic
film through the substrate, acts as a heat source during recording.
this development by switching to blue or violet lasers within the next two to three
years.) The quadrilayer of Figure 50.1 is deposited on a plastic substrate and
consists of a thin magnetic film sandwiched between two transparent dielectric
layers, capped by a thin layer of an aluminum alloy.1,2 The optical and thermal
constants of the various layers of this stack are listed in Table 50.1.
The focused laser beam arrives at the magnetic layer from the substrate side.
This quadrilayer is designed to have a reflectivity of 9%, and has a fairly large
polar MO Kerr signal (polarization ellipticity gK ¼ 1.55 and Kerr rotation
angle hK ¼ 0.24 , where the signs correspond to the up and down directions
of magnetization of the storage layer). Aside from contributing to the optical
properties of the stack, the aluminum layer acts as a heat sink, and the upper
dielectric layer is thin enough to provide good thermal coupling between the two
metallic layers.1,2
Table 50.1. Optical and thermal constants of the various materials used in the
calculations
Thermal
Refractive index Dielectric tensor Specific heat conductivity
n þ ik (k ¼ 0.4 lm) e, e0 (k ¼ 0.4 lm) C (J/cm3 C) K (J/cm s C)
Polycarbonate 1.6 — 1.4 0.0025
(substrate)
Aluminum alloy 0.50þ4.85i — 2.4 0.75
Tb21Fe72Co7 2.33þ3.45i e ¼ 6.46 þ 16.11i 2.9 0.10
(amorphous e0 ¼ 0.1850.233i
ferrimagnet)
SiN (dielectric) 2.2 — 2.5 0.030
Ge2Sb2Te5 2.9þ2.5i — 1.3 0.002
(amorphous)
Ge2Sb2Te5 2.0þ3.6i — 1.3 0.005
(polycrystal)
ZnS–SiO2 2.2 — 2 0.006
(dielectric)
Figure 50.2 shows the intensity profile of the focused spot at the storage layer
of the disk. The assumed objective lens that brings the laser light to focus in this
case is free from all aberrations, is corrected for the thickness of the substrate,
and has NA ¼ 0.8, f ¼ 1.5 mm. The collimated Gaussian beam entering the lens
has 1/e (amplitude) radius r0 ¼ 1.2 mm, which is the same as the radius of the
objective’s entrance pupil. The distribution of Figure 50.2(a) is displayed on a
logarithmic scale to enhance the diffraction rings caused by truncation of the
beam at the objective’s aperture. The radial profile of the spot, depicted on a
linear scale in Figure 50.2(b), reveals that the rings are quite weak, however,
and thus incapable of producing much heat at the periphery of the central
bright spot.
Figure 50.3 is a plot of the magnitude of the Poynting vector, S, along the
Z-axis for a plane wave normally incident on the quadrilayer stack of Figure 50.1
through the substrate.2 The horizontal axis depicts the distance from the top of the
stack. Thus S is seen to be constant in the two dielectric layers (30 < z < 60 nm
and 70 < z < 115 nm), indicating no optical absorption in these regions. Most of
the absorption takes place in the magnetic film (60 < z < 70 nm); a very small
fraction of the incident energy goes to the aluminum layer (0 < z < 30 nm). The
optical energy thus deposited in the magnetic film raises the local temperature
immediately, but soon thermal diffusion takes over and carries the heat to other
regions of the stack.
a
–1 x(μm) +1
(b) = 0.4 μm, r0 = 1.2 mm

1.0 NA = 0.8, f = 1.5 mm
0.8
Normalized Intensity
0.6
0.4
0.2
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Radius (μm)
Figure 50.2 Distribution of total E-field intensity, jExj2 þ jEyj2 þ jEzj2, at the
focal plane of a 0.8NA objective. The incident Gaussian beam (k ¼ 0.4 lm) is
truncated by the lens aperture at its 1/e (amplitude) radius. For simplicity’s
sake, the beam is assumed to be circularly polarized, so that it would yield a
circularly symmetric spot at the focal plane. (a) Logarithmic plot of intensity,
showing an Airy disk diameter 0.6 lm and FWHM 0.3 lm. (b) Radial
intensity profile.
1.0
0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100
z (nm)
Figure 50.3 The magnitude S of the Poynting vector along the Z-axis, plotted
through the thickness of the quadrilayer of Figure 50.1. The incident beam
(k ¼ 0.4 lm) is assumed to have unit power. Upon entering the stack S ¼ 0.91,
which indicates that 9% of the incident optical energy is reflected at the substrate
interface. Approximately 3% of the energy goes to the aluminum layer and the
remaining 88% is absorbed by the magnetic film.
Heat diffusion in the stationary stack

To describe the temperature distribution within the stack, we need to specify the
time dependence of the incident laser power, P(t). Figure 50.4 shows three such
functions used in the examples throughout this chapter. The first function, P1(t),
is a 1 mW trapezoidal pulse with 55 ns duration and 5 ns rise and fall times. We
examine the effect of this pulse on the quadrilayer of Figure 50.1, when the stack
is stationary.
Figure 50.5 shows the profiles of temperature versus z at the beam center,
r ¼ 0; the various curves correspond to different instants of time. Early on, at
t ¼ 10 ns, the magnetic film is at a relatively high temperature, the aluminum
layer has uniform temperature through its thickness, and there is a large thermal
gradient between the magnetic film and the aluminum layer. It is through
this temperature gradient that heat is transferred from the magnetic layer to the
aluminum heat sink.3 A gradient has been established also in the lower dielectric
layer between the magnetic film and the substrate. The temperature in the
substrate is seen to decay exponentially with z.
P1(mW)
1
0 15 30 45 60 75
P2(mW)
0 15 30 45 60 75
P3(mW)
6
0
0 15 30 45 60 75
Time (ns)
Figure 50.4 The functions representing laser power versus time that are
used in the various examples: P1(t) is a 55 ns trapezoidal pulse with 5 ns rise
and fall times; P2(t) is a sequence of five identical pulses, each with a 5 ns
duration, 1 ns rise and fall times, and a center-to-center spacing of 10 ns;
P3(t) is a fairly complex pattern of three-level pulses, used in phase-change
recording.
At later times during the heating cycle (i.e., t ¼ 30 ns and 50 ns) the patterns are
similar to that at t ¼ 10 ns, but the temperatures are higher. Once the laser is turned
off the temperatures drop abruptly. At t ¼ 55 ns, the magnetic film is already
cooling down, and the heat is moving to the substrate. The hottest spot at this point
is somewhere in the substrate, close to the interface with the lower dielectric layer.
At t ¼ 60 ns, the cooling has progressed further, and the heat is rapidly spreading
through the substrate. By t ¼ 100 ns, the temperature everywhere is essentially back
150 time = 50 ns
125 30 ns
Temperature above ambient (°C)

100 55 ns
10 ns
75
50 60 ns
25
100 ns
0
0 100 200 300 400
Z (nm)
Figure 50.5 Computed temperature profiles along the Z-axis at the beam
center, r ¼ 0, for the stack of Figure 50.1 illuminated by the focused beam of
Figure 50.2 and the pulse P1(t) of Figure 50.4. The profiles at t ¼ 10, 30, and
50 ns represent the heating-period; cooling-period profiles are shown for t ¼ 55,
60, and 100 ns. By virtue of its strong absorption of the incident optical energy,
the magnetic film is the hottest region during heating. The high thermal
conductivity of the aluminum layer gives it a fairly uniform temperature through
its thickness. As soon as the laser is turned off, the temperatures drop rapidly,
and the peak temperature shifts to the substrate.
to the ambient temperature. Although only the z-profiles are shown here, it should
be remembered that the heat diffuses radially as well; not only does the heat move
to the substrate, but also it spreads radially throughout the entire stack.3
Next we consider the profiles of temperature versus time in the magnetic layer.
Figure 50.6 shows several profiles at different distances r from the beam’s center,
starting at r ¼ 0 and increasing in steps of Dr ¼ 50 nm to r ¼ 1 lm. At r ¼ 0, the
temperature reaches its highest value at the end of the pulse, then decays quickly
and, in the span of a few nanoseconds after the laser turn-off, goes down by
almost an order of magnitude. At larger radii, the temperature is slow to rise, and
it also peaks somewhat after the laser is turned off. The reason for this behavior is
that the focused spot at these radii is rather weak, and the heat does not arrive
there directly from the laser, but by radial diffusion from the central region,
which is under intense illumination.
150 r=0
125
Temperature above ambient (°C)

100
75
50
25
r = 1␮m
0
0 20 40 60 80 100
Time (ns)
Figure 50.6 Computed profiles of temperature versus time in the magnetic layer
under the same conditions as in Figure 50.5. Different curves correspond to dif-
ferent radial distances from the beam center (in steps of Dr ¼ 50 nm), the largest
temperature occurring in the center at r ¼ 0 and the lowest temperatures belonging
to r ¼ 1 lm. As soon as the laser is turned down at t ¼ 50 ns temperatures near the
beam center drop sharply, but at larger radii, because of radial heat diffusion from
the center, T continues to rise for a while after the laser is turned off.
Recording by magnetic field modulation

The electromagnet (EM) above the quadrilayer stack of Figure 50.1 provides a
switched magnetic field Bz(t) between Bmax, thus helping to set the direction of
magnetization of the hot spot within the magnetic layer.4,5 (Bmax is typically of
the order of a hundred oersteds.) To ensure proper alignment of the focused spot
with the EM’s pole piece, the EM is rigidly attached across the disk to the optical
head. (The optical head is the assembly of the laser and other optical, mechanical,
and electronic elements that guide the laser beam to the disk and back to the
detectors.) The disk moves at a constant velocity V in the space between the EM
and the optical head. Typically, it spins at a fixed angular velocity, say, 6000 rpm,
which, at a radius of 40 mm on a 3.5 inch diameter disk, corresponds to a linear
track velocity V ¼ 25 m/s. Since the information track is usually a continuous
spiral from the inner to the outer radius of the disk, the combined optical and
magnetic head assembly must follow this track by slow, continuous, travel along
the disk’s radial direction.1,2
In a currently popular recording scheme, the laser is pulsed at a fixed rate to
produce a sequence of identical hot spots in the magnetic layer.6 The binary
information to be recorded on the disk is fed to the EM, which switches the
magnetization of the hot spot between the “up” and “down” stable states. The
switching rate must be rapid enough to provide a high data-transfer rate into
the recording medium. This requires a compact EM capable of flying very close
to the magnetic layer, lest its inductance becomes too large. If the recorded marks
are to be 0.25 lm long in the direction of the track, the laser must be pulsed at
10 ns intervals, in which case Bz(t) must switch between Bmax with rise and fall
times of only a few nanoseconds. Such fast magnetic heads are currently at the
forefront of conventional magnetic recording technology (i.e., hard-disk drives),
but they require further development in order to be suitable for future generations
of MO drives.
Consider the quadrilayer disk of Figure 50.1 moving at V ¼ 25 m/s under
the focused beam of Figure 50.2, modulated with the pulse sequence P2(t) of
Figure 50.4. These five pulses are each 5 ns wide, have 1 ns rise and fall times, and
reach a peak power of 3 mW. The assumed ambient temperature is 25 C. Figure 50.7
shows several isotherms at the critical temperature Tcrit ¼ 175 C in the magnetic
film during the period 0 t 50 ns. (The maximum temperature of the magnetic
film during the same period is Tmax ¼ 300 C.) To a good approximation, the
magnetic dipoles of the storage layer align with the field of the EM in those regions
where T Tcrit but the EM is unable to reorient these dipoles where T < Tcrit.4,5,6
200 Isotherms at 175°C Tmax = 300°C
100
Y (nm)
–100
–200
0 200 400 600 800 1000 1200
X (nm)
Figure 50.7 Computed isotherms in the magnetic layer of Figure 50.1, when
the multilayer is subjected to the focused spot of Figure 50.2 and the pulse
sequence P2(t) of Figure 50.4. The ambient temperature is 25 C, and the disk
moves at V ¼ 25 m/s along the X-axis. (In the reference frame of the disk, the
focused spot moves from left to right.) The maximum temperature during this
period is Tmax ¼ 300 C, reached at t ¼ 44 ns. All depicted isotherms are at
T ¼ 175 C, plotted at Dt ¼ 1 ns intervals whenever T 175 C. The solid
(broken) curves represent the heating (cooling) phase of each pulse. Because of
the lateral heat diffusion, each pulse produces slightly larger isotherms than the
preceding one, but by the end of the fifth pulse this process reaches a steady
state. During the 10 ns period of each pulse the disk moves by Dx ¼ 0.25 lm,
which is the minimum mark-length that can be recorded in this example.
The isotherms in Figure 50.7 are plotted at Dt ¼ 1 ns intervals whenever T Tcrit

in the MO film. When the temperature is on the rise the isotherms are shown as
solid lines, but as broken lines when the temperature is declining. By the end of the
fifth pulse the temperature profiles reach the steady state (i.e., there are no sig-
nificant variations from one set of isotherms to the next). In between the adjacent
pulses the temperatures everywhere drop below Tcrit, so that for about 4 ns before
and after each pulse the entire magnetic film is at T < Tcrit. This cooling period is
crucial for thermomagnetic recording by magnetic-field modulation, because as
long as Bz(t) is saturated at Bmax, the magnetization state of the disk is well
defined (either up or down). But when the field is in transition, there is a short time
interval during which Bz(t) is weak and, therefore, the magnetization orientation is
uncertain. This problem is overcome by keeping the temperature below Tcrit during
the transition period, in which case no changes occur in the disk’s magnetic state
and, consequently, the recorded domains acquire sharp boundaries.6 For these
reasons it is imperative to have a quadrilayer design, such as that of Figure 50.1,
that cools down below Tcrit in between adjacent laser pulses.
Phase-change optical recording

The general structure of a phase-change (PC) disk is similar to that of an MO
disk. Figure 50.8 shows the cross-section of a quadrilayer PC stack optimized for
50 nm Aluminum alloy
25 nm Dielectric (ZnS–SiO2)
20 nm Phase-change (GeSbTe)
50 nm Dielectric (ZnS–SiO2)
Substrate (Polycarbonate)
Figure 50.8 A quadrilayer stack designed for through-substrate phase-change

recording at k ¼ 400 nm. The reflectance of the stack Ra 8% when the Ge2Sb2Te5
film is amorphous, and Rc 30% when the film is crystalline. The absorbed optical
power in the aluminum layer in either case is only about 1%. Thus, for all practical
purposes, the fraction of the incident power that is not reflected is entirely absorbed
in the PC layer.
operation at k ¼ 400 nm. The optical and thermal constants of this stack are listed
in Table 50.1. The Ge2Sb2Te5 material can be switched between amorphous and
(poly)crystalline states by the laser beam: melting at Tmelt 625 C followed by
rapid quenching results in an amorphous mark, whereas annealing for a reason-
able length of time above the glass transition temperature (Tglass 150 C) returns
the material to the crystalline state.7,8,9 The stack shown in Figure 50.8 has
reflectivities Rc ¼ 30% and Ra ¼ 8% for the crystalline and amorphous phases of
the PC film. Note also that the thermal constants for the two phases are somewhat
different. In the following analysis we ignore these differences by assuming the PC
layer to be crystalline at all times. Also to simplify the calculations further, we
ignore the heats of melting and crystallization. These are reasonable approxima-
tions, but the final results may need slight corrections if more accuracy is desired.
The laser pulse sequence applied to this sample is P3(t), shown in Figure 50.4.
Here the laser operates at three different power levels. At the highest level the
pulse is strong enough to melt the PC film. In the low-power regime, occurring
immediately after the melting pulse, the temperatures drop rapidly, causing the
quenching of the molten material into an amorphous state. The intermediate
power level is for annealing the pre-existing amorphous marks, which is required
when overwriting a previously written track. (Such tracks contain both amorphous
and crystalline regions, and it is necessary that all amorphous regions that are not
being melted be annealed into the crystalline state.)
Figure 50.9(a) shows the computed isotherms in the PC layer at T ¼ Tmelt for a
disk speed V ¼ 25 m/s and an ambient temperature of 25 C. The solid and broken
isotherms as before represent the heating and cooling cycles, respectively. The
maximum temperature reached in this sample is Tmax ¼ 1153 C at t ¼ 59 ns. The
isotherms are plotted at intervals Dt ¼ 0.5 ns whenever T Tmelt in the PC film.
The first two molten regions are well separated from each other and from other
molten pools; these will eventually quench to form two small amorphous marks.
The cooling in these regions is rapid, and the temperatures return to the vicinity
of Tglass in about 5 ns.
The third and fourth molten pools, however, have some degree of overlap.
(In practice two or more short overlapping marks such as these are used to create
a long mark.) The heat generated by the fourth pulse flows backward and affects
the amorphous region being formed in the wake of the third pulse. In general,
heat diffusion from the tail end of any long mark can anneal the leading edge as
well as the mid-sections of the same mark, causing partial crystallization. This
problem may be better appreciated by examining the T ¼ 325 C isotherms of the
same system, shown in Figure 50.9(b). Here the annealing pulses (i.e., the
medium-power levels of P3(t)) appear behind the first two marks well after they
have cooled. By then the annealed region is far enough from the previously
300 (a) Isotherms at 625 °C Tmax = 1153 °C
150
Y (nm)
–150
–300
0 300 600 900 1200 1500 1800
300 (b) Isotherms at 325 °C
150
Y (nm)
–150
–300 V = 25 m/s
0 300 600 900 1200 1500 1800

X (nm)
Figure 50.9 Computed isotherms in the GeSbTe film of Figure 50.8, subjected
to the focused beam of Figure 50.2 and the pulse sequence P3(t) of Figure 50.4.
The ambient temperature is 25 C, and the quadrilayer disk moves at V ¼ 25 m/s
along the X-axis. The peak temperature Tmax ¼ 1153 C in the film is reached at
t ¼ 59 ns. The solid (broken) isotherms correspond to the heating (cooling) phase
of each pulse. (a) Isotherms at T ¼ 625 C, the melting point of the PC material.
(b) Isotherms at T ¼ 325 C, the presumed (elevated) annealing temperature
given the short annealing time.
molten pools that there is no danger of recrystallization. In contrast, the two large
isotherms in Figure 50.9(b) corresponding to the third and fourth melting pulses
partially overlap, causing the formation of a small, undesirable crystallite in the
middle of the long amorphous mark. These are some of the issues with which
the designers of optical disk drives must grapple, in order to create robust and
reliable data storage systems.

1 T. W. McDaniel and R. H. Victora, eds., Handbook of Magneto-optical Recording,
Noyes Publications, Westwood, New Jersey, 1997.
3 H. S. Carslaw and J. C. Jaeger, Conduction of Heat in Solids, Oxford University
Press, UK, 1954.
4 D. Chen, G. N. Otto, and F. M. Schmit, MnBi films for magneto-optic recording,
IEEE Trans. Magnet. MAG-9, 66–83 (1973).
5 Y. Mimura, N. Imamura, and T. Kobayashi, Magnetic properties and Curie point
writing in amorphous metallic films, IEEE Trans. Magnet. MAG-12, 779–781
(1976).
6 S. Yonezawa and M. Takahashi, Thermodynamic simulation of magnetic field
modulation methods for pulsed laser irradiation in magneto-optical disks, Appl. Opt.
33, 2333–2337 (1994).
7 S. R. Ovshinsky, Reversible electrical switching phenomena in disordered structures,
Phys. Rev. Lett. 21, 1450–1453 (1968).
8 J. Feinleib, J. deNeufvile, S. C. Moss, and S. R. Ovshinsky, Rapid reversible light-
induced crystallization of amorphous semiconductors, Appl. Phys. Lett. 18, 254–257
(1971).
9 T. Ohta, M. Takenaga, N. Akahira, and T. Yamashita, Thermal change of optical
properties in some sub-oxide thin films, J. Appl. Phys. 53, 8497–8500 (1982).
Index
Abbe’s sine condition 9, 10, 14, 16, 20, 33, 39, 46, annular phase mask 552
528 anti-bunching 127
Abbe’s theory of image formation 23, 38 anti-guiding 494
aberrated wavefront 619 antireflection coated 11, 158, 346, 361, 388–391, 476,
aberration 9, 16, 20, 33, 45, 64, 160, 227, 310, 313, 518, 557, 634–635
351, 355, 362, 379, 381, 452, 482, 503, 525, 578, aperture
617, 658, 680 annular aperture 32, 542
aberration-free 10, 11, 16, 18, 19, 34, 48, 57, 227, aperture stop 503, 579
229, 261, 307, 388, 447, 494, 496, 499, 526, 602, circular aperture 26–28, 34, 41, 443, 448–449, 457,
616, 630 610–611
primary spherical aberration 618, 621, 622 clear aperture 10, 16, 447, 496, 587, 659
spherical aberration 227, 380, 381, 482, 486, 534, spiral aperture 372
541, 578, 604, 617–623, 627, 658 aplanat 18–19, 483
absorbing media 216, 222 aplanatic 16, 18, 21, 33–35, 46–51, 328, 389, 391,
absorption coefficient 128, 155, 204, 206, 605, 611, 528, 539, 541
632 aplanatic sphere 528
air gap 133, 144, 198, 238, 270, 275, 280, 383, 389, aplanatic system 16
395, 536 aplanatism 33, 528
Airy disk 62, 261, 316, 379, 517, 522, 526, 542, 547, apodization 379
681 apparent position of fixed star 310
Airy function 35, 41 aragonite 405
Airy pattern 10, 19, 33–35, 37, 45, 62, 69, 70, 379, arc lamp 62, 546, 554, 588, 626
517, 518, 522, 547 aspheric mirror 543
aluminum mirror 83, 564 aspherics 351
ambient temperature 684, 686, 688, 689 aspheric surface 485, 527, 543
amorphous 168, 680, 687, 688, 689 astigmatic 59, 490–491
amorphous mark 688, 689 astigmatic distance 490, 491
amplitude mask 373, 459, 549 astigmatism 363, 381, 481–483, 490–499, 503,
amplitude spectrum 102, 117, 118 580–581, 617
amplitude transmission function 70, 373 atomic dipole 211
analyzer 555–570, 633–638 atom optics 370
anamorphic magnification 499, 501–503 attenuated total internal reflection (TIR) 133, 385–388
anamorphic magnification factor 503 attenuation coefficient 449–458
anamorphic prisms 499 autocorrelation 78, 113, 115–116, 126, 241
angular discrimination 263 autocorrelation function 78, 113, 116
angular momentum 289, 301–309 average intensity 77, 91–92
angular resolution 37, 516
angular separation 114, 121, 124, 258–265, 508, Babinet’s principle 69
517–520, 567–569 backward propagated 478
angular spectrum 135, 178, 233, 264, 381 backward propagation 53
angular spectrum decomposition 233 baseball pattern 136–137, 315–316, 531, 533,
anisotropic polarizability 655 539–540
annular light source 552 baseline 516–520
691
692 Index
beam decenter 483–487 classical mount 326–328, 332, 336, 338–341,
beam propagation method (BPM) 459, 492, 658, 665, 346–349
667 classical source of light 127
split-step BPM 459–460, 658 coefficient of nonlinearity 676
beam-splitter (BS) 79, 126, 182–187, 194, 201, 224, coherence
231–234, 268, 297, 464, 516, 525, 546, 564, 567, coherence factor 590
619, 624, 644 coherence length 74, 78, 80–81, 93, 113, 624–626
beam tilt 480–488 coherence theory 88, 505
beam waist 55–56, 290, 295, 320, 490 coherence time 64, 545, 547
beat frequency 195 first-order coherence 77, 89, 91–92, 95, 113, 117,
bending of polarization vector 46 124
Bessel beam 32 degree of coherence 88–89, 95, 586, 587
best focus 226–228, 358–362, 525–530, 535–542, mutual coherence 100, 103
580–583 partial coherence 586
bias phase 568–574 partial spatial coherence 556
biaxial birefringent crystal 405–408, 413 temporal coherence 74–79, 88, 93, 113, 555
binary intensity mask (BIM) 586, 588 temporally coherent 64, 566
binary star 121 coherent addition 213–214
birefringence 197, 201–203, 554, 563–565, 632 coherent illumination 62, 64, 66–69, 88, 367, 547–550
birefringent 110, 201–203, 235, 405–408, 413, 557, coherent and incoherent imaging 62, 88, 551, 582
563–565, 567 coherent image 68, 92, 93, 583
birefringent slab 110 coherent imaging 41, 42, 65, 68, 70, 582
birefringent substrate 557 coherent imaging system 42, 65
boundary condition 142, 149, 324–325, 447, 461, 600, coherent monochromatic light 408
606 coherent point source 64, 92
Bracewell, Ronald 61, 515, 524 coherent source 63, 64, 546, 548
Bracewell telescope 515, 519–522 colliding pulse ring laser 240
Bradley, James 310 collimated beam 5, 10, 21, 28, 65, 78, 100, 136, 201,
Bragg’s law 270, 317, 318 225, 307, 367, 382, 405, 476, 494, 506, 541, 551,
bright space 595 627, 633
Brewster’s angle 164, 218, 281–283, 381, 383, 389–390 collimator 160, 161, 351, 383, 495–503, 505–508, 526
collimated coherent illumination 67–69, 547
Calcite 110, 567 coma 5, 9–10, 16, 19, 379–381, 452, 457, 580–581,
catadioptric solid immersion lens (SIL) 541, 543 617–620
caustic 301, 479 third-order coma 5, 617, 619
cavity 177–178, 195, 197, 204, 207, 270, 447–457, primary coma 19, 452, 457, 619, 620
491–492, 602 comatic tail 452
central fringe 96, 98 comb function 375–377
channel waveguide 464–467, 673–675 compact disk 351, 534, 609
chaotic light 113, 116–119 complex amplitude distribution 16, 17, 25–28, 34, 52,
chaotic point source 124, 125 174, 228, 289, 371, 400, 448, 546, 584, 643
characteristic equation 141 complex degree of spatial coherence 96
charge coupled device 556, 627 compound microscope 9, 576
chirp cancellation 241, 248, 249, 252 compression 46, 86, 240–257, 499, 502
chirp-compensation 251 compression ratio 241, 248, 252, 253
chirped mirror 240 concave pit 604, 609, 610
chirped pulse 240, 247, 248 concentric ring pattern 370
chromatic aberration 355, 356, 527, 582 condenser 29, 30, 62–67, 546–557, 583–591, 625
chromatic dispersion 351 condenser stop 586, 587, 590, 591
chromeless 589, 592 conduction electrons 150
circle of least confusion 226–227 cone of light 11–18, 65–67, 136, 160, 200, 316, 372,
circular aperture 26–28, 34, 41, 443, 448, 449, 457, 388, 405, 415, 461, 517, 526, 530–540, 551, 560,
610, 611 579, 611, 637
circularization 499 confocal resonator 447, 451, 454, 457
circularly polarized 5, 12, 35, 107, 154, 167, conical mount 326, 327, 331–335, 340–346
202, 224, 229, 263, 305, 409, 413–416, conical refraction 404–408, 410, 413, 415
637, 681 conjugate plane 11, 14, 17
circular polarization 154–155, 168, 203–204, 208, conjugate wave 644, 645, 658, 651
230, 305, 410 conjugated object wave 649
cladding 139, 243, 255–257, 459–470, 478, 489–495, conoscopic 554, 556, 564
665–668, 671–676 conoscopic polarization microscopy 564
Index 693
conservation of energy 211, 213, 215, 232, 235, 275 differential method of Chandezon 8, 138, 325, 350,
constitutive relation(s) 420, 599 544
construction wavelength 352, 354, 357, 361, 363, 364, differential polarization microscopy 558, 564
365 differential signal 175, 177, 398, 401, 402, 533
contact hole 590, 594, 595 differentiation theorem 60
contrast 62, 88, 96, 201, 315, 371, 389, 549–561, 564, DIFFRACT ix, xi, 2, 138, 350, 544
566, 571, 573, 588, 592, 615, 624, 625 diffracted order 65, 66, 134–136, 249, 270, 272, 315,
contrast enhancement 549–551 318, 324–338, 341–345, 356, 361–365, 387,
convection of light 310, 311, 320 614–616
convex pit 604, 608, 610 diffracted ray 360
convolution 35, 41, 522 diffraction 1, 9, 16, 23, 28, 44–52, 70, 336, 344, 355,
core 243, 459–468, 476, 478, 665, 666, 673, 674 382, 445, 447, 459, 461, 526, 545, 566, 570
co-rotating dielectric 191 classical diffraction 367, 459, 599
coupling efficiency 402, 476–488 classical theory of diffraction 23, 26, 45, 47
cover plate 528, 533–535, 604 diffraction-free beam 36, 44
critical angle 129, 142, 143, 255, 343, 379–383, 393, diffraction effect 301, 382, 571, 573, 655
401, 402, 537 diffraction efficiency 250, 324, 330–349, 354, 356
critical illumination 570 diffraction-limited 10, 35, 64, 172, 307, 460, 527,
critical TIR angle 133, 134, 379, 388, 391, 393 547, 584, 658
cross-correlation function 91, 100, 103, 123, 124 diffraction-limited focus 10, 36, 52, 336, 483, 525,
crossed analyzer 564, 569, 570 527, 539
crystalline 554, 687, 688 diffraction-limited spot 37, 328, 609
current loop 426, 433, 435, 436 diffraction order 21, 137, 249, 250, 316, 325–327,
curvature 10, 21, 31–33, 55, 59, 226, 250, 291–294, 331, 355, 531, 615
302, 358–366, 374, 480–488, 496, 502, 543, diffraction theory 25, 47, 367, 526, 532, 537, 599,
577–581, 617, 624–627, 655–658, 664 615
curvature phase factor 31, 32, 55, 366, 480, 481, 557 diffraction rings 680
cutoff frequency 64–67 scalar diffraction theory 25, 532
cycle-averaged intensity 113, 115, 119, 121, 126 vector diffraction 8, 47, 51, 138, 330, 345, 355,
cylindrical lens pair 496, 498, 499, 503 533, 537, 544
diffractive lens 351
dark soliton 665 diffractive optical element (DOE) 351, 352, 353, 355,
defocus 45, 48, 64, 227, 381, 482–487, 529, 541, 554, 356, 366
559–562, 570–573, 617, 663 diffractive propagation 303, 665
degree of first-order coherence 77, 117, 124 diffuse radiation 522
degree of second-order coherence 113, 114, 116, 119, diffusion 678, 684
127 heat diffusion 663, 678, 682, 685, 686, 688
DELTA 350, 544 lateral heat diffusion 686
delta function 27, 375–377 radial diffusion 684, 685
depolarization 107, 110, 156 thermal diffusion 663, 680
depolarized 105, 107 diode laser 351, 489–502, 584, 678
depth of focus 525–536, 541, 542, 586, 588 dipolar oscillation 211
detection module 177, 397, 398, 525, 638, 639 dipole radiation pattern 420
detector 78, 86, 113, 121–127, 176, 194, 268, 351, Dirac’s delta function 27
397, 402, 517, 522, 525, 526, 531, 532, 537, 556, directional coupler 467, 469–473
627–640, 685 dispersion 83, 240, 243–246, 254, 351, 600, 664, 665
diagonal element 154, 167 dispersive 81, 240, 241, 246–249, 257, 600
diamagnetic 154, 167 dispersive element 81, 248
dielectric constant 128, 131, 139–143 dispersive optical element 248
dielectric mirror 161, 177–179, 197–199, 206, 207, divergence-free 150, 419, 410, 435
274 divergence laws 420
dielectric slab 191, 192, 213, 215, 216, 218, 220, 254, Doppler shift 184–190, 310–320
279 double exposure 651, 652
dielectric stack 82, 235–237, 268, 276, 324 double-slit mask 505, 508–511
dielectric tensor 153, 162, 166, 171, 234, 235, 396 double star 508, 511
differential detection 176, 398, 638, 639 down-chirp 246
differential detection module 398, 638, 639 duty cycle 64, 65, 325, 326, 346
differential detector 402, 639
differential image 555, 558, 561 Earth’s rotation 182
differential interference contrast 566, 569 effective index 243
differential interference contrast microscope 566, 569 effective medium theory 333
694 Index
E-field energy density 50, 51 far field 23, 30–32, 41, 52, 55, 294–298, 368,
eigenfunction 23, 60, 449 477–480, 584, 612, 616
eigenfunction of propagation in free space 60 far field (Fraunhofer) diffraction formula 480, 584
eigenvalue 449 far field pattern 31, 32, 368
electric charge 418, 419, 423, 429, 436, 439 fast axis 498
electric current 437, 489 fast Fourier transform (FFT) 26, 45, 302, 461
electric dipole 209, 216, 420, 421, 425–427, 429, femtosecond range 240
433–437 ferrimagnetic 154, 167
electric field intensity 2, 6, 205, 264, 307, 654, 665, ferromagnetic 154, 167
681 fiber 64, 182, 240, 241, 246, 248, 460–466, 475–478,
electromagnet 679, 685 482, 483, 485–489, 502, 503, 584
electromagnetic energy 292, 301 fiber bundle 64
electromagnetic field 23, 45, 88, 130, 145, 147, 149, fiber-optic gyroscope 182, 196
150, 209, 258, 325, 338, 381, 387, 388, 419, 447, field momenta 301
479, 599 field momentum density 305
electromagnetic radiation 209, 304, 324 field of view 17, 40, 41, 62, 522, 541, 568, 570, 580,
electromagnetic waves 47, 52, 133, 149, 234, 275, 584
387 filament 447, 463, 464, 502, 655, 662, 663
elegant solution of wave equation 60 filamentation 660, 663, 664, 667
ellipse of polarization 5, 7, 102, 105, 106, 203, 398, Finite Difference Time Domain (FDTD) 139, 151,
415, 632 418, 599
ellipsoid of birefringence 554, 564, 565 first-order beam 66, 67, 331, 338, 341, 616, 623
ellipsometry 632, 638 first order field coherence function 78
elliptical aperture 418–442, 503 Fizeau 505, 511, 513
ellipticity 5–7, 102–110, 155–170, 173, 179, 180, flow of heat 396
202–206, 389–399, 413–416, 498, 499, 556, 557, focal-shift phenomenon 46
559, 562, 679 f-number 379, 615, 627
emergent wavefront 10, 16, 19, 45, 364, 478, 479, focused cone 135, 136, 200, 406, 461, 517, 527, 560,
480, 657 564, 604, 611, 615, 637
energy flow pattern 139, 150 focused laser beam 301, 396, 397, 473, 609, 679
ensemble 75, 116 focused spot 10, 19, 20, 35, 37, 45, 136–138, 144,
ensemble average 75 183, 184, 261, 263, 265, 272, 297, 308, 309,
ergodic 88 315–317, 328, 336, 379, 380–382, 388, 396, 397,
ergodicity 116 415, 460, 476–483, 499–502, 525, 526, 528,
evanescent 25–27, 132, 143, 145, 147, 148, 255, 257, 533–535, 540, 561, 580, 602–605, 608–610, 623,
325, 381, 387, 388, 461, 467, 494, 495, 611–613 628, 630, 680, 684–686
evanescent beam 27 forward propagation 53
evanescent coupling 1, 387–393, 395, 398, 401, 402 four-corners problem 556–558, 560, 561, 563
evanescent wave 26, 132–134, 381, 382, 386, 387, Fourier coefficients 354
461 Fourier component 42, 242, 373
even mode 144–150, 467, 469 Fourier domain 27, 28, 41, 52, 89, 375–377, 546
Ewald-Oseen theorem 209, 214, 218, 220, 222 Fourier plane 27, 546, 548–551
extended incoherent source 92 Fourier optics 1, 23, 44, 45, 53, 73, 653
extended source 88, 91, 92 Fourier series 78, 115, 354, 375
extended waveform 75, 77, 81, 91 Fourier spectrum 144, 146, 241, 325
extended white light source 554 Fourier transform 23, 25–28, 31–38, 45, 48, 53, 60,
external conical refraction 404–407, 413 76, 90, 96, 116, 119–121, 124, 147, 242,
extinction rate 145 245–247, 258, 302, 375, 376, 380, 461, 549
extinction ratio 635, 637, 638, 640 Fourier transform lens 36
extinction theorem 1, 209, 213, 214, 216, 220–222 Fourier transform plane 38, 549
Fraunhofer (far field) distribution 31
Fabry–Pérot etalon 1, 197–204, 207, 248, 251, 263, free-space impedance 140, 246, 432
265, 270, 271 frequency domain 100, 121, 247
Fabry–Pérot interferometer 197, 205 frequency spectrum 75–77, 89, 90, 100, 117, 118,
Fabry–Pérot resonator 159–161 240, 241, 320
Faraday angle 154, 157 frequency sweep 240
Faraday effect 152–159, 162–164, 166 Fresnel’s coefficient(s) 141, 209, 210, 221, 222, 238,
longitudinal Faraday effect 162, 163 256, 559
Faraday medium 155, 156, 159, 160, 204–206, 208 Fresnel drag 182
Faraday rotation angle 155, 156, 208 Fresnel’s formula for the drag of light 321
Faraday rotator 203–205, 208, 224, 225, 230, 235 Fresnel-Kirchhoff diffraction integral 447
Index 695
Fresnel number 26 grating period 21, 136, 250, 251, 318, 325–328,
Fresnel’s reflection coefficient 131, 141, 143, 167, 331, 334–346, 615, 616
209, 221, 256, 379, 389, 391, 400, 557, 559 metal grating 135, 136, 331
Fresnel’s reflection formula 128 metallic grating 324, 331, 446
Fresnel rhomb 103, 105 metallized grating 325, 326
Fresnel transmission coefficient 238, 537 ruled grating 323, 324, 337
fringes 68, 74, 88, 92–95, 183, 187, 190, 191, 195, transmission grating 330, 341, 344, 346
201, 298, 422, 427, 429, 436, 438, 440, 495, 505, two-dimensional grating 623
506, 508, 511, 513, 522, 614, 616, 617, 619, grating compressor 240, 241
624–627, 643–645, 649, 651 grazing incidence 283, 284, 288
fringe contrast 88, 96, 98, 625 groove 135–138, 272, 315–318, 324–333, 336–341,
fringe pattern 88, 93–96, 188, 261, 290, 291, 345, 346, 353, 531–533, 537–540, 629, 630
505–510, 519, 523, 618, 619, 625, 651 groove depth 315, 316, 325, 333, 346, 531
fringe periodicity 94 groove edge 136, 137, 531, 533
fringe shift 183, 187 group velocity, 193, 243–245
fringe visibility 507–511 group velocity dispersion (GVD) 244, 245, 246, 248
frustrated total internal reflection (FTIR) 383, 388, guided mode 243, 255, 346, 443, 444, 461, 464, 466,
537 467, 472, 477, 491, 492, 494, 668
fused silica 154, 243, 244, 248, 584, 654 guiding layer 255–257, 489–495, 665–668, 676
gain-guided laser 490 half-wave plate 557, 558

gain layer 489–492 half-wave layer 208
gain medium 194, 195, 490–494 half-wave thickness 212, 217, 219, 232
Gale, Henry 182 Hall conductivity 420
Gauss–Hermite polynomials 296 Hall, Robert N. 489
Gauss–Laguerre polynomials 296 halogen lamp 554
Gaussian beam 3–6, 29, 52–60, 143–146, 289–298, Hamilton, Sir William Rowan 23, 404, 405
302–306, 319, 320, 358, 361–365, 460, 476–487, Hanbury Brown, Robert 114, 121, 127, 513
502, 655–659, 665–675, 680, 681 Harress, Francis 182
Gaussian optics 12 Hanbury Brown-Twiss experiment 113, 114, 127
generalized Gaussian beam 52, 58 heat sink 396, 679, 682
geometrical optics 14, 258, 301, 305, 620 heat source 679
geometric-optical ray 46, 301, 352, 477–479, 584 Heisenberg’s uncertainty relations 258
geometric-optical theory 615 helicity 295–298
geometrical optics 14, 258, 301, 305, 620 hemispherical glass cap 530
giant star 509, 510 hemispherical glass substrate 129, 346, 634
Gires–Tournois (GT) resonator 251 hemispherical 129, 343–346, 530, 602, 603, 634, 635
guided mode 243, 255, 346, 443, 444, 461, 464, 466, Hermite–Gaussian beam 60
467, 472, 477, 491, 492, 494, 668 HeNe laser 74, 231, 233, 328, 405
glass ball 584 Hermite polynomial 60
glass cylinder 274 higher-order Gaussian beam 59
glass hemisphere 129, 133, 344, 383–391, higher-order mode 256, 450, 452, 464
602–607 high-resolution imaging 560
glass prism 143, 144, 150, 361, 363, 379, 388 hollow cone of light 391, 405, 551
glass sphere 539, 576–580, 584 hologram 289, 643–653
Goos–Hänchen effect 379, 380, 382 double exposure hologram 651, 653
Gouy phase 52, 56–60 rainbow hologram 643
gradient-index (GRIN) 277, 476, 477 holography 1, 642, 643, 646, 653
Gradium glass 486–488 Huygens principle 23
grating 1, 21, 48, 64–67, 134, 151, 240, 241, 248, hybrid design 355
249, 270, 272, 315, 316, 318, 323, 325, 327, 328,
333, 336, 341, 350, 351, 354, 387, 531, 540, illumination optics 62, 63, 547
615–618, 623 image contrast 371, 372, 555–558, 565, 573, 588, 589
amplitude grating 64, 65, 614 image-forming system 9, 16, 38, 298
blazed grating 336, 337 image plane 11–13, 16, 19, 21, 39–42, 63–71, 370,
dielectric-coated grating 346, 347 517–523, 545–552, 555, 556, 566, 571, 581, 584,
diffraction grating 1, 2, 21, 24, 48, 134–138, 248, 597, 648
270, 315, 316, 323, 324, 346, 350, 387, 531, 544, image quality 68, 298, 549, 580, 592
614, 615 imaging system 10, 11, 13, 16, 39, 41, 42, 62, 63, 65,
double-frequency grating 623 92, 546
echelette grating 337–341 immersion-oil microscopy 11
696 Index
incandescent lamp 62, 91 longitudinal Kerr effect 168–172
incidence medium 283, 330, 356 polar Kerr effect 166, 168, 172
incoherent illumination 64–70, 548, 551 polar Kerr signal 174, 175
incoherent image 69, 583, 584 magneto-optical (MO) Kerr effect 2, 166, 167, 178,
incoherent light source 63, 92, 370, 555, 559, 560, 179, 397
569, 570, 586 transverse Kerr effect 166
index ellipsoid 406–408 Kerr ellipticity 397, 399
index-guided laser 490 Kerr nonlinearity 246, 664, 666
index-matched fluid 530, 535 Kerr rotation angle 180, 399, 402, 679
infinite conjugate 18, 20, 33, 35, 627 MO Kerr rotation 397
information storage 297 Kerr signal 172, 174, 175, 177
infrared 248, 324, 515, 516, 524, 655 MO Kerr signal 179, 402, 679
inhomogeneous plane wave 140, 149, 386 Kerr liquid 240
injection laser 489 Köhler illumination 570, 573
in-phase soliton pair 673 knife-edge method 620
instantaneous intensity 114 knife-edge test 621, 622
integrated intensity 46, 53, 172, 173, 401, 602, 605, Kretschmann configuration 142
606, 610, 612, 659
intensity autocorrelation 116, 126 land 325–328, 535, 532, 533, 537, 540
intensity fluctuation 91, 113, 114, 126, 127 land-groove 328
interfere 78, 79, 108, 183–188, 262, 268, 290, 291, laser 62, 74, 76, 113, 172, 195, 231, 248, 328, 402,
298, 299, 411, 422, 505, 521, 547, 566, 619, 461, 489, 491, 501, 588, 626, 655, 679, 684, 688
643–646, 649–652 laser beam 62, 113, 172, 176, 183, 301, 351, 396, 460,
interference 1, 29, 74, 75, 80, 92, 146, 147, 156, 158, 491, 497, 525, 547, 609, 621, 644, 678, 685, 688
164, 183, 221, 268, 297, 315, 385, 396, 401, laser diode 224, 225, 460, 461
422, 429, 463, 471, 474, 512, 547, 566, 574, laser gyroscope 194
623–627, 652 laser heating 678
constructive interference 58, 268, 517, 518, 521 lateral wavefront shear 623
destructive interference 58, 74, 136, 268, 411, 518, launching light 476
522, 589, 591 lens 3, 11, 19, 50, 318, 502, 558, 580, 602, 617, 626,
double pinhole interference 93 633, 645, 680
interference fringe 88, 92, 93, 299, 614, 616, 643 aplanatic lens 18, 33, 34, 46–51, 328
interference pattern 74, 195, 200, 202, 261, 298, collimating lens 32, 160, 161, 328–330, 344, 345,
495, 508, 617, 625, 626, 643–647, 653 382, 413, 637, 640
interferogram 199, 201, 290, 411, 494–496, 545, 566, collimator lens 495, 505, 506, 508
619, 620, 626, 627, 645–652 condenser lens 29, 62–64, 67, 546–548, 551, 552,
holographic interferogram 651 557, 583, 587
sheared interferogram 566 cylindrical lens 143, 489, 496, 498, 499, 502, 503, 630
interferometer 78–81, 86, 182–186, 189, 190, focusing lens 160, 328, 382, 406, 413, 414, 416,
194–197, 201, 205, 251, 268, 269, 494, 496, 499, 500, 505–507
505, 506, 511–519, 523, 619, 623–627 dark lens 28–30
double-slit interferometer 505, 506 diffraction-limited lens 35, 460
nulling interferometer 515, 517, 523, 524 finite-conjugate lens 64, 547
interferometric telescope 515, 516 high-NA lens 42, 47, 261
interferometry 127, 182, 199, 494, 516, 574, 624, 642, microscope objective lens 136, 172, 201, 307, 308, 527
651–653 plano-convex lens 10–12, 226, 227, 486–488, 527,
holographic interferometry 642, 649, 651, 653 624, 625
phase-shift interferometry 574 objective lens 42, 62, 66, 135, 161, 199, 201, 297,
real-time interferometry 652 315, 328, 364, 388, 397, 525, 534, 540, 545, 551,
stellar interferometry 88, 505, 511, 513 554, 560, 564, 566, 569, 602, 680
internal conical refraction 405, 407, 408, 410, 415 oil-immersion lens 529
inverse Fourier transform 26, 60 split lens 57
iron garnet 154, 204 thick lens 626
isolated bright line 590, 592 lensless imaging 367
isolator 225 left circularly polarized (LCP) 5, 107, 154, 167, 230, 264
isotherm 686–689 lenslet 627–630
iteration 447–458, 460, 601, 602, 658, 662 lenslet array 627–630
light emitting diode 584
k-space 143, 258 light source 62, 63, 113, 124, 155, 182, 186, 188, 190, 314,
k-vector 131, 143, 144, 234, 235, 258, 259, 265, 270 319, 351, 370, 381, 388, 545–547, 552, 554, 555,
Kerr effect 2, 166, 167, 170, 173, 178, 654 560, 569, 586, 587, 622, 624, 625, 633, 639, 643
Index 697
linearly polarized 3, 35, 47, 49, 107, 143, 154, 161, Michelson 78, 88, 182, 505, 507, 511–514
166, 178, 201, 224, 230, 261, 302, 307, 358, 363, Michelson-Gale interferometer 182
380, 388, 397, 410, 415, 527, 538, 543, 554, 559, Michelson interferometer 78, 511, 513
569, 602, 633, 639, 655 Michelson’s stellar interferometer 88, 505, 513
linear momentum 259, 301 microscope 2, 9, 50, 297, 309, 525, 526, 545, 546,
linear phase shift 85, 247, 380, 619 554, 560, 564, 566, 569, 574, 576, 581, 584, 642
linear system 81 microscope objective 136, 172, 201, 307, 328, 527,
line-shape 117, 119, 124 546, 565, 568, 573
liquid crystal cell 638 misalignment 183, 452, 488, 640
liquid waveguide 664 mode 74, 76, 77, 91, 119, 139, 140–150, 194, 195, 240,
Littrow mount 336–340 255–257, 289, 296, 346, 443, 447–458, 459–475,
localized irradiation 678 476, 477, 483, 489–496, 502, 584, 664, 668
local nonlinearity 663 mode-locked laser 76, 240
long-range SSP 138, 139, 149 modulation transfer function (MTF) 64–67
Lorentz contraction 318, 319 monochromatic 21, 29, 74, 92, 97, 129, 135, 231, 327,
Lorentz force 420, 422 408, 546, 557, 574, 614, 625, 633
Lorentz transformation 187, 192, 193, 311, 320 monochromatic point source 88, 93, 372, 373, 505,
Lorentzian 116–118 552, 557, 579, 584
lowest-order mode 448, 449, 450, 452, 457, 467 MTF cutoff frequency 65, 66
low-pass filtering 114, 126 MULTILAYER 2
luminiferous ether 310 multilayer dielectric mirror 198
multilayer stack 75, 84, 86, 197, 236, 238, 239, 275,
Mach–Zehnder interferometer 78–80, 268, 269, 619 395, 396, 400, 518, 601, 678
magnetic charge 149 multimode 471, 473–475, 584, 664
magnetic dipole 420–443, 686 multimode interference (MMI) device 471–475
magnetic domain 397, 558, 559 multimode fiber 584
magnetic energy 424, 437, 440 multi-photon ionization 658
magnetic field 140, 153–156, 167, 204, 302, 387, 397, multiple reflection 108, 156, 158, 210, 218
418–440, 600, 679, 685, 687 multi-transverse-mode 502
magnetic-field modulation 687
magnetic film 178, 396, 397, 401, 679–687 nano-photonic 139
magnetic layer 395–397, 679–686 narrowband spectra 123
magnetic medium 156, 168, 171, 172, 178 natural light 107
magnetic moment 167, 170–175 near field 52, 403, 544, 584
magnetic recording 686 neutral density filter 201, 202
magnetization 153–158, 164, 166–178, 396–400, 558, non-classical light 127
562, 679, 685–687 nonlinear absorption 658
magneto-optical (MO) activity 235, 562 nonlinear coefficient 248, 654, 656
MO contribution 174, 175, 400 nonlinear medium 254, 656–662, 668
magneto-optical (MO) disk 168, 395, 396, 398, 565, nonlinear refractive index 246, 658, 664
678, 679, 687, 690 nonuniform grid 601
magneto-optical film 687 nonuniform polarization 307
magneto-optically induced polarization 156, 161 non-reciprocal 224–227, 230
MO signal 169, 174, 177–180, 395, 397, 401 normalized difference signal 639, 640
magnification 9, 12, 14, 16, 18, 20, 21, 39, 40, 64, Nomarski, George 566, 574
374, 375, 499, 501–503, 517, 521, 547, 576, 577, Nomarski microscope 566, 569, 574, 575
581, 583, 584, 587, 648 broadband Nomarski microscope 574
magnification factor 374, 375, 499, 503 null-corrector 627
magnifying glass 576, 580 nulling ellipsometer 633, 634, 637–639
marginal focus 618, 621, 622 nulling telescope 516
mask 65, 68, 70, 72, 209, 371, 373, 374, 391, 401, numerical algorithm 2, 459, 632
406, 459, 460, 465, 474, 492, 495, 506, 546, 550, numerical aperture 11, 16, 33, 45, 46, 62, 66, 272,
586, 588, 590, 595, 665 315, 379, 447, 461, 480, 517, 526, 547, 551,
material inhomogeneity 659, 663 584–588, 602, 615
Maxwell’s equations 3, 23, 25, 26, 47, 128, 136, 139, numerical error 6, 409, 414, 451, 537, 606, 608, 659, 663
149, 209, 213, 222, 234, 235, 301, 324, 325, 386, numerical method 2, 447
387, 405, 419, 420, 447, 599, 600, 606, 607,
656, 666 object beam 643, 644
metallo-dielectric interface 139, 149 objective lens 42, 62, 66, 136, 161, 201, 297, 316,
metal slab 142, 143, 149, 215 364, 388, 397, 525, 534, 546, 551, 555, 566, 569,
method of Fox and Li 2, 447–452 602, 680
698 Index
object wave 644–652 phase-amplitude object 31, 38, 645, 646
oblique incidence 19, 20, 129, 131, 133, 156, 157, phase-change 683, 687
162, 198, 199, 207, 216, 218, 219, 222, 328, phase-change (PC) disk 687
559, 611 phase-conjugate mirror (PCM) 228–232, 235, 236
odd mode 141, 142, 144, 148–150, 467, 469 phase-contrast filter 548, 549, 551
off-diagonal element 154, 167, 171, 397, 400 phase-contrast mask 548, 549
offense against the sine condition 19 phase-contrast mechanism 70, 549
oil-immersion microscopy 535 phase-contrast microscope 545, 546
oil-immersion objective 528–532 phase discontinuity 295
omni-directional dielectric mirror 274 phase-edge 589, 593
omni-directional reflector 274, 277, 285, 288 phase factor 18, 26, 30, 38, 41, 59, 96, 207, 243, 248,
optical activity 158, 197, 204, 235, 397, 554, 558, 252, 257, 366, 373, 380, 480, 657, 659, 666
559, 562, 632 phase gradient 566
optical axis 10–14, 21, 36, 46, 58, 97, 176, 201, 292, phase mask 460, 462, 464, 467, 470, 473, 474,
295, 303, 379, 398, 494, 501, 534, 547, 556, 614, 492–495, 552, 589, 558, 665
618, 657 phase object 65, 70–73, 546, 548–552, 566, 570–574
optical data storage 403, 544, 678 phase-shifter 549, 592
optical disk 138, 168, 224, 395, 397, 398, 525, 536, phase-shifting mask (PSM) 486, 488, 498
678, 679, 689 phase singularity 289, 291, 294, 296, 300, 309
optical disk drive 224, 397, 689 phase velocity 149, 243
optical filter 208, 223, 239 photodetector 78, 79, 113, 114, 126, 175, 176, 183,
optical head 685 194, 195, 398, 522, 638, 639
optical path difference (OPD) 352, 357, 513 photodetector array 194, 195
optical path-length 58, 79, 524, 548, 568 photoelastic modulator 638
optical path-length difference 79, 548 photographic plate 3, 62, 92, 264, 367, 571, 630,
optical power density 442, 666 643–652
optical recording 397, 678, 687 photolithography 1, 586, 597, 598
optical tweezers 289, 301 photomask 595
optical vortex 289, 291, 300, 309 photo-multiplication 126
optic axis of wave normals 406 photo-multiplier tube 114
optic-ray axis 406–408, 413, 415 photonic bandgap crystal 274
orthoscopic 554, 556 photonic bandgap structure 463
oscillating dipole 211, 213, 216, 218, 420 photonic crystal 288, 463
Otto configuration 142 photon momentum 258
out-of-phase soliton pair 671 photon noise 517
picosecond range 240, 248, 252
paraboloid 365, 627 pinhole 88, 93–98, 406, 413, 415, 416, 625
parallax 314 planar glass waveguide 664, 676
parallel plate 232, 233, 534, 604 Planck’s constant 258, 302
paramagnetic 154, 167 plane of best focus 227, 228, 358, 360, 362, 580, 581
paraxial approximation 14, 16, 34, 52, 53, 59, 60, 307 plane monochromatic beam 129, 198
paraxial focus 578, 579, 581, 582, 617, 618, 621, 622 plane wave 3, 17, 23, 25, 31, 35, 40, 46, 75, 100, 107,
paraxial ray 12, 578 131, 140, 159, 169, 180, 192, 211, 214, 220, 233,
paraxial ray-tracing 12 238, 255, 263, 277, 290, 302, 311, 315, 325, 337,
paraxial regime 13–17, 296, 578, 579 341, 356, 368, 372, 397, 407, 418, 421, 423, 427,
path-length difference 18, 74, 79, 93, 268, 269, 520, 432, 442, 582, 606, 608, 619, 626, 637, 644, 647,
524, 548, 624 650, 680
parabolic mirror 364, 365, 448 fully-polarized plane wave 102
partially coherent 78, 88, 127, 555 inhomogeneous plane wave 140, 149, 386
partial depolarization 107, 110, 156 monochromatic plane wave 100, 129–131, 133,
partially coherent illumination 88 135, 210, 277, 316, 418, 582
partial polarization 1, 100 plane-wave spectrum 46, 302
penetration depth 83, 215 polychromatic plane wave 107, 108
perfectly matched layer (PML) 600 spectrum of plane waves 369
period-averaged intensity 246 superposition of plane waves 7, 23, 26, 31, 35, 45,
periodic boundary condition 461 320
periodic mask 373, 374 uniform plane wave 26, 27, 49, 137
periodic stack 274, 275, 281 plano-aspheric lens 483–485
phase-amplitude information 644 plano-convex lens 10–12, 226, 227, 486–488, 527,
phase-amplitude mask 373, 459, 549, 550 624, 625
phase-amplitude modulation 17, 459 plano-cylindrical lens 496, 498, 502, 503
Index 699
plasmon excitation 128, 133, 135, 137, 138, 346, 349, pulse compression 240, 241, 246, 254, 257
393, 533 pulse train 76, 241, 242
plastic substrate 395, 534, 565, 609, 679 entrance pupil 33–41, 45, 46, 65, 264, 297, 307, 406,
p–n junction 489 414, 416, 461, 480, 499, 526, 545, 552,
Poincaré, Henri 101, 106 554, 555, 570, 573, 584, 602, 604, 628,
Poincaré sphere 100, 105, 106 680
point source 30, 63, 64, 89, 93, 114, 121, 125, 370, exit pupil 14, 32, 36, 40, 47, 66, 71, 135, 160,
505, 510, 523, 547, 552, 559, 562, 570, 579, 583, 172–179, 315, 328, 345, 363, 382, 388, 400, 413,
624–627 495, 500, 531, 540, 551, 562, 565, 579, 584,
Poisson’s bright spot 28 614–619
polarizability 212, 655 push–pull method 532, 537
polarization 1, 5, 25, 45, 48, 88, 100, 108, 130, 135,
140, 149, 154, 159, 164, 173, 179, 197, 202, 208, quadratic phase factor 38, 40, 41, 59, 243–252
217, 224, 230, 250, 263, 281, 302, 307, 329, 345, quadrilayer 178, 180, 235, 237, 396, 401, 402,
360, 385, 390, 397, 400, 407, 410, 415, 418, 427, 678–689
442, 495, 502, 516, 525, 538, 554, 557, 564, 570, quadrilayer stack 178, 235, 402, 679, 680, 685, 687
605, 632, 640, 656, 660, 679 quadrupole 425, 427, 429, 430
polarization ellipticity (see also (ellipticity)) 7, 161, quality factor 204
168, 173, 203, 389, 390, 392, 394, 399, 413–416, quantum nature of light 127, 259
557, 562, 679 quantum optics 127
polarization microscope 554, 555, 560, 561 quantum well 492
polarization rotation 155, 161, 164, 168, 171, 173, quarter-wave plate 100, 105, 175, 176, 201, 202, 224,
179, 200, 203–206, 556–558, 564 225, 229, 633, 637
polarization rotation angle 6, 7, 108, 110, 157, 161, quarter-wave stack 83, 204, 276, 278
163, 169, 173, 203, 205, 389, 390–394, 399, quartz 82, 83, 87, 567, 568, 642
413–516, 557, 562 quasi-monochromatic 29, 64, 88–96, 100, 113, 372,
polarization state 5, 7, 45, 48, 108, 140, 155, 159, 172, 505, 545, 547, 552, 555, 569, 574, 586, 615, 622
202, 225, 230, 261, 264, 268, 274, 281, 305, 324, quasi-monochromatic light source 545, 569, 622
361, 393, 410, 491, 525, 533, 554, 559, 638 quasi-monochromatic point source 88, 93, 372, 373,
polarization vector 46, 48, 108, 136, 166, 201, 505, 552
224, 409, 410, 415, 527, 540, 556, 564, 566,
656 radiated field 209, 212–214, 555
polarizer 100, 103, 108, 199, 200, 225, 554, 555, 569, radiation pressure 301, 423
633, 638, 640 radiative mode 462, 464, 466
polarizing beam splitter (PBS) 224, 225, 230, 567 random phase 92, 116, 118
polychromatic 100–110 rare-earth iron garnet 154
polychromatic beam 100, 102, 103, 110 ray-bending 50, 380
population inversion 489, 494 ray ellipsoid 407, 408
power attenuation coefficient 449–458 ray distribution 307
power content 53, 361, 448, 461–473, 494, 495, 604, ray-tracing 12, 301, 351, 477–479, 584
656, 671, 673 Rayleigh, Lord 23, 323–325, 378
power spectral density 113, 115, 124 Rayleigh anomaly 331–347
Poynting vector 2, 129, 144, 148, 205, 291, 301–309, Rayleigh criterion 298
381, 388, 431, 441, 445, 477, 584, 680, 682 Rayleigh range 58, 59, 259, 261, 291–296, 319,
primary astigmatism 617 320, 528, 656
primary coma 19, 452, 457, 619, 620 readout signal 401
primary mirror 516, 521, 627 real image 29, 579, 580, 614, 626, 645–653
principal axes 406, 407 reciprocity 2, 224–238, 275, 277, 333–340, 348
principle of conservation of energy 232, 235 reconstructed wavefront 361, 645–653
principle of superposition 81 reconstruction beam 645–652
principal refractive index 406, 408, 563, 564 reference beam 199–202, 297–299, 411, 545, 619,
principal plane(s) 10–21, 33–39 620, 624–627, 630, 643–653
prism 47, 81, 142, 150, 175, 231, 240, 248, 337, 351, reference mirror 200, 201, 297
361, 379, 387, 397, 499, 555, 560, 566, 569, 574, reflecting telescope 511
638, 659 reflection coefficient 83, 129, 141, 167, 197, 205, 216,
prism-coupling 142 221, 233, 251, 256, 275, 379, 385, 391, 400,
prism pair 240, 499, 500–502 557, 632
propagating mode 460 reflection loss 355, 476
propagating order 326 reflective diffractive optical element (DOE) 356
propagation through nonlinear medium 660–662 refraction 46, 48, 107, 180, 344, 415, 541, 556, 557
pulse broadening 245 refractive index profile 459, 546, 664
700 Index
residual phase 227, 228, 359, 364, 580 single-mode fiber 463, 476, 477, 483, 485, 486,
resist threshold 590 503
resolution 9, 37, 41, 50, 62, 65, 67, 68, 176, 270, 297, single- transverse-mode 494, 496, 502
298, 515–517, 525, 528, 531, 535, 551, 552, 559, Sirius 121
560, 570, 576, 577, 581, 584, 586, 588, 592, 601, skin depth 130, 420
602, 637, 649 slab waveguide 243, 255, 665–672
resolution of imaging system 41, 65 slow axis 496, 498
resolvability 258, 261, 272, 273 Snell’s law 217, 218, 232, 343, 344, 355, 380, 534,
resolving power 336 535, 567, 577, 604
resonance 144, 161, 204, 207, 208, 252, 338 solid immersion lens (SIL) 388, 395, 535, 602
resonant absorption 128, 131, 132, 144, 393 soliton theory 664
resonant behavior 346 spatial incoherence 64, 547, 570
resonant cavity 452 spatial filter 546–549
resonator 159–161, 251–253, 270, 351, 447–458 spatial frequency 64, 150, 320, 382, 649, 652
rest frame 316–321 spatial frequency-content 382, 649, 652
retardation 103, 105, 252, 570, 637–640 spatially coherent 88, 545, 555
retarder 102–106, 638–640 spatially incoherent 92, 94, 506, 545, 555, 559, 560,
reverse-contrast 560, 564 569–573, 586
right circularly polarized (RCP) 5, 154, 167, 204, 225, spatially incoherent point source 571, 573
229, 264, 393, 410–416 spatial optical soliton 664, 676
ring interferometer 182 spatial resolution 176, 637
ring laser 194, 195 special theory of relativity 84, 101, 310
r.m.s. wavefront aberration 228, 496, 498, 503 spectral bandwidth 74, 81, 151, 156, 351
Ronchigram 617–620 spectral broadening 85, 86, 248
Ronchi ruling 614, 615 spectral filtering 119
Ronchi test 614–623 spectral width 76, 252
rotating frame 182, 188 specular reflection 331
running fringes 190, 191, 195 spherical aberration
spherical cap 35, 39, 624–627
Sagnac, George 182, 189 spherical wavefront 17, 18, 370–374, 658
Sagnac interferometer 182–186, 190, 194, 195 split detector 176, 398, 531, 532, 537
Sagnac loop 184–188, 194 split-step beam propagation method 658
scalar theory 45, 325, 329, 531, 533 split-step technique 459, 460
scanning optical microscope 525, 526 splitter 78, 80, 126, 182, 187, 195, 224, 231, 268, 297,
Schwartz inequality 105, 111 464, 471, 516, 525, 546, 564, 569, 619, 624, 625,
secondary mirror 517 644, 645
second-harmonic 241 s-polarization 47, 107, 200, 277, 286, 329, 633
second-order coherence 113, 114, 116, 119, 127 spot size 525–543
Seidel aberration(s) 5, 617 square-shaped aperture 444, 470, 590, 594
Seidel astigmatism 482, 483 stable self-trapping 664, 676
Seidel curvature 480–487 standing-wave 187, 188, 205, 422, 429
self-focusing 654–663, 665–670 state of polarization 5, 25, 102, 106, 176, 226,
self-focusing collapse 655, 658 230, 261, 305, 389, 407–417, 526, 554, 556,
self-imaging 367, 378, 475 564, 634
self-induced phase shift 657 stationary 75, 78, 116
self-phase modulation 240, 241, 257 stationary-phase approximation 23, 33, 42, 45, 46, 48
self-trapping 654, 655, 658, 663, 664, 666, stationary point 30, 34, 43, 44
676 stationary process 75, 78
semiconductor junction 489 stellar aberration 310, 313
semiconductor laser diode 461, 489, 490, 678 Stokes, Sir George Gabriel 101, 104, 112, 235
Shack cube 625, 627 Stokes’ parameters 100, 104–110
Shack–Hartmann wavefront sensor 624, 627, 628 storage layer 534, 537, 679, 680, 686
shearing interferometry 494 straight-line fringes 625, 626
shear plate 494–497 subwavelength 139, 420, 423, 599, 602
shifter-shutter mask 589, 595 subwavelength structure 599, 602
short-range or lossy mode 149 successive iteration 448
shot noise 126, 522 sum signal 177, 532, 537, 538, 638, 639
side-rigger 590–594 superposition 23, 31, 48, 67, 75, 81, 94, 107, 124, 136,
signal-to-noise ratio 298, 517, 638 155, 233, 261, 296, 317, 320, 325, 370, 411, 413,
silica glass fiber 246, 460, 462, 476, 477, 502 427, 494, 606–608
single-mode beam 475 superposition integral 48
Index 701
super solid immersion lens (super SIL) 539–542 transverse magnetic (TM) 140, 141, 144, 149,
surface charge 149, 435 326, 328
surface current 419–442 transverse magnification 12, 14, 16, 18, 21
surface plasmon 1, 128–138, 139, 143, 149, 331, 346, triangulation 512
349, 386, 388, 533 truncated Bessel beam 32
surface plasmon excitation 128, 133, 135, 137, 138, truncated Gaussian 76, 90, 526
346, 349, 533 truncated prism 146–148
surface plasmon polariton (SPP) 138, 139, 149 tungsten lamp 546
surface relief feature 325 Twiss, Richard Q. 114, 121, 127, 513
surface relief structure 367, 546 Twyman–Green interferometry 199
switching 467, 674, 679, 686, 690
uncertainty principle 258, 261
Talbot effect 1, 367, 370, 375, 473 uniaxial birefringent crystal 407, 567
telescope 2, 9, 50, 310, 314, 511–523, 627, 630 unpolarized 35, 100, 103, 106, 107, 410, 415,
temperature distribution 682 502, 637
temperature gradient 682 up-chirp 245, 247
temporal soliton 664, 665
TEMPROFILE 2 van Cittert–Zernike theorem 88, 94, 96
test beam 624–630 van Leeuwenhoek, Antoni 576, 577, 579, 585
thermal conductivity 680, 684 van Leeuwenhoek microscope 576, 581, 582
thermal diffusion 663, 680 variable retarder 102, 103, 105, 638–640
thermal source 91, 119 Verdet constant 154
thermomagnetic recording 687 virtual image 517, 580, 582, 583, 624, 645–651
thin-film optics 209, 221 vortex structure 309
thin-film stack 236, 602, 632, 678
thin magnetic film 178, 679 waist 55–60, 289–296, 302–306, 319, 320, 490–497,
thin magnetic layer 395 655, 656
third-order nonlinearity 654 wavefront curvature 291, 294, 359, 480, 483, 488,
time average 75, 77, 78, 104, 116, 301, 302 655, 664
time-averaged intensity 77 wavefront cylinder 483
total E-field intensity 307, 681 wavefront tilt 380
total intensity distribution 528, 529, 542 waveguide 141, 243, 255, 274, 296, 346, 443,
total internal reflection (TIR) 2, 103, 129, 133, 447, 459, 464, 469, 474, 655, 664–677
142, 143, 230, 255, 343, 344, 379, 387, 388, waveguide mode 141, 349
489, 537, 674 wavelength discrimination 270
TIR mirror 230, 231 wave optics 16, 258
TIR prism 231, 380 wave packet 74–86, 91, 92
transcendental equation 141 wave-plate 105, 106, 108, 110
transfer function of propagation 53, 60 wide-aperture system 17, 580
transform-limited 246 white light 74, 554, 582, 602, 627, 643, 653
transmission axis 201, 225, 226, 555–557, 570, wire test 620–623
633, 634 Wollaston 175, 397, 555, 560, 566–568
transmission coefficient 83, 183, 197, 202, 205, Wollaston prism 175, 397, 555, 558, 560, 566–574,
210, 214, 216, 231–239, 263, 266, 270, 638–640
275, 277, 279, 280, 355, 461, 537, 632, Wood, R.W. 138, 165, 181, 208, 323, 350, 511,
646 514
transmission efficiency 432, 441, 442, 613 Wood’s anomaly 324
transmission function 59, 70, 373
transmissive DOE 356, 357, 358, 361, 363 Y-branch beam splitter 464, 466
transmitted order 233, 333, 341–346 Young’s interference fringes 88
transparent hemisphere 535
transverse effect 162, 164, 170–172 Zernike, Frederick 88, 94, 99, 545, 547, 552
transverse electric (TE) 140, 149, 326 zodiacal light 522
transverse Faraday effect 164 zeroth-order 66, 328–341, 616

Classical Optics and Its Applications

Uploaded by

Copyright:

Available Formats

Classical Optics and Its Applications

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classical Optics and Its Applications

Uploaded by

Copyright:

Available Formats

CLASSICAL OPTICS AND ITS APPLICATIONS

Covering a broad range of fundamental topics in classical optics and electro-

M asud M ansuripur received a Bachelor of Science degree in Electrical

This publication is in copyright. Subject to statutory exception

First edition published 2002

Printed in the United Kingdom at the University Press, Cambridge

ISBN 978-0-521-88169-2 hardback

Cambridge University Press has no responsibility for the persistence or

Preface to the second English edition page ix

Plots of intensity distribution

Logarithmic plots of intensity distribution

Figure 0.1 Plots of intensity distribution in the cross-sectional plane of a

Plots of phase distribution

these phase plots would be of no physical significance, since it merely indicates a

Figure 0.3 Plots of phase distribution in the cross-sectional plane of the

Figure 0.5 Distributions of intensity and polarization in the cross-section of a

References for the Introduction

Ernst Abbe (1840–1905), professor of physics and mathematics and director

Ernst Abbe (1840–1905), professor of physics and mathematics at the University of

condition is satisfied, “aberration-free” imaging of the object points located in

A lens that violates the sine condition

Figure 1.1 A plano-convex lens brings a collimated beam to perfect focus on

(x0, y0) Image

First p.p. Second p.p.

The sine condition

First p.p Second p.p

First p.p. Second p.p.

The wave-optical viewpoint

Here k is the wavelength of the light, W(x, y) represents wavefront aberrations,

Wavefront perturbation due to off-axis shift of the object point

Offense against the sine condition

To a first approximation, therefore, the difference between sin h and tan h

Figure 1.7 (a) Distribution of phase at the first p.p. of an infinite-conjugate

Figure 1.8 An aplanatic meniscus lens brings collimated beams to diffraction-

The image of a diffraction grating

References for Chapter 1

Siméon Denis Poisson

George Biddell Airy Gustav Robert Kirchhoff

Siméon Denis Poisson (1781–1840). In 1818, during the judging of Fresnel’s

Sir George Biddell Airy (1801–1892), became Lucasian Professor of Math-

Electromagnetic plane waves

Here k is the wavelength of the light, A0 is a complex vector representing the

rz ¼ ð1 r2x r2y Þ1=2 : ð2:1bÞ

Expansion into plane waves

Gustav Robert Kirchhoff (1824–1887), Professor of physics at Heidelberg,

The inverse Fourier transform may therefore be written

Equation (2.3) is the fundamental formula of the classical theory of diffraction. It

Diffraction from a circular aperture

–4000 x 4000 –4000 x 4000 –4000 x 4000

Figure 2.1 Computed intensity patterns at various distances from a circular

factor, the distribution at z ¼ z0 becomes equal to that at z ¼ 0. This occurs if the

Poisson’s bright spot

Figure 2.3 Computed intensity patterns at various distances from an opaque

four small obstacles. The object is back-illuminated incoherently, by an

–1200 x 1200 –2400 x 2400

Distribution of light in the far field

Figure 2.5 A phase/amplitude object is illuminated by a plane wave propa-

Far field of an annular aperture

–1500 x 1500 –8000 x 8000 –8000 x 8000

–8000 x 8000 –8000 x 8000 –8000 x 8000

2 ðx1 ; rx1 Þ ¼ 2p ðx x1 Þ=rx1 þ ðx x2 Þ2 þ ðz22 x2 Þ ðz1 þ z2 Þ : ð2:16Þ

Noting that x2 ¼ Mx1, 2 may also be considered a function of x2 and rx1.

If z21, this quadratic phase can be ignored, yielding a plane-wave output for a

gðx; yÞ gðx0 ; y0 Þ þ 12 gxx ðx0 ; y0 Þðx x0 Þ2

w ¼ 12 tan1 ða þ cÞz0 =½1 ðac b2 Þz20 ; ð4:4Þ

tðx; yÞ ¼ exp½ipðpx2 þ 2qxy þ ry2 Þ: ð4:5Þ