Micro Sound
Micro Sound
Curtis Roads
Microsound
Contents
Introduction
vii
Acknowledgments
Overview
ix
xi
Granular Synthesis
119
Transformation of Microsound
179
Microsound in Composition
Conclusion
References
85
301
349
353
Appendixes
A
Name Index
Subject Index
399
403
235
383
389
325
43
Introduction
Beneath the level of the note lies the realm of microsound, of sound particles. Microsonic particles remained invisible for centuries. Recent technological
advances let us probe and explore the beauties of this formerly unseen world.
Microsonic techniques dissolve the rigid bricks of music architecturethe notes
into a more uid and supple medium. Sounds may coalesce, evaporate, or
mutate into other sounds.
The sensations of point, pulse (regular series of points), line (tone), and surface (texture) appear as the density of particles increases. Sparse emissions leave
rhythmic traces. When the particles line up in rapid succession, they induce the
illusion of tone continuity that we call pitch. As the particles meander, they
ow into streams and rivulets. Dense agglomerations of particles form swirling
sound clouds whose shapes evolve over time.
In the 1940s, the Nobel prize winning physicist Dennis Gabor proposed that
any sound could be decomposed into acoustical quanta bounded by discrete
units of time and frequency. This quantum representation formed the famous
Gabor matrix. Like a sonogram, the vertical dimension of the Gabor matrix
indicated the location of the frequency energy, while the horizontal dimension
indicated the time region in which this energy occurred. In a related project,
Gabor built a machine to granulate sound into particles. This machine could
alter the duration of a sound without shifting its pitch.
In these two projects, the matrix and the granulator, Gabor accounted for
both important domains of sound representation. The matrix was the original
windowed frequency-domain representation. ``Windowed'' means segmented in
time, and ``frequency-domain'' refers to spectrum. The granulation machine, on
the other hand, operated on a time-domain representation, which is familiar to
anyone who has seen waveforms in a sound editor. This book explores microsound from both perspectives: the windowed frequency-domain and the micro
viii
Introduction
time-domain. Both concern microacoustic phenomena lasting less than onetenth of a second.
This book is the fruit of a lengthy period of activity involving synthesis
experiments, programming, and composition dating back to the early 1970s.
I started writing the text in 1995, after completing my textbook The Computer
Music Tutorial (The MIT Press 1996). Beginning with a few strands, it eventually grew into a lattice of composition theory, historical accounts, technical
overviews, acoustical experiments, descriptions of musical works, and aesthetic
reections. Why such a broad approach? Because at this stage of development,
the musical, technical, and aesthetic problems interweave. We are inventing
particles at the same time that we are learning how to compose with them. In
numerous ``assessment'' sections I have tried to summarize the results, which in
some cases are merely preliminary. More experimentation is surely needed.
Microsound records this rst round of experimentation, and thus serves as a
diary of research. Certain details, such as the specic software and hardware
that I used, will no doubt become obsolete rapidly. Even so, I decided to leave
them in for the historical record.
The experimentation and documentation could go on indenitely. One could
imagine, for example, a kind of synthesis ``cookbook'' after the excellent example of Jean-Claude Risset (1969). His text provided detailed recipes for making
specic sounds from a variety of synthesis techniques. This would be a worthy
project, and I would encourage others in this direction. As for myself, it is time
to compose.
Acknowledgments
This book derives from a doctoral thesis written for the Universite de Paris VIII
(Roads 1999). It would never have started without strong encouragement from
Professor Horacio Vaggione. I am deeply indebted to him for his patient advocacy, as well as for his inspired writings and pieces.
The congenial atmosphere in the Departement Musique at the Universite
de Paris VIII was ideal for the gestation of this work. I would also like to extend my sincere appreciation to Jean-Claude Risset and Daniel Arb. Despite
much pressure on their time, these pioneers and experts kindly agreed to
serve on the doctoral committee. Their commentaries on my text resulted in
major improvements.
I owe a debt of thanks to my colleague Gerard Pape at the Centre de
Creation Musicale Iannis Xenakis (CCMIX) for his support of my research,
teaching, and composition. I must also convey appreciation to Iannis Xenakis
for his brilliant example and for his support of our work in Paris. My rst
contact with him, at his short course in Formalized Music in 1972, started me
on this path.
I completed this book while teaching in the Center for Research in Electronic
Art Technology (CREATE) in the Department of Music and in the Media
Arts and Technology Program at the University of California, Santa Barbara.
I greatly appreciate the friendship and support of Professor JoAnn KucheraMorin, Director of CREATE, during this productive period. I would also like
to extend my thanks to the rest of the CREATE team, including Stephen T.
Pope for his collaboration on pulsar synthesis in 1997. It was a great pleasure to
work with Alberto de Campo, who served as CREATE's Research Director
in 19992000. Together we developed the PulsarGenerator software and the
Creatovox synthesizer. I consider these engaging musical instruments to be
among the main accomplishments of this research.
Acknowledgments
Overview
xii
Overview
The nal sections present techniques of spatialization with sound particles, and
convolution with microsounds.
Chapter 6 explores a variety of sound transformations based on windowed
spectrum analysis. After a theoretical section, it presents the main tools of windowed spectrum transformation, including the phase vocoder, the tracking
phase vocoder, the wavelet transform, and Gabor analysis.
Chapter 7 turns from technology to compositional applications. It begins
with a description of the rst studies realized with granular synthesis on a digital computer. It then looks at particle techniques in my recent compositions, as
well as those by Barry Truax, Horacio Vaggione, and other composers.
Chapter 8, on the aesthetics of composing with microsound, is the most
philosophical part of the book. It highlights both specic and general aesthetic
issues raised by microsound in composition.
Chapter 9 concludes with a commentary on the future of microsound in
music.
Microsound
Chapter 1
makes the computer an ideal testbed for the representation of musical structure
on multiple time scales.
This chapter examines the time scales of music. Our main focus is the micro
time scale and its interactions with other time scales. By including extreme time
scalesthe innite and the innitesimalwe situate musical time within the
broadest possible context.
Chapter 1
6. Micro Sound particles on a time scale that extends down to the threshold of auditory perception (measured in thousandths of a second or milliseconds).
7. Sample The atomic level of digital audio systems: individual binary samples or numerical amplitude values, one following another at a xed time
interval. The period between samples is measured in millionths of a second
(microseconds).
8. Subsample Fluctuations on a time scale too brief to be properly recorded
or perceived, measured in billionths of a second (nanoseconds) or less.
9. Innitesimal The ideal time span of mathematical durations such as the
innitely brief delta functions.
Figure 1.1 portrays the nine time scales of the time domain. Notice in the
middle of the diagram, in the frequency column, a line indicating ``Conscious
time, the present (@600 ms).'' This line marks o Winckel's (1967) estimate of
the ``thickness of the present.'' The thickness extends to the line at the right
indicating the physical NOW. This temporal interval constitutes an estimate
of the accumulated lag time of the perceptual and cognitive mechanisms associated with hearing. Here is but one example of a disparity between chronos
physical time, and tempusperceived time (Kupper 2000).
The rest of this chapter explains the characteristics of each time scale in turn.
We will, of course, pay particular attention to the micro time scale.
Figure 1.1 The time domain, segmented into periods, time delay eects, frequencies,
and perception and action. Note that time intervals are not drawn to scale.
Chapter 1
Digital audio systems, such as compact disc players, operate at a xed sampling frequency. This makes it easy to distinguish the exact boundary separating the sample time scale from the subsample time scale. This boundary is the
Nyquist frequency, or the sampling frequency divided by two. The eect of
crossing this boundary is not always perceptible. In noisy sounds, aliased frequencies from the subsample time domain may mix unobtrusively with high
frequencies in the sample time domain.
The border between certain other time scales is context-dependent. Between
the sample and micro time scales, for example, is a region of transient events
too brief to evoke a sense of pitch but rich in timbral content. Between the
micro and the object time scales is a stratum of brief events such as short staccato notes. Another zone of ambiguity is the border between the sound object
and meso levels, exemplied by an evolving texture. A texture might contain a
statistical distribution of micro events that are perceived as a unitary yet timevarying sound.
Time scales interlink. A given level encapsulates events on lower levels and is
itself subsumed within higher time scales. Hence to operate on one level is to
aect other levels. The interaction between time scales is not, however, a simple
relation. Linear changes on a given time scale do not guarantee a perceptible
eect on neighboring time scales.
(Gabor 1952)
Chapter 1
Figure 1.2 Zones of intensities and frequencies. Only the zone marked a is audible to
the ear. This zone constitutes a tiny portion of the range of sound phenomena.
ui u1 u 2 u 3
i1
This equation sums a set of numbers ui , where the index i goes from 1 to y.
What if each number ui corresponded to a tick of a clock? This series would
then dene an innite duration. This ideal is not so far removed from music as
it may seem. The idea of innite duration is implicit in the theory of Fourier
analysis, which links the notion of frequency to sine waves of innite duration.
As chapter 6 shows, Fourier analysis has proven to be a useful tool in the analysis and transformation of musical sound.
Figure 1.3
10
Chapter 1
a single composition may change several times within a century. The entire
history of music transpires within the supratemporal scale, starting from the
earliest known musical instrument, a Neanderthal ute dating back some
45,000 years (Whitehouse 1999).
Composition is itself a supratemporal activity. Its results last only a fraction
of the time required for its creation. A composer may spend a year to complete
a ten-minute piece. Even if the composer does not work every hour of every
day, the ratio of 52,560 minutes passed for every 1 minute composed is still
signicant. What happens in this time? Certain composers design a complex
strategy as prelude to the realization of a piece. The electronic music composer
may spend considerable time in creating the sound materials of the work. Either
of these tasks may entail the development of software. Virtually all composers
spend time experimenting, playing with material in dierent combinations.
Some of these experiments may result in fragments that are edited or discarded,
to be replaced with new fragments. Thus it is inevitable that composers invest
time pursuing dead ends, composing fragments that no one else will hear. This
backtracking is not necessarily time wasted; it is part of an important feedback
loop in which the composer renes the work. Finally we should mention documentation. While only a few composers document their labor, these documents
may be valuable to those seeking a deeper understanding of a work and the
compositional process that created it. Compare all this with the eciency of the
real-time improviser!
Some music spans beyond the lifetime of the individual who composed it,
through published notation, recordings, and pedagogy. Yet the temporal reach
of music is limited. Many compositions are performed only once. Scores, tapes,
and discs disappear into storage, to be discarded sooner or later. Music-making
presumably has always been part of the experience of Homo sapiens, who it is
speculated came into being some 200,000 years ago. Few traces remain of
anything musical older than a dozen centuries. Modern electronic instruments
and recording media, too, are ephemeral. Will human musical vibrations somehow outlast the species that created them? Perhaps the last trace of human
existence will be radio waves beamed into space, traveling vast distances before
they dissolve into noise.
The upper boundary of time, as the concept is currently understood, is the
age of the physical universe. Some scientists estimate it to be approximately
fteen billion years (Lederman and Scramm 1995). Cosmologists continue to
debate how long the universe may expand. The latest scientic theories continue to twist the notion of time itself (see, for example, Kaku 1995; ArkaniHamed et al. 2000).
11
12
Chapter 1
13
This is not to say that the use of preconceived forms has died away. The
practice of top-down planning remains common in contemporary composition.
Many composers predetermine the macrostructure of their pieces according to
a more-or-less formal scheme before a single sound is composed.
By contrast, a strict bottom-up approach conceives of form as the result of a
process of internal development provoked by interactions on lower levels of
musical structure. This approach was articulated by Edgard Varese (1971), who
said, ``Form is a resultthe result of a process.'' In this view, macrostructure
articulates processes of attraction and repulsion (for example, in the rhythmic
and harmonic domains) unfolding on lower levels of structure.
Manuals on traditional composition oer myriad ways to project low-level
structures into macrostructure:
Smaller forms may be expanded by means of external repetitions, sequences, extensions,
liquidations and broadening of connectives. The number of parts may be increased by supplying codettas, episodes, etc. In such situations, derivatives of the basic motive are formulated into new thematic units. (Schoenberg 1967)
14
Chapter 1
15
A trend toward shaping music through the global attributes of a sound mass
began in the 1950s. One type of sound mass is a cluster of sustained frequencies
that fuse into a solid block. In a certain style of sound mass composition,
musical development unfolds as individual lines are added to or removed from
this cluster. Gyorgy Ligeti's Volumina for organ (1962) is a masterpiece of this
style, and the composer has explored this approach in a number of other pieces,
including Atmospheres (1961) and Lux Aeterna (1966).
Particles make possible another type of sound mass: statistical clouds of
microevents (Xenakis 1960). Wishart (1994) ascribed two properties to cloud
textures. As with sequences, their eld is the set of elements used in the texture,
which may be constant or evolving. Their second property is density, which
stipulates the number of events within a given time period, from sparse scatterings to dense scintillations.
Cloud textures suggest a dierent approach to musical organization. In
contrast to the combinatorial sequences of traditional meso structure, clouds
encourage a process of statistical evolution. Within this evolution the composer can impose specic morphologies. Cloud evolutions can take place in the
domain of amplitude (crescendi/decrescendi), internal tempo (accelerando/
rallentando), density (increasing/decreasing), harmonicity (pitch/chord/cluster/
noise, etc.), and spectrum (high/mid/low, etc.).
Xenakis's tape compositions Concret PH (1958), Bohor I (1962), and Persepolis (1971) feature dense, monolithic clouds, as do many of his works for
traditional instruments. Stockhausen (1957) used statistical form-criteria as one
component of his early composition technique. Since the 1960s, particle
textures have appeared in numerous electroacoustic compositions, such as the
remarkable De natura sonorum (1975) of Bernard Parmegiani.
Varese spoke of the interpenetration of sound masses. The diaphanous nature of cloud structures makes this possible. A crossfade between two clouds
results in a smooth mutation. Mesostructural processes such as disintegration
and coalescence can be realized through manipulations of particle density (see
chapter 6). Density determines the transparency of the material. An increase in
16
Chapter 1
density lifts a cloud into the foreground, while a decrease causes evaporation,
dissolving a continuous sound band into a pointillist rhythm or vaporous background texture.
Cloud Taxonomy
To describe sound clouds precisely, we might refer to the taxonomy of cloud
shapes in the atmosphere:
Cumulus
Stratocumulus
Stratus
Nimbostratus
Cirrus
In another realm, among the stars, outer space is lled with swirling clouds of
cosmic raw material called nebulae.
The cosmos, like the sky on a turbulent summer day, is lled with clouds of dierent sizes,
shapes, structures, and distances. Some are swelling cumulus, others light, wispy cirrusall
of them constantly changing colliding, forming, and evaporating. (Kaler 1997)
17
vocalist. The concept of sound object extends this to allow any sound, from
any source. The term sound object comes from Pierre Schaeer, the pioneer of
musique concrete. To him, the pure objet sonore was a sound whose origin a
listener could not identify (Schaeer 1959, 1977, p. 95). We take a broader view
here. Any sound within stipulated temporal limits is a sound object. Xenakis
(1989) referred to this as the ``ministructural'' time scale.
The Sensation of Tone
The sensation of tonea sustained or continuous event of denite or indenite
pitchoccurs on the sound object time scale. The low-frequency boundary for
the sensation of a continuous soundas opposed to a uttering succession of
brief microsoundshas been estimated at anywhere from 8 Hz (Savart) to
about 30 Hz. (As reference, the deepest sound in a typical orchestra is the open
E of the contrabass at 41.25 Hz.) Helmholtz, the nineteenth century German
acoustician, investigated this lower boundary.
In the rst place it is necessary that the strength of the vibrations of the air for very low
tones should be extremely greater than for high tones. The increase in strength . . . is of
especial consequence in the deepest tones. . . . To discover the limit of the deepest tones it is
necessary not only to produce very violent agitations in the air but to give these a simple
pendular motion. (Helmholtz 1885)
18
Chapter 1
19
20
Chapter 1
We can subdivide a sound object not only by its properties but also by its
temporal states. These states are composable using synthesis tools that operate
on the microtime scale. The micro states of a sound can also be decomposed
and rearranged with tools such as time granulators and analysis-resynthesis
software.
Sound Object Morphology
In music, as in other elds, the organization is conditioned by the material.
1977, p. 680)
(Schaeer
The desire to understand the enormous range of possible sound objects led
Pierre Schaeer to attempt to classify them, beginning in the early 1950s
(Schaeer and Moles 1952). Book V of his Traite des objets musicaux (1977),
entitled Morphologie and typologie des objets sonores introduces the useful notion of sound object morphologythe comparison of the shape and evolution
of sound objects. Schaeer borrowed the term morphology from the sciences,
where it refers to the study of form and structure (of organisms in biology, of
word-elements in linguistics, of rocks in geology, etc.). Schaeer diagrammed
sound shape in three dimensions: the harmonic (spectrum), dynamic (amplitude), and melodic (pitch). He observed that the elements making up a complex sound can be perceived as either merged to form a sound compound, or
remaining separate to form a sound mixture. His typology, or classication
of sound objects into dierent groups, was based on acoustic morphological
studies.
The idea of sound morphology remains central to the theory of electroacoustic music (Bayle 1993), in which the musical spotlight is often shone on
the sound object level. In traditional composition, transitions function on the
mesostructural level through the interplay of notes. In electroacoustic music,
the morphology of an individual sound may play a structural role, and transitions can occur within an individual sound object. This ubiquity of mutation
means that every sonic event is itself a potential transformation.
21
22
Chapter 1
One cannot speak of a single time frame, or a time constant for the auditory
system (Gordon 1996). Our hearing mechanisms involve many dierent agents,
each of which operates on its own time scale (see gure 1.1). The brain integrates signals sent by various hearing agents into a coherent auditory picture.
Ear-brain mechanisms process high and low frequencies dierently. Keeping
high frequencies constant, while inducing phase shifts in lower frequencies,
causes listeners to hear a dierent timbre.
Determining the temporal limits of perception has long engaged psychoacousticians (Doughty and Garner 1947; Buser and Imbert 1992; Meyer-Eppler
1959; Winckel 1967; Whiteld 1978). The pioneer of sound quanta, Dennis
Gabor, suggested that at least two mechanisms are at work in microevent detection: one that isolates events, and another that ascertains their pitch. Human
beings need time to process audio signals. Our hearing mechanisms impose
minimum time thresholds in order to establish a rm sense of the identity and
properties of a microevent.
In their important book Audition (1992), Buser and Imbert summarize a large
number of experiments with transitory audio phenomena. The general result
from these experiments is that below 200 ms, many aspects of auditory perception change character and dierent modes of hearing come into play. The
next sections discuss microtemporal perception.
Microtemporal Intensity Perception
In the zone of low amplitude, short sounds must be greater in intensity than
longer sounds to be perceptible. This increase is about 20 dB for tone pips
of 1 ms over those of 100 ms duration. (A tone pip is a sinusoidal burst with
a quasi-rectangular envelope.) In general, subjective loudness diminishes with
shrinking durations below 200 ms.
Microtemporal Fusion and Fission
In dense portions of the Milky Way, stellar images appear to overlap, giving the eect of a
near-continuous sheet of light . . . The eect is a grand illusion. In reality . . . the nightime
sky is remarkably empty. Of the volume of space only 1 part in 10 21 [one part in a quintillion] is lled with stars. (Kaler 1997)
Circuitry can measure time and recognize pulse patterns at tempi in the range
of a gigahertz. Human hearing is more limited. If one impulse follows less than
200 ms after another, the onset of the rst impulse will tend to mask the second,
23
24
Chapter 1
100
45
500
26
1000
14
5000
18
Doughty and Garner (1947) divided the mechanism of pitch perception into
two regions. Above about 1 kHz, they estimated, a tone must last at least 10 ms
to be heard as pitched. Below 1 kHz, at least two to three cycles of the tone are
needed.
Microtemporal Auditory Acuity
We feel impelled to ascribe a temporal arrangement to our experiences. If b is later than a
and g is later than b, then g is also later than a. At rst sight it appears obvious to assume
that a temporal arrangement of events exists which agrees with the temporal arrangement
of experiences. This was done unconsciously until skeptical doubts made themselves felt.
For example, the order of experiences in time obtained by acoustical means can dier from
the temporal order gained visually . . . (Einstein 1952)
Green (1971) suggested that temporal auditory acuity (the ability of the ear to
detect discrete events and to discern their order) extends down to durations as
short as 1 ms. Listeners hear microevents that are less than about 2 ms in duration as a click, but we can still change the waveform and frequency of these
events to vary the timbre of the click. Even shorter events (in the range of
microseconds) can be distinguished on the basis of amplitude, timbre, and spatial position.
Microtemporal Preattentive Perception
When a person glimpses the face of a famous actor, snis a favorite food, or hears the voice
of a friend, recognition is instant. Within a fraction of a second after the eyes, nose, ears,
tongue or skin is stimulated, one knows the object is familiar and whether it is desirable or
dangerous. How does such recognition, which psychologists call preattentive perception,
happen so accurately and quickly, even when the stimuli are complex and the context in
which they arise varies? (Freeman 1991)
25
26
Chapter 1
27
Figure 1.4 Viewing the micro time scale via zooming. The top picture is the waveform
of a sonic gesture constructed from sound particles. It lasts 13.05 seconds. The middle
image is a result of zooming in to a part of the top waveform (indicated by the dotted
lines) lasting 1.5 seconds. The bottom image is a microtemporal portrait of a 10 millisecond fragment at the beginning of the top waveform (indicated by the dotted lines).
28
Chapter 1
struction? In certain sounds, such as the taps of a slow drum roll, the individual
particles are directly perceivable. In other sounds, we can prove the existence of
a granular layer through logical argument.
Consider the whole number 5. This quantity may be seen as a sum of subquantities, for example 1 1 1 1 1, or 2 3, or 4 1, and so on. If we
take away one of the subquantities, the sum no longer is 5. Similarly, a continuous tone may be considered as a sum of subquantitiesas a sequence of overlapping grains. The grains may be of arbitrary sizes. If we remove any grain,
the signal is no longer the same. So clearly the grains exist, and we need all of
them in order to constitute a complex signal. This argument can be extended
to explain the decomposition of a sound into any one of an innite collection
of orthogonal functions, such as wavelets with dierent basis functions, Walsh
functions, Gabor grains, and so on.
This logic, though, becomes tenuous if it is used to posit the preexistence (in
an ideal Platonic realm) of all possible decompositions within a whole. For example, do the slices of a cake preexist, waiting to be articulated? The philosophy of mathematics is littered with such questions (Castonguay 1972, 1973).
Fortunately it is not our task here to try to assay their signicance.
Heterogeneity in Sound Particles
The concept of heterogeneity or diversity of sound materials, which we have
already discussed in the context of the sound object time scale, also applies to
other time scales. Many techniques that we use to generate sound particles assign to each particle a unique identity, a precise frequency, waveform, duration,
amplitude morphology, and spatial position, which then distinguishes it from
every other particle. Just as certain sound objects may function as singularities,
so may certain sound particles.
29
Figure 1.5 Sample points in a digital waveform. Here are 191 points spanning a 4.22 ms
time interval. The sampling rate is 44.1 kHz.
The atom of the sample time scale is the unit impulse, the discrete-time counterpart of the continuous-time Dirac delta function. All samples should be considered as time-and-amplitude-transposed (delayed and scaled) instances of
the unit impulse.
The interval of one sample period borders near the edge of human audio
perception. With a good audio system one can detect the presence of an individual high-amplitude sample inserted into a silent stream of zero-valued samples. Like a single pixel on a computer screen, an individual sample oers little.
Its amplitude and spatial position can be discerned, but it transmits no sense of
timbre and pitch. Only when chained into sequences of hundreds do samples
oat up to the threshold of timbral signicance. And still longer sequences of
thousands of samples are required to represent pitched tones.
Sound Composition with Individual Sample Points
Users of digital audio systems rarely attempt to deal with individual sample
points, which, indeed, only a few programs for sound composition manipulate
directly. Two of these are G. M. Koenig's Sound Synthesis Program (SSP) and
30
Chapter 1
Herbert Brun's Sawdust program, both developed in the late 1970s. Koenig and
Brun emerged from the Cologne school of serial composition, in which the interplay between macro- and microtime was a central aesthetic theme (Stockhausen 1957; Koenig 1959; Maconie 1989). Brun wrote:
For some time now it has become possible to use a combination of analog and digital
computers and converters for the analysis and synthesis of sound. As such a system will
store or transmit information at the rate of 40,000 samples per second, even the most
complex waveforms in the audio-frequency range can be scanned and registered or be
recorded on audio tape. This . . . allows, at last, the composition of timbre, instead of with
timbre. In a sense, one may call it a continuation of much which has been done in the electronic music studio, only on a dierent scale. The composer has the possibility of extending
his compositional control down to elements of sound lasting only 1/20,000 of a second.
(Brun 1970)
31
32
Chapter 1
33
34
Chapter 1
35
Laser-induced phonic sound focuses the beams from two lasers with a small
wavelength dierence onto a crystal surface. The dierence in wavelength
causes interference, or beating. The crystal surface shrinks and expands as
this oscillation of intensity causes periodic heating. This generates a wave that
propagates through the medium. The frequency of this sound is typically in the
gigahertz range, with a wavelength of the order of 1 micron. Because of the
small dimensions of the heated spot on the surface, the wave in the crystal has
the shape of a directional beam. These sound beams can be used as probes, for
example, to determine the internal features of semiconductor crystals, and to
detect faults in their structure.
One of the most important properties of laser-induced phononic sound is that
it can be made coherent (the wave trains are phase-aligned), as well as monochromatic and directional. This makes possible such applications as acoustic
holography (the visualization of acoustic phenomena by laser light). Today the
study of phononic vibrations is an active eld, nding applications in surface
acoustic wave (SAW) lters, waveguides, and condensed matter physics.
At the Physical Limits: The Planck Time Interval
Sound objects can be subdivided into grains, and grains into samples. How far
can this subdivision of time continue? Hawking and Penrose (1996) have suggested that time in the physical universe is not innitely divisible. Specically,
that no signal uctuation can be faster than the quantum changes of state in
subatomic particles, which occur at close to the Planck scale. The Planck scale
stands at the extreme limit of the known physical world, where current concepts
of space, time, and matter break down, where the four forces unify. It is the
exceedingly small distance, related to an innitesimal time span and extremely
high energy, that emerges when the fundamental constants for gravitational
attraction, the velocity of light, and quantum mechanics join (Hawking and
Penrose 1996).
How much time does it take light to cross the Planck scale? Light takes about
3.3 nanoseconds (3:3 1010 ) to traverse 1 meter. The Planck time interval is
the time it takes light to traverse the Planck scale. Up until recently, the Planck
scale was thought to be 1033 meter. An important new theory puts the gure at
a much larger 1019 meter (Arkani-Hamed et al. 2000). Here, the Planck time
interval is 3:3 1028 seconds, a tiny time interval. One could call the Plank
time interval a kind of ``sampling rate of the universe,'' since no signal uctuation can occur in less than the Planck interval.
36
Chapter 1
37
Figure 1.6 Comparison of a pulse and the Dirac delta function. (a) A narrow pulse of
height 1=b and width b, centered on t 0. (b) The Dirac delta function.
38
Chapter 1
39
40
Chapter 1
Summary
Particle physics seeks to nd a simple and orderly pattern to the behavior of matter on the
atomic and subatomic level. To this end, large particle accelerators are built, acting like
giant microscopes that zoom down through the atom . . . Astronomers build equally complex devicestelescopes and observatories. These gather data from distant clusters of
galaxies, all the way out to the rim of the cosmos . . . We are seeing here a convergence
between particle physics and cosmology. The instruments, and even the stated objectives,
are dierent, but the languages draw closer. The laws of nature that control and order the
microscopic world, and those that determined the creation and evolution of the universe, . . .
are beginning to look identical. (Lederman and Schramm 1995)
Projecting time horizontally, and amplitude vertically, the concept of nil duration corresponds to a zero-dimensional point on the time-amplitude plane. This
point zero is mute: no ux of energy can occur in the absence of a time window.
In that ideal world experienced only by the gods of mathematics, the delta
function qt breaks the monotony with an instantaneous impulse that is born
and dies within the most innitesimal window beyond point zero.
Our mundane digital domain is a discrete approximation to the ideal realm
of innitesimal time. In the digital domain, the smallest event has a duration
equivalent to the period of the sampling frequency. This sound atom, the sample period, is the grid that quantizes all time values in an audio signal. Any
curve inscribed on the amplitude-versus-time plane must synchronize to this
grid. Individual samples remain subsymbolic. Like the woven threads of canvas
holding paint in place, their presence is a necessity, even if we can see them only
in the aggregate.
As the window of time expands, there is a possibility for chaotic uctuation,
periodic repetition, echoes, tone, noise, and measured silence. Each additional
instant of time accrues new possibilities.
Microsonic particles can be likened to molecules built from atomic samples.
To view this level of detail, we rely on the tools of sound analysis and display.
Under this scrutiny, remarkable patterns emerge and we gain new insight into
sound structure. These images show the hidden morphologies of elementary
sound molecules (gure 1.7).
Molecular materials alter the terrain of composition. Pliant globules can be
molded into arbitrary object morphologies. The presence of mutating sound
objects suggests a uid approach to compositional mesostructure, spawning
rivulets, streams, and clouds as well as discrete events. The package for all these
41
Figure 1.7 Image of a grain in the time-domain (top) and its frequency-domain counterpart (bottom).
musical structures, the macroform, can be tailored with high exibility and
precision in a sound mixing program.
It is necessary to see music over a broad range of time scales, from the innitesimal to the supra scale (Christensen 1996). Not all musicians are prepared
to view musical time from such a comprehensive perspective, however, and
it may well take decades for this perspective to lter into our general musical
vocabulary.
44
Chapter 2
The evolution of sound synthesis has always been interwoven with the engines of acoustic emission, be they mechanoacoustic, electromechanical, electrooptical, analog electronic, or digital. The current state of music technology has
been arrived at through decades of laboratory experimentation. If we are to
benet from this legacy, we must revisit the past and recover as much knowledge as we can.
Table 2.1 lists electric and electronic music instruments developed in the period 18991950. The rst column names each instrument. The second column
shows the date of their rst public demonstration (rather than the date of
their conception). Before 1950, almost all instruments were designed for live
performance. After 1950, the technology of recording changed the nature of
electronic music, ushering in the era of the tape-based electronic music studio.
Electronic instruments invented before 1950 represented a wave-oriented approach to synthesis, as opposed to a particle-oriented approach. Gabor's experiments in the late 1940s signaled the beginning of a new era in synthesis.
This chapter explores the ancient philosophical debate between waves and
particles. It then presents the modern history of microsound synthesis, continuing through the era of analog electronics. Chapter 7 continues this story by
recounting the history of early experiments in microsound synthesis by digital
computer.
45
Instrument
Date of
demonstration
Inventor
Notes
Early electric keyboard instrument
Electromagnetic instrument
Singing Arc
Choralcello Electric
Organ
1899
1903
Telharmonium
Audio oscillator and
Audion Piano
Synthetic Tone Musical
Instrument
Thereminovox
1906
1915
W. Duddell
Farrington,
C. Donahue, and
A. Homan
T. Cahill
L. De Forest
1918
S. Cabot
1920
L. Theremin
Electrophon
Staccatone
1921
1923
J. Mager
H. Gernsback
Sphaerophon
Electronic Harmonium
1926
1926
Pianorad
Violen
1926
c. 1926
Light Siren
Illuminovox
SuperPiano
Electric guitar
prototype
Electronic Violin
c. 1926
1926
1927
1927
J. Mager
L. Theremin and
?. Rzhevkin
H. Gernsback
W. Gurov and
?. Volynken
Kovakenko
L. Theremin
E. Spielmann
Les Paul
1927
E. Zitzmann-Zirini
1928
J. Bethenod
1928
1928
1929
Crea-tone
1930
M. Martenot
R. Bertrand
B. Helberger and
P. Lertes
S. Cooper
Givelet-Coupleaux
organ
1930
J. Givelet and
E. Coupleaux
46
Chapter 2
Table 2.1 (continued)
Instrument
Date of
demonstration
Inventor
Notes
Trautonium
1930
F. Trautwein
Magnetoelectric organ
Westinghouse organ
1930
1930
R. H. Ranger
R. Hitchcock
Ondium Pechadre
1930
Hardy-Goldwaithe
organ
Neo-Bechstein piano
1930
A. Hardy and
S. Brown
W. Nernst
Radiopiano
Trillion-tone Organ
1931
1931
Radiotone
1931
Rangertone Organ
Emicon
1931
1932
Gnome
Miessner Electronic
Piano
Rhythmicon
1932
1932
Mellertion
Electronde
1933
1933
Cellulophone
Elektroakustische Orgel
1933
1934
La Croix Sonore
Ethonium
1934
1934
H. Cowell,
L. Theremin,
B. Miessner
?
L. or M.
Taubman
P. Toulon
O. Vierling and
Kock
N. Oboukhov
G. Blake
Keyboard Theremin
1934
L. Theremin
Loar Vivatone
1934
L. Loar
1931
1932
Hiller
A. Lesti and
F. Sammis
Boreau
R. Ranger
N. Langer and
Hahnagyi
I. Eremeef
B. F. Miessner
47
Instrument
Date of
demonstration
Polytone
1934
Syntronic Organ
1934
Everett Orgatron
1934
Partiturphon
Hammond electric
organ
Photona
1935
1935
Inventor
Notes
Electro-optical tone generators
1935
A. Lesti and
F. Sammis
I. Eremeef and
L. Stokowski
F. Hoschke and
B. Miessner
J. Mager
L. Hammond and
B. Miessner
I. Eremeef
Variophone
1935
Y. Sholpo
Electrone
1935
Foerster Electrochord
Sonotheque
1936
1936
Compton Organ
Company
O. Vierling
L. Lavalee
Kraft-durch-Freude
Grosstonorgel
1936
1936
1936
O. Vierling and
sta of HeinrichHertz-Institut,
Berlin
E. Welte
J. Dopyera
1936
L. Fender
1936
F. Sammis
1937
Oscillion
1937
Krakauer Electone
Melodium
Robb Wave organ
1938
1938
c. 1938
H. Bode and
C. Warnke
W. Swann and
W. Danforth
B. F. Miessner
H. Bode
M. Robb
48
Chapter 2
Table 2.1 (continued)
Instrument
Date of
demonstration
Inventor
Notes
Sonor
c. 1939
?. Ananyev
Kaleidaphon
Allen organ
Neo Bechstein piano
1939
1939
1939
Amplied piano
1939
J. Mager
Jerome Markowitz
O. Vierling and
W. Nernst
B. Miessner
Novachord
1939
Parallel Bandpass
Vocoder
Dynatone
1939
Voder speech
synthesizer
Violena
Emiriton
1939
Ekvodin
1940
V-8
c. 1940
Solovox
1940
W. Gurov
A. Ivanov and A.
Rimsky-Korsakov
A. Volodin,
Russia
A. Volodin,
Russia
L. Hammond
Univox
c. 1940
Univox Company
Multimonika
1940
Hohner GmbH
Ondioline
1941
Georges Jenny
Melotone
c. 1944
Hanert Electrical
Orchestra
Joergensen Clavioline
1945
Compton Organ
Company
J. Hanert
Rhodes Pre-Piano
1947
1939
1940
1940
1947
Hammond
Company
H. Dudley, Bell
Laboratories
B. Miessner,
A. Amsley
H. Dudley
M. Constant
Martin
H. Rhodes
Neon-tube oscillators
49
Instrument
Date of
demonstration
Inventor
Notes
Wurlitzer
Company
Conn Organ
Company
Hugh LeCaine
Wurlitzer electronic
organ
Conn Organ
1947
Electronic Sackbut
1948
1948
Mixturtrautonium
1949
Heliophon
Mastersonic organ
1949
1949
Connsonata
1949
Melochord
19479
B. Helberger
J. Goodell and
E. Swedien
Conn Organ
Company
H. Bode
Bel Organ
c. 1947
Bendix Electronics
Elektronium Pi
1950
Hohner GmbH
Radareed organ
Dereux organ
1950
c. 1950
G. Gubbins
Societe Dereux
1947
B. Cross and
P. Grainger
O. Sala
50
Chapter 2
ing certain wavelike properties to light beams. Newton was careful not to
speculate further, however, and the corpuscular or particle theory of light held
sway for a century (de Broglie 1945; Elmore and Heald 1969).
A competing wave theory began to emerge shortly afterward with the
experiments in reection and refraction of Christian Huygens, who also performed experiments on the wave nature of acoustical signals. The early nineteenth century experiments of Thomas Young reinforced the wave view. Young
observed that a monochromatic beam of light passing through two pinholes
would set up an interference pattern resembling ``waves of water,'' with their
characteristic patterns of reinforcement and cancellation at points of intersection, depending on their phase. Experiments by Augustin Fresnel and others
seemed to conrm this point of view. The theory of electromagnetic energy
proposed by the Scottish physicist James Clerk Maxwell (18311879) described
light as a wave variation in the electromagnetic eld surrounding a charged
particle. The oscillations of the particle caused the variations in this eld.
Physicists resolved the optical waveparticle controversy in the rst two
decades of the twentieth century. This entailed a unied view of matter and
electromagnetic energy as manifestations of the same phenomena, but with
dierent masses. The wave properties of polarization and interference, demonstrated by light, are also exhibited by the atomic constituents of matter, such as
electrons. Conversely, light, in its interaction with matter, behaves as though
composed of many individual units (called photons), which exhibit properties
usually associated with particles, such as energy and momentum.
Acoustical Wave versus Particle Debate
What Atomes make Change
Tis severall Figur'd Atomes that make Change,
When severall Bodies meet as they do range.
For if they sympathise, and do agree,
They joyne together, as one Body bee.
But if they joyne like to a Rabble-rout,
Without all order running in and out;
Then disproportionable things they make,
Because they did not their right places take.
(Margaret Cavendish 1653)
The idea that a continuous tone could be decomposed into smaller quantities of
time emerges from ancient atomistic philosophies. The statement that all matter
is composed of indivisible particles called atoms can be traced to the ancient
51
city of Abdera, on the seacoast of Thrace. Here, in the latter part of the fth
century BC, Leucippus and Democritus taught that all matter consists only of
atoms and empty space. These Greek philosophers are the joint founders of
atomic theory. In their opinion, atoms were imperceptible, individual particles
diering only in shape and position. The combination of these particles causes
the world we experience. They speculated that any substance, when divided into
smaller and smaller pieces, would eventually reach a point where it could no
longer be divided. This was the atom.
Another atomist, Epicurus (341270 BC), founded a school in Athens in 306
BC and taught his doctrines to a devoted body of followers. Later, the Roman
Lucretius (55) wrote De Rerum Natura (On the Nature of the Universe) delineating the Epicurean philosophy. In Book II of this text, Lucretius characterized the universe as a fortuitous aggregation of atoms moving in the void.
He insisted that the soul is not a distinct, immaterial entity but a chance combination of atoms that does not survive the body. He further postulated that
earthly phenomena are the result of purely natural causes. In his view, the
world is not directed by divine agency; therefore fear of the supernatural is
without reasonable foundation. Lucretius did not deny the existence of gods,
but he saw them as having no impact upon the aairs of mortals (Cohen 1984,
p. 177).
The atomistic philosophy was comprehensive: both matter and energy (such
as sound) were composed of tiny particles.
Roughness in the voice comes from roughness in its primary particles, and likewise smoothness is begotten of their smoothness. (Lucretius 55, Book IV, verse 524)
At the dawn of early modern science in the seventeenth century, the French
natural philosophers Pierre Gassendi (15921655) and Rene Descartes (1596
1650) revived atomism. Descartes' theory of matter was based on particles and
their motion. Gassendi (1658) based his system on atoms and the void. The
particles within these two systems have various shapes, weights, or other qualities that distinguish them. From 1625 until his death, Gassendi occupied himself with the promulgation of the philosophy of Epicurus.
During the same period, the science of acoustics began to take shape in
western Europe. A conuence of intellectual energy, emanating from Descartes,
Galileo, Beekman, Mersenne, Gassendi, Boyle, and others, gradually forced a
paradigm shift away from the Aristotelian worldview toward a more experimental perspective. It is remarkable how connected was this shift in scientic
thinking to the analysis of musical sound (Coelho 1992). Problems in musical
52
Chapter 2
53
(c. 25 BC), and Boethius (480524). The wave interpretation was also consistent with Aristotle's (384322 BC) statement to the eect that air motion is
generated by a source, ``thrusting forward in like manner the adjoining air, so
that the sound travels unaltered in quality as far as the disturbance of the air
manages to reach.''
By the mid-1600s, evidence had begun to accumulate in favor of the wave
hypothesis. Robert Boyle's classic experiment in 1640 on the sound radiation of
a ticking watch in a partially evacuated glass vessel gave proof that the medium
of air was necessary for the production or transmission of audible sound.
Experiments showed the relation between the frequency of air motion and
the frequency of a vibrating string (Pierce 1994). Galileo Galilei's book Mathematical Discourses Concerning Two New Sciences, published in 1638, contained the clearest statement given until then of frequency equivalence, and, on
the basis of accumulated experimental evidence, Rene Descartes rejected
Beekman's corpuscular theory of sound (Cohen 1984, p. 166).
Marin Mersenne's description in his Harmonie Universelle (1636) of the rst
absolute determination of the frequency of an audible tone (at 84 Hz) implies
that he had already demonstrated that the absolute-frequency ratio of two
vibrating strings, radiating a musical tone and its octave, is as 1 : 2. The perceived harmony (consonance) of two such notes could be explained if the ratio
of the air oscillation frequencies is also 1 : 2, which is consistent with the wave
theory of sound.
Thus, a continuous tone could be decomposed into small time intervals, but
these intervals would correspond to the periods of a waveform, rather than to
the rate of ow of sonic particles.
The analogy with water waves was strengthened by the belief that air motion associated with musical sounds is oscillatory and by the observation that
sound travels with a nite speed. Another matter of common knowledge was
that sound bends around corners, suggesting diraction, also observed in water
waves (gure 2.1). Sound diraction occurs because variations in air pressure
cannot go abruptly to zero after passing the edge of an object. They bend, instead, into a shadow zone in which part of the propagating wave changes direction and loses energy. This is the diracted signal. The degree of diraction
depends on the wavelength (short wavelengths diract less), again conrming
the wave view.
While the atomic theory of matter became the accepted viewpoint in the
nineteenth century, the wave theory of sound took precedence. New particlebased acoustic theories were regarded as oddities (Gardner 1957).
54
Chapter 2
Figure 2.1 Zones of audition with respect to a sound ray and a corner. Listeners in
zone A hear the direct sound and also the sound reected on the wall. Those in zone B
hear a combination of direct, reected, and diracted sound. In zone C they hear a
combination of direct and diracted sound. Listeners in zone D hear only diracted
sound (after Pierce 1994).
55
In this theory, Pound recognized the rhythmic potential of infrasonic frequencies. The composer Henry Cowell also describes this relationship:
Rhythm and tone, which have been thought to be entirely separate musical fundamentals
. . . are denitely related through overtone ratios. (Cowell 1930)
56
Chapter 2
a stick, the taps for the second melody would recur with double the rapidity of those of the
rst. If now the taps were to be increased greatly in rapidity without changing the relative
speed, it will be seen that when the taps for the rst melody reach sixteen to the second,
those for the second melody will be thirty-two to the second. In other words, the vibrations
from the taps of one melody will give the musical tone C, while those of the other will give
the tone C one octave higher. Time has been translated, as it were, into musical tone. Or, as
has been shown above, a parallel can be drawn between the ratio of rhythmical beats and
the ratio of musical tones by virtue of the common mathematical basis of both musical time
and musical tone. The two times, in this view, might be said to be ``in harmony,'' the simplest possible. . . . There is, of course, nothing radical in what is thus far suggested. It is only
the interpretation that is new; but when we extend this principle more widely we begin to
open up new elds of rhythmical expression in music. (Cowell 1930)
57
xed window tted with a viewing lens. Depending on the speed of rotation, the
image appeared to move in fast or slow motion.
After the invention of celluloid lm for photography, the ubiquitous Thomas
Alva Edison created the rst commercial system for motion pictures in 1891.
This consisted of the Kinetograph camera and the Kinetoscope viewing system.
Cinema came into being with the projection of motion pictures onto a large
screen, introduced by the Lumiere brothers in 1895.
In 1889 George Eastman demonstrated a system which synchronized moving
pictures with a phonograph, but the ``talking picture'' with optical soundtrack
did not appear until 1927. An optical sound track, however, is not divided into
frames. It appears as a continuous band running horizontally alongside the
succession of vertical image frames.
In music, automated mechanical instruments had long quantized time into
steps lasting as little as a brief note. But it was impossible for these machines to
operate with precision on the time scale of microsound. Electronics technology
was needed for this, and the modern era of microsound did not dawn until the
acoustic theory and experiments of Dennis Gabor in the 1940s.
The Gabor Matrix
Inherent in the concept of a continuum between rhythm and pitch is the notion
that tones can be considered as a succession of discrete units of acoustic energy.
This leads to the notion of a granular or quantum approach to sound, rst
proposed by the British physicist Dennis Gabor in a trio of brilliant papers.
These papers combined theoretical insights from quantum physics with practical experiments (1946, 1947, 1952). In Gabor's conception, any sound can be
decomposed into a family of functions obtained by time and frequency shifts of
a single Gaussian particle. Another way of saying this is that any sound can
be decomposed into an appropriate combination of thousands of elementary
grains. It is important to emphasize the analytical orientation of Gabor's
theory. He was interested in a general, invertible method for the analysis of
waveforms. As he wrote in 1952:
The orthodox method [of analysis] starts with the assumption that the signal s is a function s(t) of time t. This is a very misleading start. If we take it literally, it means that we
have a rule of constructing an exact value of s(t) to any instant of time t. Actually we are
never in a position to do this. . . . If there is a bandwidth W at our disposal, we cannot mark
time any more exactly than by a time-width of the order 1/W; hence we cannot talk
physically of time elements smaller than 1/W. (Gabor 1952, p. 6)
58
Chapter 2
Gabor took exception to the notion that hearing was well represented by
Fourier analysis of innite signals, a notion derived from Helmholtz (1885). As
he wrote:
Fourier analysis is a timeless description in terms of exactly periodic waves of innite duration. On the other hand it is our most elementary experience that sound has a time pattern as well as a frequency pattern. . . . A mathematical description is wanted which ab ovo
takes account of this duality. (Gabor 1947, p. 591)
59
tt0 2
e 2pjf0 t
where
Dt p 1=2 =a
and
D f a=p 1=2
The rst part of equation 1 denes the Gaussian envelope, while the second
part denes the complex sinusoidal function (frequency plus initial phase)
within each quantum.
The geometry of the acoustic quantum Dt D f depends on the parameter a,
where the greater the value of a, the greater the time resolution at the expense
of the frequency resolution. (For example, if a 1:0, then Dt 1:77245, and
D f 0:56419. Setting the time scale to milliseconds, this corresponds to a time
window of 1.77245 ms, and a frequency window of 564.19 Hz. For a 2:0, Dt
would be 0.88 ms and D f would be 1128.38 Hz.) The extreme limiting cases of
the Gabor series expansion are a time series (where Dt is the delta function d),
and the Fourier series (where Dt y).
Gabor proposed that a quantum of sound was a concept of signicance to
the theory of hearing, since human hearing is not continuous and innite in
resolution. Hearing is governed by quanta of dierence thresholds in frequency, time, and amplitude (see also Whiteld 1978). Within a short time window (between 10 and 21 ms), he reasoned, the ear can register only one distinct
sensation, that is, only one event at a specic frequency and amplitude.
Gabor gave an iterative approximation method to calculate the matrix. By
1966 Helstrom showed how Gabor's analysis/resynthesis approximation could
be recast into an exact identity by turning the elementary signals into orthogonal functions. Bacry, Grossman, and Zak (1975) and Bastiaans (1980, 1985)
veried this hypothesis. They developed analytic methods for calculating the
matrix and resynthesizing the signal.
A similar time-frequency lattice of functions was also proposed in 1932 in a
dierent context by the mathematician John von Neumann. It subsequently
became known as the von Neumann lattice and lived a parallel life among
quantum physicists (Feichtinger and Strohmer 1998).
60
Chapter 2
61
62
Chapter 2
resampled version. The local frequency content of the original signal and in
particular of the pitch, is preserved in the resampled version.
To eect a change in pitch without changing the duration of a sound, one
need only to change the playback rate of the original and use the timescale modication just described to adjust its duration. For example, to shift the pitch
up an octave, play back the original at double speed and use time-granulation
to double the duration of the resampled version. This restores the duration
to its original length. Chapter 5 looks at sound granulation using digital
technology.
Meyer-Eppler
The acoustician Werner Meyer-Eppler was one of the founders of the
West Deutscher Rundfunk (WDR) studio for electronic music in Cologne
(Morawska-Bungler 1988). He was well aware of the signicance of Gabor's
research. In an historic lecture entitled Das Klangfarbenproblem in der elektronischen Musik (``The problem of timbre in electronic music'') delivered in
August 1950 at the Internationale Ferienkurse fur Neue Musik in Darmstadt,
Meyer-Eppler described the Gabor matrix for analyzing sounds into acoustic
quanta (Ungeheuer 1992). He also presented examples of Oskar Fischinger's
animated lms with their optical images of waveforms as the ``scores of the
future.'' In his later lecture Metamorphose der Klangelemente, presented in 1955
at among other places, Gravesano, Switzerland at the studio of Hermann
Scherchen, Meyer-Eppler described the Gabor matrix as a kind of score that
could be composed with a ``Mosaiktechnik.'' In his textbook, Meyer-Eppler
(1959) described the Gabor matrix in the context of measuring the information
content of audio signals. He dened the ``maximum structure content'' of a
signal as a physical measurement
K 2W T
where W is the bandwidth in Hertz and T is the signal duration. Thus for a
signal with a full bandwidth of 20 kHz and a duration of 10 seconds, the
maximum structure content is 2 20000 10 400;000, which isby the sampling theoremthe number of samples needed to record it. He recognized that
aural perception was limited in its time resolution, and estimated that the
lower boundary on perception of parameter dierences was of the order of 15 ms,
about 1/66th of a second.
63
Wiener
The MIT mathematician Norbert Wiener (the founder of cybernetics) was well
aware of Gabor's theory of acoustic quanta, just as Gabor was well aware of
Wiener's work. In 1951, Gabor was invited to present his acoustical quantum
theory in a series of lectures at MIT (Gabor 1952).
Like Gabor, Wiener rejected the view (expounded by Leibniz in the eighteenth century) that time, space, and matter are innitely subdivisible or continuous. He supported Planck's quantum theory principle of discontinuity in
light and in matter. Wiener noted that Newton's model of deterministic physics
was being replaced by Gibbsian statistical mechanicsa ``qualied indeterminism.'' And like Gabor, he was skeptical of Fourier analysis as the best representation for music.
64
Chapter 2
The frequency and timing of a note interact in a complicated manner. To start and stop a
note involves an alteration of its frequency content which may be small but very real. A
note lasting only a nite time is to be analyzed as a band of simple harmonic motions, no
one of which can be taken as the only simple harmonic motion present. The considerations
are not only theoretically important but correspond to a real limitation of what a musician
can do. You can't play a jig in the lowest register of an organ. If you take a note oscillating
at sixteen cycles per second and continue it only for one twentieth of a second, what you get
is a single push of air without any noticeable periodic character. Just as in quantum theory,
there is in music a dierence of behavior between those things belonging to small intervals
of time and what we accept on the normal scale of every day. (Wiener 1964a, 1964b)
Going further, Wiener stressed the importance of recognizing the time scale of a
model of measurement:
The laws of physics are like music notationthings that are real and important provided
that we do not take them too seriously and push the time scale down below a certain
level. (Wiener 1964a, 1964b)
65
Synchronized with the advancing time interval Dt, the screens are snapshots
of sound bounded by frequency and amplitude grids, each screen subdivided
66
Chapter 2
67
A
0.1
0.33
0.8
B
0.9
0.33
0.1
C
0
0.34
0.1
68
Chapter 2
If one were to implement Xenakis's screens, one would want to modify the
theory to allow Dt to be less than the grain duration. This measure would allow
grain attacks and decays to overlap, thereby smoothing over the perception of
the frame rate. Similar problems of frame rate and overlap are well known
in windowed analysis-resynthesis techniques such as the short-time Fourier
transform (STFT). Frame-based representations are fragile, since any transformation of the frames that perturbs the perfect summation criteria at the boundaries of each frame leads to audible distortions (see chapter 6).
We face the necessity for a synchronous frame rate in any real-time implementation of granular synthesis. Ideally, however, this frame rate should operate at a speed as close as possible to the audio sampling rate.
Analog Impulse Generators
The most important sound particle of the 1950s, apart from those identied
in Xenakis's experiments, was the analog impulse. An impulse is a discrete
amplitude-time uctuation, producing a sound that we hear as a click.
Although the impulse is ideally a narrow rectangular shape, in practice it may
be band-limited or have a ramped attack and decay. An impulse generator
emits a succession of impulses at a specied frequency. Impulse generators
serve many functions in a laboratory, such as providing a source for testing
the impulse response (IR) of a circuit or system. The IR is an important system
measurement (see chapter 5).
The common analog circuit for impulse and square wave generation is the
multivibrator (gure 2.3). Multivibrators can be built using many electronic
technologies: vacuum tubes, transistors, operational ampliers, or logic gates.
Although sometimes referred to as an oscillator, a multivibrator is actually an
automatic switch that moves rapidly from one condition to another, producing
a voltage impulse which can be positive, negative, or a combination of the two.
The multivibrator circuit has the advantage that it is easily tuned to a specic
frequency and duty cycle by adjusting a few circuit elementseither resistance
or capacitance values (Douglas 1957).
The multivibrator was used in electronic music instruments as early as 1928,
in Rene Bertrand's Dynaphone (Rhea 1972). Musicians appropriated laboratory impulse generators in the electronic music studios of the 1950s. Karlheinz
Stockhausen and Gottfried Michael Koenig worked extensively with impulse
generators at the Cologne studio. As Koenig observed:
69
Figure 2.3 A multivibrator circuit, after Douglas (1957). Suppose that when switching
on, a small positive voltage appears at V1. This increases the anode current of V1, and in
so doing increases the anode potential of V1, which is communicated to the grid of V2.
As the voltage of V2 falls, so will the anode current of V2, causing a rise in the anode
potential of V1, making it more positive. The process continues until it reaches the cuto
voltage of the vacuum tube. The circuit stays in this condition while the grid of V2 leaks
away at a rate depending on the time constant of C1 and R1. As soon as the anode potential of V2 reaches a point where anode current can ow again, the anode potential of
V2 will fall again since the current is increasing, which drives the grid of V1 negative.
The whole process is continued in the opposite direction until V1 is cut o, and so on
continuously. If C1 C2 and R1 R2 the waveform is symmetrical (square) and has
only odd harmonics.
[The pure impulse] has no duration, like sinus and noise, but represents a brief energy
impetus, comparable to a leaping spark. Consequently it has neither pitch nor timbre. But
it encounters an object and sets it vibrating; as pitch, noise, or timbre of the object which
has been impelled. (Koenig 1959)
70
Chapter 2
Figure 2.4 Synthesis patches used in the creation of Kontakte by Stockhausen. The
components include impulse generators (IG), preampliers (P), analog tape recorders,
bandpass lters ( f ), and plate reverberators (R). Feedback loops appears as arrows
pointing backwards. (a) Simple impulse generation and recording. (b) Impulse generation with preamplication, ltering, and tape feedback. (c) Impulse generation with
preamplication and ltered feedback. (d) Impulse generation with preamplication,
and multiband ltering. (e) Impulse generation with preamplication, multiband ltering, and tape feedback. (f ) A four-stage process involving (f1) Impulse generation, preamplication, ltering, and recording. (f2) Reverberation with feedback and recording.
(f3) Tape feedback and recording. (f4) Reverberation, ltering, preamplication, and
recording.
71
(Boulez 1955)
Ernst Krenek, one of the rst composers to own an electronic music synthesizer, seemed to anticipate the notion of sound quanta when he mused:
The next step might be the splitting of the atom (that is, the sine tone).
(Krenek 1955)
72
Chapter 2
73
the time interval spanning one cycle of a waveform, Stockhausen uses the term
``phase'' (Phasen), referring not to a ``fundamental period'' but to a ``fundamental phase.'' He substitutes the term ``formant'' for ``harmonic,'' so a harmonic spectrum built up of a fundamental and integer-multiple frequencies is
called a ``formant spectrum.'' He applies the term ``eld'' (Feld ) to denote an
uncertainty region (or band) around a time interval or a central frequency. As
long as one understands these substitutions of terms, however, one can follow
Stockhausen's arguments. In the representation of his article below, I replace
Stockhausen's neologisms with standard acoustical terminology. Page numbers
refer to the English translation.
The most important insight of ``. . . . . How time passes . . . . .'' is a unied view
of the relationship between the various time scales of musical structure. Stockhausen begins by noting the generality of the concept of period, an interval
between two cycles. Period appears in both rhythm (from 6 sec to 1/16th of a
sec) and pitch (from about 1/16th sec to about 1/3200th sec). The key here is
that pitch and rhythm can be considered as one and the same phenomenon,
diering only in their respective time scales. Taking this argument deeper into
the microtemporal domain, the tone color or steady-state spectrum of a note
can also be seen as a manifestation of microrhythm over a fundamental frequency. This point of view can also be applied in the macrotemporal domain.
Thus, an entire composition can be viewed as one time spectrum of a fundamental duration. (As noted earlier, this idea was proposed by Ezra Pound in the
1920s, and by Henry Cowell in 1930.)
The bulk of Stockhausen's text applies this viewpoint to a problem spawned
by serial composition theory; that of creating a scale of twelve durations corresponding to the chromatic scale of pitches in the twelve-tone system. The
problem is exacerbated by Stockhausen's desire to notate the result for performance on traditional instruments. Later, after the composer has developed a
method for generating some of the most arcane rhythmic notation ever devised
(see, for example, the scores of Zeitmasse or the Klavierstucken), he turns to the
diculties of indeterminate notation. Let us now look in more detail at these
arguments.
Stockhausen begins by observing a contradiction in twelve-tone composition theory, which rigorously organizes pitch but not, in any systematic way,
rhythm. Since pitch and rhythm can both be considered as dimensions of time,
Stockhausen proposes that they should both be organized using twelve-element
scales. Constructing a scale of durations that makes sense logically and makes
sense perceptually, however, is not simple. Stockhausen presents several strat-
74
Chapter 2
75
76
Chapter 2
mula may be written in dierent ways, some harder than others for performers
to realize. Stockhausen tries to allow for such imprecisions in his theory by
assigning an ``uncertainty band'' (time-eld ) to the dierent notations. These
time-elds could be derived by recording expert instrumentalists playing dierent gures while measuring the precision of their interpretation. (Obviously it
would be impractical if every rhythmic formula in a composition had to be
notated in multiple ways and tested in such a way.) Stockhausen then proposes
that one could serialize the degree of inaccuracy of performance (!).
The degree of notational complexity and the exactness of performance
are inversely related. If metronome markings are changing from measure to
measure, the uncertainty factor increases. Multiple simultaneous tempi and
broad uncertainty bands lead to general confusion. At this point, Stockhausen
switches the direction of his argument to the organization of statistical groups.
All parameters of time can be turned into ranges, for example, leaving it to the
performers to select from within a specied range. Stockhausen points out that
John Cage, in his proportional notation was not interested in proportional
relationships, depending, as they do, on memories of the past. In Cage's compositions, temporal events are not intentionally linked to the past; always one is
in the present.
Stockhausen prefers a system in which determinacy and indeterminacy stand
at opposite poles of a continuum. He seeks a way to notate structural indeterminacy in a determinate way. This involves ``time smearing'' the music by
interpolating grace notes and articulations (staccato, legato, etc.) that ``fade''
into or out of a central note. Indeterminate notation can also be extended to
meso and macrostructure.
The structure of a piece is presented not as a sequence of development in time but as a
directionless time-eld . . . The groups are irregularly distributed on paper and the general
instructions are: Play any group, selected at random. . . . (p. 36)
The important principle in this gamut between determinacy and indeterminacy is the interplay between the rational counting of time and the ``agitation of
time'' by an instrumentalist. The score is no longer the reference for time.
Instead of mechanically quantifying durations that conict with the regularity of metronomic time, [the performer] now measures sensory quanta; he feels, discovers the time of
the sounds; he lets them take their time. (pp. 378)
The domain of pitch can also be notated aleatorically, and the gamut between pitch and noise can be turned into a compositional parameter.
77
To fully realize the pitch-noise continuum, he argues, a new keyboard instrument could be built in which a certain key-pressure produces a constant
repetition of waveform periods (a continuous pitched tone), but a stronger
pressure causes aleatoric modulation leading into noise. This ``ideal instrument'' would be able to move from duration to pitch, from tone to noise, and
also be able to alter the timbre and amplitude of the oscillations. Several instruments playing together would be able to realize all of the riches of Stockhausen's temporal theory. After lamenting on how long one might have to wait
for such an instrument, Stockhausen nishes his article by asserting:
It does not seem very fruitful to founder on a contradiction between, on the one hand, a
material that has become uselessinstruments that have become uselessand, on the
other, our compositional conception.
78
Chapter 2
(Stockhausen
When it was published, ``. . . . . How time passes . . . . .'' presented a new viewpoint on musical time. This viewpoint is more familiar to us now, yet we can
still appreciate the depth of Stockhausen's insight. Few articles on music from
the 1950s ring with such resonance today.
The quest for a scale of durations for serial composition is no longer a compelling musical problem. Even in the heyday of serial composition, Stockhausen's solution never entered the repertory of common practice. The most
prominent American exponent of serial techniques, Milton Babbitt (1962), explicitly rejected the idea of constructing a duration scale from multiples of an
elementary unit. For Babbitt, the temporal order of the pitches in the row was
more important than the actual durations of the notes. Thus he reduced the
problem of serial organization of time to the organization of the instants at
which notes started, which he called the time point set. (See also Morris 1987.)
Stockhausen's arguments managed to resolve temporarily one of the many
contradictions and inconsistencies of serial theory. At the same time, they left
unresolved a host of major problems involving perception, notation, traditional
instrumental timbre, and higher-level organization. To untangle and examine
these issues in detail would require another book. Even if these problems could
somehow be magically resolved, it would not automatically ``validate'' compositions made with these techniques, for music will always be more than a game
of logic.
Today it is easier than ever before to compose on all time scales. Yet we
must continue to respect the dierences between them. In his essay on aesthetic
questions in electronic music, the musicologist Carl Dahhaus (1970) criticized
the use of identical methodologies for macro and micro composition. As he
wisely pointed out, serial methods that are already barely decipherable on the
level of notes and phrases disappear into invisibility when applied on the microlevel of tone construction. (See the discussion in chapter 8.)
Henry Cowell's (1930) ideas concerning the relationship between musical
time scales precede those of Stockhausen by almost thirty years. Cowell pointed
out that rhythms, when sped up, become tones. He introduced the concept of
undertones at fractional intervals beneath a fundamental tone, leading to the
notion of a rhythmic undertone. In order to represent divisions of time not
79
80
Chapter 2
81
focused on the movement in space of long sustained tones produced by commercial synthesizers. None of these works, however, continues or extends the
theoretical principles underlying Kontakte.
Other Assessments of Stockhausen's Temporal Theory
Christopher Koenigsberg (1991) presented a balanced analysis of Stockhausen's
temporal theory and of its critics. His review prompted me to revisit these
contemporaneous critiques. The most heated seem to have been provoked
by Stockhausen's nonstandard terminology. The American acoustician John
Backus wrote:
In physics, a quantum is an undivisible unit; there are no quanta in acoustical phenomena. (Backus 1962, p. 18; see also Backus 1969, p. 280)
Backus had probably not read Gabor's papers. And while certain of his
criticisms concerning terminology are valid, they tend to focus on the details of
a much broader range of musical ideas. The same can be said of Adriaan Fokker's (1962) comments. G. M. Koenig's (1962) response to Fokker's critique
attempts to re-explicate Stockhausen's theories. But Koenig manages to be
even more confusing than the original because of his insistence on defending
its nonstandard terminology. This leads him to some arcane arguments, such as
the convoluted attempt to explain the word ``phase'' (Koenig 1962, pp. 825).
(See also Davies 1964, 1965.)
Milton Babbitt's (1962) proposal on time-point sets, which we have already
mentioned, was published in the same issue of Perspectives of New Music as
the Backus attack on Stockhausen's article. This might have been intentional,
since Babbitt's theory was an alternative proposal for a set-theory approach to
rhythmic structure. The theory of time-point sets, however, left open the question of which durations to use, and was not concerned with creating a unique
scale of equal-tempered durations. Babbitt broached the possibility of microrhythms, but never followed up on this concept.
82
Chapter 2
ous wave emissionalways oscillatingwith one hand controlling the amplitude of the emitted tone. In the hands of a virtuoso, such as Clara Rockmore or
Lydia Kavina, these instruments produce expressive tones, although the duration and density of these tones never approaches the microsonic threshold.
Further examples of a timeless wave model include the sine and pulse generators of the pioneering electronic music studios. These devices were designed
for precise, repeatable, and unchanging output, not for real time performance.
A typical generator might have a vernier dial for frequency, but with the
sweepable frequency range broken into steps. This meant that a continuous
sweep, say from 1 Hz to 10 kHz, was not possible. The amplitude and waveform controls typically oered several switchable settings.
Analog magnetic tape oered a breakthrough for microsound processing. In
discussing the musique concrete of the early 1950s, M. Chion wrote:
Soon the tape recorder, which was used in the musique concrete, would replace the turntable. It allowed editing, which was dicult with the vinyl disc. The possibility of assembling
tight mosaics of sound fragments with magnetic tape denitively launched electroacoustic
music. The rst pieces for tape were ``micro-edited,'' using as their basis sounds that were
reduced to the dust of temporal atoms (Pierre Henry's Vocalises, Boulez's Etudes,
Stockhausen's Etude Concrete). In this ``analytic'' period, one sought the atomic ssion of
sound, and magnetic tape (running at the time at 76 cm/s) was seen as having a tangible
duration that could be cut up ad innitum, up to one hundred parts per second, allowing the
realization of abstract rhythmic speculations that human performers could never play, as in
the Timbres-Durees (1953) of Olivier Messaien. (Chion 1982)
83
Only by the mid 1970s, through the introduction of digital technology, was it
feasible to experiment with microsound in the manner predicted by Koenig.
(See chapter 3.) Digital editing on any time scale did not become possible until
the late 1980s.
The novel sound of analog circuitry was used brilliantly in early works by
Stockhausen, Koenig, and Xenakis to create a new musical world based on
microsonic uctuations. The acoustical signature of analog generators and lters remains a useful resource for twenty-rst century composers. At the same
time, one must recognize the constraints imposed by analog techniques, which
can be traced to a nite class of waveforms and the diculty of controlling their
evolution on a micro level. New analog synthesizers have been introduced into
the marketplace, but they are no more than variations on a well-established
theme. There is little room for further evolution in analog technology.
84
Chapter 2
Summary
The notion that apparently continuous phenomena can be subdivided into
particles can be traced to the atomistic philosophers of Greek antiquity.
Debates between proponents of the ``wave'' and ``particle'' views in optics and
acoustics have occupied scientists for centuries. These debates were central to
the formation of early modern science.
The contemporary scientic view of microsound dates back to Dennis
Gabor, who applied the concept of an acoustic quantum (already introduced by
Einstein) to the threshold of human hearing. With Meyer-Eppler as intermediary, the pioneering composers Xenakis, Stockhausen, and Koenig injected this
radical notion into music. Xenakis's theory of granular synthesis has proven to
be an especially inspiring paradigm. It has directly inuenced me and many
other composers who have employed granular techniques in their works. Over
decades, a microsonic perspective has gradually emerged from the margins of
musical thought to take its present place as a valuable fountain of compositional ideas.
Granular Synthesis
86
Chapter 3
Digital sound synthesis techniques inhabit a virtual world more pure and precise than the physical world, and purity and precision have an undeniable
charm in music. In the right hands, an unadorned sine wave can be a lush and
evocative sonority. A measured pulsation can invite emotional catharsis. Synthesis, however, should be able to render expressive turbulence, intermittency,
and singularity; the overuse of precision and purity can lead to sterile music.
Sonic grains, and techniques used to scatter the grains in evocative patterns, can
achieve these results.
This chapter is devoted entirely to granular synthesis (GS). I present its
theory, the history of its implementations, a report on experiments, and an
assessment of its strengths and weaknesses. A thorough understanding of
the principles of granular synthesis is fundamental to understanding the other
techniques presented in this book. This chapter focuses on synthesis with
synthetic waveforms. Since granulation transforms an existing sound, I present the granulation of sampled sound in chapter 5 with other particle-based
transformations.
87
Granular Synthesis
Figure 3.1 Portrait of a grain in the time domain. The duration of the grain is typically
between 1 and 100 ms.
88
Chapter 3
Anatomy of a Grain
A grain of sound lasts a short time, approaching the minimum perceivable
event time for duration, frequency, and amplitude discrimination (Whiteld
1978; Meyer-Eppler 1959; Winckel 1967). Individual grains with a duration less
than about 2 ms (corresponding to fundamental frequencies > 500 Hz) sound
like clicks. However one can still change the waveform and frequency of grains
and so vary the tone color of the click. When hundreds of short-duration grains
ll a cloud texture, minor variations in grain duration cause strong eects in
the spectrum of the cloud mass. Hence even very short grains can be useful
musically.
Short grains withhold the impression of pitch. At 5 ms it is vague, becoming
clearer by 25 ms. The longer the grain, the more surely the ear can hear its
pitch.
An amplitude envelope shapes each grain. In Gabor's original conception,
the envelope is a bell-shaped curve generated by the Gaussian method (gure
3.2a).
1
2
px p ex =2 dx
2p
A variation on the pure Gaussian curve is a quasi-Gaussian envelope (Roads
1978a, 1985), also known as a cosine taper or Tukey window (Harris 1978). This
envelope can be imagined as a cosine lobe convolved with a rectangle (gure
3.2b). It transitions smoothly at the extrema of the envelope while maximizing the eective amplitude. This quality persuaded me to use it in my earliest
experiments with granular synthesis.
In the early days of real-time granular synthesis, it was necessary to use simple line-segment envelopes to save memory space and computation time (Truax
1987, 1988). Gabor (1946) also suggested line-segment envelopes for practical
reasons (gure 3.2c and d). Keller and Rolfe (1998) have analyzed the spectral
artefacts introduced by a line-segment trapezoidal window. Specically, the
frequency response is similar to that of a Gaussian window, with the addition of
comb-shaped spectral eects. Null points in the spectrum are proportional to
the position of the corners of the window.
Figure 3.2e portrays another type of envelope, the band-limited pulse or sinc
function. The sidelobes (ripples) of this envelope impose a strong modulation eect. The percussive, exponentially decaying envelope or expodec grain
89
Granular Synthesis
Figure 3.2 Grain envelopes. (a) Gaussian. (b) Quasi-Gaussian. (c) Three-stage line
segment. (d) Triangular. (e) Sinc function. (f ) Expodec. (g) Rexpodec.
90
Chapter 3
91
Granular Synthesis
Figure 3.3 The simplest grain generator, featuring a Gaussian grain envelope and a
sinusoidal grain waveform. The grains can be scattered to a position in N channels of
output.
92
Chapter 3
93
Granular Synthesis
94
Chapter 3
95
Granular Synthesis
Figure 3.4 Inuence of grain density on pitch. The waveforms in (a) through (e) last
59 ms. (a) 50 grains/sec. (b) 100 grains/sec. (c) 200 grains/sec. (d) 400 grains/sec.
(e) 500 grains/sec. (f ) Plot of a granular stream sweeping from the infrasonic frequency
of 10 grains/sec to the audio frequency of 500 grains/sec over thirty seconds.
When the grain density increases to 400 grains per second (gure 3.4d), the
perceived pitch doubles to 400 Hz. This is due to the increasing frequency of
wavefronts (as in the well-known Doppler shift eect). Notice that the amplitude of tone diminishes after beginning, however, because the density period c
is less than the waveform period a. Only the rst few samples of the product of
the sine wavetable and the grain envelope are being repeated, resulting in a lowamplitude signal.
Finally, at a density of 500 grains per second (gure 3.4e), the signal has
almost no amplitude. It is reading only the rst few samples of the sinusoid,
which are near zero.
96
Chapter 3
Figure 3.4f shows the amplitude prole of a granular stream that sweeps
from 10 grains per second to 500 grains per second over thirty seconds. Notice
the diminution of amplitude due to the eect shown in gure 3.4e.
Besides pitch changes, other anomalies, such as phase cancellation, can occur
when the grain density and envelope duration are at odds with the frequency of
the grain waveform.
Even the impression of synchronicity can be undermined. If we widen the
frequency limits of a dense synchronous stream slightly, the result quickly truns
into a noiseband. The fact that the grain emissions are regular and the frequency changes at regular intervals (for example, every 1 ms), does not alter
the general impression of noise. The eect is similar to that produced by asynchronous granular synthesis, described next.
Asynchronous Granular Synthesis
Asynchronous granular synthesis (AGS) abandons the concept of linear streams
of grains. Instead, it scatters the grains over a specied duration within regions
inscribed on the time-frequency plane. These regions are cloudsthe units with
which the composer works. The scattering of the grains is irregular in time,
being controlled by a stochastic or chaotic algorithm. The composer may specify
a cloud with the following parameters:
1. Start-time and duration of the cloud
2. Grain durationmay vary over the duration of the cloud
3. Density of grains per second, with a maximum density depending upon the
implementation; density can vary over the duration of the cloud
4. Frequency band of the cloud; specied by two curves forming high and low
frequency boundaries within which grains are scattered; alternatively, the
frequency of the grains in a cloud can be restricted to a specic set of pitches
5. Amplitude envelope of the cloud
6. Waveform(s) within the grains
7. Spatial dispersion of the cloud, where the number of output channels is implementation-specic
The grain duration (2) can be a constant (in milliseconds), or a variable that
changes over the course of a cloud. (It can also be correlated to other parameters, as in the grainlet synthesis described in chapter 4.) Grain duration can also
97
Granular Synthesis
98
Chapter 3
trast, chaotic functions vacillate between stable and unstable states, between
intermittent transients and full turbulence (Di Scipio 1990, 1997b; Gogins 1991,
1995; Miranda 1998). The challenge is to set up a musically compelling mapping between chaotic behavior and the synthesis parameters.
Streams and Clouds of Granulated Samples
The granulation of sampled sounds is a powerful means of sound transformation. To granulate means to segment (or window) a sound signal into grains, to
possibly modify them in some way, and then to reassemble the grains in a new
time order and microrhythm. This might take the form of a continuous stream
or of a statistical cloud of sampled grains.
The exact manner in which granulation occurs will vary from implementation to implementation. Chapter 5 includes a major section on granulation so
here we shall limit the discussion to noting, that granulation can be controlled
by any of the global control structures described above.
Spectra of Granular Streams
When the intervals between successive grains are equal, the overall envelope of
a stream of grains forms a periodic function. Since the envelope is periodic, the
signal generated by SGS can be analyzed as a case of amplitude modulation or
AM. AM occurs when the shape of one signal (the modulator) determines the
amplitude of another signal (the carrier). From a signal processing standpoint,
we observe that for each sinusoidal component in the carrier, the periodic envelope function contributes a series of sidebands to the nal spectrum. (Sidebands are additional frequency components above and below the frequency of
the carrier.) The sidebands separate from the carrier by a distance corresponding to the inverse of the period of the envelope function. For grains lasting
20 ms, therefore, the sidebands in the output spectrum will be spaced at 50 Hz
intervals. The shape of the grain envelope determines the precise number and
amplitude weighting of these sidebands.
The result of modulation by a periodic envelope is that of a formant surrounding the carrier frequency. That is, instead of a single line in the spectrum
(a single frequency), the spectrum looks like a sloping peak (a group of frequencies around the carrier). In the case of a bell-shaped Guassian envelope,
the spectrum is similarly bell-shaped. In other words, for a Gaussian envelope,
the spectrum is an eigenfunction of the time envelope.
99
Granular Synthesis
When the delay interval between the grains is irregular, perfect grain synchronization disappears. The randomization of the onset time of each grain
leads to a controllable thickening of the sound spectruma ``blurring'' of the
formant structure (Truax 1988).
In its simplest form, the variable-delay method is similar to amplitude modulation using low-frequency colored noise as a modulator. In itself, this is not
particularly new or interesting. The granular representation, however, lets us
move far beyond simple noise-modulated AM. We can simultaneously vary
several other parameters on a grain-by-grain basis, such as grain waveform,
amplitude, duration, and spatial location. On a global level, we can also dynamically vary the density of grains per second, creating a variety of scintillation eects.
Parameters of Granular Synthesis
Research into sound synthesis is governed by aesthetic goals as much as by
scientic curiosity. Some of the most interesting synthesis techniques have
resulted from applied practice, rather than from formal theory. Sound design
requires taste and skill and at the experimentation stage, musical intuition is the
primary guide.
Grain Envelope Shape Eects
Of Loose Atomes
In every Braine loose Atomes there do lye,
Those which are Sharpe, from them do Fancies ye.
Those that are long, and Aiery, nimble be.
But Atomes Round, and Square, are dull, and sleepie.
(Margaret Cavendish 1653)
100
Chapter 3
Under special circumstances, all of this is quite true. But if we loosen any one
of a number of constraints, time reversibility does not hold. For it to hold at the
micro scale, the grain envelope must be symmetrical. This, then, excludes
asymmetric techniques such as FOF grains (Rodet 1980), trainlets (chapter 4),
expodec, or rexpodec grains. The grain waveform must not alter in time, so
excluding techniques such as the time-varying FM grains (Jones and Parks
1988), long glissons (chapter 4), or grains whose waveform derives from a time-
101
Granular Synthesis
2. Time-varying duration
Frequency of
modulation
200 msec
500 msec
1 ms
10 ms
50 ms
100 ms
200 ms
5 KHz
2 KHz
1 KHz
100 Hz
20 Hz
10 Hz
5 Hz
Perceived eect
Noisy particulate disintegration
Loss of pitch
Fluttering, gurgling
Stable pitch formation
Aperiodic tremolo, jittering
102
Chapter 3
Figure 3.5 Comparison of grain spectra produced by a 7 ms grain duration (top) versus
a 29 ms grain duration (bottom). Notice the narrowing of the spectrum as the duration
lengthens.
The laws of micro-acoustics tell us that the shorter the duration of a signal,
the greater its bandwidth. Thus the width of the frequency band B caused by
the sidebands is inversely proportional to the duration of the grain D (gure
3.5).
A dramatic eect occurs when the grain duration is lowered to below the
period of the grain waveform. This results in a signal that is entirely unipolar in
energy, which is a byproduct of the ratio of the grain duration to the fundamental frequency period Pf of the grain waveform, or D=Pf . The eect is
caused by an incomplete scan of the wavetable, where the waveform starts in
either the positive or the negative quadrant. It occurs whenever D=Pf is less
than 1.0. In the specic case of a 1 ms grain with a fundamental frequency of
500 Hz, the ratio is 0:001=0:002 1=2.
To completely represent one period of a given frequency, the grain duration
must be at least equal to the frequency period. If we took this criterion as a
standard, grains could last no less than 50 ms (corresponding to the period of
20 Hz) for low frequency signal energy to be captured completely. As it happens however, much shorter grains can represent low frequency signals, but this
short grain duration introduces modulation products. Our experiments show
that grains shorter than 5 ms tend to generate particulated clouds in which a
sense of center-pitch is still present but is diused by noise as the frequency
descends.
103
Granular Synthesis
104
Chapter 3
on the sampling rate). Above these limits, waveforms other than sine cause
foldover. For this reason, higher sampling rates are better for digital synthesis.
The grain waveform can also be extracted from a sampled sound. In this
case, a single extracted waveform is fed to an oscillator, which reads the waveform repetitively at dierent frequencies. In Cloud Generator, for example, the
extracted waveform constitutes the rst 2048 samples (46 ms) of a selected
sound le (see the appendix). This diers from granulation, which extracts
many dierent segments of a long sample le. See chapter 5.
Frequency Band Eects
Frequency band parameters limit the fundamental frequencies of grain waveforms. Within the upper and lower boundaries of the band, the grain generator
scatters grains. This scattering can be aligned to a frequency scale or to random
frequencies. When the frequency distribution is random and the band is greater
than a small interval, the result is a complex texture, where pitch is ambiguous
or unidentiable. The combined AM eects of the grain envelope and grain
density strongly inuence pitch and spectrum.
To generate harmonic texture, we can constrain the choice of fundamental
frequency to a particular set of pitches within a scale. We distinguish two
classes of frequency specications:
Cumulus The frequencies of the grains scatter uniformly within the upper and
lower bounds of a single band specied by the composer.
Stratus
105
Granular Synthesis
Figure 3.6 Frequency band specications. (a) The band centers on a single frequency.
(b) The center frequency changes over time, creating a glissando eect. (c) Stratus cloud
with several frequencies. (d) Cumulus cloud where the grains scatter randomly between
the upper and lower boundaries. (e) The shape of the cumulus band changes over time.
(f ) Time-varying curves shape the bandlimits of the cumulus cloud.
106
Chapter 3
107
Granular Synthesis
1 Wide bands (e.g., an octave or more) and high densities generate massive
clouds of sound.
As we have seen, in the section on grain duration eects, another way to modify
the bandwidth of a cloud is by changing the grain duration parameter.
Granular Spatial Eects
Granular synthesis calls for multichannel output, with an individual spatial
location for each grain. If the cloud is monaural, with every grain in the
same spatial position, it is spatially at. In contrast, when each grain scatters to
a unique location, the cloud manifests a vivid three-dimensional spatial morphology, evident even in a stereophonic conguration.
From a psychoacoustical point of view, the listener's perception of the spatial
position of a grain or series of grains is determined by both the physical properties of the signal and the localization blur introduced by the human auditory
system (Blauert 1997). Localization blur means that a point source sound produces an auditory image that spreads out in space. For Gaussian tonebursts,
the horizontal localization blur is in the range of 0.8 to 3.3 , depending on the
frequency of the signals (Boerger 1965). The localization blur in the median
plane (starting in front, then going up above the head and down behind) is
greater, on the order of 4 for white noise and becoming far greater (i.e. less
accurate) for purer tones. (See Boerger 1965 for a study of the spatial properties
of Gaussian grains.)
Taking localization blur into account, one can specify the spatial distribution
of the grains in one of two ways: as an envelope that pans across N channels, or
as a random dispersion of grains among N channels. Random dispersion is
especially eective in the articulation of long grains at low densities.
Chapter 5 presents more on the spatial eects made possible through particle
scattering and other techniques.
Granular Clouds as Sound Objects
A cloud of grains may come and go within a short time span, for example, less
than 500 ms. In this case, a cloud of grains forms a tiny sound object. The inner
structure of the cloud determines its timbral evolution. I have conducted numerous experiments in which up to fty grains were generated within a time
span of 20 to 500 ms. This is an eective way to construct singular events that
cannot be created by other means.
108
Chapter 3
Cloud Mixtures
A granular composition is a ow of multiple overlapping clouds. To create such
textures, the most exible strategy is rst to generate each individual cloud.
Then to mix the clouds to precisely order and balance their ow in time. To
create a polychrome cloud texture, for example, several monochrome clouds,
each with a dierent grain waveform are superimposed in a mixing program.
It is easy to granulate a sound le and take the results ``as is.'' A more sophisticated strategy is to take the granulation as a starting point. For example,
one can create a compound cloudone with an interesting internal evolutionby carefully mixing several granulated sound les.
Mixing is also eective in creating rhythmic structures. When the density of a
synchronous cloud is below about 20 Hz, it creates a regular metric pulse. To
create a polyrhythmic cloud, one can generate several clouds at dierent densities, amplitudes, and in dierent frequency regions to stratify the layers.
This description intrigued me, but there were no sounds to hear. Granular
synthesis remained a theoretical topic at the workshop. Maestro Xenakis took
us to the campus computing center to show us experiments in stochastic wave-
109
Granular Synthesis
form generation (also described in his book), but he never realized granular
synthesis on a computer.
Later that year, I enrolled as a student in music composition at California
Institute of the Arts. During this period, I also studied mathematics and computer programming with Leonard Cottrell. For the next two years, I wrote
many programs for the Data General Nova 1200, a minicomputer at the Institute. Thus included software for stochastic processes and algorithmic composition based on Xenakis's formulas (Roads 1992a). I spent much time testing the
formulas, which fostered in me a deeper understanding of probability theory.
The Nova 1200 was limited, however. It lacked memory and had no digital
audio converters. Its only peripheral was a teletype with a paper tape punch for
storing and reading programs. Digital sound synthesis was out of the question.
In March 1974, I transferred to the University of California, San Diego
(UCSD), having learned of its computer sound synthesis facilities. Bruce
Leibig, a researcher at UCSD, had recently installed the Music V program
(Mathews 1969) on a mainframe computer housed in the UCSD Computer
Center. The dual-processor Burroughs B6700 was an advanced machine for its
day, with a 48-bit wordlength, virtual memory, digital tape storage, and support for parallel processing. A single language, Extended Algol, provided access
to all levels of the system, from the operating system to the hardware. This is
not to say that music synthesis was easy; because of the state of input and output technology, the process was laborious.
The Burroughs machine could not produce sound directly. It could, however,
write a digital tape that could be converted to sound on another computer, in
this case a Digital Equipment Corporation (DEC) PDP-11/20, housed on
campus at the Center for Music Experiment (CME). Bruce Leibig wrote the
PAL-11 assembly language code that performed the digital-to-analog conversion. This important programming work laid the foundation for my research. I
enrolled in an Algol programming course oered by the computer science
department. There were no courses in computer sound synthesis, but with help
from Bruce Leibig, I learned the Music V language. We programmed on
punched paper cards, as there were no interactive terminals.
Owing to storage limitations, my sound synthesis experiments were limited to
a maximum of one minute of monaural sound at a sampling rate of 20 kHz. It
took several days to produce a minute of sound, because of the large number of
steps involved. The UCSD Computer Center scheduled sound calculations for
the overnight shift. So I would submit a box of punched cards to a computer
operator and return the next day to collect a large digital tape reel containing
110
Chapter 3
the previous evening's data. In order to convert this data into sound, I had rst
to transfer it from the tape to a disk cartridge. This transfer involved setting up
an appointment at the Scripps Institute of Oceanography. Surrounded by the
pungent atmosphere of the squid tanks of the Neurology Computing Laboratory, I transferred the contents of the tape. Then I would take the disk cartridge
to CME and mount it on the DEC minicomputer. This small computer, with a
total of 28 kbytes of magnetic-core RAM, had a single-channel 12-bit digitalto-analog converter (DAC) designed and built by Robert Gross. The digital-toanalog converter truncated the four low-order bits of the 16-bit samples.
After realizing a number of short etudes with Music V, in December 1974 I
tested the rst implemention of asynchronous granular synthesis. For this experiment, called Klang-1, I typed each grain specication (frequency, amplitude, duration) onto a separate punched card. A stack of about eight hundred
punched cards corresponded to the instrument and score for thirty seconds of
granular sound. Following this laborious experience, I wrote a program in
Algol to generate grain specications from compact, high-level descriptions of
clouds. Using this program, I realized an eight-minute study in granular synthesis called Prototype. Chapter 7 describes these studies in detail. (See also
Roads 1975, 1978a, 1985c, 1987.)
In 1980, I was oered a position as a Research Associate at the Experimental
Music Studio at the Massachusetts Institute of Technology. The computing
environment centered on a Digital Equipment Corporation PDP-11/50 minicomputer (16-bit word length) running the UNIX operating system. There I
implemented two forms of granular synthesis in the C programming language.
These programs generated data that could be read by the Music 11 sound synthesis language. The Csound language (Boulanger 2000; Dodge and Jerse 1997;
Vercoe 1993) is a superset of Music 11. The initial tests ran at a 40 kHz sampling rate, and used 1024-word function tables for the waveforms and envelopes. The 1980 implementation generated a textual score or note-list for a
sinusoidal granular synthesis oscillator. The second, 1981, implementation at
MIT granulated sampled sound les using the soundin unit generator of Music
11. I implemented gestures such as percussion rolls by granulating a single
stroke on a snare drum or cymbal. Due to the limitations of the Music 11 language, however, this version was constrained to a maximum density of thirtytwo simultaneous grains.
An important transition in technology took place in the 1980s with the introduction of personal computers. By 1988, inexpensive computers (less than
111
Granular Synthesis
$5000 for a complete system including audio converters) had become powerful
enough to support stereo 16-bit, 44.1 kHz audio synthesis. In 1988, I programmed new implementations of granular synthesis and granulation of sampled soundles for the Apple Macintosh II computer in my home studio (Roads
1992c, d). I called these C programs Synthulate and Granulate, respectively.
For playback, I used the Studer Dyaxis, a digital audio workstation with good
16-bit converters attached to the Macintosh II. My synthesis programs worked
with a version of the Music 4C language, which I modied to handle the large
amounts of data associated with granular synthesis. Music 4C (Gerrard 1989)
was a C-language variant of the venerable Music IVBF language developed in
the 1960s (Mathews and Miller 1965; Howe 1975). I revised the synthesis programs in 1991 while I was at the Kunitachi College of Music in Tokyo. After
moving to Paris in 1992, I modied the grain generator to work with instruments that I wrote for the Csound synthesis language (Boulanger 2000). The
revised programs ran on a somewhat faster Macintosh Quadra 700 (25 MHz),
but it still took several minutes to calculate a few hundred grains of sound.
Working at Les Ateliers UPIC in 1995, John Alexander and I developed the
Cloud Generator program (Roads and Alexander 1995). Cloud Generator is
a stand-alone synthesis and granulation program for MacOS computers. The
Appendix documents this program. Our implementation of Cloud Generator
merged the C code from several of my previous programs (Synthulate, Granulate, etc.) into a single interactive application. Since then, Cloud Generator has
served as a teaching aid in the basics of granular synthesis. It has also been used
in compositions by many musicians around the world. It provides a variety of
options for synthesis and sound processing. I have used it extensively for research purposes, and in composition.
Although Synthulate and its cousins have no graphical interface, they are
extensible. For this reason, I have continued to use them when I needed to try
an experiment that could not be realized in Cloud Generator. In early 1999, I
revised and recompiled Synthulate and its cousins for the Metrowerks C compiler on the Apple Power Macintosh computer.
Between 1996 and 2000, my CREATE colleagues and I also implemented a
variety of particle synthesis and sound processing programs using versions 1
and 2 of the SuperCollider language (McCartney 1996, 1998). SuperCollider
provides an integrated environment for synthesis and audio signal processing,
with gestural, graphical envelope, or algorithmic control. SuperCollider is my
synthesis environment of choice at the present time.
112
Chapter 3
113
Granular Synthesis
In the early 1990s, the Marseilles team of Daniel Arb and Nathalie Delprat
created the program Sound Mutations for time-frequency analysis of sound.
After analyzing a sound, the program modied and resynthesized it using
granular techniques. It could also perform transformations including timestretching, transposition, and ltering (Arb and Delprat 1992, 1993).
James McCartney included a granular instrument in his Synth-O-Matic program for MacOS (McCartney 1990, 1994). Users could draw envelopes on the
screen of the computer to control synthesis parameters.
Mara Helmuth realized two dierent implementations of granular synthesis techniques. StochGran was a graphical interface to a Cmix instrument
(Helmuth 1991). StochGran was originally developed for NeXT computers,
and later ported to the Silicon Graphics Incorporated IRIX operating system.
Helmuth also developed Max patches for granular sampling in real time on
the IRCAM Signal Processing Workstation (Helmuth 1993).
A group at the University of York implemented granular synthesis with
graphical control (Orton, Hunt, and Kirk 1991). A novel feature was the use of
cellular automata to modify the output by mapping the automata to the
tendency masks produced by the drawing program. Csound carried out the
synthesis.
In 1992 and 1993, I presented several lectures at IRCAM on granular synthesis and convolution techniques, After I left the institute, a number of people
who had attended these lectures launched granular synthesis and convolution
research of their own as extensions of other long-standing projects, namely
Chant synthesis and Max on the IRCAM Musical Workstation. The Granular
Synthesis Toolkit (GIST) consisted of a set of external objects for the Max
programming language, including a sinusoidal FOF grain generator, and a
FOG object for granulation (Eckel, Rocha-Iturbide, and Becker 1995; Rocha
1999). (See the description of FOF synthesis in chapter 4, and the description
of granulation in chapter 5.) Also at IRCAM, Cort Lippe (1993), developed
another Max application for granulation of sound les and live sound.
Recent versions of the Csound synthesis language (Boulanger 2000) provide four unit generators for granular synthesis: fof, fof2, grain, and granule.
Another unit generator, fog, was implemented in versions of Csound from
the universities of Bath and Montreal. The fof generator reads a synthetic
waveform function table and is oriented toward generating formant tones. The
fof2 generator adds control over the initial phase increment in the waveform
function table. This means that one can use a recorded sound and perform
114
Chapter 3
115
Granular Synthesis
116
Chapter 3
Michael Norris (1997) provided four granulation processes in his SoundMagicFX package, which works with the SoundMaker program for MacOS.
Entitled Brassage Time Stretch, Chunk Munger, Granular Synthesis, and
Sample Hose, these exible procedures allow multiple-le input, time-varying
parameters, and additional signal processing to be applied to soundles, resulting in a wide range of granular textures.
Eduardo Miranda developed a Windows application called ChaosSynth for
granular synthesis using cellular automata (CA) control functions (Miranda
1998). Depending on how the CA are congured, they calculate the details of
the grains. A diculty posed by this approach is the conceptual rift between the
CA controls (number of cell values, resistances of the potential divider, capacitance of the electrical capacitor, dimension of the grid, etc.) and the acoustical
results (Correa, Miranda, and Wright 2000).
In 1999, Arboretum Systems oered a scattering granulator eect in its
popular Hyperprism eects processing software. The user controls grain size,
randomization, speed, as well as density and spread.
Can a standard MIDI synthesizer realize granular synthesis? Yes, in a limited
form. The New Yorkbased composer Earl Howard has done so on a Kurzweil
K2500 sampling synthesizer. The K2500 lets one create short samples, which
can repeat by internal signals as fast as 999 bpm, or about every 10 ms. Howard
created granular textures by layering several streams operating at dierent
rates, with each stream having a random delay. Another MIDI-based approach
to granular synthesis is found in Clarence Barlow's spectastics (spectral stochastics) technique. This generates up to two hundred notes per second to approximate the spectrum of a vocal utterance (Barlow 1997).
Even with all these implementations, there is still a need for an instrument
optimized with controllers for the virtuoso performance of granular textures.
Apropos of this, see the description of the Creatovox project in chapter 5.
Summary
As regards electric instruments for producing sound, the enmity with which the few musicians who know them is manifest. They judge them supercially, consider them ugly, of
small practical value, unnecessary. . . . [Meanwhile, the inventors] undiscerningly want the
new electric instruments to imitate the instruments now in use as faithfully as possible and
to serve the music that we already have. What is needed is an understanding of the . . .
possibilities of the new instruments. We must clearly evaluate the increase they bring to our
117
Granular Synthesis
own capacity for expression . . . The new instruments will produce an unforeseen music, as
unlooked for as the instruments themselves. (Chavez 1936)
Granular synthesis is a proven method of musical sound synthesis, and is featured in important compositions (see chapter 7). Implementations of granular
techniques are widespread. Most focus on the granulation of sampled sound
les. Pure granular synthesis using synthetic waveforms is available only in a
few packages.
At low densities, synchronous GS serves as a generator of metrical rhythms
and precise accelerandi/rallentandi. A high-density cloud set to a single frequency produces a stream of overlapping grains. This forms sweet pitched tones
with strong formants, whose position and strength depend greatly on the grain
envelope and duration.
Asynchronous GS sprays thousands of sonic grains into cloudlike formations
across the audio spectrum. At high densities the result is a scintillating sound
complex that varies over time. In musical contexts, these types of sounds can
act as a foil to the smoother, more sterile sounds emitted by digital oscillators.
Granulation of sampled sounda popular techniqueproduces a wide range
of extraordinary variations, explored in chapter 5. The destiny of granular
synthesis is linked both to graphics and to real-time performance.
A paint program oers a uid interface for granular synthesis. The MetaSynth program (Wenger and Spiegel 1999), for example, provides a spray brush
with a variable grain size. A further extension would be a multicolored spray jet
for sonic particles, where the color palette corresponds to a collection of waveform samples. (In MetaSynth, the color of the grains indicates their spatial
location.)
Analysis/resynthesis systems, such as the phase vocoder, have an internal
granular representation that is usually hidden from the user. As predicted (in
Roads 1996), the interfaces of analysis/resynthesis systemswhich resemble
sonogramshave merged with interactive graphics techniques. This merger
sonographic synthesisis a direct and intuitive approach to sound sculpture.
(See chapters 4 and 6 for more on sonographic synthesis and transformation.)
One can scan a sound image (sonogram), touch it up, paint a new image, or
erase it, with the algorithmic brushes of computer graphics.
My colleagues and I continue to rene our instrument for real-time virtuoso
performance of granular synthesis (Roads 19921997). The Creatovox research
project at the University of California, Santa Barbara has resulted in a prototype of a granular synthesis instrument, playable on a standard musical keyboard and other controllers. (See the description in chapter 5.)
118
Chapter 3
Granular synthesis oers unique opportunities to the composer and suggests new ways of organizing musical structureas clouds of evolving sound
spectra. Indeed, granular representation seems ideal for representing statistical
processes of timbral evolution. Time-varying combinations of clouds lead to
such dramatic eects as evaporation, coalescence, and mutations created by crossfading overlapping clouds. A striking similarity exists between these processes
and those created in computer graphics by particle synthesis (Reeves 1983),
often used to create images of re, water, clouds, fog, and grasslike textures,
analogous to some of the audio eects possible with asynchronous granular
synthesis.
Glisson Synthesis
Magnetization Patterns of Glisson Clouds
Implementation of Glisson Synthesis
Experiments with Glisson Synthesis
Assessment of Glisson Synthesis
Grainlet Synthesis
Parameter Linkage in Grainlet Synthesis
Frequency-Duration Experiments
Amplitude-Duration Experiments
Space-Duration Eperiments
Frequency-Space Experiments
Amplitude-Space Experiments
Assessment of Grainlet Synthesis
Trainlet Synthesis
Impulse Generation
Theory and Practice of Trainlets
Assessment of Trainlet Cloud Synthesis
Pulsar Synthesis
Basic Pulsar Synthesis
Pulsaret-Width Modulation
Synthesis across Time Scales
Spectra of Basic Pulsar Synthesis
120
Chapter 4
121
Glisson Synthesis
Glisson synthesis is an experimental technique of particle synthesis. It derives
from the technique of granular synthesis, presented in the previous chapter. I
implemented glisson synthesis after revisiting Iannis Xenakis's original paper
on the theory of granular synthesis (Xenakis 1960). In this article, Xenakis
described each grain as a vector within a three-dimensional space bounded by
time, frequency, and amplitude. Since the grain is a vector, not a point, it can
vary in frequency, creating a short glissando. Such a signal is called a chirp or
chirplet in digital signal processing (Mann and Haykin 1991). Jones and Parks
implemented frequency-modulated grains with a variable chirp rate in 1988.
My implementation of glisson synthesis dates to 1998.
In glisson synthesis, each particle or glisson has an independent frequency
trajectoryan ascending or descending glissando. As in classic granular synthesis, glisson synthesis scatters particles within cloud regions inscribed on the
122
Chapter 4
time-frequency plane. These clouds may be synchronous (metric) or asynchronous (ametric). Certain parameters of glisson synthesis are the same as for
granular synthesis: start time and duration of the cloud, particle duration,
density of particles per second, frequency band of the cloud, amplitude envelope of the cloud, waveform(s) within the particles, and spatial dispersion of the
cloud. (See the description in the previous chapter.)
Magnetization Patterns of Glisson Clouds
The magnetization patterna combination of several parametersdetermines
the frequency direction of the glissons within a cloud. First, the glissandi may
be deep (wide frequency range) or shallow (small frequency range) (gure 4.1a,
b). Second, they may be unidirectional (uniformly up or down) or bidirectional
(randomly up or down) (gure 4.1c, d, e). Third, they may be diverging (starting from a common center frequency and diverging to other frequencies), or
converging (starting from divergent frequencies that converge to a common
center frequency). The center frequency can be changing over time.
Implementations of Glisson Synthesis
Stephen Pope and I developed the rst implementation of glisson synthesis in
February 1998. The software was coded in the SuperCollider 1 synthesis language (McCartney 1996, Pope 1997). Later, I modied the glisson program and
carried out systematic tests. In the summer of 1999, Alberto de Campo and I
reimplemented glisson synthesis in the SuperCollider 2 language (McCartney
1998).
Experiments with Glisson Synthesis
Short glissons (< 10 ms) with a large frequency variation (> 100 Hz) resemble
the classic chirp signals of digital signal processing; sweeping over a wide frequency range in a short period of time. An individual glisson of this type in the
starting frequency range of 400 Hz sounds like a tap on a wood block. When
the starting frequency range is around 1500 Hz, the glissons sound more like
the tapping of claves. As the density of glissons increases and the deviation
randomizes in direction, the texture tends quickly toward colored noise.
Medium-length (25100 ms) glissons ``tweet'' (gure 4.2a), so that a series of
them sounds like birdsong.
123
Figure 4.1 Magnetization patterns in glisson synthesis. The vertical axis is frequency
and the horizontal axis is time. (a) Shallow (small frequency deviation) bidirectional.
(b) Deep (large frequency deviation) bidirectional. (c) Upwards unidirectional. (d)
Downwards unidirectional. (e) Diverging from center frequency. (f ) Converging to
center frequency.
Long glissons (> 200 ms) result in dramatic cascades of sound (gure 4.2b).
At certain densities, they are reminiscent of the massed glissandi textures heard
in such orchestral compositions as Xenakis's Metastasis (1954). A striking eect
occurs when the glissandi diverge from or converge upon a common central
frequency. By constraining the glissandi to octaves, for example, it is possible to
generate sounds similar to the Shepard tones (Risset 1989a, 1997), which seem
to spiral endlessly upward or downward.
Assessment of Glisson Synthesis
Glisson synthesis is a variant of granular synthesis. Its eects segregate into two
categories. At low particle densities, we can perceive each glissando as a sepa-
124
Chapter 4
Figure 4.2 Glissons. (a) Sonogram of a single 25-ms glisson. Notice the gray artefacts
of the analysis, reecting the time-frequency uncertainty at the beginning and end of the
particle. (b) Glisson cloud generated by a real-time performance. The glisson durations
increase over the 16-second duration of the cloud.
125
rate event in a micro-melismatic chain. When the glissons are short in duration
(< 50 ms), their internal frequency variation makes it dicult to determine
their pitch. Under certain conditionssuch as higher particle densities with
greater particle overlapglisson synthesis produces second-order eects that
we perceive on the time scale of sound objects. In this case, the results tend toward a mass of colored noise, where the bandwidth of the noise is proportional
to the frequency variation of the glissandi. Several factors can contribute to the
sensation of a noise mass, the most important being density, wide frequency
variations, and short glisson durations.
Grainlet Synthesis
Grainlet synthesis combines the idea of granular synthesis with that of wavelet
synthesis. (See chapter 6.) In granular synthesis, the duration of a grain is
unrelated to the frequency of its component waveform. In contrast, the wavelet
representation scales the duration of each particle according to its frequency.
Short wavelets represent high frequencies, and long wavelets represent low
frequencies. Grainlet synthesis generalizes this linkage between synthesis
parameters. The fundamental notion of grainlet synthesis is that any parameter of synthesis can be made dependent on (or linked to) any other parameter.
One is not, for example, limited to an interdependence between frequency and
duration.
I implemented grainlet synthesis in 1996 as an experiment in parameter
linkage within the context of granular cloud synthesis (described in the previous chapter). Grainlet synthesis imposes no constraints on the choice of waveform, particle envelope, or any other parameter, except those that we introduce
through parameter linkage.
Parameter Linkage in Grainlet Synthesis
Parameter linkage is the connecting of one parameter with a dependent parameter. As parameter A increases, for example, so does its dependent parameter B. One can also stipulate inverse linkages, so that an increase in A results in
a decrease in B.
Parameter linkages can be drawn as patch diagrams connecting one parameter to another (gure 4.3). An arrow indicates a direct inuence, and a gray
126
Chapter 4
Figure 4.3 Parameter linkage in grainlet synthesis. Each circle represents a parameter
of grainlet synthesis. An arrow from one parameter to another indicates a dependency.
Here parameter 7 is dependent on parameter 2. If parameter 7 is spatial depth and parameter 2 is grainlet start time, then later grainlets have more reverberation. Notice that
parameter 4 is inversely dependent on parameter 8, as indicated by the gray dot. If parameter 4 was grainlet duration and parameter 8 was grainlet frequency, then higher
frequency grainlets are shorter in duration (as in wavelet resynthesis).
127
Figure 4.4 Collections of grainlets. (a) These grainlets are scaled in duration according
to their frequency. (b) Superposition of short high-frequency grainlets over a long lowfrequency grainlet.
1 Grainlet waveform
1 Grainlet position in the stereo eld
1 Grainlet spatial depth (amount of reverberation)
Frequency-Duration Experiments
The rst experiments with grainlet synthesis simulated the relationship between
grain duration and grain frequency found in wavelet representation (gure 4.4a
and b). I later generalized this to allow any frequency to serve as a point of attraction around which certain durations (either very long or very short) could
gravitate (gure 4.5).
128
Chapter 4
Figure 4.5 Inverse sonogram plotted on a logarithmic frequency scale, showing a frequency point of attraction around the grainlet spectrum. The grainlets whose frequencies
are close to the point of attraction (700 Hz) are long in duration, creating a continuous
band centered at this point.
Amplitude-Duration Experiments
These experiments linked grain duration with the amplitude of the grains. In
the case of a direct link, long grains resulted in louder grains. In an inverse relationship, shorter grains had higher amplitudes.
Space-Duration Experiments
These experiments positioned grains in space according to their duration.
Grains of a stipulated duration always appeared to emanate from a specic
location, which might be any point in the stereo eld. Grains whose duration
was not stipulated scattered randomly in space.
Frequency-Space Experiments
These experiments positioned grains in space according to their frequency.
Grains of a stipulated frequency appeared to always emanate from a specic
location, which might be any point in the stereo eld. Other grains whose frequency was not stipulated scattered randomly in space.
129
Amplitude-Space Experiments
These experiments assigned grains a spatial location according to their amplitude. Grains of a stipulated amplitude appeared to always emanate from a
specic location, which might be any point in the stereo eld. Other grains
whose amplitude was not stipulated scattered randomly in space.
Assessment of Grainlet Synthesis
Grainlet synthesis is an experimental technique for realizing linkages among the
parameters of microsonic synthesis. It appears to be a good technique for forcing high-level organizations to emerge from microstructure. Specically, the
clouds generated by grainlet synthesis stratify, due to the internal constraints
imposed by the parameter linkages. This stratication is seen in textures such
as a dense cloud of brief, high-frequency grains punctuated by low and long
grains. Other clouds stratify by spatial divisions. Many parameter linkages are
easy to discern, conveniently serving as articulators in music composition.
Trainlet Synthesis
A trainlet is an acoustic particle consisting of a brief series or train of impulses.
Like other particles, trainlets usually last between 1 to 100 ms. To create timevarying tones and textures, an algorithm is needed that can spawn thousands of
trainlets from a few high-level specications. The main parameters of trainlet
synthesis are the density of the trainlets, their attack time, pulse period, harmonic structure, and spectral energy prole. Before explaining the theory of
trainlets, let us summarize the basics of impulse generation.
Impulse Generation
An impulse is an almost instantaneous burst of energy followed by an immediate decline in energy. In its ideal form, an impulse is innitely narrow in the
time dimension, creating a single vertical line in its time-domain prole. In
practice, however, impulses always last a nite time; this is their pulse width.
Electronic impulses in the real world vary greatly, exhibiting all manner of attack shapes, decay shapes, and transition times. These variations only make
them more interesting from a musical point of view.
130
Chapter 4
Table 4.1 Technical specications of the Hewlett-Packard HP8005B pulse generator
Repetition rate
Attack and decay transition times
Overshoot, preshoot, and ringing
Pulse width
Width jitter
Pulse delay
Delay jitter
Period jitter
131
132
Chapter 4
Figure 4.6 Bandlimited pulses. (a) Sum of eight harmonics. (b) Sum of thirty-two harmonics. Notice the narrowing of the pulse.
can stipulate the lowest harmonic, the total number of harmonics starting from
the lowest, and the chroma of the harmonic series (see below).
Chroma is a spectral brightness factor. Figure 4.7 shows the relationship
between chroma and spectra. Chroma determines the relative strength of the
harmonic series. If the lowest harmonic partial has a strength coecient of A,
the lowest harmonic nth partial will have a coecient of A chroma n , an
exponential curve. The chroma may be positive, zero, or negative, and is not
restricted to integers. If chroma 1, the harmonics are of equal strength. If
chroma < 1, the higher harmonics are attenuated, as though the signal had
been sent through a lowpass lter. As the value of chroma tends toward 0, they
attenuate more rapidly. If chroma > 1, the highest harmonic has the greatest
amplitude (as though a highpass lter had processed it), while each lower harmonic stepping down from the highest has a progressively lower amplitude. As
the chroma value increases, the signal is brighter in timbre.
133
Figure 4.7 Relationship between chroma and spectra. We see time-domain waveforms
and sonogram spectra (on a logarithmic frequency scale) of trainlets with increasing
chroma. All trainlets have a pulse frequency of 100 Hz, with thirty-two harmonics, and
the lowest harmonic is always 1. The chroma values are indicated at the bottom of the
gure. In the last example, chroma 20, the algorithm explodes.
134
Chapter 4
Figure 4.8 Time-domain view of scattering of trainlets in two channels. The trainlets
are at various frequencies and have various durations. The amplitude of each trainlet
(the sum of both channels) is constant.
Trainlet clouds are musical units on the sound object time scale. As in granular synthesis, synchronous clouds spawn a regular series of trainlets. A linear
accelerando or rallentando can also be realized in this mode. Asynchronous
clouds spawn trainlets at random intervals according to the stipulated density.
Figure 4.8 provides a time-domain view of a trainlet cloud, while gure 4.9
shows the sonogram of two trainlet clouds, one with long trainlets, the other
with short trainlets. Notice their characteric spectral pattern.
Table 4.2 enumerates the rest of the cloud parameters. The values in brackets
are typical ranges.
135
Figure 4.9
136
Chapter 4
Table 4.2 Trainlet cloud parameters
1. Cloud start time, in seconds
2. Cloud duration, in seconds
3. Random duration ag. If set, interpret the initial and nal trainlet durations as
maximum and minimum trainlet durations, respectively. The actual durations are
generated randomly between these limits.
4. Trainlet durations at start of cloud (0.0010.8 sec)
5. Trainlet durations at end of cloud
6. Density of trainlets per second at start of cloud (1300)
7. Density of trainlets per second at end of cloud
8. Upper frequency bandlimit of the cloud at start (20 Hz20 KHz)
9. Lower frequency bandlimit of the cloud at start
10. Upper frequency bandlimit of the cloud at end
11. Lower frequency bandlimit of the cloud at end
12. Amplitude at start of cloud (196 dB)
13. Amplitude at end of cloud
14. Number of harmonics per trainlet at start of cloud (164)
15. Number of harmonics per trainlet at end of cloud
16. Lowest sounding harmonic at start of cloud
17. Lowest sounding harmonic at end of cloud
18. Chroma at start of cloud. If chroma < 0 then the eect is lowpass. If chroma 1
all harmonics are equal in strength. If chroma > 1 then the eect is highpass.
19. Chroma at end of cloud,
20. Initial waveform, usually sine
21. Final waveform, usually sine
22. Spatial position of trainlets in the stereo eld at the start of the cloud, either xed
(0 L, 0:5 middle, 1 right) or random
23. Spatial position of trainlets in the stereo eld at the end of the cloud
24. Initial attack time (550 ms)
25. Final attack time
Note: Initial and nal values refer to the beginning of the cloud and the end of the cloud,
respectively.
137
Pulsar Synthesis
Pulsar synthesis (PS), named after the spinning neutron stars that emit periodic
signals in the range of 0.25 Hz to 642 Hz, is a powerful method of digital sound
synthesis with links to past analog techniques. Coincidentally, this range of
frequenciesbetween rhythm and toneis of central importance in pulsar
synthesis.
PS melds established principles within a new paradigm. In its basic form, it
generates electronic pulses and pitched tones similar to those produced by
138
Chapter 4
analog instruments such as the Ondioline (Jenny 1958; Fourier 1994) and the
Hohner Elektronium (1950), which were designed around the principle of
ltered pulse trains. Pioneering electronic music composers including Stockhausen (1955, 1957, 1961, 1963) and Koenig (1959, 1962) used ltered impulse
generation as a staple in their studio craft. Pulsar synthesis is a digital technique, however, and so it accrues the advantages of precise programmable
control, waveform exibility, graphical interface, and extensibility. In its advanced form, pulsar synthesis generates a world of rhythmically structured
crossbred sampled sounds.
This section rst presents the basic theory of pulsars and pulsar graphs. We
then move on to the more advanced technique of using pulsars to transform
sampled sounds through cross-synthesis, presenting musical applications of
pulsar synthesis in compositions by the author. At the end of this section, we
describe the features of a new interactive program called PulsarGenerator
(Roads 2001).
Basic Pulsar Synthesis
Basic pulsar synthesis generates a family of classic electronic music timbres akin
to those produced by an impulse generator connected to a bandpass lter.
Unlike this classic technique, however, there is no lter in the basic PS circuit.
A single pulsar is a particle of sound. It consists of an arbitrary pulsaret
waveform w with a period d followed by a silent time interval s (gure 4.10a).
The total duration of a pulsar is p d s, where p is the pulsar period, d is the
duty cycle, and s is silent. Repetitions of the pulsar signal form a pulsar train.
Let us dene the frequency corresponding to the repetition period as fp 1=p
and the frequency corresponding to the duty cycle as fd 1=d. Typical ranges
of fp are between 1 Hz and 5 kHz, the typical range of fd is from 80 Hz to
10 kHz.
In PS, both fp and fd are continuously variable quantities. They are controlled by separate envelope curves that span a train of pulsars. The train is the
unit of musical organization on the time scale of notes and phrases, and can last
anywhere from a few hundred milliseconds to a minute or more.
Notice in gure 4.10b that the duty ratio or d : s ratio varies while p remains
constant. In eect, one can simultaneously manipulate both fundamental frequency (the rate of pulsar emission) and what we could call a formant frequency
(corresponding to the duty cycle), each according to separate envelopes. A
139
Figure 4.10 Anatomy of a pulsar. (a) A pulsar consists of a brief burst of energy called
a pulsaret w of a duration d followed by a silent interval s. The waveform of the pulsaret,
here shown as a band-limited pulse, is arbitrary. It could also be a sine wave or a period
of a sampled sound. The total duration p d s, where p is the fundamental period
of the pulsar. (b) Evolution of a pulsar train, time-domain view. Over time, the pulsar
period p remains constant while the pulsaret period d shrinks. The ellipses indicate a
gradual transition period containing many pulsars between the three shown.
140
Chapter 4
Figure 4.11 Typical pulsaret waveforms. In practice, any waveform can be used.
(a) Sine. (b) Multicycle sine. (c) Band-limited pulse. (d) Decaying multicycle sinusoid.
(e) Cosmic pulsar waveform emitted by the neutron star Vela X-1.
141
Figure 4.12 Typical pulsaret envelopes. (a) Rectangular. (b) Gaussian. (c) Linear
decay. (d) Exponential decay. The term b determines the steepness of the exponential
curve. (e) Linear attack, with duty cycle d. (f ) Exponential attack. The term x determines
the steepness of the exponential curve. (g) FOF envelope. (h) Bipolar modulator.
generalization is that v can also be any shape. As we show later, the envelope v
strongly aects the spectrum of the pulsar train.
Figure 4.12 shows some typical pulsaret envelopes. A rectangular envelope
(gure 4.12a) produces a broad spectrum with strong peaks and nulls for any
pulsaret. Figure 4.12g depicts a well-known conguration for formant synthesis, an envelope with a sharp attack followed by an exponential decay. This
corresponds to the FOF and Vosim techniques described later in this chapter.
Such a conguration can be seen as a special case of pulsar synthesis. As gure
4.12h shows, the envelope can also be a bipolar ring modulator.
Keeping p and w constant and varying d on a continuous basis creates the
eect of a resonant lter swept across a tone. There is, of course, no lter in this
circuit. Rather, the frequency corresponding to the duty cycle d appears in the
spectrum as a formant peak. By sweeping the frequency of this peak over time,
we obtain the sonic equivalent of a time-varying bandpass lter applied to a
basic impulse train.
142
Chapter 4
Pulsaret-Width Modulation
Pulse-width modulation (PWM) is a well-known analog synthesis eect occuring
when the duty cycle of a rectangular pulse varies while the fundamental frequency remains constant (gure 4.13a). This produces an edgy ``sawing'' quality as the upper odd harmonics increase and decrease over the course of the
modulation. At the extremes of PWM, the signal is silent. For example, when
d 0, PWM results in a signal of zero amplitude (gure 4.13b). When d p,
PWM produces a signal of a constant amplitude of 1 (gure 4.13c).
Pulsaret-width modulation (PulWM) extends and improves this model. First,
the pulsaret waveform can be any arbitrary waveform. Second, it allows the
duty cycle frequency to pass through and below the fundamental frequency.
Here fd a fp . Notice in gure 4.13 how the duty cycle of the sinusoid increases
from (d) to (e). In (f ), p d. Finally, in (g) p < d. That is, the duty cycle is
longer than the fundamental period. Only the rst quadrant of the sine wave
repeats. The fundamental period cuts o the duty cycle of the pulsaret in midwaveform. In our implementation, we apply a user-controlled crossfade time
around this cuto point, which we call the edge factor. When there is no
crossfade, the edge factor is high.
We have also tested an alternative approach to pulsar-width modulation,
designed by Alberto de Campo, which produces a dierent sound. In this
method, overlapped pulsaret-width modulation or OPulWM, the fundamental
frequency corresponds to the rate of pulsar emission, independent of the pulsaret duty cycle. That is, the duty cycle of an individual pulsar always completes, even when it crosses below the fundamental frequency. Whenever the
fundamental period expires, our algorithm spawns a new pulsar. Thus, when
d > p several pulsars overlap with others whose duty cycle has not yet completed. As d increases, the generator spawns more and more overlapping
pulsars. For practical reasons, then, we stipulate an arbitrary overlap limit.
OPulWM results in a great deal of phase cancellation and so tends to be a more
subtle eect than regular PulWM.
Synthesis across Time Scales
PS operates within and between musical time scales. It generates a stream of
microsonic particles at a variable rate across the continuum spanning infrasonic pulsations and audio frequencies. When the distance between successive
143
Figure 4.13 Pulsaret-width modulation. (a) Classical PWM with a rectangular pulse
shape. The ellipses indicate a gradual transition between the pulses. (b) PWM when the
duty cycle d 0 results in a signal of zero amplitude. (c) PWM when the duty cycle
d p (the fundamental period), the result is a signal with a constant amplitude of 1.
(d) Pulsar train with a sinusoidal pulsaret. (e) Same period as (d), but the duty cycle is
increasing. (f ) The duty cycle and the period are equal, resulting in a sinusoid. (g) The
duty cycle is greater than the fundamental period, which cuts o the nal part of the sine
waveform.
144
Chapter 4
impulses is less than about one twentieth of a second, the human hearing
mechanism causes them to fuse into a continuous tone. This is the forward
masking eect (Buser and Imbert 1992). As Helmholtz (1885) observed, in
the range between 20 and 35 Hz, it is dicult to distinguish the precise pitch
of a sustained tone; reliable pitch perception takes hold at about 40 Hz,
depending on the waveform. So listeners hear pitch in a periodic sustained tone
for p between approximately 25 ms (corresponding to fp 40 Hz) and 200 msec
(corresponding to fp 5 kHz).
As the rate of pulsar emission slows down and crosses through the threshold
of the infrasonic frequencies ( fp < 20 Hz), the sensation of continuous tone
evaporates, and we can perceive each pulsar separately. When the fundamental
fp falls between 62.5 ms (corresponding to the time span of a thirty-second note
at quarter note 60 MM) and 8 sec (corresponding to the time span of two
tied whole notes at quarter note 60 MM), we hear rhythm. The fundamental
frequency envelope becomes a graph of rhythm. This takes the form of a function of time that a user draws on the screen of a computer. Such a pulsar graph
can serve as an alternative form of notation for one dimension of rhythmic
structure, namely the onset time of events. The correspondence between the
musical units of rhythmic structure (note values, tuplets, rests, etc.) can be made
clear by plotting note values on the vertical or frequency scale. For example,
assuming a tempo of 60 MM, a frequency of 5 Hz corresponds to a quintuplet
gure. Note that the two-dimensional pulsar graph does not indicate the duration of the events. This could be represented by adding a third dimension to the
plot.
To interpret the rhythm generated by a function inscribed on a pulse graph,
one has to calculate the duration of the grain emission curve at a given xed
frequency rate. For example, a grain emission at 4 Hz lasting 0.75 seconds
emits three grains. When grain emission switches from one value to the next,
the pulsar corresponding to the new duration plays immediately followed by a
silence equal to the period of grain emission. Figure 4.14 plots a rhythm that
alternates between xed-rate pulses, accelerandi, and silence.
Spectra of Basic Pulsar Synthesis
Many time-varying parameters interact to produce the pulsar timbre, including
the pulsaret, the pulsaret envelope, the fundamental frequency, multiple formant frequencies, and the burst masking ratio. The spectrum of a single pulsar
stream is the convolution product of w and v, biased in frequency by fd and fp .
145
Figure 4.14 Pulsar rhythms. (Top) Pulse graph of rhythm showing rate of pulsar emission (vertical scale) plotted against time (horizontal scale). The left-hand scale measures
traditional note values, while the right-hand scale measures frequencies. (Bottom) Timedomain image of generated pulsar train corresponding to the plot above.
146
Chapter 4
Figure 4.15 Eect of the pulsaret envelope on the spectrum. The top row presents frequency-versus-time sonograms of an individual pulsar with a sinusoidal pulsaret, a fundamental frequency of 12 Hz, and a formant frequency of 500 Hz. The sonograms use
1024-point fast Fourier transform plots with a Von Hann window. They are plotted on a
linear frequency scale. From left to right, we see the sonogram produced by a rectangular envelope, an expodec envelope, and a Gaussian envelope. The lower row plots
the spectra of these pulsars on a dB scale.
Since w and v can be arbitrary waveforms, and fd and fp can vary continuously,
the range of spectra produced by PS is quite large.
When the formant frequency is set at a specic frequency energy spreads in
that region of the spectrum. Precisely how the energy spreads depends on w and
v. The pulsaret waveform w can be considered a template of spectrum shape
that repeats at the stipulated fundamental frequency fp and is time-scaled by
the duty cycle or formant frequency fd . If, for example, the ratio of the amplitudes of the rst ve harmonics of w is 5 : 4 : 3 : 2 : 1, this ratio prevails, independent of the specic values of p and d, when fp a fd .
The pulsaret envelope's contribution to the spectrum is signicant. Figure
4.15 shows the spectra of individual pulsars where the waveform w is a sinusoid,
and the pulsaret envelope v varies among three basic shapes. In the case of gure 4.15a, v is rectangular. Consequently, the formant spectrum takes the form
of a broad sinc sinx=x function in the frequency domain. The spectrum
shows strong peaks at factors of 1.5 fd , 2.5 fd , etc., and nulls at harmonics of fd .
This is characteristic of the sinc function. An exponential decay or expodec
envelope (such as in gure 4.15d) tends to smooth the peaks and valleys in
147
the spectrum (gure 4.15b). The bell-shaped Gaussian envelope compresses the
spectral energy, centering it around the formant frequency (gure 4.15c).
Thus by modifying the pulsaret envelope, one can alter the prole of the
pulsar spectrum. The appendix presents a mathematical analysis of the spectra
of simple pulsaret envelopes.
Advanced Pulsar Synthesis
Advanced pulsar synthesis builds upon basic pulsar synthesis by adding several
features that take it beyond the realm of vintage electronic sonorities. Three
methods are of particular importance:
1. Multiple pulsar generators sharing a common fundamental frequency but
with individual formant and spatial trajectories
2. Pulse-masking to shape the rhythm of the pulsar train
3. Convolution of pulsar trains with sampled sounds
Figure 4.16 outlines the schema of advanced pulsar synthesis. The following
sections explain the dierent parts of this schema.
Multiple Pulsar Generators
A pulsar generator has seven parameters:
1 Pulsar train duration
1 Pulsar train fundamental frequency envelope fp
1 Pulsaret formant frequency envelope fd
1 Pulsaret waveform w
1 Pulsaret envelope v
1 Pulsar train amplitude envelope a
1 Pulsar train spatial path s
The individual pulsar train is the simplest case. To synthesize a complex
sound with several resonance peaks, we can add several pulsar trains with the
same fundamental frequency but with dierent time-varying formant frequencies fd . One envelope controls their common fundamental frequency, while two
or more separate envelopes control their formant trajectories fd1 , fd2 , etc.
148
Chapter 4
149
A unique feature of pulsar synthesis is that each formant can follow its own
spatial path. This leads to complex spatial interplay within a single tone or
rhythmic phrase.
Pulsar Masking, Subharmonics, and Long Tonepulses
A pulsar generator emits a metronomic sequence of pulsars, where the rate of
emission can vary over time according to the fundamental frequency envelope
function fp . Pulsar masking breaks up the stream by introducing intermittancies
(regular or irregular) into the metronomic sequence. It deletes individual pulsarets, leaving an interval of silence in their place. This takes three forms: burst,
channel, and stochastic masking.
Burst masking (gure 4.17a) models the burst generators of the classic
electronic music studios. It produces a regular pattern of pulsarets that are
interrupted at regular intervals. The on/o pattern can be stipulated as
the burst ratio b : r, where b is the burst length in pulsaret periods and r is
the rest length in pulsaret periods. For example, a b : r ratio of 4 : 2 produces an alternating sequence of four pulsarets and two silent periods:
111100111100111100111100, etc. If the fundamental frequency is infrasonic, the eect is rhythmic.
When the fundamental is in the audio frequency range, burst masking
imposes an amplitude modulation eect on the timbre (gure 4.18), dividing
the fundamental frequency by a subharmonic factor b r. With the PulsarGenerator program (described later), we can alter the burst ratio in real time,
producing a gamut of subharmonic permutations.
When b r is large, the subharmonic crosses through the threshold separating tone and rhythm. The result is a series of alternating long tonepulses (at the
fundamental pitch) and silent intervals.
Channel masking (gure 4.17b) selectively masks pulsars in two channels, creating a dialog within a phrase by articulating each channel in turn.
Figure 4.17b shows two channels only, but we can generalize this scheme to N
channels.
Figure 4.16 Schema of pulsar synthesis. A pulsar generator with separate envelope
controls for fundamental frequency, formant frequency, amplitude, stochastic masking,
and spatial position. In advanced pulsar synthesis, several generators may be linked with
separate formant and spatial envelopes. A pulsar stream may be convolved with a sampled sound.
150
Chapter 4
Figure 4.17 Pulsar masking turns a regular train into an irregular train. Pulsars are
illustrated as quarter notes, and masked pulsars are indicated as quarter rests. (a) Burst
masking. The burst ratio here is 3 : 3. (b) Channel masking. (c) Stochastic masking
according to a probability table. When the probability is 1, there is no masking. When
the probability is 0, there are no pulsars. In the middle, the pulsar train is intermittent.
Notice the thinning out of the texture as the probability curve dips in the center.
151
Figure 4.18 Sonogram depicting the eect of burst masking in the audio frequency
range. The pulsaret is one cycle of a sinusoid, and the pulsaret envelope is rectangular.
The b : r ratio is 2 : 1. The fundamental frequency is 100 Hz and the formant frequency is
400 Hz. Notice the subharmonics at 133 Hz and 266 Hz caused by the extended periodicity of the pulse masking interval (400 Hz/3).
152
Chapter 4
153
Figure 4.19 Eect of convolution with pulsar train. (a) Infrasonic pulsar train with a
variable fundamental and formant frequency. (b) Sampled sound, the Italian word qui
(pronounced ``kwee''). (c) Convolution of (a) and (b).
154
Chapter 4
is a collection of percussion samples. If one seeks a smoother and more continuous texture the constraints can be relaxed. Samples with long durations
superimpose multiple copies of the sampled object, creating a rippling sound
stream. Samples with slow attacks blur the onset of each sample copy, smearing
the stream into a continuum. Thus by controlling the attack shape of the sample one can aect the sonic texture.
Implementations of Pulsar Synthesis
My original implementation of PS dates to 1991, using James McCartney's
Synth-O-Matic, a programmable sound synthesis environment for Apple
Macintosh computers (McCartney 1990, 1994). In 1996, Mr. McCartney replaced Synth-O-Matic with SuperCollider 1an object-oriented programming
language with a Power Macintosh runtime system (McCartney 1996). Using
SuperCollider 1, Stephen T. Pope and I created a new implementation of basic
PS in 1997.
With the improved SuperCollider 2 (McCartney 1998), Alberto de Campo
and I developed a new realization of pulsar synthesis, presented in a 1999
summer course at the Center for New Music and Audio Technology, University of California, Berkeley. Further renement of this prototype has led to the
PulsarGenerator application, distributed by CREATE. Figure 4.20 shows the
graphical interface of PulsarGenerator, version 1. Notice the control envelopes
for the synthesis variables. Users can design these envelopes in advance of
synthesis, or manipulate them in real time as the instrument plays. We have
implemented a scheme for saving and loading these envelopes in groups called
settings. The program lets one crossfade at a variable rate between settings,
which takes performance with PulsarGenerator to another level of musical
complexity.
In wave-oriented synthesis techniques, an algorithm loops through a wavetable and varies the signal according to relatively slowly-updated control
functions. Thus the eciency of synthesis corresponds to the number of simultaneous unit generators (oscillators, lters, etc.). In contrast, particle synthesis is more demanding, since the synthesis algorithm must also handle the
task of scheduling possibly thousands of events per second, each of which may
be unique. The eciency of pulsar synthesis is therefore related to the rate of
particle emission. At infrasonic rates (< 20 pulsars per second), the PulsarGenerator application uses less than 3.6% of the processor on a single-processor
Apple G4 running at a 500 MHz clock speed. At high audio rates (such as
a three-formant instrument emitting six thousand pulsars per second, corre-
155
sponding to the fundamental frequency of 2 kHz), the application requires approximately 45% of the processor. It is a testimony to SuperCollider 2 that the
entire implementation, including the graphical interface, needed less than one
thousand and ve hundred lines of code and comments. Our code builds the
interface, denes the synthesis algorithm, schedules the pulsars, and handles le
input and output. McCartney's SCPlay, an ecient real-time sound engine,
calculates the samples.
Composing with Pulsars
To interact with PulsarGenerator in real time is to experiment directly with
sonic ideas. While experimenting, a composer can save settings and plan how
these will be used within a composition. The PulsarGenerator program can also
156
Chapter 4
record the sounds produced in a real-time session. The composer can then edit
the session or convolve and mix it with other material.
A nal stage of pulsar composition is to merge multiple trains to form a
composite texture. This is a question of montage, and is best handled by editing
and mixing software designed for this purpose. Each layer of the texture may
have its own rhythmic pattern, formant frequency envelope, choice of convolved objects, and spatial path. Working on a variety of time scales, a composer can apply signal processing transformations such as mixing with other
sounds, ltering, modulations, and reverberation to individual pulsars, pulsar
trains, and pulsar textures.
Musical Applications of Pulsar Synthesis
I developed pulsar synthesis while realizing Clang-Tint (Roads 1993b), an electronic music composition commissioned by the Japanese Ministry of Culture
(Bunka-cho) and the Kunitachi College of Music, Tokyo. The second movement of this work, entitled Organic, focuses on expressive phrasing. It combines
bursts of insect noise and animal and bird calls with electronic pulse-tones. The
electronic sound palette utilizes pulsar synthesis in many forms: pulsating blips,
elongated formant tones, and clouds of asynchronous pulsars. For the latter, I
rst generated multiple infrasonic pulsar trains, each one beating at a dierent
frequency in the range of 6 to 18 Hz. I then mixed these together to obtain the
asynchronous pulsar cloud.
The raw material of my electronic music composition Half-life, composed in
1998 and 1999, is a one-minute pulsar train that is wildly varied. Most of the
sounds in the rest of the work are derived from this source. Half-life extends
the pulsar material through processes of granulation, microltration, granular
pitch-shifting, recirculating feedback echo, individual pulsar amplitude shaping,
and selective reverberation. Tenth vortex (2000) and Eleventh vortex (2001)
continue in this direction.
Assessment of Pulsar Synthesis
Music unfolds on multiple time scales, from high-level macrostructure down to
a myriad of individual sound objects or notes. Below this level is another hierarchy of time scales. Here are the microsonic particles such as the classical
rectangular impulses, grains, wavelets, and pulsars (Roads 1999). Musicians
proved the eectiveness of analog impulse generation decades ago. In com-
157
parison, digital pulsar synthesis oers a exible choice of waveforms and envelopes, increased precision, and graphical programmable control.
Unlike wave-oriented synthesis techniques, the notion of rhythm is built into
techniques based on particles. Rhythm, pitch, and timbre are all interrelated
but can be separately controlled. Pulsar synthesis oers a seamless link between
the time scales of individual particle rhythms, periodic pitches, and the meso or
phrase level of composition. A novel feature of this technique is the generation
of multiple independent formant trajectories, each following its own spatial path.
As we have shown, basic pulsar technique can create a broad family of
musical structures: singular impulses, rhythmic sequences, continuous tones,
time-varying phrases, and beating textures. Pulsar microevents produce rhythmic sequences or, when the density of events is suciently high, sustained tones,
allowing composition to pass directly from microstructure to mesostructure.
158
Chapter 4
159
tablet, mounted vertically like a painter's easel. The rst composition realized
with the UPIC was Xenakis's Mycenae-Alpha (1980). A major breakthrough
for the system was the development of a real-time version, based on a 64oscillator synthesis engine (Raczinski and Marino 1988). By 1991, engineers
had coupled this engine to a personal computer running the Windows operating
system, permitting a sophisticated graphical interface (Marino, Raczinski, and
Serra 1990; Raczinski, Marino, and Serra 1991). The program now runs standalone, with no additional hardware.
At the level of sound microstructure, waveforms and event envelopes can
be drawn directly onto the tablet and displayed onscreen. At a higher level of
organization, composers can draw the sonographical frequency/time structure
of a score page. Linescalled arcsappear on the display screen as one draws
with the mouse. Individual arcs can then be moved, stretched or shrunk, cut,
copied, or pasted. The arcs on the page can also represent sampled sounds.
In the 1991 version of the UPIC, a page can have sixty-four simultaneous
arcs, with four thousand arcs per page. Most importantly, the duration of each
page can last from 6 ms to more than two hours. This temporal exibility lets
the user zoom in to the micro time scale. When a page lasts only a second, say,
any arcs written onto it will be microsounds. These micro-arcs can also be cut,
copied, and pasted, as well as stretched or compressed in time and frequency.
Moreover, the rate at and direction in which the score is read can be controlled
in real time with a mouse. This allows discontinuous jumps from one region of
the score to another, for example. The sequence of control motions as it plays
a score can be recorded and later the same performance can be replayed or
edited.
The UPIC system is an especially pliable musical tool since it integrates
many levels of composition within a common user interface. Graphic functions
created onscreen can function equally as envelopes, waveforms, pitch-time scores,
tempo curves, or performance trajectories. This uniform treatment of composition data at every level should be extended to more computer music systems.
Synthesis of Microsound in Phonogramme
Vincent Lebros developed Phonogramme in 1993 at the Universite de Paris
VIII (Lesbros 1995, 1996). Phonogramme oers an approach to graphical synthesis with some similarities to the UPIC, but oering a number of extensions.
First, the program can generate sound directly from a MacOS computer, or it
can generate MIDI data to be sent to a bank of synthesizers. Second, the pro-
160
Chapter 4
161
Figure 4.21 Phonogramme scores. Both scores are just over 4.6 seconds in length.
(a) Fast horizontal gestures leave behind a stream of micro-arcs. Notice the four harmonics superimposed over the original low-frequency gestures by the harmonic pencil.
(b) Slow hand movements create continuous tones.
162
Chapter 4
Figure 4.22 Sonographical synthesis of particles in MetaSynth. (a) The particles were
drawn by hand with a spray brush tool. This particle score can be played back with
many dierent waveforms, including sampled sounds. (b) Time-domain view of the
waveform.
163
each line and its mapping into sound can obtain precise results. It is just as
worthwhile to treat the medium as a sketch pad, where initial drawings are later
rened into a nished design.
Unlike traditional notation, which requires serious study for a long period of
time, a child can learn the relationship between drawn gestures and sound in
minutes. This initial simplicity hides a deeper complexity, however. As with any
technique, the best results demand a long period of study and experimentation.
As mentioned earlier, a single arc is not a description of a complete sound
object. An arc is only one component of a complex time-varying sound. Such
sounds require many arcs. We can see this complex nature in the sonograms of
relatively simple instrumental tones, such as a cello. Noiser timbres, such as a
cymbal or gong, display enormous complexity. They seem to be composed of
globules of energy that sometimes connect and at other times break apart.
As such representations become more commonplace, a long process of codicationfrom complex sonographic patterns into abstract iconic notation
seems to be inevitable. The Acousmagraphe system, developed at the Groupe
de Recherches Musicale (Paris), points in this direction (Desantos 1997).
164
Chapter 4
human speech, whereas WF was developed to emulate the formants of traditional musical instruments. For more detailed descriptions, see the references
and Roads (1996).
FOF Synthesis
Formant wave-function synthesis ( fonction d'onde formantique or FOF) generates a stream of grains, each separated by a quantum of time, corresponding
to the period of the fundamental frequency. So a single note produced by this
technique contains hundreds of FOF grains. The denitive FOF grain is a sine
wave with either a steep or smooth attack and a quasi-exponential decay.
The envelope of a FOF grain is local, that of the entire note is global. The
local envelope of a FOF grain is dened as follows. For the attack portion of
the FOF grain, 0 a t a tex, the envelope is:
envt 1=2 1 cospt =tex expattent
For the decay portion, t b tex, the envelope is:
envt expattent
p is the initial phase of the FOF signal, tex is the attack time of the local envelope, and atten is the decay time (D'Allessandro and Rodet 1989). The eect
is that of a damped sinusoidal burst, each FOF grain lasting just a few milliseconds. The convolution of the brief FOF envelope with the sinusoid contributes audible sidebands around the sine wave, creating a formant spectrum. The
spectrum of the damped sine generator is equivalent to the frequency response
curve of one of the bandpass lters and the result of summing several FOF
generators is a spectrum with several formant peaks.
Each FOF generator is controlled by many parameters. Among these are the
formant parameters p1 through p4:
p1 is the center frequency of the formant.
p2 is the formant bandwidth, dened as the width between the points that are
6 dB from the peak of the formant.
p3 is the peak amplitude of the formant.
p4 is the width of the formant skirt. The formant skirt is the lower part of the
formant peak, about 40 dB below the peak, akin to the foothills of a mountain.
The skirt parameter is independent of the formant bandwidth, which species
the breadth at the peak of the mountain.
165
The inherent connection between time-domain and frequency-domain operations is exemplied in the way FOF parameters are specied. Two of the main
formant parameters are specied in the time domain as properties of the envelope of the FOF grain. First, the duration of the FOF attack controls parameter p4, the width of the formant skirt, that is, as the duration of the attack
lengthens, the skirtwidth narrows. Second, the duration of the FOF decay
determines p2, the formant bandwidth. Hence a long decay length translates
into a sharp resonance peak, while a short decay widens the bandwidth of the
signal.
The basic sound production model embedded in FOF synthesis is the voice.
However, users can tune many parameters to move beyond vocal synthesis
toward synthetic eects and emulations of instruments (Bennett and Rodet
1989).
Typical applications of FOF synthesis congure several FOF generators in
parallel. Some implementations are very complicated, with over 60 parameters
to be specied for each sound event. The CHANT program, developed in the
1980s, was proposed as a response to this complexity, providing a collection
of rules for controlling multiple FOF streams in parallel (Rodet, Potard, and
Barriere 1984).
Vosim
Like FOF synthesis, the Vosim technique generates a series of short-duration
particles in order to produce a formant eect. Vosim synthesis was developed
by Werner Kaegi and Stan Tempelaars at the Institute of Sonology in Utrecht
during the early 1970s (Kaegi 1973, 1974; Tempelaars 1976, 1977, 1996). Vosim
generates a series of tonebursts, producing a strong formant component. Like
FOF, it was originally designed for vowel sounds, and later extended to model
vocal fricativesconsonants such as [sh]and quasi-instrumental tones (Kaegi
and Tempelaars 1978).
The Vosim waveform approximates the signal generated by the human voice
in the form of a series of pulsetrains, where each pulse is the square of a sine
function. The parameter A sets the amplitude of the highest pulse. Each of the
pulsetrains contains N sin 2 pulses in series decreasing in amplitude by a decay
factor b. The width (duration) of each pulse T determines the position of the
formant spectrum. A variable-length delay M follows each pulse train, which
contributes to the pulsetrain's overall period, and thus determines the fundamental frequency period. The period is N T M, so for seven pulses of
166
Chapter 4
Table 4.3 VOSIM parameters
Name
Description
T
dT
M
dM
D
A
dA
b
N
S
NM
NP
Pulsewidth
Increment or decrement of T
Delay following a series of pulses
Increment or decrement of M
Maximum deviation of M
Amplitude of the rst pulse
Increment or decrement of A
Attentuation constant for the series of pulses
Number of pulses per period
Type of modulation (sine or random)
Modulation rate
Number of periods
200 msec duration and a delay equal to 900 msec, the total period is 3 ms and
the fundamental frequency is 333.33 Hz. The formant centers at 5000 Hz.
Two strong percepts emerge from the typical Vosim signal: a fundamental
corresponding to the repetition frequency of the entire signal, and a formant
peak in the spectrum corresponding to the pulsewidth of the sin 2 pulses. A
Vosim oscillator produces one formant. In order to create a sound with several
formants, it is necessary to mix the outputs of several Vosim oscillators.
Table 4.3 lists the set of parameters that control the Vosim oscillator. T, M,
N, A, and b are the primary parameters. By modulating the delay period M,
one can produce vibrato, frequency modulation, and noise sounds. Kaegi and
Tempelaars introduced three additional variables: S, D, and NM, corresponding respectively to the type of modulation (sine or random), the maximum frequency deviation, and the modulation rate. They wanted also to be able to
provide for ``transitional'' sounds, which led to the introduction of the variables
NP, dT, dM, and dA. These are the positive and negative increments of T, M,
and A, respectively, within the number of periods NP.
By changing the value of the pulsewidth T, the formant changes in time. This
is formant shifting, a dierent eect than the progressive spectral enrichment
which occurs in, for example, frequency modulation synthesis. The Vosim signal is not bandlimited, but spectral components are greater than 60 dB down at
six times the fundamental frequency (Tempelaars 1976).
167
168
Chapter 4
WF synthesis requires an amplitude compensation scheme, because low frequencies contain few pulses and much zero-amplitude deadtime or silence,
while high frequencies contain many pulses and almost no deadtime. A quasilinear scaling function adjusts the amplitude as an inverse function of frequency. That is, low tones are emphasized and high tones are attentuated for
equal balance throughout the frequency range.
Assessment of Particle-Based Formant Synthesis Techniques
Particle-based formant synthesis models a class of natural mechanisms which
resonate when excited, and are quickly damped by physical forces. A typical
example would be a stroke on a woodblocka sound which cuts o almost
immediately. The result is a grain-like ``pop.'' Another example is the glottal
pulse, which the vocal tract lters. Continuous tones string together a series of
such particles.
FOF synthesis has been available within the widely distributed Csound language for some time (Boulanger 2000). The Common Lisp Music language
(Schottstaedt 2000) includes a wavetrain object able to realize both FOF and
Vosim synthesis. Window Function synthesis technique was experimental, and
since its original realization, has not continued.
Vocal-like tones can be simulated by mimicking the fast impulses that continuously excite resonance in the vocal tract. Realistic simulation from a particle technique, however, requires an enormous investment of time. In the 1980s,
vocal synthesis using the FOF technique performed the ``Queen of the Night''
aria from Mozart's Magic Flute. The realization of this 30-second fragment
took months of eort.
In the simulation of vocal and instrumental tone, the particle representation
should be invisible. If we divorce these techniques from their original uses, we
can see that particle-based formant synthesis remains a rich resource for synthetic timbres, both at the infrasonic frequency levelwhere it produces a wide
variety of rhythmic popsand also at the audio frequency level, where it generates expressive resonant tones.
169
Figure 4.23 Transient wave writing in a sound editor. Notice the hand-drawn transients
interspersed with narrow computer-generated impulses.
editor which provides a waveform pencil that can be used to manually inscribe
the waveform of a transient. These transients can be interspersed with waveforms produced by other means (gure 4.23).
Transformational transient drawing also uses a sound editor, after beginning
with an existing low-level audio signal, such as background noise or the tail of a
reverberant envelope. A brief extract of this nonstationary signal is selected and
rescaled to a much higher amplitude, creating a transient particle (gure 4.24).
This particle can be reshaped using the implements of the editor, such as narrowband ltering, envelope reshaping, phase inversion, and spatialisation. The
following frequency bands are especially important in ltering the transient
particles.
1. Direct current (DC) cutremoval of frequencies below 20 Hz
2. Deep bass cut or boost80 Hz
3. Mid-bass boominess cut200 Hz
4. Low-mid boost500700 Hz resonances
5. Mid harshness cut12 kHz
170
Chapter 4
171
172
Chapter 4
Figure 4.25 Particle cloning synthesis. (a) Solo particle, lasting 35 ms, extracted from
an acoustic sample of a drum sound. (b) A 200 ms sound object formed by cloning (a) 50
times, pitch-shifting it up two octaves, creating another channel slightly delayed, and
applying an exponentially decaying amplitude curve.
3. Clone the particle and repeat it over a specied duration to form a tone pip.
The number of particles cloned corresponds to the total duration divided by
the duration of the particle. The resulting pitch depends on the period of the
particle.
4. Shape the amplitude envelope of the resulting tone pip.
5. Pitch-shift the tone pip to the desired pitch, with or without time correction.
6. Apply one or more bandpass lters. The important bands to control are the
low (50150 Hz), low-mid (150300 Hz, narrowband), mid (500900 Hz),
mid-high (34 kHz, wide bandwidth), and high (912 kHz, wide bandwidth)
ranges.
In stage 3, we used a Replicate function in a sound editor to automate the
process. Replicate lls an arbitrary duration with the contents of the clipboard.
Thus it is possible to copy a particle, select a 3-second region, and ll the region
with a sequence of cloned particles. Obviously, the frequency of the tone pip
depends on the period of the particle, and so a 10-ms particle produces a tone
173
pip at a pitch of 100 Hz. By selecting some of the silent samples around a particle, one can shift the fundamental frequency downward. The resulting tone
can then be transposed to any desired pitch. If the period of the particle is
greater than about 50 ms, the texture is no longer continuous, but utters.
Stages 4, 5, and 6 foster heterogeneity among tones cloned from the same
particle. Each tone can have a unique duration, amplitude envelope, pitch, and
spectrum weighting.
Assessment of Particle Cloning Synthesis
The construction of tones by particle cloning elevates the time scale of a particle
from the micro level to the sound object level, and so fosters the emergence of
singularities on the sound object time scale. As we have implemented it, it is a
manual technique, carried out with a sound editor. I developed the particle
cloning method in the course of composing Half-life (199899), in which the
initial particles derive from pulsar synthesis or transient drawing. These particles stand out prominently in part 1 of the piece, in the melodic section at time
2:042:11.
174
Chapter 4
Figure 4.26 Perry Cook's model of a maraca, coded using the Synthesis Toolkit in the
C language. Notice the declaration at the top indicating the number of beans in the
shaker. A statement in a score le (not shown) triggers the maraca model.
more recently has attention turned to synthesis using physical models of the
particulated sounds of certain percussion instruments and environmental
microsounds. Perry Cook's Physically Informed Stochastic Event Modeling
(PhISEM) exemplies this approach. PhISEM is a suite of programs to simulate the sounds of shaken and scraped percussion such as maracas (gure 4.26),
sekere, cabasa, bamboo windchime, tambourine, sleighbells, and guiro (Cook
1996, 1997). Cook also developed a model to simulate the sound of water drops
based on the same principles, and suggested that this technique could also synthesize the sound of feet crunching on gravel, or ice cubes in a shaken glass.
(See also Keller and Truax (1998) and the discussion of physical models for
granular synthesis in chapter 3).
The common thread among instruments modeled by PhISEM is that sound
results from discrete microevents. At the core of PhISEM are particle models.
175
Basic Newtonian equations governing the motion and collision of point masses
produce the sounds. For shaken percussion instruments, the algorithm assumes
multiple individual sound sources, for example, that with maracas each of them
contains many beans. It calculates the probability of bean collisions; very high
after a shake, and rapidly decaying. If a bean collision occurs, it is simulated by
a burst of exponentially decaying noise. All collision noises pass through a
sharply tuned bandpass lter, which simulates the resonance of the gourd.
Assessment of Physical Models of Particles
Many percussion instruments create their distinctive timbre through the accumulation of microsounds. In the case of shaken instruments, this can be
modeled as a stochastic process. Physical models of such instruments produce
granular sounds, either sparse or dense.
My experiences with this technique were based on the MacOS synthesis
program Syd (Bumgardner 1997), which realized Cook's maracas model. It
allowed control of the resonance frequency, the resonance pole, the probability
of bean collisions, the system decay, and the sound decay. Under certain conditions, these sounds can evoke acoustic instruments recorded in an anechoic
environment. When the range of parameter settings is extended, sounds that are
not particularly realistic, but still interesting are created.
The physical modelling of particles could go much further. As Chapter 3
points out, a vast scientic literature devoted to models of granular processes
has yet to be harnessed for sound synthesis.
176
Chapter 4
177
Envelope type
Waveform
Characteristics
Grains
Gaussian or arbitrary
Glissons
Gaussian or arbitrary
Arbitrary
Pulsars
Arbitrary
Sinusoidal
Trainlets
Wavelets
Grainlets
Micro-arcs
FOF grains
Vosim grains
Windowfunction pulses
Transient
drawing
Particle cloning
Impulse
Sine
Arbitrary, including sampled
Sine
Sine 2 pulses
Blackman-Harris
pulse
Hand-drawn
Arbitrary,
including sampled
178
Chapter 4
Summary
All forms of music compositionfrom the freely improvised to the formally
organizedare constrained by their sound materials. The urge to expand the
eld of sound comes from a desire to enrich compositional possibilities, and
much can be gained from the harvest of synthetic waveforms produced by
particle synthesis. In chapter 3 and in this chapter, we have looked at a variety
of sound particles. Chapter 6 describes additional particles derived from windowed spectrum analysis. Table 4.4 summarizes the variety of sound particles
studied in this book.
Artists are frequently recalled to the belief that the splendors of nature surpass anything that human beings can create. The natural world, however, did
not inhibit the rst painters, on the contrary, it inspired them. Similarly, for
composers, the omnipresence of natural sound does nothing to quash the need
to create a virtual sound world. With the sound particles, we cultivate a new
strain of culture within the natural order.
Transformation of Microsound
180
Chapter 5
181
Transformation of Microsound
182
Chapter 5
Micromontage
Certain transformations clearly articulate the granular texture of sound. Micromontage extracts particles from sound les and rearranges them. The term
183
Transformation of Microsound
Figure 5.1 Micromontage shown as a display of one hundred and thirty-six sound les
organized in a graphical mixing program. The les appear in twelve tracks stacked vertically, while time moves from left to right. Notice that the total duration of this phrase
is 3.7 seconds.
184
Chapter 5
185
Transformation of Microsound
Duration
Soundle
Amplitude
Location
0.137
0.281
0.346
0.628
0.748
0.847
0.974
...
0.136
0.164
0.132
0.121
0.174
0.062
0.154
8
10
12
1
3
6
8
0.742
0.733
0.729
0.711
0.693
0.687
0.686
0.985
0.899
0.721
0.178
0.555
0.159
0.031
Seven events occur in the interval of one second. Each sound le has a number
labelfrom 1 to 12 in this caseshown in the third column. The location
parameter indicates spatial location in the stereo eld, with 1 corresponding
to left, and 0 corresponding to right. The Granulate program generates the
score for the micromontage according to high-level instructions stipulated by
the composer. The composer species the parameters of a cloud, such as which
sound les to granulate, the density of particles, their amplitude, shape, and so
on. Each cloud may contain hundreds of particles.
The eect of automated micromontage is much the same as granulation. One
dierence between them is that micromontage is based on a scripta text
that is read by a Music N synthesis program, meaning that the user can edit the
script before the montage is rendered into sound.
Composition with Micromontage
Micromontage has been a specialty of the composer Horacio Vaggione for
some time, in such works as Octuor (1982), Thema (1985, Wergo 2026-2), and
186
Chapter 5
For more about this composition, see chapter 7, which also discusses the
work of the Princeton-based composer Paul Lansky, another pioneer of
micromontage.
Assessment of Micromontage
Micromontage is an open-ended approach, still with many unexploited aesthetic possibilities. Granulation techniques, presented next, have absorbed
187
Transformation of Microsound
Figure 5.2 120-ms excerpt from H. Vaggione's Thema (1985). (a) Time-domain view
shows microtemporal variations divided into ve sections A-E. See the text for an explanation. The amplitude of the signal has been compressed in order to highlight lowlevel microvariations. (b) Frequency-domain view plotted on a linear frequency scale.
many of the techniques of micromontage. Perhaps the best way to draw a distinction between granulation and micromontage is to observe that granulation
is inevitably an automatic process, whereas a sound-artist can realize micromontage by working directly, point by point. It therefore demands unusual
patience.
Granulation
The automatic granulation of sampled sounds is a powerful technique for
sound transformation. To granulate means to segment a sound signal into tiny
188
Chapter 5
189
Transformation of Microsound
an arbitrary succession. For example, we can extract a single large grain from
a snare drum and clone a periodic sequence of hundreds of grains to create
a single-stroke roll. To avoid the repetitious quality of commercial drum
machines and samplers, a variation on this method is to select grains from
several strokes of a roll and to extract each grain from a dierent set of samples
in the selected sound le.
One can liken granulation to scattering sound particles with a precision spray
jet of sampled sound waveforms. Grains sampled from dierent instruments
can be mingled, and grains taken from several sound les can create interwoven
fabrics of sound. When the sound les consist of dierent notes of a scale, the
result is a ringing harmonic cloud, where the layers stratify at particular pitches.
When two harmonic clouds overlap, the sound is a statistical evolution from
the rst cloud to the second.
It is fascinating to experiment with extracting grains from several dierent
sources, leading to hybrid textures, such as grains from a cello or voice mixed
with grains from a cymbal. By controlling the distribution of grains from different sources, we can create clouds which evolve from one texture to another.
In any granulation method, grain duration is an important parameter. As
chapter 3 points out, the timbre of a grain correlates strongly with its duration.
When the duration of the grain is very short (<40 ms), sampled sound les lose
their intrinsic identiable qualities. For sound les consisting of spoken text
or other identiable material that is to be preserved, longer duration grains
(>40 ms) work better.
Implementations of Granulation from Sound Files
Chapters 3 and 7 describe implementations of granular synthesis with synthetic
waveforms. Granulation, on the other hand, implies the existence of analog-todigital converters and software for sound recording and editing.
In 1981, I was working at the MIT Experimental Music Studio. At the time,
we referred to analog-to-digital conversion by the now quaint term ``digitizing.''
The software for digitization and editing was primitive, written by students in
short-term research projects. In this fragile environment, I managed to carry
out a number of experiments in sound le granulation using a program that I
wrote in the C language. This program read a script that I prepared, which
described the parameters of a granulated sound cloud. It generated a score
consisting of hundreds of note statements in the Music 11 language. I then ran
Music 11 with a simple instrument based on the soundin unit generator. The
190
Chapter 5
191
Transformation of Microsound
Selective Granulation
An increasing number of signal processing operations analyze and separate
dierent components of a sound signal. One set of components may be retained, while the other is discarded. Alternatively, the components can be treated
in dierent ways and then recombined. The listener recognizes the identity of
the original sound, but with an interesting variation in its components.
What are some of the ways of separating a sound into components? One is to
send the sound through a bank of lters, where each lter tunes to a dierent
center frequency. This results in a number of output signals that dier in their
frequency band. Another technique is to set an amplitude threshold and separate the low level and high level parts of a signal. A third way is to separate the
chaotic excitation and the stable resonance parts of a signal. Sounds can be separated by spatial position, by duration, by attack shape, and so on. Generally,
there are no limits on the number of ways in which a given sound can be divided.
Selective granulation means granulating one of these separated components.
For example, only those sounds that fall above or below a given amplitude
threshold. This principle is the foundation of dynamics processing on a micro
time scale (see the description later). The granulated sounds can then be combined with the original to create a new hybrid texture.
Granulation in Real Time
Granulation in real time takes two forms. The rst is the granulation of an incoming sound source, such as the signal coming from a microphone. The second
is the granulation of stored sound les with real-time control. The granulator
can be thought of as a delay line. As the sound passes through the delay line, the
granulator sets and moves various pointers which identify memory locations in
the delay line and extract grains. Several eects are possible:
1. Looping through the incoming samples repeatedly, causes the incoming
sound to be time-stretched. The playback or looping rate can vary by
changing the speed at which the granulator reads through the samples: from
normal speed to a slowed-down rate in which a single grain repeats over and
over. Here a brief grain telescopes into a slowly evolving sound object that
lasts hundreds of times longer than the original.
2. Overlaying many copies of a grain with dierent phase delays increases its
perceived volume and creates a kind of chorus eect.
192
Chapter 5
3. Varying the size and shape of the granulation window introduces amplitude
modulation with its noticeable spectral productsdistorting the input sound
in a controllable way.
Barry Truax was the pioneer of real-time granulation and is its most inveterate exponent (Truax 1986, 1987, 1988, 1990a, 1990b, 1992, 1994a, 1994b,
1995, 1996a, 1996b; Keller and Truax 1998; see also chapters 3 and 7). Numerous programs now oer granulation in real time, as chapter 3 describes.
Granulation of incoming signals is limited by the forward direction of the
arrow of time. As Truax stated:
Granulation may be many things, but it is not omniscient.
(Truax 1994a)
In other words, the read pointer in a real-time system can never peek into the
future. This eliminates the possibility of time-shrinking in real time. In contrast,
the other form of granulation, reading from stored sound les, has the advantage that it can select grains from any time-point in the sound, making it
possible to time shrink and time scramble. Regardless of whether the source
is coming in ``live'' or from a stored sound le, real-time control is especially
eective in creating an unending collection of variants of a given sound.
Assessment of Granulation
Granulating and powdering are, strictly speaking, nothing other than mechanical . . . operations, the object of which is to separate the molecules of a material and to reduce them to
very ne particles. But . . . they cannot reach the level of the internal structure of the material. . . . Thus every molecule, after granulation, still resembles the original material. This
contrasts with the true chemical operations, such as dissolution, which change intimately
the structure of the material. (Antoine Lavoisier 1789, quoted in Vaggione 1996b)
193
Transformation of Microsound
194
Chapter 5
195
Transformation of Microsound
196
Chapter 5
Figure 5.3 The author playing the Creatovox synthesizer in CREATE's Studio Varese,
January 2000. The physical hardware consists of a MIDI keyboard controller, MIDI
joystick, and footpedals. A Power Macintosh computer receives the MIDI data. The
Creatovox software interprets the MIDI data and synthesizes the sound, with octophonic output through a Digidesign Pro Tools 24 converter.
197
Transformation of Microsound
grains and large variances create noise bands. Large grains and small variances,
when combined with overlapping grains, produce a multiple-voice chorus
eect. When the grain durations go below about 50 ms, the chorus turns into
a ghostly whisper, as the pitch variations become microscopic and the signal
dissolves into noise.
198
Chapter 5
199
Transformation of Microsound
size of the jump derives from an estimate of the periodicity (pitch) of the incoming signal. When the harmonizer decides to splice, a smoothing fade-out
envelope ramps the amplitude of the presplice signal to zero and a corresponding fade-in envelope ramps to postsplice the signal to full amplitude.
Renements can be added to this basic scheme to improve the audio quality.
One is a noise gate connected to the input to the system to ensure that the pitch
shifting does not try to shift any ambient noise associated with the input signal.
The sound quality of a simple harmonizer depends on the nature of the input
signal and on the ratio of pitch change that it is asked to perform. Small pitch
changes tend to generate less audible side eects. Some commercial devices
produce undesirable side eects (such as buzzing at the frequency of the splicing) when used on material such as vocal sounds.
Granular Time Stretching and Shrinking in Cloud Generator
The Cloud Generator program (Roads and Alexander 1995; see the appendix),
can realize a variety of granular time stretching and shrinking eects, some
unique to it. Setting the Selection parameter to ``Granulate'' causes the program to open a le dialog, requesting a stereo le to process. To time-stretch
the le by a factor of two, one sets the cloud duration to be twice the input le's
duration. To time-shrink a sound, one sets the cloud duration to be shorter than
the selected input sound.
The Selection Order parameter applies only to granulated clouds. It determines in what order grains will be selected from the input sound le. Three
options present themselves:
1 RandomThe program selects input grains from random points in the input
sound le.
1 Statistical evolutionThe program selects input grains in a more-or-less leftto-right order, i.e., at the beginning of the cloud there is a high probability
that grains will be selected from the beginning of the input le; at the end of
the cloud there is a high probability that grains will be selected from the end
of the input le.
1 Deterministic progressionThe program selects input grains in a strictly
left-to-right order.
By experimenting with density and grain duration parameters, one obtains a
variety of granular time distortions. With a density of ten grains per second and
200
Chapter 5
201
Transformation of Microsound
202
Chapter 5
The basic dierence between Gibson's and Lent's method is that in step 5 the
input signal can be resampled to change its timbre (either up or down in frequency), and then window-replicated (step 6) with the window length being the
same as the input signal. This produces a tone with an altered timbre but with
the same pitch as the input.
It is also possible to shift the pitch while preserving the formant structure of
the original. If the desired pitch is above the pitch of the input signal by X
semitones, then the algorithm reduces the length of the Hanning window (step
6) by 2X =12 . Alternatively, if the desired pitch is below the input signal by X
semitones, then the algorithm increases the length of the Hanning window by
2 X =12 .
203
Transformation of Microsound
fcenter
fhigh flow
where fcenter is the lter's center frequency, fhigh is the upper cuto frequency,
and flow is the lower cuto frequency. Notice that when the center frequency is
constant, adjusting the Q is the same as adjusting the bandwidth. A constant Q
lter, on the other hand, adjusts the bandwidth according to the center frequency, keeping the ratio the same. For example, suppose that we set the Q to
be a constant value of 2. When the center frequency is 250 Hz, the bandwidth is
125 Hz. When the center frequency is 2500 Hz, the bandwidth is 1250 Hz.
Constant Q lters have the advantage that they sculpt the same musical interval
regardless of their center frequency.
GranQ has eleven parameters that greatly aect the output, and which can
be adjusted in real-time as the sound is being processed:
1. Pitchshifts the input to this pitch
2. Pitch Variationamount of random deviation in the pitch
3. Pitch Quantizationrounds the pitch to a multiple of this value, causing
the pitch variation to jump to only a few pitches
4. Time Raterate of scanning through the input sound le
5. Time Dispersionamount of random deviation in the time rate
6. Time Quantizationrounds the time to a multiple of this value, causing
the granulation to scan only a few time points in the le
204
Chapter 5
7. Grain Duration
8. Grain Overlap or Density
9. Amplitude
10. Filter Qconstant over a range of 0.1 to 20
11. Filter Rangeupper and lower limits, from 20 Hz to 20 kHz
When the lter Q and density are high, the granular stream has a liquid
quality. One can disintegrate any sound into a torrent of ltered particles.
GranQ is one of my staple treatments for sound. I applied the program in
creating the second movement of my composition Half-life (1999), the source
material for which consisted of a series of sound les generated by pulsar synthesis (see chapter 4). I processed these les with GranQ at variable densities
and grain sizes, creating the cascading clouds heard in the second movement,
Granules. (See description, chapter 7.)
205
Transformation of Microsound
how the parameters are adjusted, this articulates the microstructure of a given
sound.
Parameters of Spectral Dynamic Processing
The Spectral Dynamics eect in SoundHack oers threshold detection for each
band. This means that one frequency band can have the dynamics process
active, while other bands are inactive. One can select whether to aect sounds
above or below a specied amplitude threshold. One can set the threshold level
to one value for all bands, or to a dierent value for each band by reading in
and analyzing a sound le. The spectrum of this sound le can set the thresholds for each band.
Other parameters let users set the amount of gain or reduction for the bands
that are past a specied threshold. For compression and expansion, it allows
one to set the gain ratio. When aecting sounds below the threshold, the compressor and expander hold the highest level steady and aect lower levels (this is
also known as downward expansion or compression). When aecting sounds
above the threshold, the compressor and expander hold the lower threshold
level steady and compress or expand upwards.
The attack and decay parameters are important for the manipulation of
microsound structure. These set the time window for each band to open or
close. When this value is a large time constant, the algorithm ignores transients.
If the time constant is a small duration, then the eect tends to modulate the
sound le on the time scale of the window within the aected bands.
206
Chapter 5
Figure 5.4 Formation of wavesets. The circles indicate zero crossings. (a) Sine wave
plus second harmonic in a 1 : 0.5 mix. (b) Sine wave plus second harmonic in a 1 : 0.7
mix. (c) Sine wave plus second harmonic in a 1 : 1 mix. (d) Sine wave plus second harmonic in a 0.5 : 1 mix.
Waveset Formation
What types of signals have multiple zero-crossings? Beginning with a sinusoid,
one can add an arbitrary number of partials without crossing zero more than
once within the wave period. The classic sawtooth and square waves are examples of single-cycle waveforms with an arbitrary number of partials, limited
only by the sampling frequency of the synthesis system.
In any sound with a strong fundamental frequency, waveset manipulations are
equivalent to operations on individual cycles. Thus waveset time-stretching produces no artefacts when the signal has a strong and steady fundamental period.
But as gure 5.4 shows, wavesets form when the ratio of the amplitude of the
fundamental to any of the upper partials dips below 1 : 0.5. In speech, multiple
wavesets appear in sounds such as whispering, where the fundamental drops out.
Experiences with Waveset and Wavecycle Distortions
Table 5.1 summarizes Wishart's catalog of distortions based on wavesets and
wavecycles. The primary documentation of these distortions is a series of
207
Transformation of Microsound
Table 5.1 Waveset and wavecycle transformations in the Composer's Desktop Project
software
Waveset transformations
Waveset transposition
Waveset reversal
Waveset shaking
Waveset inversion
Waveset omission
Waveset shuing
Waveset distortion
Waveset substitution
Waveset harmonic
distortion
Waveset averaging
Waveset enveloping
Waveset transfer
Waveset interleaving,
method 1
Waveset interleaving,
method 2
Waveset time-stretching
Waveset time-shrinking
Waveset normalizing
208
Chapter 5
Table 5.1 (continued)
Wavecycle transformations
Distort average
Average the waveshape over N wavecycles
Distort cyclecnt
Count wavecycles in soundle
Distort delete
Time-compress le by deleting wavecycles
Distort divide
Distortion by dividing wavecycle frequency
Distort envel
Impose envelope over each group of N wavecycles
Distort lter
Time-compress sound by ltering out wavecycles
Distort fractal
Superimpose miniature copies of source wavecycles onto
themselves
Distort harmonic
Harmonic distortion by superimposing ``harmonics'' onto
wavecycles
Distort interact
Time-domain interaction of sounds
Distort interpolate
Time-stretch le by repeating wavecycles and interpolating
between them
Distort multiply
Distortion by multiplying wavecycle frequency
Distort omit
Omit A out of every B wavecycles, replacing them by silence
Distort pitch
Pitch warp wavecycles of sound
Distort repeat
Time-stretch le by repeating wavecycles
Distort replace
Strongest wavecycle in each group replaces others
Distort reform
Modify shape of wavecycles
Distort reverse
Cycle-reversal distortion, wavecycles reversed in groups
Distort shue
Distortion by shuing wavecycles
Distort telescope
Time-compress by telescoping N wavecycles to 1
209
Transformation of Microsound
A compact disc supplied with his book contains sound examples of waveset
distortions. The examples of waveset inversion applied to a speaking voice
sound similar to band-shifted radio distortion. Waveset omission creates noiseinfused textures, with temporal gaps as the proportion of omitted wavesets
increases. In waveset substitution, the timbre changes according to the substituted waveform. For example, when sine waves substitute for wavesets, the
result isnot surprisinglymore sinusoidal in quality. Waveset time-stretching, in which each waveset repeats N times, has a distinctly articial quality.
The more extreme waveset distortions destroy the identity of the source signal, turning it into a chain of arbitrary waveform fragments. As the composer
observes, the results are often unpredictable in their detail:
In general, the eects [of waveset time-stretching] produced will not be entirely predictable, but they will be tied to the morphology (time-varying characteristics) of the original
sound. (pp. 401)
Although [waveset averaging] appears to be similar to the process of spectral blurring, it is
in fact quite irrational, averaging the waveset length and the wave shape in perceptually
unpredictable ways. (p. 42)
Convolution of Microsounds
Increased processor speeds make it possible to realize previously exotic and
computationally intensive techniques on personal computers. Convolution is
one such technique. A fundamental operation in signal processing, convolution
210
Chapter 5
``marries'' two signals (Rabiner and Gold 1975). Convolution is also implicit in
signal processing operations such as ltering, modulation, excitation/resonance
modeling, cross-ltering, spatialization, and reverberation. By implementing
these operations as convolutions, we can take them in new and interesting
directions. This section reviews the theory and presents the results of systematic
experimentation with this technique. Throughout it oers practical guidelines
for eective musical use of convolution, and later presents the results of new
applications such as transcription of performed rhythms and convolutions with
sonic particles. Parts of this text derive from Roads (1992b, 1993a, 1996, and
1997).
Status of Convolution
The theory of convolution may remain unfamiliar to most musicians, but to
signal processing engineers it is fundamental: the foundation stone of linear
system theory. Signal processing textbooks often present it tersely, reducing it to
a handful of generalized mathematical cliches. Since these texts are not aimed at
a musically-inclined reader, the audio signicance of convolution is barely
touched upon. Hence engineers are not always aware of the range of convolution
eects in the audio domain (an exception is Dolson and Boulanger 1985).
Listeners are familiar with the eects of convolution, even if they are
unaware of its theory. Convolution may disguise itself under more familiar
terms such as ltering, modulation, and reverberation. Newer software tools
running on personal computers unbundle convolution, oering it as an explicit
operation, and allowing any two sampled les to be convolved (MathWorks
1995; Erbe 1995; Pranger 1999). Such tools provide a stable basis for musical
exploration of convolution, and prompt a need for more universal understanding of its powers. We begin this task here. Those already familiar with the
theory of convolution may want to skip to the section ``Musical Signicance of
Convolution.''
Impulse Response and Cross-Synthesis
The denition of a lter is very broad (Rabiner et al. 1972). Virtually any system that accepts an input signal and emits an output is a lter, and this certainly applies to convolution. A good way to examine the eect of a lter is to
see how it reacts to test signals. One of the most important test signals in signal
processing is the unit impulsean instantaneous burst of energy at maximum
amplitude. In a digital system, the briefest possible signal lasts one sample pe-
211
Transformation of Microsound
riod. Since short-duration signals have broad bandwidths, this signal contains
energy at all frequencies that can be represented at the given sampling frequency. The output signal generated by a lter that is fed a unit impulse is the
impulse response (IR) of the lter. The IR corresponds to the system's amplitude-versus-frequency response (often abbreviated to frequency response). The
IR and the frequency response contain the same informationthe lter's
response to the unit impulsebut plotted in dierent domains. That is, the
IR is a time-domain representation, and the frequency response is a frequencydomain representation.
Convolution serves as the bridge between the time-domain and the frequency-domain. Any lter convolves its impulse response with the input signal to produce a ltered output signal. The implications of convolution in
audio engineering are vast. One can start from the measured IR of any audiofrequency system (microphone, loudspeaker, room, distortion, delay eect,
equalizer, modulator, etc.), and through convolution, impose the characteristics
of this system on any audio signal.
This much is understood in the engineering community. By generalizing the
notion of impulse response, however, one arrives at quite another set of possibilities. Let us consider any sequence of samples as the impulse response of a
hypothetical system. Now we arrive at a new and musically potent application
of convolution: cross-synthesis by convolution of two arbitrary sound signals.
In musical signal processing, the term cross-synthesis describes a number of
dierent techniques that in some way combine the properties of two sounds into
a single sound. This may involve shaping the spectrum, time, or spatial pattern
of one sound by the other.
What then precisely is convolution? The next section presents an intuitive
review of the theory.
Review of Convolution Theory
To understand convolution, let us examine the simplest case: convolution of a
signal a with a unit impulse, which we call unitn. A unit impulse is a digital
sequence dened over n time points. At time n 0, unitn 1, but for all other
values of n, unitn 0. The convolution of an with unitn can be denoted as
follows:
outputn an unitn an
Here the sign ``'' signies convolution. This results in a set of values for
output that are the same as the original signal an. Thus, convolution with the
212
Chapter 5
unit impulse is said to be an identity operation with respect to convolution, because any function convolved with unitn leaves that function unchanged.
Two other simple cases of convolution tell us enough to predict what will
happen at the sample level with any convolution. If we scale the amplitude of
unitn by a constant c, we can write the operation as follows:
outputn an c unitn
The result is simply:
outputn c an
In other words, we obtain the identity of a, scaled by the constant c.
In the third case, we convolve signal a by a unit impulse that has been timeshifted by t samples. Now the impulse appears at sample n t instead of at
n 0. This can be expressed as follows:
outputn an unitn t
The result of which is:
outputn an t
That is, output is identical to a except that it is time-shifted by the dierence
between n and t.
Putting together these three cases, we can view any sampled function as a
sequence of scaled and delayed unit impulse functions. They explain the eect
of convolution with any IR. For example, the convolution of any signal a with
another signal b that contains two impulses spaced widely apart results in a
repetition or echo of a starting at the second impulse in b. When the impulses in
b move closer together, the scaled repetitions of b start to overlap.
Thus, to convolve an input sequence an with an arbitrary function bn, we
place a copy of bn at each point of an, scaled by the value of an at that
point. The convolution of a and b is the sum of these scaled and delayed functions. Clearly convolution is not the same as simple multiplication of two signals. The multiplication of one signal a by another signal b means that each
sample of a is multiplied by the corresponding sample in b. Thus:
output1 a1 b1
output2 a2 b2
etc.
213
Transformation of Microsound
N 1
X
an bn m
m0
where N is the length of the sequence a in samples and m ranges over the entire
length of b. In eect, each sample of an serves as a weighting function for a
delayed copy of bn; these weighted and delayed copies all add together. The
conventional way to calculate this equation is to evaluate the sum for each
value of k. This is direct convolution. At the midpoint of the convolution, n
copies are summed, so the result of this method of convolution is usually
rescaled (i.e., normalized ) afterward.
Convolution lengthens inputs. The length of the output sequence generated
by direct convolution is:
lengthoutput lengtha lengthb 1
In the typical case of an audio lter (lowpass, highpass, bandpass, bandreject),
a is an IR that is very short compared to the length of the b signal. For a
broad smooth lowpass or highpass lter, for example, the IR lasts less than a
millisecond.
The Law of Convolution
A fundamental law of signal processing is that the convolution of two waveforms is equivalent to the multiplication of their spectra. The inverse also holds,
that is, the multiplication of two waveforms is equal to the convolution of their
spectra. Another way of stating this is as follows:
Convolution in the time domain is equal to multiplication in the frequency domain, and vice
versa.
214
Chapter 5
The law of convolution has profound implications. In particular, the convolution of two audio signals is equivalent to ltering the spectrum of one sound
by the spectrum of another sound. Conversely, multiplying two audio signals
(i.e., performing amplitude modulation or ring modulation), is equal to convolving their spectra. Convolution of spectra means that each point in the
discrete frequency spectrum of input a is convolved with every point in the
spectrum of b. Convolution does not distinguish whether its input sequences
represent samples or spectra. To the convolution algorithm they are both just
discrete sequences.
Another implication of the law of convolution is that every time we reshape
the envelope of a sound, we also convolve the spectrum of the envelope with the
spectrum of the reshaped sound. In other words, every time-domain transformation results in a corresponding frequency-domain transformation, and vice versa.
Relationship of Convolution to Filtering
Convolution is directly related to ltering. The equation of a general niteimpulse-response (FIR) lter is as follows:
yn a xn G b xn 1 G i xn j
We can think of the coecients a; b; . . . i as elements in an array hi, where
each element in hi is multiplied by the corresponding element in array x j.
With this in mind, the general equation of an FIR lter presented earlier can be
restated as a convolution:
yn
N 1
X
hm xn m
m0
where N is the length of the sequence h in samples, and n ranges over the entire
length of x. Notice that the coecients h play the role of the impulse response
in the convolution equation. And, indeed, the impulse response of an FIR lter
can be taken directly from the value of its coecients. Thus, any FIR lter can
be expressed as a convolution, and vice versa.
Since an innite-impulse response (IIR) lter also convolves, it is reasonable
to ask whether there is also a direct relation between its coecients and its impulse response. In a word, the answer is no. There exist, however, mathematical
techniques that design an IIR lter to approximate a given impulse response.
See Rabiner and Gold (1975).
215
Transformation of Microsound
216
Chapter 5
sectioned convolution generate equivalent results. Rabiner and Gold (1975) and
Kunt (1981) present techniques for sectioned convolution and real-time implementations. Gardner (1995) describes a novel technique that combines direct
and sectioned convolution to eliminate processing delays.
Musical Signicance of Convolution
A veritable catalog of sonic transformations emerges out of convolution: crosslters, spatialization, modulation, models of excitation and resonance, and
time-domain eects. Indeed, some of the most dramatic eects induced by
convolution involve temporal transformations: attack smoothing, multiple
echoes, room simulation, time smearing, and reverberation. The type of eect
achieved depends entirely on the nature of the input signals. Pure convolution
has no control parameters.
The following sections spotlight each type of transformation. A mark (1) in
front of an indented section indicates a practical guideline.
Cross-Filtering
One can implement any lter by convolving an input signal with the impulse
response of the desired lter. In the usual type of FIR audio lter, the IR is
typically less than a few dozen samples in length. The impulse response of a
bandpass lter is precisely a grain with a sinusoidal waveform. The longer the
grain, the stronger is the eect of the lter.
By generalizing the notion of impulse response to include signals of any
length, we enter into the domain of cross-ltering; mapping the time-varying
spectrum envelope of one sound onto another.
1 If both signals are long in duration and one of the input signals has a smooth
attack, the main eect of convolution is a spectrum alteration.
Let us call two sources a and b, and their corresponding analyzed spectra
spectrum_a and spectrum_b. If we multiply each point in spectrum_a with each
corresponding point in spectrum_b and then resynthesize the resulting spectrum, we obtain a time-domain waveform that is the convolution of a with b.
1 If both sources are long duration and each has a strong pitch and one or both
of the sources has a smooth attack, the result will contain both pitches and
the intersection of their spectra.
217
Transformation of Microsound
For example, the convolution of two saxophone tones, each with a smooth attack, mixes their pitches, sounding as though both tones are being played
simultaneously. Unlike simple mixing, however, the ltering eect in convolution accentuates metallic resonances that are common in both tones.
Convolution is particularly sensitive to the attack of its inputs.
1 If either source has a smooth attack, the output will have a smooth attack.
Listening to the results of cross-ltering, one sometimes wishes to increase the
presence of one signal at the expense of the other. Unfortunately, there is no
straightforward way to adjust the ``balance'' of the two sources or to lessen the
convolution eect.
Spatiotemporal Eects
Spatiotemporal eects constitute an important class of transformations induced by convolution. These include such staples as echo, time-smearing, and
reverberation.
Any unit impulse in one of the inputs to the convolution results in a copy of
the other signal. So if we convolve any brief sound with an IR consisting of two
unit impulses spaced one second apart, the result is a clear echo of the rst
sound.
1 To create a multiple echo eect, convolve any sound with a series of impulses
spaced at the desired delay times. For a decaying echo, lower the amplitude
of each successive impulse.
Time-smearing occurs when the pulses in the IR are spaced close together,
causing the convolved copies of the input sound to overlap. If, for example, the
IR consists of a series of twenty impulses spaced 10 ms apart, and the input
sound is 500 ms in duration, then multiple copies of the input sound overlap,
blurring the attack and every other temporal landmark.
The IR of a room contains many impulses, corresponding to reections o
various surfaces of the roomits echo pattern. When such an IR is convolved
with an arbitrary sound, the result is as if that sound had been played in that
room, because it has been mapped into the room's echo pattern.
1 If we convolve sound a with the IR of an acoustic space, and then mix this
convolution with a, the result sounds as if a is within the acoustic space.
218
Chapter 5
We hear reverberation in large churches, concert halls, and other spaces with
high ceilings and reective surfaces. Sounds emitted in these spaces are reinforced by thousands of closely-spaced echoes bouncing o the ceiling, walls,
and oors. Many of these echoes arrive at our ears after reecting o several
surfaces, so we hear them after the original sound has reached our ears; the
myriad echoes fuse into a lingering acoustical ``halo.''
From the point of view of convolution, a reverberator is nothing more than a
particular type of lter with a long IR. Thus we can sample the IR of a reverberant space and then convolve that IR with an input signal. When the convolved sound is mixed with the original sound, the result sounds like the input
signal has been played in the reverberant space.
Importance of Mixing
For realistic spatial eects, it is essential to blend the output of the convolution
with the original signal. In the parlance of reverberation, the convolved output
is the wet (i.e., processed) signal, and the original signal is the dry or unprocessed signal.
1 It is typical to mix the wet signal down 15 dB or more with respect to the
level of the dry signal.
Noise Reverberation
When the peaks in the IR are longer than one sample, the repetitions are timesmeared. The combination of time-smearing and echo explains why an exponentially-decaying noise signal, which contains thousands of sharp peaks in its
attack, results in reverberation eects when convolved with acoustically dry
signals.
1 If the amplitude envelope of a noise signal has a sharp attack and a fast exponential decay, the result of convolution resembles a natural reverberation
envelope.
1 To color this reverberation, one can lter the noise before or after convolving it.
1 If the noise has a slow logarithmic decay gure, the second sound appears to
be suspended in time before the decay.
1 If the noise signal has an exponentially increasing envelope, the second sound
gives the impression of being played in reverse.
219
Transformation of Microsound
Modulation as Convolution
Amplitude and ring modulation (AM and RM) both call for multiplication of
time-domain waveforms. The law of convolution states that multiplication of
two waveforms convolves their spectra. Hence, convolution accounts for the
sidebands that result. Imagine that instead of impulses in the time-domain,
convolution is working on line spectra in the frequency-domain. The same rules
applywith the important dierence that the arithmetic is that of complex
numbers. The FFT, for example, generates a complex number for each spectrum component. Here, the main point is that this representation is symmetric
about 0 Hz, with a replica of each component (halved in amplitude) in the
negative frequency domain. This negative spectrum is rarely plotted, since it has
signicance only inside the FFT. But it helps explain the positive and negative
sidebands generated by AM and RM.
Excitation/Resonance Modeling
Many vocal and instrumental sounds can be simulated by a two-part model: an
excitation signal that is ltered by a resonance. The excitation is a nonlinear
switching action, like the pluck of a string, the buzz of a reed, or a jet of air into
a tube. The resonance is the ltering response of the body of an instrument.
Convolution lets us explore a virtual world in which one sound excites the resonances of another.
Through a careful choice of input signals, convolution can simulate improbable or impossible performance situationsas if one instrument is somehow
playing another. In some cases (e.g., a chain of bells striking a gong), the interaction could be realized in the physical world, others (e.g., a harpsichord
playing a gong), can only be realized in the virtual reality of convolution.
1 To achieve a plausible simulation, the excitation must be a brief, impulse-like
signal, (typically percussive), with a sharp attack (or multiple sharp attacks).
The resonance can be any sound.
Rhythm Input
We have seen that a series of impulses convolved with a brief sound maps that
sound into the time pattern of the impulses. A new application of convolution is
the precise input of performed rhythms. To enter a performed rhythm, one need
220
Chapter 5
only tap with drumsticks on a hard surface, and then convolve those taps with
other sounds.
1 The convolution of a tapped rhythmic pattern with any sound having a sharp
attack causes each tap to be replaced by a copy of the input sound.
This is a direct method of mapping performed rhythms to arbitrary sounds.
Since convolution aligns the sounds to the rhythm with a time resolution of
the sampling rate, this approach is much more precise than a MIDI percussion controller with its temporal resolution of several milliseconds. One can
also layer convolutions using dierent patterns and input sounds. After prepositioning each tap in stereo space, convolution automatically distributes them
spatially.
Convolution and Pulsar Synthesis
The slower the ow of time colors, the greater the clarity with which they can represent
themselves . . . as rhythms. (Koenig 1962)
221
Transformation of Microsound
222
Chapter 5
thundercloud, it lls up all space. Likewise, the greater its value at a point, the greater the
probability of nding the electron there. Similarly, wave functions can be associated with
large objects, like people. As I sit in my chair in Princeton, I know that I have a Schroedinger probability wave function. If I could somehow see my own wave function, it would
resemble a cloud very much in the shape of my body. However, some of the cloud would
spread out all over space, out to Mars and even beyond the solar system, although it would
be vanishingly small there. This means that there is a very large likelihood that I am, in
fact, sitting here in my chair and not on the planet Mars. Although part of my wave function has spread even beyond the Milky Way galaxy, there is only an innitesimal chance
that I am sitting in another galaxy. (Kaku 1995)
All our experiments indicate that quarks and bosons interact as points with no spatial
dimensions, and so are fundamental, like the leptons. If the fundamental particles are really
dimensionless points with mass, avor, color, charge, and other quantum properties, occupying no volume, then the nature of matter appears quite bizarre. The four interactions
[strong force, weak force, gravitational force, electromagnetic force] give matter shape.
Matter itself is empty. (Lederman and Schramm 1995)
223
Transformation of Microsound
from its own virtual space. This leads to spatial counterpointthe choreography of sounds in an interplay between xed and moving positions and between
foreground and background elements. An electronic work that does not take
advantage of these possibilities may suer from a ``spatial sameness'' (Vaggione
1996a). We hear this in, for example, compositions where the voices remain
xed in space and submerged in a constant global reverberation.
Scattering Particles in Virtual Spaces
To spatialize microsound means to assign an independent spatial position to
every sonic particle. Spatial position is a function of the particle's amplitude
in two or more channels, as well the amount of reverberation in which it is
immersed. Here we concentrate on the question of position in a pluriphonic
(multichannel) environment. The next section deals with reverberation.
Two sound particles may share a spatial position, but it is also possible for
each particle in a complex sound object to occupy a unique location. This
situation creates a vivid ``three-dimensional'' sound picture, an eect that is
enhanced by loudspeakers with good imaging characteristics.
When the particle density is relatively sparse, it is possible to position each
particle manually in a sound editor or mixing program, through the manipulation of the amplitudes and panning curves of the individual tracks. But when
densities are relatively high, we design automatic scattering algorithms to assign
a position to each of thousands of particles in virtual space. These algorithms
obey high-level tendencies stipulated by the composer on a larger time scale. To
cite an example, the Cloud Generator program (described in the appendix)
oers four scattering options in a stereo eld:
1. Stationary spatial position for all grains in a cloud
2. Panoramic motion from one position to another over the duration of a cloud
3. Panoramic motion from a xed position to a random position, or vice versa
4. Random spatial position for each grain in the cloud
As pluriphonic sound diusion becomes more commonplace, we will see new
spatial scattering algorithms. With an eight-channel source, for example, an
alternative approach to spatialization is to control the density of particles per
channel. Pluriphonic sound systems that surround the audience suggest circular
and elliptical trajectories. The immersive projection spaces associated with virtual reality demand spherical scattering algorithms.
224
Chapter 5
Per-Grain Reverberation
We can deepen the spatial image by adding highly selective reverberation to
the spatial algorithm. That is, the spatialization algorithm sets the depth of
reverberation of each grain individually. This can be accomplished in several
ways.
By controlling the amplitude of a signal sent to a global reverberator, the
amount of reverberation is controlled, and so the sound's depth. Thus an ecient way to individuate the grains is to send each grain to at least two outputs.
One of the outputs is unreverberated or dry, while the other passes through a
global reverberator that is common to all grains. This is the wet signal. The
spatialization algorithm can derive the wet/dry ratio for each grain according
to a probability function. In a multichannel system, one can generalize this to N
global reverberators, where N is the number of output channels. Alberto de
Campo tested such a design at CREATE, Santa Barbara, in the octophonic
Varese Studio in the spring of 2000.
A more elaborate design is to create a bank of distinct reverberators. Some
have very short decay times, while others have long decay times. Some are dark
in color, others brighter, and so on. Probability functions determine to which
reverberator a grain is sent.
Per-grain reverberation is most striking at low densities. At high densities, the
individual reverberations fuse into a continuous background reverberation, not
much dierent from global wave reverberation.
Spatialization by Modulation with Particles
A simple way to spatialize a source sound is by modulation with particles.
Starting by generating a pattern of synthetic particles which are distributed
among two or more channels, we extract the amplitude envelope of each channel of particles then impose it on another source signal. Figure 5.5 shows the
operation applied to one channel. Here it converts a speech sound to a granular
texture. Granulation achieves a similar eect.
Convolutions with Clouds of Sonic Particles
The rolling of thunder has been attributed to echoes among the clouds; and if it is to be
considered that a cloud is a collection of particles of water . . . and therefore each capable
of reecting sound, there is no reason why very [loud] sounds should not be reverberated
. . . from a cloud. (Sir John Herschel, quoted in Tyndall 1875)
225
Transformation of Microsound
Figure 5.5 Granular modulation. (a) Granular sound. (b) Extraction of its amplitude
envelope. (c) Speech sound. (d) Speech sound modulated by the granular envelope in (b).
226
Chapter 5
In the case of convolutions with an asynchronous cloud of grains, the particles can be thought of as the IR of an unusual virtual environment (Roads
1992b). What is the shape of this environment? I imagine that it resembles a
large balloon with many long nipples. Each nipple resonates at a particular
grain frequency.
For a brief source, convolution with a sparse cloud of short grains contributes a statistical distribution of echoes. The higher the grain density of the
cloud, the more the echoes fuse into an irregular quasi-reverberation eect,
often undulating with odd peaks and valleys of intensity, and weird echoes of
the original source (gure 5.6). The virtual reection contributed by each grain
splatters the input sound in time. That is, it injects multiple delays spaced at
irregular time intervals. If each grain was a single-sample pulse, then the echoes
would be faithful copies of the original input. Since each grain may contain
hundreds of samples, however, each echo is locally ltered and time-smeared.
Time-smearing eects fall into two basic categories, depending partly on the
attack of the input sound. If the source begins with a sharp attack, each grain
generates an echo of that attack. If the cloud of grains is not continuous, these
echoes are spaced irregularly in time. If the source has a smooth attack, however, the time-splattering itself is smoothed out into a kind of strange colored
reverberation. The ``color'' of the reverberation and the echoes is determined by
the pitch and spectrum of the grains, which are a factor of the frequency, duration, envelope, and waveform of each grain (gure 5.7). See chapter 3 for
more details on grain parameters.
For low-density synchronous clouds (<10 particles/second), convolutions
result in metrical rhythms resembling tape echo, owing to the repetition of the
source. Brief particles produce clear echoes, while long particles accentuate the
bandpass ltering eect. At high densities the echoes fuse into buzzing, ringing, or rippling sonorities. The identity of the source may be obfuscated or
obliterated.
Eect of Cloud Amplitude Envelope
The amplitude envelope of a cloud plays an important role in sound transformation. If the amplitude envelope of a dense cloud of particles has an
exponential decay, then the eect is similar to granular reverberation. If the
amplitude envelope of the cloud decreases linearly or logarithmically, the reverberation sustains unnaturally. (Natural reverberation dies out quickly.) An
227
Transformation of Microsound
Figure 5.6 Reverberation by granular convolution. (a) Speech input: ``Moi, Alpha
Soixante.'' (b) Granular impulse response, consisting of one thousand 9-ms sinusoidal
grains centered at 14,000 Hz, with a bandwidth of 5000 Hz. (c) Convolution of (a) and
(b). (d) Mixture of (a) and (c) in a proportion of 5 : 1, creating reverberation around the
speech.
228
Chapter 5
229
Transformation of Microsound
230
Chapter 5
Figure 5.8 Spatialization via convolution with sound particles. These sonograms are
the results of convolutions of a vocal utterance with two dense clouds of particles. The
sonograms used a 2048-point FFT with a Kaiser-Bessel window. Frequency is plotted
logarithmically from 40 Hz to 11.025 kHz. (a) The particle envelope is expodec (sharp
attack, exponential decay). (b) The particle envelope has a Gaussian attack and decay.
Notice the turgid undulations caused by time-smearing due to the smooth attack.
231
Transformation of Microsound
232
Chapter 5
233
Transformation of Microsound
posers discovered new techniques, their spatial aesthetic became more rened.
Simultaneously, the technology of recording, editing, and mixing of sound became more sophisticated. This made it possible to associate specic localization
patterns with dierent tracks or phrases.
The digital audio workstation, introduced in the late 1980s, extended the
time scale of spatial transformations down to the level of individual sound
objects.
Through new particle scattering algorithms, micromodulation, per-grain reverberation, and convolution, we have now extended spatialization down to the
level of microsound. When we project these microspatial eects in a physical
space over widely separated loudspeakers, these tiny virtual displacements appear far larger, and the sounds dance.
Summary
Our principal metaphor for musical composition must change from one of architecture to
one of chemistry. We may imagine a new personality combing the beach of sonic possibilities, not someone who selects, rejects, classies and measures the acceptable, but a chemist
who can take any pebble, and, by numerical sorcery, separate its constituents, and merge
the constituents from two quite dierent pebbles . . . (Wishart 1994)
In the rst half of the twentieth century, Russolo (1916), Cahill (1897, 1914,
1917, 1919), Cage (1937), and Varese (1971) extended the borders of music,
allowing previously excluded sounds into the territory of composition. Mechanical noise instruments, electrodynamic tone wheels, and electronic circuits
produced these sounds. Magnetic tape recording, introduced in the 1950s, made
possible another shift in musical practice. Composers could store sounds on
tape, which opened up all the possibilities of montage.
Previously transformed sounds are another rich source for the composer.
Recursive processing, in which a transformed sound is again transformed, often
provides interesting musical evolutions. This principle of recursive variation
applies on multiple time scales. The conventional musical practice of variations
involves permutations and combinations of discrete notes: repetition, changing
the meter, changing the order of the notes, adding or omitting intervals, lling
intervals with ancillary notes, inverting the harmony, substituting chords,
transposing a melody, and so on. In contrast, sound transformation leads to
morphological changes in sound color and spatial position as well as pitch and
234
Chapter 5
236
Chapter 6
237
238
Chapter 6
senses. Yet, classical signal processing has devoted most of its eorts to the design of timeinvariant and space-invariant operators, that modify stationary signal properties. This has
led to the indisputable hegemony of the Fourier transform, but leaves aside many information-processing applications. The world of transients is considerably larger and more
complex than the garden of stationary signals. The search for an ideal Fourier-like basis
that would simplify most signal processing is therefore a hopeless quest. Instead, a multitude of dierent transforms and bases have proliferated.
Stephane Mallat (1998)
The stream of samples that form a digital audio signal is merely one representation of a microsound. To convert this signal from the time-domain to the
frequency-domain requires a stage of analysis. The analysis seeks evidence of
periodicities of a specic waveform. In the case of Fourier analysis, for example, the waveform basis is sinusoidal. Once analysis transforms the signal into a
frequency-domain representation, a large family of sonic transformations become possible. In the frequency-domain, the signal exists as a combination of
periodic functions.
What interests us here are transformations that issue from the analysis of
brief windows of sound. This chapter examines the short-time Fourier transform, the phase vocoder, the vector oscillator transform, wavelet transforms,
and the Gabor transform. Some of the explanatory material appeared in my
1996 book The Computer Music Tutorial. It appears here revised and updated.
The reports on experiments in sound transformation are new.
Analysis in music traditionally has referred to the study of form, phrasing, and
note relationships within a score. Digital audio technology lets us take analysis
to the level of sonic microstructureinside the note. The rst step in analysis is
239
240
Chapter 6
241
242
Chapter 6
associated with a thick noisy texture, for example, is much more voluminous
than the analysis data for a simple sinusoidal melody. Another factor in the
data explosion is the internal representation used by the analysis program,
including the word length of the numerical data.
For many reasons, there is great interest in reducing the storage requirements
of sound data. Many companies compete in the arena of digital audio encoding
schemes, which fall, broadly, into two areas: lossless packing and lossy data
reduction.
Lossless packing does not involve spectrum analysis. It makes use of redundancies in the numerical sample value to reformat it in a more memory-ecient
form. Thus it reduces storage while preserving the full integrity of the audio
data. See Craven and Gerzon (1996), or Meridian (1998) for details.
Lossy data reduction does involve windowed spectrum analysis. It dissects
sounds into a data-reduced form according to a resynthesis model, while discarding large amounts of ``nonessential'' data. In eect, it reduces sounds to a
set of control functions. It presumes the existence of a resynthesis system that
can properly interpret these control functions to reconstitute an approximation
of the sound.
Lossy data reduction schemes are built into consumer audio products such as
the Mini-Disc system, the DVD surround sound format, MP3 (MPEG 1, Layer
3) audio, and other popular Internet audio le formats. MP3, for example,
oers a variable bit rate (VBR) method (Kientzle 1998). According to the
theory of VBR, ``simple'' sound demands a low bit rate, while ``complex''
sound demands a higher bit rate. VBR encoding uses windowed spectrum
analysis and other techniques to estimate the ``complexity'' of the signal. In essence, an MP3 audio bitstream species the frequency content of a sound and
how that content varies over time. It splits the input signal into thirty-two subbands, each of which contains eighteen frequency bands, for a total of 576 frequency bands. (See Brandenburg and Bosi 1997 for details.) An MP3 player
resynthesizes the audio signal from its data-reduced form. Many points of
compromise are exploited by MP3 encoders. For example, in the interest of
speed, MP3 decoders may use integer arithmetic, which sacrices audio accuracy. The encoding of stereo information is often crude. MP3's ``joint-stereo''
mode plays the same track through both channels but with the intensity dierences of the original tracks.
Data reduction discards information. The losses may be insignicant when
the original audio program is already bandlimited, compressed in amplitude,
spatially at, distorted, and designed to be played back over a mediocre audio
243
system, as is the case with much popular music. But such losses are evident in
musical material that exploits the full range of a ne audio system. It is not
dicult to generate signals that reveal the weaknesses of the commercial coding
models. The degradation is more pronounced when these signals are ``copied''
(or to be more precise, recoded) or further manipulated.
For creative purposes, we prefer data reductions that leave the analysis data
in editable form. The literature of computer music includes a large body of
research work on data reduction, including pioneering studies by Risset (1966),
Freedman (1967), Beauchamp (1969, 1975), and Grey (1975). Techniques that
have been used in computer music include line-segment approximation, principal components analysis, spectral interpolation synthesis, spectral modeling
synthesis, and genetic algorithms.
244
Chapter 6
In the twentieth century, mathematicians rened Fourier's method. Engineers designed analog lter banks to perform simple types of spectrum analysis.
Following the development of stored-program computers in the 1940s, programmers created the rst digital implementations of the Fourier transform
(FT), but these consumed enormous amounts of computer timea scarce
commodity in that era. Finally, in the mid-1960s, a set of algorithms known as
the fast Fourier transform or FFT, described by James Cooley at Princeton
University and John Tukey at Bell Telephone Laboratories, greatly reduced
the voluminous calculations required for Fourier analysis (Cooley and Tukey
1965).
Fourier Series
Fourier showed that a periodic function xt of period T can be represented by
the innite summation series:
xt C0
y
X
n1
Cn cosnot fn
That is, the function xt is a sum of harmonically related sinusoidal functions with the frequency on no 2p=T. C0 is the oset or DC component; it
shifts the waveform up or down. The rst sinusoidal component C1 is the fundamental; it has the same period as T. The numerical variables Cn and fn give
the magnitude and phase of each component.
A Fourier series summation is a formula for reconstructing or synthesizing a
periodic signal. But it does not tell us how to set the coecients Cn and fn for
an arbitrary input sound. For this, we need the analysis method called the
Fourier transform.
Fourier Transform
This section takes advantage of the complex exponential representation of a
sine wave at a given phase. This representation is based on these identities:
cos2p f f cos2p f j sin2p f e j2pf
So, a cosine at a given frequency and phase can also be represented as a
complex number, or a complex exponential function. (See Roads 1996, appendix A.)
245
This says that the FT at any particular frequency f is the integral of the multiplication of the input signal xt by the pure sinusoid ej2pft . Intuitively, we
could surmise that this integral will be larger when the input signal is high in
amplitude and rich in partials. X f represents the magnitude of the Fourier
transform of the time-domain signal xt. By magnitude we mean the absolute
value of the amplitude of the frequencies in the spectrum. The capital letter X
denotes a Fourier transform, and the f within parentheses indicates that we are
now referring to a frequency-domain signal, as opposed to the time-domain
signal xt. Each value of X f is a complex number.
The magnitude is not a complete picture of the Fourier transform. It tells us
just the amount of each complex frequency that must be combined to synthesize
xt. It does not indicate the phase of each of these components. One can also
plot the phase spectrum, as it is called, but this is less often shown.
The magnitude of the Fourier transform X f is symmetric around 0 Hz.
Thus the Fourier representation combines equal amounts of positive and negative frequencies. This is the case for any real-valued input signal. This dualsided spectrum has no physical signicance. (Note that the inverse Fourier
transform takes a complex input signala spectrumand generates a realvalued waveform as its output.)
The Discrete Fourier Transform
The one kind of signal that has a discrete frequency-domain representation (i.e.,
isolated spectral lines) is a periodic signal. A periodic signal repeats at every
interval T. Such a signal has a Fourier transform containing components at a
fundamental frequency 1=T and its harmonics and its zero everywhere else.
246
Chapter 6
247
The spectrum that results is the convolution of the spectra of the input and the
window signals. We see the implications of this later.
Operation of the STFT
Adopting Dolson's (1986) notation, the equation for a DFT of an input signal
xm multiplied by a time-shifted window hn m is as follows:
X n; k
y
X
fxmhn mgej2p=Nkm
my
Thus the output X n; k is the Fourier transform of the windowed input at each
discrete time n for each discrete frequency band or bin k. The equation says that
m can go from minus to plus innity; this is a way of saying ``for an arbitrarylength input signal.'' For a specic short-time window, the bounds of m are set
to the appropriate length. Here, k is the index for the frequency bins, N is the
number of points in the spectrum. The following relation sets the frequency
corresponding to each bin k:
fk k=N fs
where fs is the sampling rate. So for a sampling rate of 44.1 kHz, an analysis
window length N of 1024 samples, and a frequency bin k 1, fk is 43 Hz.
The windowed DFT representation is particularly attractive because the fast
Fourier transform or FFT can calculate it eciently.
A discrete STFT formulation indicating the hop size or time advance of each
window is:
X l; k
M
1
X
hmxm lHej2p=Nkm
m0
248
Chapter 6
Figure 6.1 Magnitude and phase spectra. (a) Input waveform. (b) Windowed segment.
(c) Magnitude spectrum plotted over the range 0 to 8 dB. (d) Phase spectrum plotted
over the range p to p. (After Serra 1989.)
spectrum, and the starting phase (between p and p) in the case of the phase
spectrum. The magnitude spectrum is relatively easy to read, the phase spectrum less so. When normalized to the range of p and p it is called the wrapped
phase representation. For many signals, it appears to the eye like a random
function. An unwrapped phase projection may be more meaningful visually.
(See Roads 1996, appendix A.)
To summarize, applying the STFT to a stream of input samples results in a
series of frames that make up a time-varying spectrum.
Justications for Windowing
Theory says that we can analyze a segment of any length and exactly resynthesize the segment from the analysis data. For example, we can analyze in one
249
250
Chapter 6
constant that can shift the entire signal above or below the center point of
zero amplitude.
Audio signals are bandlimited to half the sampling rate (25 kHz in this case)
and so we are concerned with only half of the analysis bins. The eective frequency resolution of an STFT is thus N=2 bins spread equally across the audio
bandwidth, starting at 0 Hz and ending at the Nyquist frequency. In our example, the number of usable audio frequency bins is ve hundred, spaced 50 Hz
apart.
Time-Frequency Uncertainty
The knowledge of the position of the particle is complementary to the knowledge of its
velocity or momentum. If we know the one with high accuracy we cannot know the other
with high accuracy. (Heisenberg 1958)
251
Figure 6.2 shows the eects of time-frequency (TF) uncertainty at the juncture of an abrupt transition between two pure tones. Figure 6.2a portrays the
actual spectrum of the signal fed into the analyzer. Figure 6.2b is the measured
short-time Fourier transform of this signal. Notice the band-thickening and
blurring, which are classic symptoms of TF uncertainty.
Time-Frequency Tradeos
The FFT divides the audible frequency space into N=2 frequency bins, where N
is the length in samples of the analysis window. Hence there is a tradeo between the number of frequency bins and the length of the analysis window. For
example, if N is ve hundred and twelve samples, then the number of frequencies that can be analyzed is limited to two hundred and fty-six. Assuming
a sampling rate of 44.1 kHz, we obtain two hundred and fty-six bins equally
spaced over the bandwidth 0 Hz to the Nyquist frequency 22.05 kHz. Increasing the sampling rate only widens the measurable bandwidth, it does not
increase the frequency resolution of the analysis.
If we want high time accuracy (say 1 ms or about forty-four samples), we
must be satised with only 44/2 or twenty-two frequency bins. Dividing the
audio bandwidth from 0 to 22.05 kHz by twenty-two frequency bins, we obtain
22,050/22 or about 1000 Hz of frequency resolution. That is, if we want to
know exactly when events occur on the scale of 1 ms, then our frequency resolution is limited to the gross scale of 1000-Hz-wide frequency bands. By sacricing more time resolution, and widening the analysis interval to 30 ms, one
can spot frequencies within a 33 Hz bandwidth. For high resolution in frequency (1 Hz), one must stretch the time interval to 1 second (44,100 samples)!
Because of this limitation in windowed STFT analysis, researchers are examining hybrids of time-domain and frequency-domain analysis, multiresolution
analysis, or non-Fourier methods to try to resolve both dimensions at high
resolution.
Frequencies in between Analysis Bins
The STFT knows only about a discrete set of frequencies spaced at equal
intervals across the audio bandwidth. The spacing of these frequencies depends
on the window size. This size corresponds to the ``fundamental period'' of
the analysis. Such a model works well for sounds that are harmonic or quasiharmonic where the harmonics align closely with the bins of the analysis. What
252
Chapter 6
253
happens to frequencies that fall in between the equally spaced analysis bins of
the STFT? This is the case for inharmonic sounds such as gongs or noisy
sounds such as snare drums.
Let us call the frequency to be analyzed f . When f coincides with the center
of an analysis channel, all its energy is concentrated in that channel, and so it is
accurately measured. When f is close to but not precisely coincident with the
center, energy leaks into all other analysis channels, but with a concentration
remaining close to f . The leakage spilling into all frequency bins from components inbetween bins is a well-known source of unreliability in the spectrum
estimates produced by the STFT. When more than one component is in between bins, beating eects (periodic cancellation and reinforcement) may occur
in both the frequency and amplitude traces. The result is that the analysis shows
uctuating energy in frequency components that are not physically present in
the input signal.
Signicance of Clutter
If the signal is resynthesized directly from the analysis data, the extra frequency
components and beating eects pose no problem. These eects are benign artifacts of the STFT analysis that are resolved in resynthesis. Beating eects are
merely the way that the STFT represents a time-varying spectrum in the frequency-domain. In the resynthesis, some components add constructively and
some add destructively (canceling each other out), so that the resynthesized result is a close approximation of the original.
Beating and other anomalies are harmless when the signal is directly resynthesized, but they obscure attempts to inspect the spectrum visually, or to
transform it. For this reason, the artifacts of analysis are called clutter. Dolson
(1983) and Strawn (1985) assayed the signicance of clutter in analysis of musical instrument tones. Cross-term clutter is common in higher-order analysis,
which can extract detailed phase and modulation laws embedded in the spectrum analysis (Masri, et al. 1997a, 1997b).
254
Chapter 6
the partial bins have to be found by converting the relative phase change between two STFT outputs to actual frequency changes. The term ``phase'' in
phase vocoder refers to the fact that the temporal development of a sound is
contained in its phase information, while the amplitudes denote that a specic
frequency component is present in a sound. The phase contains the structural
information (Sprenger 1999). The phase relationships between the dierent bins
reconstruct time-limited events when the time-domain representation is resynthesized. The phase dierence of each bin between two successive analysis
frames determines that bin's frequency deviation from its mid frequency. This
provides information about the bin's true frequency, and makes possible a
resynthesis on a dierent time basis.
Phase Vocoder Parameters
The quality of a given PV analysis depends on the parameter settings chosen by
the user. These settings must be adjusted according to the nature of the sounds
being analyzed and the type of results that are expected. The main parameters
of the PV are:
1. Window size (also called frame size)number of input samples to be analyzed at a time.
2. FFT sizethe actual number of samples fed to the FFT algorithm; usually
the nearest power of two that is double the window size, where the unit of
FFT size is referred to by points, as in a ``1024-point FFT.''
3. Window typeselection of a window shape from among standard types.
4. Hop size or overlap factortime advance from one window onset to the
next.
Next we discuss each parameter in turn. Later we give rules of thumb for setting these parameters.
Window Size
The window size (in samples) determines one aspect of the tradeo in TF resolution. The larger the window is, the greater the number of frequency bins, but
the lower the time resolution, and vice versa. If we are trying to analyze sounds
in the lower octaves with great frequency accuracy, we cannot avoid a large
window size. Since the FFT computes the average spectrum content within a
255
given window, the precise onset time of any spectrum changes within the
span of the window is lost when the spectrum is plotted or transformed. (If the
signal is simply resynthesized, the temporal information is restored.) For highfrequency sounds, small windows are adequate, which are also more accurate in
time resolution.
FFT Size and Hop Size
The FFT size is typically the nearest power of two that is double the window
size. For example, a window size of 512 samples would mandate an FFT size
of 1024. The other 512 samples in the FFT are set to zeroa process called
zero-padding.
The hop size is the number of samples that the analyzer jumps along the input
waveform each time it takes a new spectrum measurement. The shorter the hop
size, the more successive windows overlap. This improves the resolution of the
analysis, but requires more computation. Some PVs specify hop size as an
overlap factor that describes how many analysis windows cover each other. An
overlap of four, for example, means that one window follows another after 25%
of the window length. Regardless of how it is specied, the hop size is usually a
fraction of the window size. A certain amount of overlap (e.g., eight times) is
necessary to ensure an accurate resynthesis. More overlap may improve accuracy when the analysis data is going to be transformed, but the computational
cost is proportionally greater.
Window Type
A spectrum analyzer measures not just the input signal but the product of the
input signal and the window envelope. The law of convolution, introduced in
chapter 5, states that multiplication in the time-domain is equivalent to convolution in the frequency-domain. Thus the analyzed spectrum is the convolution
of the spectra of the input and the window signals. In eect, the window
modulates the input signal, and this introduces sidebands or clutter into the
analyzed spectrum.
A smooth bell-shaped window minimizes the clutter. Most PVs let the user
select a window from a family of standard window types, including Hamming,
Hanning (or Hann; see Marple 1987), truncated Gaussian, Blackman-Harris,
and Kaiser (Harris 1978; Nuttall 1981). All are bell-shaped, and all work
reasonably well for general musical analysis-resynthesis. Each one is slightly
256
Chapter 6
dierent, however, so it may be worth trying dierent windows when the results
are critical. The one window to avoid is the rectangular or Dirichelet, which
introduces a great deal of clutter or extraneous frequency components into the
analyzed spectrum.
Typical PV Parameter Settings
No parameter settings of the PV are ideal for all sounds. Within a certain
range, however, a variety of traditional instrumental sounds can be analyzed
and resynthesized with reasonable delity. Here are some rules of thumb for PV
parameter settings that may serve as a starting point for more tuned analyses:
1. Window sizelarge enough to capture four periods of the lowest frequency
of interest. This is particularly important if the sound is time-stretched; too
small a window size means that individual pitch bursts are moved apart,
changing the pitch, although formants are preserved.
2. FFT sizedouble the window size, in samples.
3. Window typeany standard type except Dirichelet.
4. Hop sizeTime advance of the analysis window. If the analysis data is
going to be time-distorted, the recommended hop size is an eighth of the
frame size, in samples (i.e., eight times overlap). The minimum technical
criterion is that all windows add to a constant, that is, all data is equally
weighted. This typically implies an overlap at the 3 dB point of the particular window type chosen, from which can be derived the hop size.
Any given setting of the window size results in an analysis biased toward
harmonics of the period dened by that window size. Frequency components
that fall outside the frequency bins associated with a given window size will be
estimated incorrectly. Some analyzers try to estimate the pitch of the signal in
order to determine the optimal window size. This is called pitch-synchronous
analysis (Mathews, Miller, and David 1961). Pitch-synchronous analysis works
well if the sound to be analyzed has a basically harmonic structure.
Resynthesis Techniques
Resynthesis constructs a time-domain signal from the analysis data. If the
analysis data has not been altered, then the resynthesis should be a close simulacrum of the original signal. If the analysis data has been altered, the resyn-
257
258
Chapter 6
259
The phase vocoder has emerged from the laboratory to become a popular
tool. It is packaged in a variety of widely distributed musical software. The
compositional interest of the PV lies in transforming the analysis data before
resynthesis, producing variations of the original sound. What the composer
seeks in the output is not a clone of the input, but a musical transformation that
maintains a sense of the source's identity.
The weaknesses of the STFT and the PV as representations for sound
are well known. The uncertainty principle pointed out by Gabor is embedded
deeply within the STFT. Time-frequency information is smeared. Overlapping
windows mean that it is impossible to modify a single time-frequency atom
without aecting adjacent atoms. Such a change will most likely lead to a discontinuity in the resynthesized signal. Many transformations sound ``blurry'' or
``sinusoidal'' in quality, a common artifact of Fourier techniques in general.
The tracking phase vocoder, described later, is a more secure transformation
tool, but it has its own imperfections.
On the positive side, the PV is powerful. Good implementations of the PV
oer the possibility of modifying pitch, time, and timbre independently.
260
Chapter 6
or closer together (when shrinking) in the resynthesis. For the smoothest transpositions, the PV should multiply the phase values by the same constant used in
the time base changing (Arb 1991).
Pitch-shifting alters the pitch without changing the time base. Pitch-transposition is a matter of scaling the frequencies of the resynthesis components. For
speech signals in particular, however, a constant scale factor changes not only
the pitch but also the formant frequencies. For upward shifts of an octave or
more, this reduces the speech's intelligibility. Thus Dolson (1986) suggested a
correction to the frequency scaling that reimposes the original spectral envelope
on the transposed frequency spectrum. If the original spectrum had a formant
at 2 kHz, for example, then so will the transposed version.
Frequency-Domain Filtering
Spectrum lters operate in the frequency domain by rescaling the amplitudes of
selected frequency bins. Some of their controls are similar to traditional timedomain lters, such as center frequency, bandwidth, and gain or boost in dB.
Other controls apply to the windowed analysis, such as the window size, FFT
size, and overlap factor. These controls aect the eciency and quality of the
analysis-resynthesis process. For example, longer windows and FFTs generally
result in more pronounced ltering eects.
Dierences between time-domain lters and spectral lters show up when
the bandwidths are narrow and the lter Q is high. The spectral lter breaks
down a broadband signal into individual sinusoidal components. A tell-tale
``breebles'' artefact may be heard, as individual components pop in and out.
Breebles are characteristic of manipulations on windowed Fourier analyses in
general, and appear in a number of other PV transformations.
Another approach to frequency-domain ltering provides a graphic interface,
in which the user sees a sonogram display of the sound. The software provides a
palette of drawing tools that let users erase or highlight selected regions of the
sonogram image. The sound is then resynthesized on the basis of the altered
sonogram image. (See the later section on sonographic transformations.)
Stable and Transient Extraction
This is a class of transformations that sorts audio waveforms on a micro time
scale into two categories: stable and transient. Spoken vowels, for example, are
relatively stable frequencies compared to the transient frequencies in conso-
261
nants. Once these frequencies are separated, the signals can be further manipulated individually.
In Erbe's (1995) implementation of stable and transient extraction, the user
species:
1. Number of bands in the analyzer
2. Number of frames to analyze at a time
3. Frequency threshold for the transient part of the signal
4. Frequency threshold of the stable part of the signal
For example, the transient part could be specied as changing more than 30
Hz per frame, while the stable part is specied as changing less than 5 Hz per
frame. Erbe's extraction algorithm takes the average of the change in instantaneous frequency over several FFT frames. If this average is greater than the
stipulated value for transient information, the amplitude and phase from the
source is assigned to the transient spectrum. Similarly, if the average change is
less than the stipulated value for stable information the amplitude and phase
from the source is assigned to the stable spectrum. Note that if the transient and
stable thresholds are not identical, this leaves behind a part of the spectrum that
is between the twoneither stable nor transient.
Another approach to stable and transient extraction is via spectral tracing
(Wishart 1994; Norris 1997). Spectral tracing analyzes a sound and retains only
the loudest or softest N% of the partials in the spectrum. To extract the transient part of a spoken voice, one retains only the softest 1% of the analyzed
spectra. The sound quality of this 1% after the result is high-pass ltered, is like
noisy whispering.
Dynamic Range Manipulations
Spectrum analysis makes it possible to manipulate the dynamic range of
selected frequency bands. The reader is referred to the discussion of dynamics
processing on a micro time scale in chapter 5.
Cross-Synthesis: Vocoding, Spectral Mutation, and Analysis-Based Formant
Filtering
Cross-synthesis extracts characteristics from the spectrum of one signal and uses
them to modify the spectrum of another signal. This can take a variety of forms,
including, among others, vocoding, spectral mutation, and formant ltering.
262
Chapter 6
263
Sonographic Transformations
A sonogram, sonograph, or spectrogram is a well-known spectrum display technique. It represents a sound signal as a two-dimensional display of time versus
``frequency amplitude.'' That is, the vertical dimension depicts frequency
(higher frequencies are higher up in the diagram) and shades of gray (or color)
indicate the amplitude within a given frequency band.
The rst sonogram was Backhaus's (1932) system. In the 1950s, the Kay
Sonograph was a standard device for printed sonograms which combined a
bank of narrow bandpass analog lters with a recording system that printed
dark traces on a roll of paper. The bars grew thicker in proportion to the energy
output from each lter.
Today, the STFT is at the core of sonogram analysis. The sonogram can be
applied on various time scales (gure 6.3). A sonogram of the meso time level
(gure 6.3a) portrays general features such the onset of notes or phonemes,
their pitch, formant peaks, and major transitions. See Cogan (1984) for an
example of using sonograms in the analysis of musical mesostructure. A sonogram of a single note (gure 6.3b) reveals its pitch and spectrum. A sonogram
of a single particle (gure 6.3c) is a coarse indicator of spectrum, since frequency resolution is poor on a micro time scale.
264
Chapter 6
Table 6.1 Typical spectrum operations in the Composer's Desktop Project software
1. Blur analysis data
Blur avrgAverage spectral energy over N adjacent channels
Blur blurBlur spectral data over time
Blur chorusAdd random variation to amplitude or frequency in analysis channels
Blur drunkModify sound by a drunken walk (a probabilistic process) along analysis
windows
Blur noiseAdd noise to spectrum
Blur scatterThin the spectrum in a random fashion
Blur shueShue analysis windows according to a specic scheme
Blur spreadSpread spectral peaks
Blur suppressSuppress the most prominent channel data
Blur weaveWeave among the analysis windows in a specied pattern
2. Combine analysis data from two or more les
Combine crossReplace spectral amplitudes of rst le with those of second
Combine diFind (and retain) the dierence between two spectra
Combine leafInterleave (groups of ) windows of several spectra
Combine makeGenerate an analysis le from data in a formant data le and a pitch
data le
Combine maxRetain loudest channel components per window amongst several
spectra
Combine meanGenerate the mean of two spectra
Combine sumAdd one spectrum to another
3. Focus on features of analysis data
Focus accuSustain each spectral band, until louder data appears in that band
Focus exagExaggerate the spectral contour
Focus focusFocus spectral energy onto the peaks in the spectrum
Focus foldOctave-transpose spectral components into a specied frequency range
Focus freezeFreeze the spectral characteristics in a sound, at given times, for
specied durations
Focus stepStep-frame through a sound by freezing the spectrum at regular time
intervals
4. Formant operations
Formants getExtract evolving formant envelope from an analysis le
Formants getseeGet formant data from an analysis le and write as a le for viewing
Formants putImpose formants in a formant data le on the spectrum in an analysis
le
Formants seeConvert formant data in binary formant data le to a le for viewing
Formants vocodeImpose spectral envelope of one sound onto another sound
265
266
Chapter 6
Table 6.1 (continued)
10. Repitch pitch-related data
Repitch approxMake an approximate copy of a pitchle
Repitch combineGenerate transposition data from two sets of pitch data, or
transpose pitch data with transposition data, or combine two sets of transposition data
to form new transposition data, producing a binary pitch data le output
Repitch combinebGenerate transposition data from two sets of pitch data, or
transpose pitch data with transposition data, or combine two sets of transposition data
to form new transposition data, producing a time value breakpoint le output
Repitch cutCut out and keep a segment of a binary pitch data le
Repitch exagExaggerate pitch contour
Repitch getpitchExtract pitch from spectrum to a pitch data le
Repitch imposeTranspose spectrum (spectral envelope also moves)
Repitch imposefTranspose spectrum, but retain original spectral envelope
Repitch invertInvert pitch contour of a pitch data le
Repitch quantizeQuantize pitches in a pitch data le
Repitch randomizeRandomize pitch line
Repitch smoothSmooth pitch contour in a pitch data le
Repitch transposeTranspose pitches in a pitch data le by a constant number of
semitones
Repitch vibratoAdd vibrato to pitch in a pitch data le
11. Spectrum operations
Spec bareZero the data in channels that do not contain harmonics
Spec cleanRemove noise from phase vocoder analysis le
Spec cutCut a section out of an analysis le, between starttime and endtime
(seconds)
Spec gainAmplify or attenuate the spectrum
Spec grabGrab a single analysis window at time point specied
Spec limitEliminate channel data below a threshhold amplitude
Spec magnifyMagnify (in duration) a single analysis window and time time to
duration dur
12. Retrieve spectrum information
Specinfo channelReturns phase vocoder channel number corresponding to specied
frequency
Specinfo frequencyReturns center frequency of phase vocoder channel specied
Specinfo levelConvert (varying) level of analysis le to a le, for viewing
Specinfo octvuText display of time varying amplitude of spectrum, within octave
bands
Specinfo peakLocate time varying energy center of spectrum (text display)
Specinfo printPrint data in an analysis le as text to le
Specinfo reportText report on location of frequency peaks in the evolving spectrum
Specinfo windowcntReturns the number of analysis windows in input le
267
Sonogram Parameters
The parameters of the modern sonogram are the same as those of the STFT,
except for the display parameters. Adjustments to these parameters make a
great dierence in the output image:
1. Range of amplitudes and the type of scale used, whether linear or
logarithmic.
2. Range of frequencies and the type of scale used, whether linear or
logarithmic.
3. Window size (number of samples to analyze) and the size of the FFT; the
resolution of time and frequency depend on these parameters.
4. Time advance of the analysis window (hop size) in samples or window
overlap factor. This determines the time distance between successive columns in the output display.
5. Number of frequency channels to display, which determines the number of
rows in the graphical output and is related to the range and scale of the frequency domain; this cannot exceed the resolution imposed by the window
size.
6. Window typesee the previous discussion in the section on the phase
vocoder.
The window parameters (3) have the most dramatic eect on the display. A
short window results in a vertically oriented display, indicating the precise onset
time of events but blurring the frequency reading. A medium length window
resolves both time and frequency features fairly well, indicating the presence of
268
Chapter 6
Figure 6.3 Sonograms on three time scales. (a) Meso time scale. 13-second sonogram of
a J. S. Bach suite transcribed for recorder, played in a reverberant environment. The
frequency scale is linear to 4 kHz. (b) Sound object time scale. 300-ms tone played on an
Ondioline, a vacuum tube electronic music instrument (Jenny 1958). The frequency scale
is linear to 5 kHz. (c) Micro time scale. 50-ms view on a pulsar particle. The frequency
scale is linear to 12 kHz.
269
270
Chapter 6
Table 6.2 Sonographic transformations in MetaSynth
Sonographical brushes (the sizes of the brushes are variable)
PenHard-edged rectangular brush
Air brushRound-edge brush with translucent edges
Filter brushThis brush acts as a multiplier, brightening or darkening depending on
the selected color
Harmonics brushPaints a fundamental and up to ve harmonics
Attack brushPaints a sharp left edge and then a soft decay
Smoothing brushSmooths the pixels over which it passes
Spray brushSprays a cloud of grains
Decay brushExtends existing pixels to the right to elongate their decay
Note brushLeaves a trail of discrete notes, aligned to the brush grid interval
Line brushPaints a harmonic line across the width of the canvas
Smear brushSmears existing pixels
Smear brighter brushSmears existing pixels with a brighter gradient than the smear
brush
Clone brushCaptures pixels under the brush when the mouse button is pressed and
then paints with the captured pixels
Sonographical transformations on the time-frequency grid
Cut, copy, paste selected part of sonogram image
X-Y scalingTime stretch or compress, frequency stretch or compress
Shift up or downTranspose all frequencies
RotateChange the direction of all frequency trajectories
Contrast and luminenceRemoves low-level frequencies, or amplies noise
Octave transposeShift the image (and sound) down by an octave
SmoothBlur the image so that the sound becomes more sinusoidal
InvertWhite becomes black, etc.; a sound becomes a silence in a noise eld
Max pictPaste the PICT clipboard into the selected area, treating the clipboard's
black pixels as transparent. Where there are coincident pixels, the brightest one is kept.
(Note: PICT is a MacOS bitmap graphics format.)
Min pictPaste the PICT clipboard into the selected area. Where there are coincident
pixels, the one of lowest amplitude one is kept. If either image has black, the result is
black.
Add pictCombine the luminosities of two images
Subtract pictSubtract the contents of the PICT clipboard from the selected area
Multiply pictMultiple the selected region by the contents of the PICT clipboard
Merge pictMerge the PICT clipboard with the selected region using a 50% blend
Crossfade pictCrossfade the PICT clipboard with the selected region left to right
ExpandVertically expand the pixel spacing
Fade in out pictFade the PICT clipboard in then out while also fading the selected
region out then in
271
Furthermore the tools provided with sonographic interfaces are not always
precise, making it dicult to achieve predictable results. To edit on a micro
level, we may want to zoom in to the sonogram image. When we do so, however, the image pixellates into large blocks. This simply reects the hard fact
that a sonogram is intrinsically limited in time-frequency resolution.
272
Chapter 6
done with an equally spaced bank of lters (the traditional STFT implementation). Another benet is that the tracking process creates frequency and
amplitude envelopes for these components, making them more robust under
transformation than overlap-add frames. A disadvantage is that the quality of
the analysis depends more heavily on proper parameter settings than in the
regular STFT. It may take multiple attempts to tune the analysis parameters for
a given sound.
Operation of the TPV
A TPV carries out the following steps:
1. Compute the STFT using the frame size, window type, FFT size, and hop
size specied by the user.
2. Derive the squared magnitude spectrum in dB.
3. Find the bin numbers of the peaks in the spectrum.
4. Calculate the magnitude and phase of each frequency peak.
5. Assign each peak to a frequency track by matching the peaks of the previous
frame with those of the current frame (see the description of peak tracking
later).
6. Apply any desired modications to the analysis parameters.
7. If additive resynthesis is requested, generate a sine wave for each frequency
track and sum all sine wave components to create an output signal; the
instantaneous amplitude, phase, and frequency of each sinusoidal component is calculated by interpolating values from frame to frame (or use the
alternative resynthesis methods described earlier).
Peak Tracking
The tracking phase vocoder follows the most prominent frequency trajectories
in the spectrum. Like other aspects of sound analysis, the precise method of
peak tracking should vary depending on the sound. The tracking algorithm
works best when it is tuned to the type of sound being analyzedspeech, harmonic spectrum, smooth inharmonic spectrum, noisy, etc. This section briey
explains more about the tracking process as a guide to setting the analysis
parameters.
The rst stage in peak tracking is peak identication. A simple control that
sets the minimum peak height focuses the identication process on the most
273
signicant landmarks in the spectrum. The rest of the algorithm tries to apply a
set of frequency guides which advance in time. The guides are only hypotheses;
later the algorithm will decide which guides are conrmed frequency tracks.
The algorithm continues the guides by nding the peak closest in frequency to
its current value. The alternatives are as follows:
1 If it nds a match, the guide continues.
1 If a guide cannot be continued during a frame it is considered to be
``sleeping.''
1 If the guide does not wake up after a certain number of frameswhich may
be specied by the userthen the tracker deletes it. It may be possible
to switch on guide hysteresis, which continues tracking a guide that falls
slightly below a specied amplitude range. Guide hysteresis alleviates the
audible problem of ``switching'' guides that repeatedly fade slightly, are
cut to zero by the peak tracker, and fade in again (Walker and Fitz 1992).
With hysteresis the guide is synthesized at its actual value instead of at zero
amplitude.
1 If there is a near match between guides, the closest wins and the ``loser''
looks for another peak within the maximum peak deviation, a frequency band
specied by the user.
1 If there are peaks not accounted for by current guides, then a new guide
begins.
Windowing may compromise the accuracy of the tracking, particularly in
rapidly moving waveforms such as attack transients. Processing sounds with a
sharp attack in time-reversed order helps the tracking algorithm (Serra 1989).
This gives the partial trackers a chance to lock onto their stable frequency trajectories before meeting the chaos of the attack, which results in less distortion.
The data can be reversed back to its original order before resynthesis.
Accuracy of Resynthesis
In contrast to the myth of ``perfect reconstruction'' which pervades the mathematical theory of signal processing, the actual quality of all analysis-resynthesis
methods is limited by the resolution of the input signal and the numerical
precision of the analysis procedures. Distortions are introduced by numerical
roundo, windowing, peak-tracking, undersampling of envelope functions, and
other aspects of the analysis. Compact disc quality audio (16-bit samples, 44.1
274
Chapter 6
kHz sampling rate) poses a priori limits on the frequency response and dynamic
range of the analysis. The fast Fourier transform on a 16-bit signal has a limited
dynamic range, and the TPV reduces this further by discarding spectral information below a certain threshold. Any modication that changes the frequency
characteristics will likely result in aliasing. Any reduction in amplitude caused
by enveloping reduces the bit resolution. The resynthesized result may have lost
a large portion of its dynamic range, and artefacts such as amplitude gating,
distortion, aliasing, and graininess are not uncommon.
In a well-implemented nontracking phase vocoder, when the analysis
parameters are properly adjusted by a skilled engineer and no modications are
made to the analysis data, the error is perceptually negligible. The TPV, on the
other hand, discards information that does not contribute to a track. If the TPV
parameters are not adjusted correctly, the sifting of low-level energy may discard signicant portions of the sound, particularly noisy, transient energy. This
can be demonstrated by subtracting an analysis of the resynthesized signal from
an analysis of the original signal to yield a residual (Serra 1989). One can consider this residual to be analysis/resynthesis error. It is common to refer to the
resynthesized, quasi-harmonic portion as the ``clean'' part of the signal and the
error or noise component as the ``dirty'' part of the signal. For many sounds
(i.e., those with fast transients such as cymbal crashes), the errors are quite
audible. That is, the clean signal sounds unnaturally sanitized or sinusoidal,
and the dirty signal, when heard separately, contains the missing grit.
TPV Computation and Storage
TPV analysis consumes large quantities of computer power even though the
inner core is implemented using the ecient FFT algorithm. It can also generate a large amount of analysis data, often ten times greater than the size of the
sample data being analyzed. The size of the analysis depends on many factors,
however. Low-level narrowband sounds require fewer tracks. Dierent settings
of the analysis parameters can greatly aect the analysis le size.
Sound Transformation with the TPV
The TPV's representation of sounda bank of hundreds of oscillators driven
by amplitude and frequency envelopesis a robust one that lends itself to
many evocative transformations.
275
276
Chapter 6
Figure 6.4 This gure shows the profusion of tracks associated with even a single
sound, in this case, the French word payons (we pay).
277
Morphing clearly diers from simply mixing two sources. In practice, some
morphs are more convincing than others. Two tones with a common pitch tend
to morph more naturally than two tones with dierent pitches, for example.
TPV Pitch-Time Changing
By altering the position or extent of the tracks, one can shift the pitch or alter
the duration of the sound (Portno 1978). For example, to stretch the duration,
the TPV interpolates new points between existing points in the amplitude and
frequency arrays. To shrink the duration by a factor of n, The TPV uses every
nth value in reading the amplitude and frequency arrays. In eect, this shifts the
sampling rate. To shift the pitch of a sound but not change its duration, one
multiplies the frequency values assigned to each of the frequency functions by
the desired factor. For example, to shift a sound up a major second, the TPV
multiples each frequency component by 11.892%.
Disintegration and Coalescence in the Gabor Matrix
The QuickMQ program (Berkely 1994) reads TPV analysis les generated by
the Lemur Pro program. It lets users alter this data through a number of algorithms. The Granny algorithm is especially interesting. Granny pokes holes in
the Gabor matrix of a sound, producing a kind of granular disintegration or
decomposition. The parameters of granular decomposition are as follows:
1. Maximum percentage of zeroing [0, 100]
2. Entropy raterate of decomposition [1, 10]
3. Distribution [random, loud tracks rst, soft tracks rst]
4. Rest in peace (RIP) switch [on, o ]
The maximum percentage of zeroing determines how many zero magnitudes
can be written to a track once it is chosen for decomposition. The percentage
refers to the total number of frames in the analysis le. If the le has three
hundred frames, for example, and the user enters a maximum percentage of
zeroing of 50%, then any track selected for decomposition may be zeroed with
between zero and one hundred and fty frames of zero magnitude. The entropy
rate determines how quickly maximum decomposition settings occur in the le.
Lower values result in more immediate decomposition. The distribution setting
lets the user specify whether louder or softer tracks should be decomposed rst,
278
Chapter 6
279
280
Chapter 6
281
Figure 6.5 Vector oscillator transform. (a) First 600 ms of the waveform of the spoken
Italian phrase Nient'altro. We see only ``Nient'alt.'' (b) VOT of this phrase with wavetable extracted at at 20 ms intervals. This turns it from samples into a series of thirty
wavetables read by an oscillator at a stipulated fundamental frequency. (c) The same set
of wavetables stretched in time by a factor of 2.5. In eect, the oscillator takes longer to
crossfade between waveforms. (d) A click on a button converts the waveform view into a
frequency-domain view, with sixty-four harmonics per wavetable.
282
Chapter 6
283
(a)
(b)
Figure 6.6 Comparison of Fourier and wavelet time-frequency grids. (a) Uniform
Gabor matrix/STFT grid. (b) Nonuniform wavelet grid.
284
Chapter 6
285
(Strang 1989; Evangelista 1991; Kussmaul 1991; Vetterli and Herley 1992).
Today wavelet theory is one of the most extensively researched subjects in all of
signal processing.
Operation of Wavelet Analysis
A casual reference to ``measurable functions'' and L 2 R can be enough to make an
aspiring pilgrim weary. (Kaiser 1994)
Wavelet theory is buried in mathematics. Part of the diculty for the nonmathematician is that its foundations are expressed in terms of linear algebra,
mappings between vector spaces, and the theory of generalized functions. A
large portion of the literature is devoted to projecting the extent of wavelet
theory and its abstract relationships to other branches of mathematics. These
abstractions are far removed from applications in musical signal processing.
Only the most basic tenets of wavelet theory can be explained using simple
mathematical concepts.
The Grossmann-Morlet wavelet equation (Meyer, et al. 1987; KronlandMartinet and Grossmann 1991) centered on point b in time, can be dened as:
1
tb
Ca; b t p c
; a > 0; b A R
a
a
The variable a is the scale factor. The wavelet function Ca; b t oscillates at the
p
frequency 1 a. When the scale is very small, the rst factor in the equation,
p
1= a, tends toward 1, while the time interval contracts around b as t b=a
tends toward t b.
The wavelet transform is:
1
tb
Sa; b ca; b tst dt p c
st dt
a
a
where c represents the complex conjugate. In eect, the WT multiplies the input signal st by a grid of analyzing wavelets, bounded by frequency on one
axis and by time scale factor on the other. This multiplication process is
equivalent to convolving with a bandpass lter's impulse response. Dilation of
this impulse response corresponds to an inverse frequency scaling. Thus, the
duration of each wavelet corresponds to the center frequency of a lter. The
longer the wavelet, the lower is its center frequency. The output of the WT is a
286
Chapter 6
287
xt
at cos2p ft
288
Chapter 6
Wavelet Display
A byproduct of research in wavelet analysis is the evocative display method
developed by scientists aliated with the Centre Nationale de Recherche Scientique (CNRS) in Marseilles. This visualization tool can be thought of as a
traditional spectrum plot projected in time and ipped on its side. In the modulus display, time projects horizontally, and frequency projects vertically with
the low frequencies on the bottom and the high frequencies on the top. The
dierence between the sonogram plot and this wavelet plot is their pattern of
time localization. Short wavelets detect brief transients, which are localized in
time, sitting at the apex of a triangle. Kronland-Martinet et al. (1987) shows the
wavelet plot of a delta function; its wavelet display clearly projects a triangle on
the frequency-versus-time plane, pointing to the locale of the impulse. Long
wavelets detect low frequencies; they sit at the base of the triangle, spread out
(blurred) over time.
The triangle of the delta function is the wavelet's domain of inuence in time.
The domain of inuence for frequencies is a constant horizontal band, as in the
spectrogram. The darker the band, the stronger the magnitude within that frequency range. Using a log scale for the dilation axis allows a greater range of
scales to be observed, important in audio applications where frequencies can
vary over several orders of magnitude.
A voice is a set of transform coecients with xed dilation parameters. Thus
a voice in some ways corresponds to a frequency band in an equalizer. If the
frequency grid is aligned to a musical interval, the modulus projects a strong
dark indicator when the input signal contains that interval.
A plot of the phase spectrum is sometimes referred to as the scalagram.
The scalagram yields interesting details about the phase transitions within a
given signal, such as the onset of transients and the nature of the modulations
within the signal (Kronland-Martinet, et al. 1987; Kronland-Martinet, et al.
1997).
Transformation of Sounds Using Wavelets
Once a sound signal has been analyzed with the WT, one can alter the analysis
data in order to transform the original signal. This section describes the various
transformations.
289
290
Chapter 6
period, and ts a comb lter to the segment with peaks aligned on the harmonics of the fundamental. The comb lter sifts out the energy in the harmonic
spectrum. The algorithm then performs a wavelet analysis on this ``clean'' harmonic signal. When the inverse WT is subtracted from the original signal, the
residual or ``dirty'' part of the signal remains. The dirty part includes the attack
transient and the details that give the sound its identity and character. Once the
clean and dirty part are separated, one can perform a kind of cross-synthesis by
grafting the dirty part of one sound into the clean part of another. This type of
separation is similar in conceptthough not in implementationto the technique used in the spectral modeling synthesis of Serra (1989).
Evangelista and Cavaliere (1998) extended the comb wavelet technique to
handle the case of inharmonic partials, warping the frequencies of the analysis
to adapt the pitch-synchronous comb wavelet lters to unequally spaced partials. With this technique they were able to resolve separately the hammer noise
and the resonant components of a piano tone. Frequency warping also allows
for transformations such as detuning the microintervals within a complex
sound.
Other Wavelet Transformations
Other transformations include altering the geometry of the frequency grid, such
as multiplying by or adding a scaling factor to all the frequencies in resynthesis.
Cheng (1996, 1997) described an application of wavelets to make a spectral
``exciter'' eect in which high frequencies are boosted and an additional octave
of high frequency information is extrapolated from existing lower frequencies.
According to the author, this algorithm worked best on broadband transient
percussion sounds, but was not well adapted to speech, where the added frequencies were perceived as articial.
Experiences with Wavelets
Wavelet-based software for audio applications is rare at present. For the
MacOS platform there is WaveLab, a library for the Matlab environment
developed at the Department of Statistics at Stanford University and the
National Aeronautics and Space Administration. WaveLab was designed for
wavelet analysis, wavelet-packet analysis, cosine-packet analysis, and matching
pursuit. It also provides a library of data les, including articial signals as well
as images and a few brief sampled sounds. As is, however, the WaveLab pack-
291
age is not well suited for musical experimentation. A great deal of additional
programming would be required for serious experimentation in musical sound
transformation.
Soniqworx Artist by Prosoniq is an audio editor with signal processing eects
for MacOS. The eects, written by Stephan Sprenger, include wavelet-based
transformations. One of these, wavelet signal reduction, discards all but a
stipulated percentage of the analysis wavelets, leaving only the strongest components. It is conceptually analogous to the spectral tracing eect described in
the section on the phase vocoder. Instead of sinusoidal basis functions, however, it uses wavelet basis functions. Owing to the change in basis function from
a sinusoid to the jagged Daubechies second wavelet, the sonic eect is quite
dierent from spectral tracing (gure 6.7).
Synthetic Wavelets
One could synthesize a stream of wavelets without an analysis of an existing
sound. Malvar wavelets (Meyer 1994), for example, bear a strong resemblance
to the FOF particles and to the grains with a quasi-Gaussian envelope. My
grainlet technique (described in chapter 4), produces grains whose duration is a
function of frequency, like the usual wavelet families. Wickerhauser (1994)
proposed that a single wavelet packet generator could replace a large number of
oscillators. Through experimentation, a musician could determine combinations of wavelet packets that produce especially interesting sounds. It could also
be possible to reproduce the sounds of traditional instruments by decomposing
an instrumental sound into wavelet packet coecients. Reproducing the note
would then require reloading those coecients into a wavelet packet generator
and playing back the result. Transient characteristics such as attack and decay
could be controlled separately (for example, with envelope generators), or by
using longer wave packets and encoding those properties into each note.
Assessment of the Wavelet Transform
Almost every author seems to have a favorite basis function, depending on their area of
interest and background. (Navarro et al. 1995)
292
Chapter 6
Figure 6.7 Wavelet signal reduction compared with spectral tracing. (a) Original
waveform of spoken voice saying ``pulse code modulation.'' (b) Resynthesized waveform
after 90% of the weaker wavelets have been discarded. (c) The uctuation indicated by
the ` symbol in (b) blown up. It has the shape of the second Daubechies wavelet, which
was the basis function of the analysis. (d) The same region of the sound le covered by
(c) but resynthesized after spectral tracing, which discarded all but the strongest 1% of
sinusoidal components.
293
At present, the vast majority of wavelets in audio are not concerned with
artistic transformations of sound. The principle applications are utilitarian:
1 Audio data reductionfor transmission in broadcast and network media.
Wannamaker and Vrscay (1997) report compression ratios of 3 : 1 with
``generally satisfactory reconstruction.'' The results do not preserve high
delity. Barnwell and Richardson (1995) criticize wavelet techniques for
audio compression, particularly the dyadic wavelet grids based on octaveband lter banks. (See also Erne 1998.)
1 Denoisingremoval of broadband noise from an audio recording, with
``faint birdy noise'' remaining as an epiphenomenon of the denoising process
(Ramarapu and Maher 1997).
294
Chapter 6
295
The wavelet paradigm has generated great interest since its introduction in
the 1980s. In the larger context of the past several decades of signal processing,
however, wavelets and multiresolution lter banks have oered a limited range
of musical applications. Fourier and Gabor techniques remain dominant, and
powerful applications such as fast convolution have stayed within the Fourier
camp. Masri, et al. (1997a) suggest that the logarithmic frequency scale of the
wavelet transform is an advantage in detecting pitch, but that the linear
frequency scale of the Fourier and Gabor transform is an advantage in analyzing linear-space harmonics. Nevertheless, it seems clear that the full scope of
wavelet techniques has not yet been exploited in the musical domain.
Gabor Analysis
Expansion into elementary signals is a process in which Fourier analysis and time description are two special cases. (Gabor 1946)
296
Chapter 6
297
Figure 6.8 Schema of the Gabor transform. The GT multiplies a segment of a source
signal by a complex Gaboret, translated in frequency and time according to the resolution of the Gabor matrix. The energy occupies cells in the modulus of the GT, and the
phase information appears in the phasogram. To resynthesize the source, each cell of the
matrix is convolved with the reproducing kernel, the dual of the complex Gaboret.
conjugated version of a window function. The complex spectrogram is determined by its values on the points of the Gabor lattice. In recent years, mathematicians have pushed the theory of the GT beyond the Gaussian gaborets
to account for larger classes of particles: general subgroups on the TF plane
(Torresani 1995).
Properties of the Gaussian Window Used in the GT
Harris (1978) compared the properties of twenty-three dierent window shapes,
including the Gaussian window used in the GT. According to Harris, two
properties of a window are paramount in spectrum analysis. One is its highest
side lobe level or HSSL. To maximize frequency detectibility, this should be low
relative to the central lobe. The other is its worst case processing loss. The
WCPL indicates the reduction in output signal-to-noise ratio as a result of
windowing and worst case frequency location. It should, therefore, be low, or
the quality of the analysis will be compromised.
298
Chapter 6
The HSSL of the rectangular Dirichelet window is only 13 dB, a poor gure that results in noise-cluttered spectral analysis. The WCPL of the Dirichelet
window is 3.92 dB. In comparison, the Gaussian window is considerably more
selective, with a highest side lobe level of 42 to 69 dB, depending on the
peakedness of the envelope. Its WCPL ranges from 3.14 to 3.73 dB.
Other windows outperform the Gaussian window in these properties. A narrow Kaiser-Bessel window has an HSSL of 82 dB, while a wide Hanning
window has a WCPL of 3.01 dB. These cases are somewhat opposite to each
other, however, and the Gaussian window remains a good compromise between
the two.
Musical Applications of the Gabor Transform
Daniel Arb, working at the Laboratoire de Mecanique et d'Acoustique (Centre Nationale de Recherche Scientique, Marseilles), was the rst to apply a
digital Gabor transform to the analysis and resynthesis of musical sounds
(Arb 1990, 1991; Risset 1992). He was soon joined by his colleague Nathalie
Delprat. One of their rst results was a robust time compression and expansion
operation in which pitch remains constant. They also applied the GT to other
musical applications including frequency transposition, phase manipulations,
cross-synthesis of speech with other sounds, and modifying the vibrato of a
sung vocal tone while keeping other characteristics intact. They also separated
the noisy, inharmonic parts of a sound from the harmonic parts (Arb 1990,
1991; Arb and Delprat 1992, 1993, 1998).
Kronland-Martinet et al. (1997) combined the Gabor transform with the
innovative display techniques developed for the wavelet transform in Marseilles. Their plots show the phase structure of signals, which is normally hidden
from view. According to the authors, this phase information can be used to
make frequency estimations more accurate. They also applied the GT to the
estimation of the amplitude and frequency modulation laws of musical instrument tones. (See also Delprat et al. 1990.) This work may lead to a more complete analysis of musical tones, one that quanties the energy at each point in
the time-frequency plane, but that also accounts for its internal modulations.
This separation opens up the possibility of modifying the energy distribution
independent of the modulation, or vice versa, which is impossible with standard
Fourier techniques.
Leigh Smith (1996) proposed a Gabor wavelet representation of performed
rhythms (pulsations between 0.1 to 100 Hz in frequency) as a model for the
299
Summary
Where the telescope ends, the microscope begins. Which has the grander view?
Hugo 1862)
(Victor
300
Chapter 6
By allowing noise to make inroads into musical sound, Varese accelerated a trend toward
the shattering of traditional musical language. Together with the inadequacy of neoserialism, this resulted in a fragmentation of musical language and a consequent proliferation of composition techniques, very far from a possible theoretical unication. On the
other hand, the increase in methods of acoustic analysis created a situation analogous to
the physics of the microcosmosan imagined unity of sonic phenomena. Here began an
opposition to the continuous wave theory of sound, a granular atomism that was capable of
representing any chaotic state of sound material and was also eective on the plane of
synthesis. (Orcalli 1993)
Microsound in Composition
302
Chapter 7
303
Microsound in Composition
304
Chapter 7
BEG
DUR MU
BETA DEN
1.0
90
99
100
87
30
where BEG is begin time of a cloud, DUR is cloud duration, MU is the initial
center frequency of a cloud, BETA is the initial bandwidth of cloud, DEN is
the grain density, DELTA is the initial amplitude of a cloud, and MUSL,
BESL, DESL represent the time-varying slopes of the MU, BETA, and
DELTA parameters over the course of the cloud, respectively. I then prepared
a graphic score, which plotted each cloud over time. The graphic score also
shows variations in reverberation and spatial panning, which I added later.
I divided the sample calculations into eight one-minute sections. The entire
synthesis process involved sixty-three steps of compilation, calculation, data
transfer, and digital-to-analog conversion over a period of weeks. After assem-
305
Microsound in Composition
Figure 7.1 The rst four digraphs of Prototype. The arrows indicate parameter linkages
(see the text).
bling the eight sound fragments on analog tape, I took this tape to the Village
Recordera sound studio in West Los Angelesfor the nal mixdown on a
Quad-Eight mixing console. There I added electronic plate reverberation (EMT
reverberator with a 2.5 second reverberation time) and spatial panning according to the graphic score.
I played this etude to a number of friends and colleagues as a demonstration
of granular synthesis, and once in a concert in May 1975 at UCSD. Around the
same time, David Cloud, host of the new music program Zymurgy, broadcast
Prototype on the radio station KPFK-FM in Los Angeles. I can hardly imagine
the eect this strange experiment must have had on the unsuspecting listener.
306
Chapter 7
307
Microsound in Composition
Figure 7.2 Spectrum display of granular clouds in nscor by the author, from 5:06 to
5:40. The frequency scale is linear.
Composed in the winter of 198081, Field is another polychromatic composition, in which granular synthesis serves as one instrument in a larger orchestra. Field, like nscor, formed from the ground up, starting with the assembly of
sound objects, building larger structures as the composition continued. Again, I
realized the montage with analog equipment. A major dierence between nscor
and Field was the number of tracks that I had to work with. I now had the use
of a 24-track recorder at a studio in Boston (thanks to a grant from the MIT
Council for the Arts), making it much easier to create an eective montage.
The crossfading ow of musical shapes in Field reects this. The composition
premiered on New Year's Eve 1981, as part of Boston's First Night Festival.
In 1985, Field appeared on a compact disc sponsored by Sony and produced
by the MIT Media Laboratory as part of its inaugural. This CD was reissued
in 1999.
Granular synthesis appears only once in Field, in the form of a granular explosion at 3:383:43. This explosion is the climax of the piece, dividing its two
major sections.
Clang-Tint
Clang-Tint (1994) is not built exclusively from microsounds, but it is a work in
which microsounds are integrated with other sounds. Its point of origin traces
308
Chapter 7
back to a sunny late afternoon, 9 December 1990, following a visit to an exhibition of photographic works by the Starn brothers at the Akron Museum of
Art (Ohio, USA). These works combine prints and large transparencies with
wood, tape, metal pipes, and other media to create three-dimensional sculptures. I was intrigued by the mixture of ``sampled'' imagery in conjunction with
idiosyncratic methods of construction. Inspired by the exhibition, I imagined a
structure for a new composition. Shortly thereafter I received a commission
from the Kunitachi Ongaku Daigaku (Kunitachi College of Music, Tokyo) and
the Bunka-cho (Japanese Ministry for Cultural Aairs). The title Clang-Tint
refers to the notion of ``sound-color.'' The composition starts from sampled
sounds, cut and framed in myriad ways, spatialized throughout, and mixed with
unusual synthetic sounds.
Parts of the work were composed during residencies in Tokyo (1991 and
1994), as well as at my studio in Paris and at Les Ateliers UPIC. At the
Kunitachi school, I was fortunate to have access to the Gakkigaku Shirokan
(musical instrument museum). There I recorded forty-ve instruments, some
two millennia old, some as modern as the Ondes Martenota vacuum-tube
electronic instrument. These recording sessions resulted in a database of ten
hours of sound.
I conceived the composition in four sections: Purity, Filth, Organic, and
Robotic. Each section takes its identity from the sound materials and organizational principles used in it. For example, Purity explores a simple musical
world of sinusoidal waves and harmonies derived from a microtonal scale
created for this composition.
The second movement, entitled Organic, focuses on expressive phrasing. It
combines bursts of insect, animal, and bird calls with pulsar synthesis. Pulsars
appear throughout the composition in dierent forms: pulsating blips, elongated formant tones, and clouds of asynchronous pulsars. For the latter, I rst
generated multiple infrasonic pulsar trains, each one beating at a dierent frequency in the range of 6 to 18 Hz. I then mixed these together to obtain the
asynchronous pulsar cloud.
``Dirty'' sounds make up the sound material of Filth: a morass of crude
waveforms, raw transients, irregular globs and grains, industrial noises, and
distorted tones. Most of the synthetic particles appear in its rst section. After
an intense climax, the nal section consists of soft sounds in layers. Running
throughout most of the section is a recording of burning embers of wood. I
ltered and ring-modulated these natural particles to narrow their bandwidth
to a specic range in the middle part of the spectrum then extensively edited
309
Microsound in Composition
310
Chapter 7
Figure 7.3 Concert hall of the Australian National Conservatory, Melbourne, prior to
the performance of the premiere of Half-life, May 1998. View from the stage. A loudspeaker is suspended at the center.
Half-life received its world premiere in May 1998 at the Next Wave Festival
in Melbourne, Australia, with sound projection over twenty-eight loudspeakers
(gure 7.3). The revised version premiered at the SuperCollider Night School at
the Center for New Music and Audio Technologies (CNMAT), University of
California, Berkeley in July 1999.
Tenth vortex and Eleventh vortex
On the night of 20 October 2000, I performed eleven ``sound cloud vortices.''
These consisted of real-time granulations of a single sound le: a train of
electronic impulses emitted by the PulsarGenerator program. Granulation
expanded the time base of the original by a factor of up to six, while also ltering each grain and scattering it to a random point in space. The swirling
combination of thousands of individual grains makes up the vortex.
311
Microsound in Composition
I chose the tenth and eleventh performances for further treatment. I cut the
Tenth vortex into four pieces, then nine more pieces, tuning and tweaking on a
micro time scale. The work proceeded rapidly. I linked the parts into the nal
version on Christmas Eve 2000. Eleventh vortex (2001) called for more nonlinearity in the macrostructure. I divided it into over eighty fragments, which
resulted in a much more complicated compositional puzzle, and a more idiosyncratic structure, alternating between coalescence and disintegration. Both
vortices premiered at Engine 27 in New York City in February 2001.
Microsound Techniques in Works by Various Composers
This section, which presents examples of works by various composers, cannot
hope to be exhaustive. The goal is to indicate how microsonic techniques are
being used by others.
Barry Truax
Several years after my article on the implementation of digital granular synthesis (Roads 1978a), the technique began to evoke interest for other composers. The Canadian Barry Truax developed the rst of several implementations
in 1986. His implementations are notable for their emphasis on real-time operation. Real-time synthesis is inevitably stream-oriented, and the musical aesthetic explored by Mr. Truax reects this orientation. Since the mid-1980s, he
has applied granular synthesis as a central technique in his oeuvre. His primary
emphasis is on the real-time granulation of sampled sounds, where he introduced many innovations. He has documented these in numerous articles (Truax
1986, 1987, 1988, 1990a, 1990b, 1991, 1992, 1994a, 1994b, 1995, 1996a, 1996b).
Truax was the rst composer to explore the gamut of eects between synchronic and asynchronic granular synthesis, which he employed eectively in a
series of compositions. In Riverrun (1986, Wergo WER 201750), from 5:40
to 6:50, he generated a series of synchronic ``steps'' consisting of overlapping
grains that simultaneously drift apart and lengthen, forming a melodic line.
Wings of Nike (1987, Cambridge Street Records CSR CD-9401, also Perspectives of New Music CD PNM 28) was the rst granular piece to use a sampled
sound as its source. The entire work is derived from two 170-ms phonemes,
magnied by the composer into a twelve-minute opus (Truax 1990b). Mixed
from an eight-track master, the work explores evolving streams of synchronic and asynchronic grains which become increasingly dense and merge into
312
Chapter 7
313
Microsound in Composition
What makes Schall unique is its brilliant use of switching between dierent
time scales: from the microscopic up to the note-object level and down again
into the microscopic. Of course, the shorter the notes, the more broadband the
texture, as in the noisy section between 2:10 and 2:28, or the nal thirty seconds
of the work. Thus the interplay is not just between durations, but also between
pitch and noise.
Of the composition process, which involved interactive sound editing and
mixing software, the composer says:
Considering the handcrafted side, this is the way I worked on Schall (along with algorithmic generation and manipulation of sound materials): making a frame of 7 minutes and
30 seconds and lling it by ``replacing'' silence with objects, progressively enriching the
texture by adding here and there dierent instances (copies as well as transformations of
diverse order) of the same basic material. (Vaggione 1999)
314
Chapter 7
to right across the screen. By pasting a single particle multiple times, it became
a sound entity of a higher temporal order. Each paste operation was like a
stroke of a brush in a painting, adding a touch more color over the blank space
of the canvas. In this case, the collection of microsounds in the library can be
thought of as a palette. Since the program allowed the user to zoom in or out in
time, the composer could paste and edit on dierent time scales. The program
oered multiple simultaneous tracks on which to paste, permitting a rich interplay of microevents.
With Nodal (1997), Vaggione has taken the materials used in Schall several
steps further, while also opening up his sound palette to a range of percussion
instruments. The identity of these instruments is not always clear, however,
since they articulate in tiny particles. The composition lasts 13:06, and divides
into three parts, with part I lasting until 5:46, and part II spanning 5:49 to 9:20.
The strong opening gesture establishes immediately the potential force of the
granular texture, and sets up a dramatic tension. Although the continuously
granulating texture that follows is often quiet in amplitude, one senses that the
oodgates could burst at any moment. This eect is enhanced by ``creaking''
sounds that give the impression of reins being strained.
Part II begins with a warm uttering texture that turns into a chaotic noise.
While the ear tracks this low-frequency rumbling, at 6:18 a distinct mid-high
crotale ``roll'' with a sharp resonance at 1600 Hz sweeps across. The overall
texture becomes unpredictably turgid and chaotic, until at 7:11 the composer
introduces an element of stasis: a rapidly repeating piano tone during which the
granulation background briey lets up. This leads to a section of almost tactile
noise, soft like a wet snowstorm. At 8:46 another wood tapping appears. Part II
ends on an incongruous major chord from what sounds like a toy piano. Part
III introduces a resonant tom-tom-like tone. The background texture is high
in frequency, sounding like rain on a thin roof. The density of the texture
gradually builds, as new bursts and resonances sweep into view. The texture
ebbs at 11:35, letting up until 12:09. The penultimate texture (a low-frequency
rumbling that also concludes Agon), is a long 39-second fade out. This texture
continues (at a low amplitude) for several seconds after the nal gesture of the
piece, a concluding three-event percussive tag ending.
The electroacoustic work Agon (1998) further elaborates the processes and
materials heard in Nodal. It opens with a continuously ``grinding'' band in the
range between 6 kHz and 16 kHz. The rate of the grinding modulation is in the
range of 10 Hz to 20 Hz. The continuity of the high-frequency band is broken
by various colored explosions. It is as if dierent percussive sounds are being
315
Microsound in Composition
Figure 7.4 A time-domain view of the nal gesture in Horacio Vaggione's Agon, a
triple-stroke ``tom-click-hiss.''
316
Chapter 7
Figure 7.5 The amplitude envelope of the rst 61.8 seconds of Agon by Horaciio
Vaggione. The line marked T indicates the amplitude threshold between the foreground
peaks and the background granulations.
317
Microsound in Composition
As Michelangelo specied shape by chipping at his block of marble with a chisel, so
Pousseur specied crisp, clear, and pitched sounds by chipping at his block of white noise
with an electronic chisel called a lter. (Chadabe 1987)
318
Chapter 7
sound; or N elements are replaced by just one of their number, extended in length to span
the entire set of N. The ``reworks'' transformation is produced by this procedure, etc.
Another process that is used, although less radically, is Sound Shredding, a technique related to Brassage. In this process a xed length of sound is cut into arbitrary length segments which are reordered at random, and this process is then repeated many times. The
progressive application of this process is heard in the voices to ``water''-like transformation
from 13 m 20 s to 14 m 40 s. (Wishart 1996)
319
Microsound in Composition
Feuillages (1992) was composed by Philippe Schoeller for fourteen instruments and eight channels of electronic sound. The electronic part of this work
uses an ``acoustic pixel'' technique developed by the composer and his technical
assistant, R. G. Arroyo. The acoustic pixels (whose waveforms may be sampled
or synthesized) were generated in multiple streams (chemins straties), which
often overlap. The team then ltered the streams to create a ``liquid'' quality.
The Neapolitan composer Giancarlo Sica's En Sueno (Dreaming), realized in
1996, strives for a continuous transition from the comprehensible to the
incomprehensible. He achieves this through asynchronous granulation of the
Spanish word sueno sung by a tenor and repeated like an evocation. The granulation techniques were mixed with an additive synthesis texture that combined
thirty-two sine waves according to time-varying envelopes. The composition
was realized using the Csound language.
The Paris-based composer Gerard Pape has employed granular techniques in
combination with convolution in Makbenach (1997) for saxophone and tape.
In Makbenach, I worked with samples of various extended techniques for the saxophone,
developed and played by the saxophonist Daniel Kientzy. These were chained together to
make ``timbre paths.'' These timbre paths were composed as an alternative to isolated
``sound eects.'' That is, the paths involve a chaining together of isolated extended techniques to emphasize an overall timbral transformation, from simplicity to complexity,
purity to noise richness, harmonicity to inharmonicity, etc. (Pape 1998)
Pape used two programs to transform the timbre paths. First, he produced a
series of grains (using Cloud Generator) that followed a particular trajectory.
He used these as impulse responses to be convolved (using SoundHack) with
saxophone samples. The saxophone was transformed according to the path
of the grains. He also used Cloud Generator to create a series of evolving
granulations of the saxophone samples, establishing a new timbre path for
the saxophone, with the choppy rhythms of the grains.
Granular and phase vocoder techniques are central to certain compositions
of the German composer Ludger Brummer. His pieces scatter volcanic eruptions in a barely audible sea of sound. La cloche sans vallees (on Cybele CD
960.101) was realized in 1993 at the Stanford University studios in California.
The piece tends toward sparse, low-level textures, punctuated by ``explosive
interruptions'' (program notes). The Gates of H. (1993) expands a brief fragment of a folk melody sung by a female choir into an 18-minute composition.
After an explosive opening gesture, from about 1:10 to 2:07 the only sound is a
low-level reverberation which fades in and out of perception. A sharp burst
signals the start of the next section. The rest of the composition follows the
320
Chapter 7
Figure 7.6 The amplitude envelope of the 19-minute, 28-second composition Cri by
Ludger Brummer. The letters indicate crescendi sections. Note that the peak between C1
and C2 reects a momentary increase in low-mid-frequency energy (four pulsations). It
is not a crescendo.
321
Microsound in Composition
grain clouds using fteen dierent envelopes simultaneously, repetitive polyrhythmic arc clusters based on sample extractions placed in infrasonic frequencies, and long sequences of superimposed sine waves that produce beating
eects in the range of 40 to 120 Hz. In these works, microsonic techniques serve
an aesthetic of emergent symbolism. In the composer's words:
Deeply embedded ideas slowly attain the threshold of perception and, at the right moment,
brim over into conscious thought. (Robindore 1996b)
In his 1999 doctoral dissertation, Manuel Rocha describes his three granular
synthesis compositions: Transiciones de Fase (1993), SL-9 (1994), and Moin
Mor (1995), which was realized with the Granular Synthesis Toolkit (Eckel, et
al. 1995). Moin Mor is based on an eighth century Irish poem and other, contemporary, poems in Gaelic. These poems mix with recordings made during the
composer's trip to Ireland in 1993, as well as sounds recorded by Italian journalist Antonio Grimaldi from Bloody Sunday (1972) in Londonderry. Moin
Mor begins with a granulated voice reciting the poems, using only consonant
sounds. The rest of the piece consists of multiple layers of granulated textures,
including phoneme fragments, over which the composer superimposes recognizable ``scenes'' from the Irish soundscape.
The media artist Kenneth Fields realized a compelling soundtrack for an
interactive science-education CD-ROM entitled Life in the Universe (1996).
Realized at CREATE, University of California, Santa Barbara, it featured
resonant wood-like grains and granulations of the computer synthesized voice
of the physicist Steven Hawking. According to the composer:
It was an appropriate metaphor, I thought, to use granular synthesis techniques for a
project having to do with the properties and history of the universe, particle and wave
physics, and the possibility of nding intelligent life in the universe beyond the Earth.
Hawkings' synthesized narration for the CD-ROM was recorded on tape and sent to me.
The piece has three parts, corresponding to three terrains we used as an organizational/
navigational strategy: the cosmological, biological, and mathematical terrains. Original
materials were derived both from sampled and synthesized (Csound) sources, then processed with Gerhard Behles's real-time Granular program Stampede II (Behles, Starke,
and Roble 1998). The music for the CD-ROM then, is a multiple path composition, dependent on the user's navigational choices. (Fields 1998)
322
Chapter 7
323
Microsound in Composition
The developer of the Kyma system, the Illinois composer Carla Scaletti, used
it to time-stretch sounds from a telephone in an interactive installation called
Public Organ:
I used granular time stretching on the voices of installation participants who picked up a
telephone (with a high-quality microphone installed in place of the original telephone
microphone) and either left a ``voice mail message'' or were speaking with another installation participant over the net via CUSeeMe. I captured fragments of their voices and then
played them back a short time later time stretched. These time-stretched fragments were
stored on disk and became part of the sounds made by the installation in its other states,
long after the people had left. (Scaletti 1996)
Summary
[Of the Nobel prize winning physicist, J. J. Thomson, the discoverer of the electron] . . .
His early mathematical work was not outstandingly important. He was not skillful in the
execution of experiments. His talentone that is for both theorists and experimentalists
the most importantlay in knowing at every moment what was the next problem to be
attacked. (Weinberg 1983)
The synthesis and transformation of microsound has already aected the music
of our time. Viewing sound as a particulate substance opens up new approaches
to composition. Variations on the micro scale provoke changes in musical texture on higher time scales. Manipulations of particle density let coalescence and
evaporation function as event articulators, as they bring sound into and out of
being. Particle density also serves as the bridge between pulse and pitch, between rhythm and timbre. The plasticity of the sound, which is inherent in its
particle substrate, allows mutation to play an important compositional role,
since every sound object is a potential transformation.
Do we intend such operations to reduce compositional practice exclusively to
the level of microacoustical uctuations? No. They merely add this stratum to
324
Chapter 7
the rest of the known layers of composition, thereby enriching the eld. When
the microsonic layers interact with the higher layers, they tend to articulate
each other's specicity. We see this in the opposition of a long note underneath
a sparse cloud of impulses. But the situation is more complex, since we can
make any continuous sound evaporate at will. This introduces a fresh element
of dramatic tension to the unfolding of musical structure.
In the early years of digital synthesis, the musical possibilities inherent in
microsound were untested. This chapter shows that the situation is changing
and that important compositions employ these techniques. Utilizing microsound will require less justication in the future, it has already proved itself in
the most important arena, the domain of music composition.
Aesthetic Premises
The Philosophy of Organized Sound
Expansion of the Temporal Field
Illusions of Continuity and Simultaneity
A Multiscale Approach to Composition
Dierences Between Time Scales
Density, Opacity, Transparency
Stationary, Stochastic, and Intermittent Textures
Composition Processes on the Microsonic Level
Heterogenity and Uniqueness of Sound Materials
Aesthetic Oppositions
Formalism versus Intuitionism
Coherence versus Invention
Spontaneity versus Reection
Intervals versus Morphologies
Smoothness versus Roughness
Attraction versus Repulsion in the Time-Domain
Parameter Variation versus Strategy Variation
Simplicity versus Complexity in Microsound Synthesis
Code versus Grammar
Sensation versus Communication
Summary
326
Chapter 8
Aesthetics seems to thrive on controversy, even to demand it; on the conict, typically, of
new and old, of simplicity and complexity.
Edward Lippman (1992)
If styles and genres did not suer exhaustion, there would be only one style, one genre in
each art.
Jacques Barzun (1961)
327
Aesthetic Premises
Every doctrine of aesthetics, when put into practice, demands a particular mode of expressionin fact, a technique of its own. (Stravinsky 1936)
328
Chapter 8
Simultaneous with this sense of crisis, new types of music were emerging out
of new musical materials. These included Pierre Schaeer's musique concrete,
and electronic music based on impulses, sine waves, noise generators, and
eventually, computer-generated sounds.
The aesthetic of organized sound places great emphasis on the initial stage of
compositionthe construction and selection of sound materials. This may involve synthesis, which often begins with microsounds, furnishing the elementary components used in the assembly of higher-level sound objects. Just as the
molecular properties of wood, thatch, mud, steel, and plastic determine the
architectural structures one can construct with them, so sonic microstructure
inevitably shapes the higher layers of musical structure.
The middle layers of musical structuremesostructurearise through interaction with the material. That is, to sculpt sonic material into gestures or
phrases involves mediation between the raw waveforms and the will of the
composer. This mediation is not always immediately successful, which is part of
the struggle of composition. If the initial result is unsatisfactory, the composer
has two choices. The rst is to develop new materials that will more easily t a
preconceived phrase mold. The second choice is to abandon the mold, which
means following the ``inner tensions''to use Kandinsky's phraseof the
sonic material (Kandinsky 1926). In this case, the material suggests its own
mesostructures. Later, the composer may intervene to reshape these structures
from the vantage point of another time scale.
329
These interrelationships between sound and structure conrm what musicians have known all along: material, organization, and transformation work
together to construct a musical code. It is in this context that a given sound
accrues meaning and beauty.
Expansion of the Temporal Field
Music theorists have long acknowledged a multiplicity of time scales in compositions. Today we can extend this awareness to the micro time scale. The call
for an expanded temporal eld was rst issued in the 1930s by composers such
as Henry Cowell and John Cage, who said:
In the future . . . the composer (or organizer of sound) will be faced not only with the entire
eld of sound but also with the entire eld of time. The ``frame'' or fraction of a second,
following established lm technique, will probably be the basic unit in the measurement of
time. No rhythm will be beyond the composer's reach. (Cage 1937)
By the 1950s, electronic devices had opened paths to the formerly inaccessible territories of microtime. In electronic music studios, one could assemble
complex sounds by splicing together fragments of magnetic tape. Composers
such as Stockhausen and Xenakis began to explore the temporal limits of
composition using these tape splicing techniques where, at a typical tape speed
of 38 cm/sec, a 1 cm fragment represented a time interval of less than 27 ms.
The analog signal generators of the 1950s let composers create for the rst
time sequences of impulses that could be transposed to dierent time scales by
means of tape speed manipulations. Designed for test purposes, the analog signal generators were not meant to be varied in time but favored a timeless wave
approach to sound. Their multiple rotary knobs and switches did not allow
the user to switch instantly from one group of settings to another. Because of
the weakness of their temporal controls, these devices imposed strict practical
limits, which, with assistance and a great deal of labor, one could work. (The
creation of Stockhausen's Kontakte comes to mind.)
By the 1970s, voltage-controlled analog synthesizers had become available,
manufactured by Moog, Arp, Buchla, and other small companies. Analog
synthesizers oered control through low-frequency oscillators, manual keyboards, and analog sequencers, but they could not provide for ne control at
the micro time scale. Neither was analog tape an ideal medium for organizing
microsonic compositions, owing to its inherent generation loss, linear access,
and razor-blade splicing. It was only with the dawn of computer synthesis and
330
Chapter 8
331
Interaction between the microtemporal scale and higher time scales is especially intriguing. To cite a simple example; a gradual change in particle durations results in timbre variations on a higher time scale. Certain signals cross
from one time scale to another, such as a descending glissando that crosses the
infrasonic threshold, turning from tone to rhythm.
For some composers, part of the attraction of composing with microsound is
the way it blurs the levels of musical structure:
The task of microcompositional strategies can be described as one of letting global morphological properties of musical structure emerge from the local conditions in the sonic
matter. (Di Scipio 1994)
332
Chapter 8
series developed for one time scale (e.g., pitch periods) to another time scale
(e.g., note durations). (Specically, it does not make much sense to transpose
the intervallic relations of the chromatic scale into the domain of note durations). Little music corresponds to a geometrically pure and symmetric hierarchical model. As Vaggione has stated:
The world is not self-similar. . . . Coincidences of scale are infrequent, and when one thinks
that one has found one, it is generally a kind of reduction, a willful construction. The ferns
imitated by fractal geometry do not constitute real models of ferns. In a real fern there are
innitely more accidents, irregularities and formal capricesin a wordsingularities
than the ossication furnished by the fractal model. (Vaggione 1996a)
Strictly hierarchical and symmetrical note relations are not necessarily perceived as such (Vaggione 1998). Gyorgy Ligeti (1971) pointed out the diculty
of organizing all time scales according to a unitary scheme, owing to a lack of
correlation with human perception. The musicologist Carl Dahlhaus (1970)
adopted a similar tack in his critique of serial pitch rule applied to the organization of microstructure.
Sound phenomena on one time scale may travel by transposition to another
time scale, but the voyage is not linear. Pertinent characteristics may or may
not be maintained. In other words, the perceptual properties of a given time
scale are not necessarily invariant across dilations and contractions. A melody
loses all sense of pitch, for example, when sped up or slowed down to extremes.
This inconsistency, of course, does not prevent us from applying such transpositions. It merely means that we must recognize that each time scale abides
by its own rules. A perfect hierarchy is a weak model for composition.
Density, Opacity, Transparency
The expansion of sonic possibilities adds new terms to the vocabulary of music.
We can now shape sonic matter in terms of its particle density and opacity.
Particle density has become a prime compositional parameter. Physics denes
density as the ratio of mass to volume. In music this translates to the ratio of
sound to silence. Through manipulations of density, processes such as coalescence (cloud formation), and evaporation (cloud disintegration) can occur in
sonic form. Opacity correlates to density. If the density of microsonic events is
sucient, the temporal dimension appears to cohere, and one perceives a continuous texture on the sound object level. Thus by controlling the density and
size of sound particles we have a handle on the quality of sonic opacity. Coalescence takes place when particle density increases to the point that tone
333
Figure 8.1 A hole in a broadband sound, sculpted by a sonogram lter. We can carve
time-frequency gaps on dierent time scales.
continuity takes hold. An opaque sound tends to block out other sounds that
cross into its time-frequency zone.
Going in the opposite direction, we can cause a sound to evaporate by
reducing its particle density. A sparse sound cloud is transparent, since we can
easily hear other sounds through it. A diaphanous cloud only partially obscures
other sounds, perhaps only in certain spectral regions. For example, by means
of sonogram ltering we can create transparent holes in the spectrum of a
sound (gure 8.1) which might provide a window onto another layer of sound
beneath.
Stationary, Stochastic, and Intermittent Textures
Many complex musical textures resemble what statistics calls stationary processes. A stationary process exhibits no trend. The texture has a xed mean
value and uctuates around the mean with a constant variance. A stationary
334
Chapter 8
process is not necessarily static in time, but its variations remain within certain
limits, and are therefore predictable. We see these characteristics in many
sound-mass textures created with particle synthesis. Consider a dense cloud of
grains scattered over a broad zone of frequencies. It scintillates while never
evolving, and is therefore a stationary texture.
Stationary textures are fertile material for composition. One can place them
at low amplitude in the background layer, where they lend depth to the musical
landscape. Positioned in the foreground, their constant presence introduces
dramatic tension and sets up an expectation of change. The ear notices any
change as a deviation from the stationary.
Changes in texture appear as slow trends or sudden intermittencies. To impose a trend is to gradually change the texture. This may take place over time
periods associated with the meso time scale (i.e., many seconds). A trend converts a stationary texture into a weighted stochastic texture. One can introduce a
trend by opening or closing the bandwidth of a cloud, by altering its center
frequency, by ltering, or by any other perceptible time-varying operation.
Sudden changes create intermittent texture. The intermittencies break up
the stationary texture by injecting loud particles or silent micro-intervals. This
latter techniquecomposing with silenceremains largely unexplored, but can
be eective. The idea is to begin with a multichannel stationary texture and
introduce silent intervals into it, working like a sculptor, carving rhythmic and
spatial patterns by subtraction.
Composition Processes on the Microsonic Level
Interactive digital sound editing originated in the 1970s (Moorer 1977a, 1977b),
but did not become widely available until the 1990s. It seems so taken for
granted today, I am not sure that musicians recognize its profound impact on
the art of electronic music composition. The possibility of editing sound on
any time scale has opened up a vast range of transformations. For example,
through selective gain adjustment it is possible to magnify a tiny subaudio
uctuation into an intense microsonic event (see the description of transient
drawing, chapter 4). We can shape the event through microsurgery or ltering
on a micro time scale (chapter 5). By replicating the event on the micro time
scale, it can be transformed into a pitched sound object (see the description of
particle cloning synthesis in chapter 4). Through time-stretching it can be
magnied into a long, slowly unfolding texture (pitch-time changing chapter 5,
and sound transformation with the phase vocoder, chapter 6). Then through
335
granulation we can decompose it once again into small particles, ``from dust to
dust'' (granulation, chapter 5).
Such manipulations open up a catalog of new compositional processes:
1 Variations (contrasts, increases, and decreases) of particle density
1 Coalescence (cloud formation) and evaporation (cloud disintegration)
1 Time stretching to extend a microstructure into a large-scale event
1 Time shrinking large events into microsounds
1 Hierarchical variations of the same event structure on multiple time scales
1 Lamination of a cloud through multiple layers with microtemporal delays
1 Particle spatialization (scattering particles in space)
1 Granular reverberation
1 Precise polymetric pulsations in space, created by superimposing multiple
metrical streams
1 Multiple formant streams, each with its own frequency and spatial trajectory
1 Spectrum evolution via manipulation of particle envelopes
1 Microsurgery on the Gabor matrix to extract the chaotic, harmonic, loudest,
softest, or other selected particles within a sound and reassemble it with
alterations
Such operations change the practice of composition, and mandate a rethinking
of compositional strategy and architecture. This cultural process has only just
begun.
Heterogenity and Uniqueness of Sound Materials
In the 1950s, certain composers began to turn their attention toward the composition of sound material itself. In eect, they extended what had always been
true at the phrase level down to the sound object level. Just as every phrase and
macro form can be unique, each sound event can have an individual morphology. This creates a greater degree of diversityof heterogeneity in sound materialwithout necessarily losing continuity to other objects. Chapter 3 showed
how we can extend the concept of heterogeneity even further, down to the level
of microsound, where each sound particle may be unique. The microstructure
of any sound can be decomposed and rearranged, turning it into a unique
sound object.
336
Chapter 8
Aesthetic Oppositions
It seems inevitable that we seek to dene and understand phenomena by positing their opposite. High cannot be understood without the concept of low, and
so with near and far, big and small, etc. A given aesthetic tendency can be seen
as confronting its opposite. The question is whether such a simplication can
lead to a clarication. This section explores certain aesthetic oppositions raised
in composing with microsound.
Formalism versus Intuitionism
In composing with microsound, we face an ancient conict: formalism versus
intuitionism. Formal models of process are natural to musical thinking. As we
listen, part of us drinks in the sensual experience of sound, while another part is
setting up expectationshypotheses of musical process. To the envy of the
other arts, notation and logical planning have been part of music-making for
centuries. As Schillinger (1946) demonstrated, we can make a music generator
337
out of virtually any mathematical formula. Lejaren Hiller's pioneering experiments with automated composition in the 1950s proved that the computer
could model arbitrary formal procedures (Hiller and Isaacson 1959). Computer
programs sped up the time-consuming labor associated with systematic composition. This led to a surge of interest in applying mathematical procedures to
composition (Hiller 1970).
Since the start of music notation, it has been possible to manipulate musical
materials as symbols on paper, separated from the production of sound in time.
Herein lies a fundamental dichotomy. Because formal symbols can be organized abstractly, such manipulations have been closely identied with the organization of sound material. Music, however, is more than an abstract formal
discipline. It must eventually be rendered into sound and heard by human
beings. Thus it remains rooted in acoustical physics, auditory perception, and
psychology.
One cannot escape formal control when working with a computer. Every
gesture translates into an intervention with a formal system. This system is
encoded in the logic of a programming language and is executed according to
the algebra of the machine hardware. The question is at what level of musical
structure do such formalisms operate? The pianist practicing on a digital piano
is interacting with a computer music system. She is not concerned that her
performance is triggering a urry of memory accesses and data transfers. The
familiarity of the keyboard and the sampled piano sounds makes the interaction seem direct and natural. This is a great illusion, however. With a change of
formal logic, the same equipment that produces the piano tones could just as
well synthesize granular clouds, as we saw with the Creatovox (chapter 5).
Applied at dierent strata of compositional organization, formal algorithms
can be a powerful means of invention. An algorithm for spawning sound particles can handle enormous detail in a fraction of a second. Other algorithms
can iterate through a collection of variations quickly, oering the composer a
wide range of selections from which to choose. Interactive performance systems
try to balance preprogrammed automation with spontaneous decisions.
While formal algorithms enable interaction with a machine, formalism in
composition means imposing constraints on one's self. The formalist composer
follows a logical system from beginning to end. This logic exists only in an ideal
conceptual plan. The plan must ultimately be translated into the real world of
acoustics, psychoacoustics, and emotional response. It is in this translation that
the game is often lost.
338
Chapter 8
339
Barry Truax's granular synthesis programs GSX and GSAMX (Truax 1988)
incorporated several types of controls:
low-level grain parameterscenter frequency, frequency range, average duration, duration range, delay (density)
presetsgroups of stored grain parameter settings
rampspatterns of change in grain parameters, stored in a ramp le
tendency masksgraphic control shapes that are translated into ramps and
presets, stored in a tendency mask le
The composer could override any of these stored parameters in performance,
intermingling planned functions with spontaneous gestures.
Our Creatovox instrument (chapter 5) takes another approach to the problem of particle synthesis in real time. Designed as a solo instrument for virtuoso
performance, it is played using a traditional keyboard, with additional joystick,
fader, and foot pedal controllers. In the Creatovox, each keystroke spawns a
cloud of grains, whose parameters can be varied with the real-time controllers.
PulsarGenerator (chapter 4) was not intended for the concert-hall, although
we anticipated that it would be used in this way, because it could be operated in
real time. We wanted a program which would allow improvisation as a fast way
to explore the wide range of pulsar textures, but which also allowed for careful
planning through control by envelopes and stored presets.
Despite the attractions of real-time music-making, the studio environment
is the ultimate choice for the musician who seeks the maximum in creative
freedom:
1 The possibility of editing allows any previous decision to be revised or
retracted in the light of reection.
1 Rehearsal of all gestures permits renement.
1 In contrast to real-time improvisation, where the focus tends to be local in scope,
studio decision-making can take into account the entire range of time scales.
1 An arbitrary number of independent musical threads can be superimposed
carefully via mixing.
1 The sound structure can be monitored and manipulated on a particle-byparticle basis, which is impossible in real time.
A potential hazard in studio work is over-production. An over-elaborate
montage may result in a stilted and contrived product.
340
Chapter 8
341
Within these owing structures, the quality of particle densitywhich determines the transparency of the materialtakes on prime importance. An increase in density induces fusion. It lifts a cloud into the foreground, while a
decrease in density causes evaporation, dissolving a continuous sound band
into a pointillist rhythm or vaporous background texture. Keeping density
constant, a change in the characteristics of the particles themselves induces
mutation, an open-ended transformation.
Smoothness versus Roughness
The shapes of classical geometry are lines and planes, circles and spheres, triangles and
cones. They inspired a powerful philosophy of Platonic harmony. . . . [But] clouds are not
spheres. . . . Mountains are not cones. Lightning does not travel in a straight line. The new
geometry models a universe that is rough, not rounded, scabrous, not smooth. It is the
geometry of the pitted, pocked, and broken up, the twisted, tangled, and intertwined. . . .
The pits and tangles are more than blemishes distorting the classical shapes of Euclidean
geometry. They are often the keys to the essence of the thing. (Gleick 1988)
The universal principle of attraction and repulsion governed the primal cosmological explosion of the Big Bang as well as the inner structure of atomic
342
Chapter 8
Edgard Varese thought that it might be possible to adapt the principle of repulsion as an organizing principle:
When new instruments will allow me to write music as I conceive it, taking the place of the
linear counterpoint, the movement of sound-masses, or shifting planes, will be clearly perceived. When these sound-masses collide, the phenomena of penetration or repulsion will
seem to occur. (Varese 1971)
343
344
Chapter 8
345
346
Chapter 8
Sound waves speak directly to our senses. They can be likened to the immediate
perception of touch, if touch could penetrate to the inner ear. The experience of
347
Summary
Every work of art aims at showing us life and things as they are in truth, but cannot be
directly discerned by everyone through the mist of subjective and objective contingencies.
Art takes away the mist. (Schopenhauer 1819)
Art, and above all, music, has a fundamental function . . . It must aim . . . toward a total
exaltation in which the individual mingles, losing consciousness in a truth immediate, rare,
enormous, and perfect. If a work of art succeeds in this undertaking, even for a single moment, it attains its goal. (Xenakis 1992)
348
Chapter 8
Conclusion
350
Chapter 9
controllers, we have built particle synthesis instruments for virtuoso performance, not only for onstage, but also for studio use.
In computer graphics, three-dimensional animation programs incorporate
sophisticated algorithms for scattering particles. These emulate physical models
of ow, perturbation, and collision. In the domain of sound, we can also apply
physical models to regulate the ow of particles. But we should not be limited
to emulations of reality. As stated at the beginning of the book, the computer's
artistic power derives from its ability to model fantasies as well as reality.
Creating sonic fantasies begins with recording, letting us ``photograph'' real
sounds and store their images on tape. The techniques of montagecutting,
splicing, and mixingare essentially manipulations of time structure. As a
pioneer in electronic composition once observed:
If there is one dimension of the music totality, one component which originally led composers to the electronic medium, it was and is the temporal domain. . . . Those who originally turned to electronic tape were obviously attracted to the element of control. After all,
the tape was not a source of sound. Tape is for storage. You can, however, control time as a
measurable distance of tape. Here we are talking about rhythm in every sense of the word.
Not only durational rhythm, but also the time rate of changes of register, of timbre, of
volume, and of those many musical dimensions that were unforeseen until we tried to nd
out how we heard and how we could structure the temporal. (Babbitt 1988)
Discs and semiconductors have largely superseded tape. Mixing and editing
software has replaced the splicing block. A fundamental capability of this software is the zooming in and out on multiple time scales. These tools let us work
at the limits of auditory phenomena, from microsurgery on individual sample
points, to the global rearrangement of sound masses. In the intermediate time
scales, we can edit each sound until it has exactly the right duration, density,
spectral weight, amplitude envelope, and spatial prole. The timing of each
transition can be adjusted precisely.
Much work remains in rethinking existing signal processing paradigms to
take into account the micro time domain. The question is how can researchers
integrate windowing, scattering, and density manipulations with other signal
processing operations on multiple time scales?
Microacoustic phenomena are still not fully understood. Likewise, in science,
the study of granular processes has emerged as a distinct scientic discipline,
with a focus on the interaction of granular streams confronted by external
forces and objects (Behringer and Harmann 1998). At the same time, we see
an emerging science of disordered systems, phase transitions, intermittencies,
351
Conclusion
and particle simulations. These too may serve as fertile analogies for musical
processes.
Guidebooks to the sonic territories, including the present one, are incomplete. Few maps exist, and shortcuts are scarce. So we base our composition
strategy on a heuristic search for remarkable sound objects, mesostructures,
and transformations. It is the job of each composer to discover the interrelations between operations on the micro time scale and their perceptual eects
at other time scales.
Listeners are increasingly open to the beauties found in complex transient
elements, particularly when they appear in lush combination textures. The
acceptance of microsounds also reects an increasing sophistication in their
deployment. When a synthesis technique is rst invented, it is not evident how
to best compose with itevery new instrument requires practice. Much experience has been gained with microsound, and every composer who touches this
resource brings new insights to the puzzle.
Gradually this strange terrain will become familiar as the storehouse of signature gestures accumulates. The understanding shared by only a small circle of
composers today will grow more widespread. We need not concern ourselves as
to whether electronic music will evolve into a language as formal as commonpractice harmony (which is not totally formalized), rather, it is our destiny to
enjoy our newfound land, to invent materials and codes, and to revel in creative
freedom.
References
Allen, J. B., and L. R. Rabiner. 1977. ``A unied approach to short-time Fourier analysis and synthesis.'' Proceedings of the IEEE 65: 15581564.
American Technology Corporation. 1998. ``HyperSonic sound.'' Internet: www.atcsd.
com.
Apel, K. 1972. Harvard Dictionary of Music. Cambridge, Massachusetts: Harvard University Press.
Arb, D. 1990. ``In the intimacy of a sound.'' In S. Arnold and G. Hair, eds. Proceedings
of the 1990 International Computer Music Conference. pp. 4345.
Arb, D. 1991. ``Analysis, transformation, and resynthesis of musical sounds with the
help of a time-frequency representation.'' In G. De Poli, A. Piccialli, and C. Roads,
eds. 1991. Representations of Musical Signals. Cambridge, Massachusetts: MIT Press.
pp. 87118.
Arb, D. 1998. ``Dierent ways to write digital audio eects programs.'' In B. Garau
and R. Loureiro, eds. Proceedings 98 Digital Audio Eects Workshop. Barcelona: Pompeu Fabra University. pp. 188191.
Arb, D., and N. Delprat. 1992. ``Sound Mutations, a program to transform musical
sounds.'' In A. Strange, ed. Proceedings of the 1992 International Computer Music Conference. San Francisco: International Computer Music Association. pp. 4423.
Arb, D., and N. Delprat. 1993. ``Musical transformations through modications of
time-frequency images.'' Computer Music Journal 17(2): 6672.
Arb, D., and N. Delprat. 1998. ``Selective transformation using a time-frequency representation: an application to the vibrato modication.'' Preprint 4652 (P5S2). Presented at the 104th Convention. New York: Audio Engineering Society.
Arb, D., J. Dudon, and P. Sanchez. 1996. ``WaveLoom, logiciel d'aide a la creation
de disques photosoniques.'' Internet: www.ircam.fr/equipes/repmus/jim96/actes/arb/
waveloom.html.
354
References
Arkani-Hamed, N., S. Dimopoulos, and G. Dvali. 2000. ``The universe's unseen
dimensions.'' Scientic American 283(2): 6269.
Arneodo, A., F. Argoul, E., J. Elezgaray, and J.-F. Muzy. 1995. Ondelettes, multifractales, et turbulences. Paris: Diderot Editeur.
Babbitt, M. 1962. ``Twelve-tone rhythmic structure and the electronic medium.'' Perspectives of New Music 1(1). Reprinted in B. Boretz and E. Cone, eds. 1972. Perspectives
on Centemporary Music Theory. New York: W. W. Norton. pp. 148179.
Babbitt, M. 1988. ``Composition in the electronic medium.'' In F. Roehmann and
F. Wilson, eds. 1988. The Biology of Music-making. Saint Louis: MMB Music. pp. 208
213.
ber die Bedeutung der Ausgleichsvorgange in der Akustik.''
Backhaus, J. 1932. ``U
Zeitschrift fur technische Physik 13(1): 3146.
Backus, J. 1962. ``die Reihe: a scientic evaluation.'' Perspectives of New Music 1(1):
160.
Backus, J. 1969. The Acoustical Foundations of Music. New York: Norton.
Bacry, A., A. Grossman, and J. Zak. 1975. ``Proof of the completeness of the lattice
states in the kq representation.'' Physical Review B12: 1118.
Barlow, C. 1997. ``On the spectral analysis of speech for subsequent resynthesis by
acoustic instruments.'' In F. Barriere and G. Bennett, eds. Analyse en Musique Electroacoustique. Bourges: Editions Mnemosyne. pp. 276283.
Barnwell, T., and C. Richardson. 1995. ``The discrete-time wavelet transformation and
audio coding.'' Preprint 4047 (A-2). Presented at the 99th Convention of the Audio
Engineering Society. New York: Audio Engineering Society.
Barrett, N. 1997. ``Structuring processes in electroacoustic music.'' Ph.D. thesis. London: City University.
Barrett, N. 1998. ``Little Animalscompositional ideas, a brief summary.'' Unpublished
manuscript.
Barrett, N. 1999. ``Little Animals: compositional structuring processes.'' Computer Music Journal 23(2): 1118.
Bartetzki, A. 1997a. ``CMask: ein stochastischer Eventgenerator fur Csound.'' Mitteilunger 26. Berlin: DegeM.
Bartetzki, A. 1997b. ``Csound score generation and granular synthesis with Cmask.''
Internet: www.kgw-tu-berlin.de/@abart/CMaskPaper/cmask-article.html.
Barzun, J. 1961. ``The request for the loan of your ears.'' In H. Russcol. 1972. The Liberation of Sound. Englewood Clis: Prentice-Hall. pp. ixxii.
355
References
Bar-Joseph, Z., D. Lischinski, M. Werman, S. Dubnov, and R. El-Yaniv. 1999.
``Granular synthesis of sound textures using statistical learning.'' In J. Fung, ed. Proceedings of the 1999 International Computer Music Conference. pp. 178181.
Bass, S., and T. Goeddel. 1981. ``The ecient digital implementation of subtractive
music synthesis.'' IEEE Micro 1(3): 2437.
Bastiaans, M. 1980. ``Gabor's expansion of a signal into Gaussian elementary signals.''
Proceedings of the IEEE 68: 538539.
Bastiaans, M. 1985. ``On the sliding-window representation of signals.'' IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-33(4): 868873.
Bastiaans, M., and M. Geilen. 1996. ``On the discrete Gabor transform and the discrete
Zak transform.'' Signal Processing 49(3): 151166.
Bayle, F. 1989. ``La musique acousmatique ou l'art des sons projetes.'' Paris: Encyclopedia Universalis.
Bayle, F. 1993. Musique Acousmatique. Paris: Institut National de l'Audiovisuel/Groupe
de Recherches Musicales et Buchet/Chastel.
Beauchamp, J. 1969. ``A computer system for time-variant harmonic analysis and synthesis of musical tones.'' In H. von Foerster and J. Beauchamp, eds. Music by Computers. New York: Wiley. pp. 1962.
Beauchamp, J. 1975. ``Analysis and synthesis of cornet tones using nonlinear interharmonic relationships.'' Journal of the Audio Engineering Society 23(10): 718795.
Beekman, I. 16041634. Journal tenu par Isaac Beekman de 1604 a 1634. Four volumes.
C. de Waard, ed. 1953. The Hague.
Behles, G., S. Starke, and A. Roble. 1998. ``Quasi-synchronous and pitch-synchronous
granular sound processing with Stampede II.'' Computer Music Journal 22(2): 4451.
Behringer, R., and H. J. Herrmann, eds. 1998. Granular Matter. Volume 1. Berlin:
Springer-Verlag. ( journal).
Benade, A. 1990. Fundamentals of Musical Acoustics. New York: Dover Publications.
Originally published 1976.
Bencina, R. 2000. Audiomulch software. Internet: www.audiomulch.com.
Bennett, G., and X. Rodet. 1989. ``Synthesis of the singing voice.'' In M. Mathews and
J. Pierce, eds. Current Directions in Computer Music Research. Cambridge, Massachusetts: MIT Press. pp. 1944.
Berg, P. 1978. ``A user's manual for SSP.'' Utrecht: Institute of Sonology.
Berkley, S. 1994. ``QuickMQ: a software tool for the modication of time-varying spectrum analysis les.'' M.S. thesis. Hanover: Department of Music, Dartmouth College.
356
References
Blauert, J. 1997. Spatial Hearing. Cambridge, Massachusetts: MIT Press.
Bode, H. 1984. ``History of electronic sound modication.'' Journal of the Audio Engineering Society 32(10): 730739.
Boerger, G. 1965. ``Die Lokalisation von Gausstonen.'' Doctoral dissertation. Berlin:
Technische Universitat, Berlin.
Boulanger, R. 2000. ed. The Csound Book. Cambridge, Massachusetts: MIT Press.
Boulez, P. 1960. ``Form.'' Darmstadt lecture, reprinted in P. Boulez, 1986. Orientations.
London: Faber and Faber. pp. 9096. Translated by M. Cooper.
Bowcott, P. 1989. ``Cellular automation as a means of high level compositional control
of granular synthesis.'' In T. Wells and D. Butler, eds. 1989. Proceedings of the 1989
International Computer Music Conference. San Francisco: Computer Music Association.
pp. 5557.
Boyer, F., and R. Kronland-Martinet. 1989. ``Granular resynthesis and transformation
of sounds through wavelet transform analysis.'' In T. Wells and T. Butler, eds. Proceedings of the 1989 International Computer Music Conference. San Francisco: International Computer Music Association. pp. 5154.
Brandenburg, K., and M. Bosi. 1997. ``Overview of MPEG audio: current and future
standards for low-bit-rate audio coding.'' Journal of the Audio Engineering Society
45(1/2): 421.
Bristow-Johnson, R. 1995. ``A detailed analysis of a time-domain formant-corrected
pitch-shifting algorithm.'' Journal of the Audio Engineering Society 43(5): 340352.
Brun, H. 1970. ``From musical ideas to computers and back.'' In H. Lincoln, ed. The
Computer and Music. Ithaca: Cornell University Press. pp. 2341.
Brun, H. 1983. Compositions. Program notes with long-play recording Nonsequitur 13.
Champaign: Non Sequitur Records.
Budon, O. 2000. ``Composing with objects, networks, and time scales: an interview with
Horacio Vaggione.'' Computer Music Journal 24(3): 922.
Bumgardner, J. 1997. Syd 1.0 User Manual. Internet: www.jbum.com/jbum.
Buser, P., and M. Imbert. 1992. Audition. Cambridge, Massachusetts: MIT Press.
Butler, D. 1992. The Musician's Guide to Perception and Cognition. New York: Schirmer
Books.
Cage, J. 1937. ``The future of music: credo.'' Lecture reprinted in Cage (1973). pp. 36.
Cage, J. 1959. ``History of experimental music in the United States.'' Lecture reprinted
in Cage (1973). pp. 6775.
357
References
Cage, J. 1973. Silence. Middletown: Wesleyan University Press.
Cahill, T. 1897. U. S. Patent 580, 035.
Cahill, T. 1914. U. S. Patent 1, 107, 261.
Cahill, T. 1917. U. S. Patent 1, 213, 803 and 1, 213, 804.
Cahill, T. 1919. U. S. Patent 1, 295, 691.
Calvet, D., C. Vallee, R. Kronland, and T. Voinier. 2000. ``Descriptif technique du
Cosmophone a 24 voies.'' Internet: cosmophone.in2p3.fr/docs/cosmo24.pdf.
Castonguay, C. 1972. Meaning and Existence in Mathematics. New York: Springer-Verlag.
Castonguay, C. 1973. ``Mathematics and ontology.'' In M. Bunge, ed. The Methodological Unity of Science. Dordrecht: D. Reidel. pp. 1522.
Cavaliere, S., and A. Piccialli. 1997. ``Granular synthesis of musical signals.'' In
C. Roads, S. Pope, A. Piccialli, and G. De Poli, eds. Musical Signal Processing. Lisse:
Swets & Zeitlinger.
Cavaliere, S., G. Evangelista, and A. Piccialli. 1988. ``Synthesis by phase modulation
and its implementation in hardware.'' Computer Music Journal 12(1): 2942.
Cavendish, Margaret Lucas. 1653. Poems and Fancies. London.
Chadabe, J. 1997. Electric Sound. Upper Saddle River: Prentice-Hall.
Chaitin, G. 1998. The Limits of Mathematics. Singapore: Springer-Verlag Singapore.
Chavez, C. 1936. Toward a New Music. Reprinted 1975. New York: Da Capo Press.
Cheng, C. 1996. ``Wavelet signal processing of digital audio with applications in electroacoustic music.'' M.A. thesis. Hanover: Department of Music, Dartmouth College.
Cheng, C. 1997. ``High-frequency compensation of low sample-rate audio les: a wavelet-based spectral excitation algorithm.'' In T. Rikakis, ed. Proceedings of the 1997 International Computer Music Conference. San Francisco: International Computer Music
Association. pp. 458461.
Chion, M. 1982. La musique electroacoustique. Paris: Presses Universitaires de France.
Chou Wen-Chung. 1966. ``A Varese chronology.'' Perspectives of New Music. Reprinted
in B. Boretz and E. Cone, eds. 1971. Perspectives on American Composers. New York:
Norton. pp. 5558.
Chowning, J. 1973. ``The synthesis of complex audio spectra by means of frequency
modulation.'' Journal of the Audio Engineering Society 21(7): 526534. Reprinted in
C. Roads and J. Strawn, eds. 1985. Foundations of Computer Music. Cambridge,
Massachusetts: MIT Press. pp. 629.
358
References
Christensen, E. 1996. The Musical Timespace. Aalborg: Aalborg University Press.
Clark, M. 1993. ``Audio technology in the United States to 1943 and its relationship to
magnetic recording.'' Preprint 3481(H21). New York: Audio Engineering Society.
Clarke, M. 1996. ``Composing at the intersection of time and frequency.'' Organised
Sound 1(2): 107117.
Clarke, M. 1998. ``Extending Contacts: the concept of unity in computer music.'' Perspectives of New Music 36(1): 221239.
Clozier, C. 1998. ``Composition-diusion/interpretation en musique electroacoustique.''
In F. Barriere and G. Bennett, eds. Composition/Diusion en Musique Electroacoustique.
Bourges: Editions Mnemosyne. pp. 52101.
Cochran, W. 1973. The Dynamics of Atoms in Crystals. London: Edward Arnold.
Coelho, V., ed. 1992. Music and Science in the Age of Galileo. Dordrecht: Kluwer Academic Publishers.
Cogan, R. 1984. New Images of Musical Sound. Cambridge, MA: Harvard University
Press.
Cohen, H. 1984. Quantifying Music. Dordrecht: D. Reidel Publishing Company.
Cook, P. 1996. ``Physically informed sonic modeling (PhISM): percussive synthesis.'' In
L. Ayers and A. Horner, eds. Proceedings of the 1996 International Computer Music
Conference. pp. 228231.
Cook, P. 1997. ``Physically informed sonic modeling (PhISM): synthesis of percussive
sounds.'' Computer Music Journal 21(3): 3849.
Cooley, J., and J. Tukey. 1965. ``An algorithm for the machine computation of complex
Fourier series.'' Mathematical Computation 19: 297301.
Cope, D. 1996. Experiments in Musical Intelligence. Madison: A-R Editions.
Correa, J., E. Miranda, and J. Wright. 2000. ``Categorising complex dynamic sounds.''
Internet: www.nyrsound.com.
Cowell, H. 1930. New Musical Resources. New York: A. A. Knopf. Reprinted 1996.
Cambridge, England: Cambridge University Press.
Craven, P. G., and M. A. Gerzon. 1996. ``Lossless coding for audio discs.'' Journal of
the Audio Engineering Society 44(9): 706720.
Crawford, F. 1968. Waves. Berkeley Physics Course Volume 3. New York: McGraw-Hill.
Dahlhaus, C. 1970. ``Aesthetische Probleme der elektronischen Musik.'' In F. Winckel,
ed. Experimentelle Musik. Berlin: Mann-Verlag.
359
References
D'Allessandro, C., and X. Rodet. 1989. ``Synthese et analyse-synthese par fonctions
d'ondes formantiques.'' Journal Acoustique 2: 163169.
Davies, H. 1964. ``Die Reihe Reconsidered1.'' Composer 14: 20.
Davies, H. 1965. ``Die Reihe Reconsidered2.'' Composer 16: 17.
de Broglie, L. 1945. Ondes corpuscules mecanique ondulatoire. Paris: Albin Michel.
de Campo, A. 1998. ``Using recursive diminution for the synthesis of micro time events.''
Unpublished manuscript.
de Campo, A. 1999. SuperCollider Tutorial Workshop. Documentation provided with
the program. Austin: AudioSynth.
Delprat, N., P. Guillemain, and R. Kronland-Martinet. 1990. ``Parameter estimation for
non-linear resynthesis methods with the help of a time-frequency analysis of natural
sounds.'' In S. Arnold and G. Hair, eds. Proceedings of the 1990 International Computer
Music Conference. pp. 8890.
De Poli, G., and A. Piccialli. 1988. ``Forme d'onda per la sintesi granulare sincronica.''
In D. Tommassini, ed. 1988. Atti di VII Colloquio di Informatica Musicale. Rome:
Associazione Musica Verticale. pp. 7075.
De Poli, G., and A. Piccialli. 1991. ``Pitch-synchronous granular synthesis.'' In G. De
Poli, A. Piccialli, and C. Roads, eds. Representations of Musical Signals. Cambridge,
Massachusetts: MIT Press. pp. 187219.
de Reydellet, J. 1999. Personal communication.
Desantos, S. 1997. ``Acousmatic morphology: an interview with Francois Bayle.'' Computer Music Journal 21(3): 1119.
Di Scipio, A. 1990. ``Composition by exploration of nonlinear dynamical systems.'' In
S. Arnold and G. Hair, eds. Proceedings of the 1990 International Computer Music
Conference. San Francisco: International Computer Music Association. pp. 324327.
Di Scipio, A. 1994. ``Formal processes of timbre composition challenging the dualistic
paradigm of computer music.'' Proceedings of ISEA, from Internet: www.uiah./
bookshop/isea_proc/nextgen/j/19.html.
Di Scipio, A. 1995. ``Da Concret PH a GENDY301, modelli compositivi nella musica
elletroacustica de Xenakis.'' Sonusmateriali per la musica contemporanea 7(13): 61
92.
Di Scipio, A. 1997a. ``The problem of 2nd-order sonorities in Xenakis' electroacoustic
music.'' Organised Sound 2(3): 165178.
Di Scipio, A. 1997b. ``Interactive micro-time sound design.'' Journal of Electroacoustic
Music 10: 48.
360
References
Di Scipio, A. 1998. ``Scienza e musica dei quanti acustici: l'eredita di Gabor.'' Il Monocordo 3(6): 6178.
Dodge, C., and T. Jerse. 1997. Computer Music: Synthesis, Composition, and Performance. Second edition. New York: Schirmer.
Dolson, M. 1983. ``A tracking phase vocoder and its use in the analysis of ensemble
sounds.'' Ph.D. dissertation. Pasadena: California Institute of Technology.
Dolson, M. 1985. ``Recent advances in musique concrete at CARL.'' In B. Truax, ed.
Proceedings of the 1985 International Computer Music Conference. San Francisco:
International Computer Music Association. pp. 5560.
Dolson, M. 1986. ``The phase vocoder: a tutorial.'' Computer Music Journal 10(4): 14
27.
Dolson, M., and R. Boulanger. 1985. ``New directions in the musical use of resonators.''
Unpublished manuscript.
Doughty, J., and W. Garner. 1947. ``Pitch characteristics of short tones I: two kinds of
pitch thresholds.'' Journal of Experimental Psychology 37: 351365.
Douglas, A. 1957. The Electrical Production of Music. New York: Philosophical Library.
Dudon, J., and D. Arb. 1990. ``Synthese photosonique.'' Actes du 1er congres
d'Acoustique, colloque C2. Marseille. pp. 845848
Dutilleux, P., A. Grossmann, and R. Kronland-Martinet. 1988. ``Application of the
wavelet transform to the analysis, transformation, and synthesis of musical sounds.''
Preprint 2727 (A2). Presented at the 85th Convention. New York: Audio Engineering
Society.
Eckel, G. 1990. ``A signal editor for the IRCAM Musical Workstation.'' In S. Arnold
and G. Hair, eds. Proceedings of the 1990 International Computer Music Conference. San
Francisco: International Computer Music Association. pp. 6971.
Eckel, G., M. Rocha-Iturbide, and B. Becker. 1995. ``The development of GiST, a
granular synthesis toolkit based on an extension of the FOF Generator.'' In R. Bidlack,
ed. Proceedings of the 1995 International Computer Music Conference. San Francisco:
International Computer Music Association. pp. 296302.
Eimert, H. 1955. ``What is electronic music.'' die Reihe 1:110. English edition 1958.
Bryn Mawr: Theodore Presser Company.
Einstein, A. 1952. Relativity: The Special and the General Theory. Fifteenth edition of
the original published in 1916. New York: Three Rivers Press.
Elmore, W., and M. Heald. 1969. Physics of Waves. New York: Dover.
361
References
Endrich, A., Coordinator. 2000. Composer's Desktop Project. 12 Goodwood Way,
Cepen Park South, Chippenham Wiltshire SN14 0SY, United Kingdom. archer@
trans4um.demon.co.uk.
Erbe, T. 1995. SoundHack User's Manual. Oakland: Mills College.
Erne, M. 1998. ``Embedded audio compression using wavelets and improved psychoacoustic models.'' In B. Garau and R. Loureiro, eds. Proceedings 98 Digital Audio
Eects Workshop. Barcelona: Pompeu Fabra University. pp. 147150.
Evangelista, G. 1991. ``Wavelet transforms that we can play.'' In G. De Poli, A. Piccialli, and C. Roads, eds. 1991. Representations of Musical Signals. Cambridge, Massachusetts: MIT Press. pp. 119136.
Evangelista, G. 1997. ``Wavelet representation of musical signals.'' In Roads, et al. eds.
Musical Signal Processing. Lisse: Swets and Zeitlinger. pp. 126153.
Evangelista, G., and S. Cavaliere. 1998. ``Dispersive and pitch-synchronous processing
of sounds.'' In B. Garau and R. Loureiro, eds. Proceedings 98 Digital Audio Eects
Workshop. Barcelona: Pompeu Fabra University. pp. 232236.
Fairbanks, G., W. Everitt, and R. Jaeger. 1954. ``Method for time or frequency compression-expansion of speech.'' Institute of Radio Engineers Transactions on Audio AV
2(1): 712.
Feichtinger, H., and T. Strohmer, eds. 1998. Gabor Analysis and Algorithms. Boston:
Birkhauser.
Fields, K. 1998. Personal communication.
Fitz, K., L. Haken, and B. Holloway. 1995. Lemur Pro 4.0.1 documentation. Urbana:
Engineering Research Laboratory, University of Illinois.
Flanagan, J. L., and R. Golden. 1966. ``Phase vocoder.'' Bell System Technical Journal
45: 14931509.
Flanagan, J. L. 1972. Speech Analysis, Synthesis, and Perception. New York: SpringerVerlag.
Fletcher, N., and T. Rossing. 1991. The Physics of Musical Instruments. New York:
Springer-Verlag.
Fokker, A. 1962. ``Wherefore, and Why?'' die Reihe 8. English edition 1968, 6879.
Fourier, L. 1994. ``Jean-Jacques Perrey and the Ondioline.'' Computer Music Journal
18(4): 1825.
Fraisse, P. 1982. ``Rhythm and tempo.'' In D. Deutsch, ed. The Psychology of Music.
Orlando: Academic.
362
References
Freed, A. 1987. ``MacMix: recording, mixing, and signal processing on a personal computer.'' In J. Strawn, ed. 1987. Music and Digital Technology. New York: Audio Engineering Society. pp. 158162.
Freedman, M. D. 1967. ``Analysis of musical instrument tones.'' Journal of the Acoustical Society of America 41: 793806.
Freeman, W. 1991. ``The physiology of perception.'' Scientic American 264(2): 7885.
Freeman, W. 1995. ``Chaos in the central nervous system.'' In F. Ventriglia, ed. Neural
Modeling and Neural Networks. New York: Pergamon Press. pp. 185216.
Gabor, D. 1946. ``Theory of communication.'' Journal of the Institute of Electrical
Engineers Part III, 93: 429457.
Gabor, D. 1947. ``Acoustical quanta and the theory of hearing.'' Nature 159(4044): 591
594.
Gabor, D. 1952. ``Lectures on communication theory.'' Technical Report 238, Research
Laboratory of Electronics. Cambridge, Massachusetts: Massachusetts Institute of Technology.
Galente, F., and N. Sani. 2000. Musica Espansa. Lucca: Ricordi LIM.
Galilei, Galileo. 1623. ``Parable of Sound.'' Quoted in V. Coelho. ``Musical myth and
Galilean science in Giovanni Serodine's Allegoria della Scienza.'' See V. Coelho (1992).
Gardner, M. 1957. Fads and Fallacies in the Name of Science. New York: Dover.
Gardner, W. 1995. ``Ecient convolution without input-output delay.'' Journal of the
Audio Engineering Society 43(3): 127136.
Gassendi, P. 1658. Syntagma Philosophicum.
George, E., and M. Smith. 1992. ``Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones.'' Journal of the Audio Engineering Society 40(6): 497516.
Gerrard, G. 1989. ``Music4CA Macintosh version of Music IVBF in C.'' Melbourne:
University of Melbourne.
Gerzon, M. 1973. ``Periphony: with-height sound reproduction.'' Journal of the Audio
Engineering Society 21(1): 210.
Gibson, B., C. Jubien, and B. Roden. 1996. ``Method and apparatus for changing the
timbre and/or pitch of audio signals.'' U. S. Patent 5,567,901.
Gleick, J. Chaos. 1988. London: Cardinal.
Goeyvaerts, K. 1955. ``The sound material of electronic music.'' die Reihe 1: 357.
English edition 1958. Bryn Mawr: Theodore Presser Company.
363
References
Gogins, M. 1991. ``Iterated function systems music.'' Computer Music Journal 15(1): 40
48.
Gogins, M. 1995. ``Gabor synthesis of recurrent iterated function systems.'' In R.
Bidlack, ed. Proceedings of the 1995 International Computer Music Conference. San
Francisco: International Computer Music Association. pp. 349350.
Gordon, J. 1996. ``Psychoacoustics in computer music.'' In C. Roads. 1996. The Computer Music Tutorial. Cambridge, Massachusetts: MIT Press. pp. 10531069.
Green, D. 1971. ``Temporal auditory acuity.'' Psychological Review 78(6): 540551.
Grey, J. 1975. ``An exploration of musical timbre.'' Report STANM2. Stanford
University Department of Music.
Grossman, A., M. Holschneider, R. Kronland-Martinet, and J. Morlet. 1987. ``Detection of abrupt changes in sound signals with the help of wavelet transforms.'' Inverse
Problems: Advances in Electronic and Electronic Physics. Supplement 19. San Diego:
Academic Press. pp. 298306.
Hamlin, P., with C. Roads. 1985. ``Interview with Herbert Brun.'' In C. Roads, ed.
Composers and the Computer. Madison: A-R Editions. pp. 115.
Harley, J. forthcoming. The Music of Iannis Xenakis. London: Harwood Academic
Publishers.
Harris, F. 1978. ``On the use of windows for harmonic analysis with the discrete Fourier
transform.'' Proceedings of the IEEE 66(1): 5183.
Hawking, S., and R. Penrose. 1996. ``The nature of space and time.'' Scientic American
248(7): 6065.
Heisenberg, W. 1958. Physics and Philosophy. New York: Harper.
Helmholtz, H. 1885. On the Sensations of Tone. Translated by A. Ellis. New edition
1954. New York: Dover Publications.
Helmuth, M. 1991. ``Patchmix and StochGran: Two Graphical Interfaces.'' In B.
Alphonce and B. Pennycook, eds. Proceedings of the 1991 International Computer Music
Conference. San Francisco: International Computer Music Association. pp. 563565.
Helmuth, M. 1993. ``Granular Synthesis with Cmix and Max.'' In S. Ohteru, ed. Proceedings of the 1993 International Computer Music Conference. San Francisco: International Computer Music Association. pp. 449452.
Helstrom, C. 1966. ``An expansion of a signal in Gaussian elementary signals.'' IEEE
Transactions on Information Theory IT12: 8182.
Hiller, L., and L. Isaacson. 1959. Experimental Music. New York: McGraw-Hill.
364
References
Hiller, L. 1970. ``Music composed with computersa historical survey.'' In H. Lincoln,
ed. The Computer and Music. Ithaca: Cornell University Press. pp. 4296.
Homann, P. 1994. Amalgam aus Kunst und Wissenschaft. Naturwissenschaftliches
Denken im Werk von Iannis Xenakis. Frankfurt: Peter Lang.
Homann, P. 1996. ``Amalgamation of art and science: scientic thought in the work of
Iannis Xenakis.'' Abbreviated version of Homann (1994) in English. Unpublished
manuscript.
Homann, P. 1997. Personal communication.
Holden, A. 1986. Chaos. Princeton: Princeton University Press.
Howard, E. 1996. Personal communication.
Howe, H. S., Jr. 1975. Electronic Music Synthesis. New York: Norton.
Hugo, V. 1862. Les Miserables. Reprinted 1999. Paris: Gallimard.
Ives, C. 1962. in H. Boatwright, ed. Essays Before a Sonata and Other Writings. New
York: W. W. Norton. p. 98.
Jenny, G. 195556. ``Initiation a la lutherie electronique.'' Toute la Radio (September
1955): 28994, (November 1955): 397404, (December 1955): 4559, January 1956):
236, (February 1956) 6772.
Jenny, G. 1958. ``L'Ondioline: conception et realisation.'' Toute la radio.
Jones, D., and T. Parks. 1986. ``Time scale modication of signals using a synchronous
Gabor technique.'' Proceedings of the 1986 ASSP Workshop on Applications of Signal
Processing to Audio and Acoustics. New York: IEEE.
Jones, D., and T. Parks. 1988. ``Generation and combination of grains for music synthesis.'' Computer Music Journal 12(2): 2734.
Kaegi, W. 1967. Was Ist Elektronische Musik. Zurich: Orell Fussli Verlag.
Kaegi, W. 1973. ``A minimum description of the linguistic sign repertoire (part 1).''
Interface 2: 141156.
Kaegi, W. 1974. ``A minimum description of the linguistic sign repertoire (part 2).''
Interface 3: 137158.
Kaegi, W., and S. Tempelaars. 1978. ``VOSIMa new sound synthesis system.'' Journal
of the Audio Engineering Society 26(6): 418426.
Kaiser, G. 1994. A Friendly Guide to Wavelets. Boston: Birkhauser.
Kaku, M. 1995. Hyperspace. New York: Anchor Books.
365
References
Kaler, J. 1997. Cosmic Clouds. New York: Scientic American Library.
Kandinsky, W. 1926. Point et ligne sur plan. 1991 edition, Paris: Gallimard.
Kargon, R. H. 1966. Atomism in England from Hariot to Newton. Oxford: Clarendon Press.
Keller, D. 1999. Touch 'n Go. Enhanced compact disc with synthesis data. ES 99002.
New Westminster, British Columbia: Earsay.
Keller, D., and C. Rolfe. 1998. ``The corner eect.'' Proceedings of the XII Colloquium
on Musical Informatics. Gorizia: Centro Polifunzionale di Gorizia. Also see Internet:
www.thirdmonk.com.
Keller, D., and B. Truax. 1998. ``Ecologically-based granular synthesis.'' In M. Simoni,
ed. Proceedings of the 1998 International Computer Music Conference. San Francisco:
International Computer Music Association. pp. 117120.
Kientzle, T. 1998. A Programmer's Guide to Sound. Reading, Massachusetts: AddisonWesley.
Koenig, G. M. 1959. ``Studium im Studio.'' die Reihe 5. English edition 1961. Bryn
Mawr: Theodore Presser Company. pp. 3039.
Koenig, G. M. 1962. ``Commentary on Stockhausen's . . . how time passes . . . on Fokker's wherefore, and why, and on present musical practice as seen by the author.'' die
Reihe 8. English edition 1968. Bryn Mawr: Theodore Presser Company. pp. 8098.
Koenig, R. 1899. Articles in Annalen der Physik 69: 626660, 721738. Cited in Miller
1916, 1935.
Koenigsberg, C. 1991. ``Stockhausen's new morphology of musical time.'' Posted June
1996. Internet: www2.uchicago.edu/ns-acs/ckk/smmt.
Krenek, E. 1955. ``A glance over the shoulders of the young.'' die Reihe 1: 1416. English edition 1958. Bryn Mawr: Theodore Presser Company.
Kronland-Martinet, R. 1988. ``The wavelet transform for the analysis, synthesis, and
processing of speech and music sounds.'' Computer Music Journal 12(4): 1120.
Kronland-Martinet, R., and A. Grossman. 1991. ``Application of time-frequency and
time-scale methods (wavelet transforms) to the analysis, synthesis, and transformation of
sounds.'' In G. De Poli, A. Piccialli, and C. Roads, eds. 1991. Representations of Musical
Signals. Cambridge, Massachusetts: MIT Press. pp. 4585.
Kronland-Martinet, R., Ph. Guilliman, and S. Ystad. 1997. ``Modeling of natural
sounds by time-frequency and wavelet representations.'' Organised Sound 2(3): 179191.
Kronland-Martinet, R., J. Morlet, and A. Grossmann. 1987. ``Analysis of sound patterns through wavelet transforms.'' International Journal on Pattern Recognition and
Articial Intelligence 1(2): 273302.
366
References
Kunt, M. 1981. Traitement numerique des signaux. Paris: Dunod.
Kupper, L. 2000. ``Le temps audio-numerique.'' In C. Clozier and F. Barriere, eds.
Les actes d'academie de musique electroacoustique. Bourges: Institute International de
Musique Electoacoustique de Bourges. pp. 94115.
Kussmaul, C. 1991. ``Applications of wavelets in music: the wavelet function library.''
M. A. thesis. Hanover: Dartmouth College.
Langmead, C. 1995. Perceptual Analysis Synthesis Tool 1.6 Manual. Hanover, New
Hampshire: Dartmouth College.
Learned, R., and A. Willsky. 1993. ``Wavelet packet approach to transient signal classication.'' Technical Report LIDSP2199, Laboratory for Information and Decision
Systems. Cambridge: MIT
Lederman, L., and D. Schramm. 1995. Second edition. From Quarks to the Cosmos. San
Francisco: W. H. Freeman.
Lee, F. 1972. ``Time compression and expansion of speech by the sampling method.''
Journal of the Audio Engineering Society 20(9): 738742.
Lee, A. 1995. ``Csound granular synthesis unit generator.'' In Proceedings of the 1995
International Computer Music Conference. San Francisco: International Computer Music Association. pp. 2301.
Leichtentritt, H. 1951. Musical Form. Englewood Clis: Prentice-Hall.
Lent, K. 1989. ``An ecient method for pitch shifting digitally sampled sounds.'' Computer Music Journal 13(4): 6571.
Lesbros, V. 1995. ``Atelier incremental pour la musique experimentale.'' These doctorat
en Intelligence Articielle. Paris: Universite Paris 8.
Lesbros, V. 1996. ``From images to sounds: a dual representation.'' Computer Music
Journal 20(3): 5969.
Liandrat, J., and F. Moret-Bailly. 1990. ``The wavelet transform: some applications to
uid dynamics and turbulence.'' European Journal of Mech., B/Fluids 9: 119.
Ligeti, G. 1971. ``Fragan und Antworden von mir selbst.'' Melos 12: 509516.
Lippe, C. 1993. ``A musical application of real-time granular sampling using the
IRCAM signal processing workstation.'' In S. Ohteru, ed. Proceedings of the 1993 International Computer Music Conference. San Francisco: International Computer Music
Association. pp. 190193.
Lippman, E. 1992. A History of Western Music Aesthetics. Lincoln: University of
Nebraska Press.
367
References
Lopez, S., F. Mart, and E. Resina. 1998. ``Vocem: an application for real-time granular
synthesis.'' In B. Garau and R. Loureiro, eds. Proceedings 98 Digital Audio Eects
Workshop. Barcelona: Pompeu Fabra University. pp. 219222.
Lorrain, D. 1980. ``A panoply of stochastic `cannons'.'' Computer Music Journal 4(1):
5381. Reprinted in C. Roads. 1989. The Music Machine. Cambridge, Massachusetts:
MIT Press. pp. 351379.
Luce, D. 1963. ``Physical correlates of nonpercussive instrument tones.'' Sc.D. dissertation. Cambridge, Massachusetts: MIT Department of Physics.
Lucretius. 55. De Rerum Natura. In J. Gaskin, ed. 1995. The Epicurean Philosophers.
London: Everyman. pp. 79304.
Maconie, R. 1989. Stockhausen on Music. London: Marion Boyars.
Maher, R., and J. Beauchamp. 1990. ``An investigation of vocal vibrato for synthesis.''
Applied Acoustics 30: 219245.
Malah, D. 1979. ``Time-domain algorithms for harmonic bandwidth reduction and time
scaling of speech signals.'' IEEE Transactions on Acoustics, Speech, and Signal Processing 27(4): 121133.
Mallat, G. 1988. ``Review of multifrequency channel decompositions of images and
wavelet models.'' Technical Report No. 412. New York: New York University, Department of Computer Science.
Mallat, G. 1989. ``A theory for multiresolution signal decomposition: the wavelet representation.'' IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7):
674693.
Mallat, S. G. 1998. A Wavelet Tour of Signal Processing. San Diego: Academic Press.
Marino, G., J.-M. Raczinski, and M.-H. Serra. 1990. ``The new UPIC system.'' In
S. Arnold and G. Hair, eds. Proceedings of the 1990 International Computer Music
Conference. San Francisco: International Computer Music Association. pp. 249252.
Marple, S. 1987. Digital Spectral Analysis. Englewood Clis: Prentice-Hall.
Masri, P., A. Bateman, and N. Canagarajah. 1997a. ``A review of time-frequency representations, with an application to sound/music analysis-resynthesis.'' Organised Sound
2(3): 193205.
Masri, P., A. Bateman, and N. Canagarajah. 1997b. ``The importance of the time-frequency representation for sound/music analysis-resynthesis.'' Organised Sound 2(3):
207214.
Mathews, M. 1969. The Technology of Computer Music. Cambridge, Massachusetts:
MIT Press.
368
References
Mathews, M., and J. Miller. 1965. Music IV Programmer's Manual. Murray Hill: Bell
Telephone Laboratories.
Mathews, M., J. Miller, and E. David, Jr. 1961. ``Pitch synchronous analysis of voiced
sounds.'' Journal of the Acoustical Society of America 33: 179186.
MathWorks, The. 1995. Matlab Reference Guide. Natick: The MathWorks.
McAdams, S. 1982. ``Spectral fusion and the creation of auditory images.'' In M.
Clynes, ed. Music, Mind, and Brain. New York: Plenum.
McAdams, S., and A. Bregman. 1979. ``Hearing musical streams.'' Computer Music
Journal 3(4): 2644. Reprinted in C. Roads and J. Strawn, eds. 1985. Foundations of
Computer Music. Cambridge, Massachusetts: MIT Press. pp. 658698.
McAulay, R., and T. Quatieri. 1986. ``Speech analysis/synthesis based on a sinusoidal
representation.'' IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP
34: 744754.
McCartney, J. 1990. Synth-O-Matic version 0.06 software.
McCartney, J. 1994. Synth-O-Matic version 0.45 software.
McCartney, J., 1996. SuperCollider, A Real-time Sound Synthesis Programming Language. Austin: AudioSynth.
McCartney, J. 1998. ``Continued evolution of the SuperCollider real time synthesis
environment.'' In M. Simoni, ed. Proceedings of the 1998 International Computer
Music Conference. San Francisco: International Computer Music Association. pp. 133
136.
Meijer, P. 1998. ``Auditory wavelets?'' Internet: ourworld.compuserve.com/homepages/
Peter_Meijer/wavelet.htm.
Meridian. 1998. ``Meridian lossless packing: provisional data June 1998.'' Huntington:
Boothroyd-Stuart Meridian.
Mersenne, M. 1636. Harmonie Universelle. Reprinted 1957, translated by Roger E.
Chapman. The Hague: Martinus Nijho.
Meyer, Y. 1994. Les ondelettes: algorithmes et applications. Paris: Armand Colin editeur.
Meyer, Y., S. Jaard, and O. Rioul. 1987. ``L'analyse par ondelettes.'' Pour la Science.
September. pp. 2837.
Meyer-Eppler, W. 1959. Grundlagen und Aufwendungen der Informationstheorie.
Springer-Verlag: Berlin.
Meyer-Eppler, W. 1960. ``Zur Systematik der elektrischen Klangtransformation.''
Darmstadter Beitrage zur Neuen Musik III. Mainz: Schott.
369
References
Miller, D. C. 1916. The Science of Musical Sounds. New York: MacMillan.
Miller, H. 1945. The Air-conditioned Nightmare. New York: New Directions.
Miranda, E. 1998. ``Chaosynth: a cellular automata-based granular synthesiser.'' Internet: website.lineone.net/@edandalex/chaosynt.htm.
Moles, A. 1960. Les musiques experimentales. Zurich: Editions du Cercle de l'Art Contemporain.
Moles, A. 1968. Information Theory and Esthetic Perception. Urbana: University of
Illinois Press.
Moog, R. 1996. ``Build the EM Theremin.'' Electronic Musician 12(2): 8699.
Moon, F. 1987. Chaotic Vibrations. New York: Wiley-Interscience.
Moore, F. R. 1982. ``The computer audio research laboratory at UCSD.'' Computer
Music Journal 6(1): 1829.
Moore, F. R. 1990. Elements of Computer Music. Englewood Clis: Prentice-Hall.
Moorer, J. A. 1977a. ``Editing, mixing, and processing digitized audio waveforms.''
Paper presented at the 57th Convention of the Audio Engineering Society. New York:
Audio Engineering Society.
Moorer, J. A. 1977b. ``Signal processing aspects of computer music.'' Proceeding of the
IEEE 65(8): 11081137. Reprinted in Computer Music Journal 1(1): 437 and in J.
Strawn, ed. 1985. Digital Audio Signal Processing: An Anthology. Madison: A-R
Editions.
Moorer, J. A. 1978. ``The use of the phase vocoder in computer music applications.''
Journal of the Audio Engineering Society 26(1/2): 4245.
Morawska-Bungler, M. 1988. Schwingende Elektronen. Cologne: P. J. Tonger.
Morris, R. 1987. Composition with Pitch-Classes. New Haven: Yale University Press.
Nahin, P. 1996. The Science of Radio. Woodbury: American Institute of Physics.
Navarro, R., A. Tabernero, and G. Cristobal. 1995. ``Image representation with Gabor
wavelets and its applications.'' In P. Hawkes, ed. Advances in Imaging and Electron
Physics. Orlando: Academic Press.
Nelson, G. 1997. ``Wind, Sand, and Sea Voyages: an application of granular synthesis
and chaos to musical composition.'' www.timara.oberlin.edu/people/%7Egnelson/
papers/Gola/gola.pdf.
Nelson, Jon Christopher. 1996. They Wash Their Ambassadors in Citrus and Fennel. On
the compact disc Cultures Electroniques 9. LCD 278060/61. Bourges: Serie Bourges/
Unesco/Cime.
370
References
Neve, R. 1992. ``Rupert Neve of Amek replies.'' Studio Sound 34(3): 2122.
Newland, D. 1994. ``Harmonic and musical wavelets.'' Proceedings of the Royal Society
of London A 444: 605620.
Norris, M. 1997. SoundMagic 1.0.3 Documentation. Wellington, New Zealand: Michael
Norris.
Nuttall, A. 1981. ``Some windows with very good sidelobe behavior.'' IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP29(1): 8491.
Nyquist, H. 1928. ``Certain topics in telegraph transmission theory.'' Transactions of the
American Institute of Electrical Engineers 4.
Olson, H. F. 1957. Acoustical Engineering. New York: D. Van Nostrand. Reprinted
1991. Philadephia: Professional Audio Journals.
Oohashi, T., E. Nishina, N. Kawai, Y. Fuwamoto, and H. Imai. 1991. ``High frequency
sound above the audible range aects brain electric activity and sound perception.''
Preprint 3207(W1). Presented at the 91st Convention of the Audio Engineering Society.
New York: Audio Engineering Society.
Oohashi, T., E. Nishina, Y. Fuwamoto, and N. Kawai. 1993. ``On the mechanism of
hypersonic eect.'' In S. Ohteru, ed. Proceedings of the 1993 International Computer
Music Conference. San Francisco: International Computer Music Association. pp. 432
434.
Orcalli, A. 1993. Fenomenologia della Musica Sperimentale. Potenza: Sonus Edizioni
Musicali.
Orton, R., A. Hunt, and R. Kirk. 1991. ``Graphical control of granular synthesis using
cellular automata and the Freehand program.'' In B. Alphonce and B. Pennycook, eds.
Proceedings of the 1991 International Computer Music Conference. San Francisco:
International Computer Music Association. pp. 416418.
Otis, A., G. Grossman, and J. Cuomo. 1968. ``Four sound-processing programs for the
Illiac II computer and D/A converter.'' Experimental Music Studios Technical Report
Number 14. Urbana: University of Illinois.
Pape, G. 1998. Personal communication.
Piccialli, A., S. Cavaliere, I. Ortosecco, and P. Basile. 1992. ``Modications of natural
sounds using a pitch synchronous technique.'' In A. Piccialli, ed. Proceedings of the International Workshop on Models and Representations of Musical Signals. Napoli: Universita di Napoli Federico II.
Pierce, A. D. 1994. Acoustics: An Introduction to Its Physical Principles and Applications.
Woodbury, New York: Acoustical Society of America.
371
References
Pierce, J. R. 1983. The Science of Musical Sound. New York: W. H. Freeman.
Pines, D. 1963. Elementary Excitations in Solids. New York: W. A. Benjamin.
Polansky, L., and T. Erbe. 1996. ``Spectral mutation in SoundHack.'' Computer Music
Journal 20(1): 92101.
Pompei, F. J. 1998. ``The use of airborne ultrasonics for generating audible sound
beams.'' Preprint 4853 (I5). New York: Audio Engineering Society.
Pompei, F. J. 1999. ``The use of airborne ultrasonics for generating audible sound
beams.'' Journal of the Audio Engineering Society 47(9): 726731.
Pope, S. T. 1997. Sound and Music Processing in SuperCollider. Santa Barbara:
www.create.ucsb.edu.
Portno, M. 1976. ``Implementation of the digital phase vocoder using the fast Fourier
transform.'' IEEE Transactions on Acoustics, Speech and Signal Processing 24(3): 243
248.
Portno, M. 1978. ``Time-scale modication of speech based on short-time Fourier
analysis.'' Sc.D. dissertation. Cambridge, Massachusetts: MIT Department of Electrical
Engineering and Computer Science.
Portno, M. 1980. ``Time-frequency representation of digital signals and systems based
on short-time Fourier analysis.'' IEEE Transactions on Acoustics, Speech, and Signal
Processing 28: 5569.
Portno, M. 1981. ``Time-scale modication of speech based on short-time Fourier
analysis.'' IEEE Transactions on Acoustics, Speech, and Signal Processing 29(3): 374
390.
Pound, E. 1934. Quoted in R. Murray Schafer, editor, 1977, Ezra Pound and Music: the
Complete Criticism. New York: New Directions.
Pranger, M. 1999. MarcoHack Version 1 Manual. Internet: www.koncon.nl/MarcoHack.
Quate, C. 1979. ``The acoustic microscope.'' Scientic American. Reprinted in
1998, Science's Vision: The Mechanics of Sight. New York: Scientic American. pp. 31
39.
Rabiner, L., and B. Gold. 1975. Theory and Application of Digital Signal Processing.
Englewood Clis: Prentice-Hall.
Rabiner, L., J. Cooley, H. Helms, L. Jackson, J. Kaiser, C. Rader, R. Schafer, K. Steiglitz, and C. Weinstein. 1972. ``Terminology in digital signal processing.'' IEEE Transactions on Audio and Electroacoustics AU20: 3227.
372
References
Raczinski, J.-M., and G. Marino. 1988. ``A real time synthesis unit.'' In C. Lischka and
J. Fritsch, eds. Proceedings of the 1988 International Computer Music Conference. San
Francisco: International Computer Music Association. pp. 90100.
Raczinski, J.-M., G. Marino, and M.-H. Serra. 1991. ``New UPIC system demonstration.'' In B. Alphonce and B. Pennycook, eds. Proceedings of the 1991 International
Computer Music Conference. San Francisco: International Computer Music Association.
pp. 567570.
Ramarapu, P., and R. Maher. 1997. ``Methods for reducing audible artefacts in a
wavelet-based broad-band denoising system.'' Journal of the Audio Engineering Society
46(3): 178190.
Read, O., and W. Welch. 1976. From Tin Foil to Stereo: Evolution of the Phonograph.
Indianapolis: Howard Sams.
Reder, L., and J. S. Gordon. 1997. ``Subliminal perception: nothing special, cognitively
speaking.'' In J. Cohen and J. Schooler, eds. Cognitive and Neuropsychological
Approaches to the Study of Consciousness. Hillsdale, New Jersey: Lawrence Erlbaum
Associates. pp. 125134.
Reeves, W. 1983. ``Particle systemsa technique for modeling a class of fuzzy objects.''
ACM Transactions on Graphics 2(2): 359376.
Rhea, T. 1972. ``The evolution of electronic musical instruments in the United States.''
Ph.D. dissertation. Nashville: Peabody College.
Rhea, T. 1984. ``The history of electronic musical instruments.'' In T. Darter, ed. 1984.
The Art of Electronic Music. New York: Quill. pp. 163.
Ricci, A. 1997. SoundMaker software. Distributed by MicroMat Computer Systems.
Risset, J.-C. 1966. ``Computer study of trumpet tones.'' Murray Hill: Bell Telephone
Laboratories.
Risset, J.-C. 1969. An Introductory Catalog of Computer Synthesized Sounds (With
Sound Examples). Murray Hill: Bell Laboratories. Reprinted 1995 with the compact
disc The Historical CD of Digital Sound Synthesis, Computer Music Currents 13, Wergo
20332. Mainz: Wergo Schallplatten.
Risset, J.-C. 1985. ``Computer music experiments: 1964. . .'' Computer Music Journal
9(1): 1118. Reprinted in C. Roads, ed. 1989. The Music Machine. Cambridge, Massachusetts: MIT Press. pp. 6774.
Risset, J.-C. 1989a. ``Paradoxical sounds.'' In M. Mathews and J. Pierce, eds. 1989.
Current Directions in Computer Music Research. Cambridge, Massachusetts: The MIT
Press. pp. 149158.
373
References
Risset, J.-C. 1989b. ``Additive synthesis of inharmonic tones.'' In M. Mathews and
J. Pierce, eds. 1989. Current Directions in Computer Music Research. Cambridge,
Massachusetts: The MIT Press. pp. 159163.
Risset, J.-C. 1991. ``Timbre analysis by synthesis: representations, imitations, and
variants for musical composition.'' In G. De Poli, A. Piccialli, and C. Roads, eds.
1991. Representations of Musical Signals. Cambridge, Massachusetts: MIT Press. pp. 7
43.
Risset, J.-C. 1992. ``Composing sounds with computers.'' In J. Paynter, T. Howell, R.
Orton, and P. Seymour, eds. Companion to Contemporary Musical Thought. London:
Routledge. pp. 583621.
Risset, J.-C. 1996. ``Denition de la musique electroacoustique.'' In G. Bennett, C.
Clozier, S. Hanson, C. Roads, and H. Vaggione, eds. Esthetique et Musique Electroacoustique. Bourges: Mnemosyne. pp. 8284.
Risset, J.-C. 1997. ``Problemes d'analyse : quelque cles pour mes premieres pieces.'' In
F. Barriere and G. Bennett, eds. Analyse en Musique Electroacoustique. Bourges:
Editions Mnemosyne. pp. 169177.
Risset, J.-C. 1998. ``Examples of the musical use of digital audio eects.'' In B. Garau
and R. Loureiro, eds. Proceedings 98 Digital Audio Eects Workshop. Barcelona:
Pompeu Fabra University. pp. 254259.
Risset, J.-C. 1999a. Personal communication.
Risset, J.-C. 1999b. Overview of research at the Laboratoire de Mecanique et d'Acoustique. Internet: omicron.cnrs_mrs.fr.
Risset, J.-C., and D. Wessel. 1982. ``Exploration of timbre by analysis and synthesis.'' In
D. Deutsch, ed. 1982. The Psychology of Music. Orlando: Academic. pp. 2558.
Roads, C. 1975. ``Computer music studies 19741975.'' Unpublished manuscript. 44
pages.
Roads, C. 1976. ``A systems approach to composition.'' Honors thesis. La Jolla: University of California, San Diego.
Roads, C. 1978a. ``Automated granular synthesis of sound.'' Computer Music Journal
2(2): 6162. Revised and updated version printed as ``Granular synthesis of sound'' in C.
Roads and J. Strawn, eds. 1985. Foundations of Computer Music. Cambridge, Massachusetts: MIT Press. pp. 145159.
Roads, C. 1978b. ``Interview with Gottfried Michael Koenig.'' Computer Music Journal
2(3): 1115. Reprinted in C. Roads and J. Strawn, eds. 1985. Foundations of Computer
Music. Cambridge, Massachusetts: MIT Press. pp. 568580.
Roads, C., ed. 1985a. Composers and the Computer. Madison: A-R Editions.
374
References
Roads, C. 1985b. ``The realization of nscor.'' In C. Roads, ed. 1985. Composers and the
Computer. Madison: A-R Editions. pp. 140168.
Roads, C. 1985c. ``Granular synthesis of sound.'' In C. Roads and J. Strawn, eds. 1985.
Foundations of Computer Music. Cambridge, Massachusetts: MIT Press. pp. 145159.
Roads, C. 1985d. ``Grammars as representations for music.'' In C. Roads and J. Strawn,
eds. 1985. Foundations of Computer Music. Cambridge, Massachusetts: MIT Press.
pp. 403442.
Roads, C. 1987. ``Experiences with computer-assisted composition.'' Translated as
``Esperienze di composizione assistata da calculatore.'' In S. Tamburini and M. Bagella,
eds. I Proli del Suono. Rome: Musica Verticale and Galzeramo. pp. 173196.
Roads, C. 1991. ``Asynchronous granular synthesis.'' In G. De Poli, A. Piccialli, and
C. Roads, eds. 1991. Representations of Musical Signals. Cambridge, Massachusetts:
MIT Press. pp. 143185.
Roads, C. 1992a. ``Composition with machines.'' In J. Paynter, T. Howell, R. Orton,
and P. Seymour, eds. Companion to Contemporary Musical Thought. London: Routledge. pp. 399425.
Roads, C. 1992b. ``Musical applications of advanced signal transformations.'' In
A. Piccialli, ed. Proceedings of the Capri Workshop on Models and Representations of
Musical Signals. Naples: University of Naples Federico II, Department of Physics.
Roads, C. 1992c. ``Synthulate.'' Software documentation. Unpublished.
Roads, C. 1992d. ``Granulateur.'' Software documentation. Unpublished.
Roads, C. 19921997. ``Design of a granular synthesizer.'' Unpublished design documents.
Roads, C. 1993a. ``Musical sound transformation by convolution.'' In S. Ohteru, ed.
Proceedings of the 1993 International Computer Music Conference. San Francisco:
International Computer Music Association. pp. 102109.
Roads, C. 1993b. ``Organization of Clang-tint.'' In S. Ohteru, ed. Proceedings of the
1993 International Computer Music Conference. San Francisco: International Computer
Music Association. pp. 346348.
Roads, C. 1996. The Computer Music Tutorial. Cambridge, Massachusetts: MIT Press.
Roads, C. 1997. ``Sound transformation by convolution.'' In C. Roads, S. Pope,
A. Piccialli, and G. De Poli, eds. Musical Signal Processing. Lisse: Swets & Zeitlinger.
Roads, C. 1998a. ``Espace musical : virtuel et physique.'' In F. Barriere and G. Bennett,
eds. Composition/Diusion en Musique Electroacoustique. Actes III. Academie Internationale de Musique Electroacoustique. Bourges: Editions Mnemosyne. pp. 158160.
375
References
Roads, C. 1998b. ``The Creatovox synthesizer project.'' Unpublished manuscript.
Roads, C. 1999. ``Synthese et transformation des microsons.'' Doctoral thesis. Paris:
Department of Music, University of Paris VIII.
Roads, C. 2000. ``Notes on the history of sound in space.'' Presented at the CREATE
symposium SOUND IN SPACE 2000, University of California, Santa Barbara. Revised
version forthcoming in Computer Music Journal.
Roads, C. 2001. ``Sound composition with pulsars.'' Journal of the Audio Engineering
Society 49(3).
Roads, C., and J. Alexander. 1995. Cloud Generator Manual. Distributed with the program Cloud Generator. Internet: www.create.ucsb.edu.
Roads, C., J. Kuchera-Morin, and S. Pope. 1997. ``The Creatophone sound spatializer
project.'' Unpublished research proposal.
Robindore, B. 1996a. ``Eskhate Ereuna: extending the limits of musical thoughtcomments on and by Iannis Xenakis.'' Computer Music Journal 20(4): 1116.
Robindore, B. 1996b. Program notes. Computer Music Journal compact disc. Volume
20, 1996.
Robindore, B. 1998. ``Interview with an intimate iconoclast.'' Computer Music Journal
23(3): 816.
Rocha, M. 1999. ``Les techniques granulaires dans la synthese sonore.'' Doctoral thesis.
Paris: Universite de Paris VIII.
Rodet, X. 1980. ``Time-domain formant-wave-function synthesis.'' In J. G. Simon, ed.
1980. Spoken Language Generation and Understanding. Dordrecht: D. Reidel. Reprinted
in Computer Music Journal 8(3): 914. 1984.
Rodet, X., Y. Potard, and J.-B. Barriere. 1984. ``The CHANT project: from synthesis of
the singing voice to synthesis in general.'' Computer Music Journal 8(3): 1531.
Reprinted in C. Roads, ed. 1989. The Music Machine. Cambridge, Massachusetts: MIT
Press.
Russolo, L. 1916. The Art of Noises. Translated 1986 by Barclay Brown. New York:
Pendragon Press.
Scaletti, C. 1996. ``Description of Public Organ.'' Internet: www.symbolicsound.com.
Schaeer, P. 1959. ``The interplay between music and acoustics.'' Gravensaner Blatter
14: 6169.
Schaeer, P. 1977. Traite des objets musicaux. Second edition. Paris: Editions du Seuil.
la recherche d'une musique concrete. Paris: Editions
Schaeer, P., and A. Moles. 1952. A
du Seuil.
376
References
Schaeer, P., G. Reibel, and B. Ferreyra. 1967. Trois microsillons d'exemples sonores de
G. Reibel et Beatriz Ferreyra illustrant le Traite des Objets Sonores et presentes par l'auteur. Paris: Editions du Seuil.
Schafer, R. M. 1977. The Tuning of the World. New York: Knopf.
Schafer, R., and L. Rabiner. 1973. ``Design and simulation of a speech analysis-synthesis
system based on short-time Fourier analysis.'' IEEE Transactions on Audio and Electroacoustics AU21: 165174.
Schillinger, J. 1946. The Schillinger System of Musical Composition. New York: Carl
Fischer. Reprinted 1978. New York: Da Capo Press.
Schindler, A. 1998. Eastman Csound Tutorial. Internet: www.esm.rochester.edu/onlinedocs/allan.cs.
Schnell, N. 1995. ``GRAINYgranularsynthese in Echtzeit.'' Beitrage zur Electronischen Musik 4. Fraz: Institut fur Elektronische Musik, Hochschule fur Musik und
Darstellende Kunst.
Schoenberg A. 1967. Fundamentals of Music Composition. London: Faber and Faber.
Schopenhauer, A. 1819. ``Art and the art of music.'' From The World of Will and Idea.
Translated by R. Haldane and J. Kemp, reprinted in J. Randall et al. 1946. Readings in
Philosophy. New York: Barnes and Noble. pp. 246254.
Schottstaedt, W. 2000. Common Lisp Music Manual. Online documentation. Internet:
www-ccrma.stanford.edu/CCRMA/Software/clm/clm-manual/clm.html.
Schroeder, M., and B. S. Atal. 1962. ``Generalized short-time power spectra and autocorrelation functions.'' Journal of the Acoustical Society of America 34: 16791683.
Scott, R., and S. Gerber. 1972. ``Pitch-synchronous time-compression of speech.'' Proceedings of the IEEE Conference for Speech Communication Processing. New York:
IEEE. 6365.
Serra, M.-H. 1992. ``Stochastic Composition and Stochastic Timbre: Gendy3 by lannis
Xenakis'' Perspectives of New Music 31.
Serra, M.-H. 1997. ``Introducing the phase vocoder.'' In C. Roads, S. Pope, A. Piccialli,
and G. De Poli, eds. Musical Signal Processing. Lisse: Swets and Zeitlinger. pp. 3190.
Serra, X. 1989. ``A system for sound analysis/transformation/synthesis based on a
deterministic plus stochastic decomposition.'' Stanford: Center for Computer Research
in Music and Acoustics, Department of Music, Stanford University.
Serra, X., and J. Smith. 1990. ``Spectral modeling synthesis: a sound analysis/synthesis
system based on a deterministic plus stochastic decomposition.'' Computer Music Journal 14(4): 1224.
377
References
Shensa, M. 1992. ``The discrete wavelet transform: wedding the a trous and Mallat
algorithms.'' IEEE Transactions on Signal Processing 40(10): 24642482.
Smalley, D. 1986. ``Spectro-morphology and structuring processes.'' In S. Emmerson,
ed. The Language of Electroacoustic Music. London: Macmillan. pp. 6193.
Smalley, D. 1997. ``Spectromorphology: explaining sound shapes.'' Organised Sound
2(2): 107126.
Smith, L. 1996. ``Modelling rhythm perception by continuous time-frequency analysis.''
In L. Ayers and A. Horner, eds. Proceedings of the 1996 International Computer Music
Conference. San Francisco: International Computer Music Association. pp. 392395.
Solomos, M. 1997. Program notes to Xenakis: Electronic Music. Compact disc CD 003.
Albany: Electronic Music Foundation.
Sprenger, S. 1999. ``Time and pitch scaling of audio signals.'' Internet: www.dspdimension.com/html/timepitch.html.
Steiglitz, K. 1996. A Digital Signal Processing Primer. Menlo Park: Addison-Wesley.
Steinberg. 1998. ``Cubase VST/24 audio recording.'' Karlsruhe: Steinberg Soft- and
Hardware GmbH.
Stevenson, R., and R. B. Moore. 1967. Theory of Physics. Philadelphia: W. B. Saunders.
Stockham, T. 1969. ``High-speed convolution and convolution with applications to digital ltering.'' In B. Gold and C. Rader. 1969. Digital Processing of Signals. New York:
McGraw-Hill. pp. 203232.
Stockhausen, K. 1955. ``Actualia.'' die Reihe 1. English edition. Bryn Mawr: Theodore
Presser Company. pp. 4551.
Stockhausen, K. 1957. ``. . . How time passes . . .'' die Reihe 3: 1043. English edition
translated by Cornelius Cardew. 1959. Reprinted with revisions as ``. . . wie die Zeit
vergeht . . .'' in K. Stockhausen. 1963. Texte zur elektronischen und instrumentalen
Musik. Band 1. Cologne: DuMont Schauberg: pp. 99139.
Stockhausen, K. 1961. ``Two lectures.'' die Reihe 5. English edition. Bryn Mawr: Theodore Presser Company. pp. 5982.
Stockhausen, K. 1962. ``Die Einheit der musikalischen Zeit.'' [The unity of musical
time.] Translated by E. Barkin as ``The concept of unity in electronic music.'' Perspectives of New Music 1(1): 39. Reprinted in B. Boretz and E. Cone, eds. 1972. Perspectives on Contemporary Music Theory. New York: Norton. pp. 129147. German
version published in Stockhausen (1963), pp. 211221.
Stockhausen, K. 1963. Texte zur elektronischen und instrumentalen Musik. Band 1.
Cologne: DuMont Schauberg.
378
References
Strang, G. 1994. A Friendly Guide to Wavelets. Boston: Birkhauser.
Stravinsky, I. 1936. An Autobiography. New York: W. W. Norton.
Stravinsky, I. 1947. Poetics of Music. Cambridge, Massachusetts: Harvard University
Press.
Strawn, J. 1985. ``Modelling musical transitions.'' Ph.D. dissertation. Stanford: Stanford
University Department of Music.
Stuckenschmidt, H. H. 1969. Twentieth Century Music. New York: McGraw-Hill.
Sturm, B. 1999. ``A potential G.U.T. of signal synthesis.'' Internet: www-ccrma.stanford.edu/@sturm/research.html.
Supper, M. 1997. Elektroakustische Music und Computermusic. Hofheim: Wolke Verlag.
Sussman, G., and Steele, G. 1981. ``Constraints: A language for expresssing almosthierarchical descriptions.'' A.I. Memo 502A. Articial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts Reprinted in Articial
Intelligence 14: 139.
Tait, C. 1995. ``Audio analysis for rhythmic structure.'' Proceedings of the 1995 International Computer Music Conference. San Francisco: International Computer Music
Association. pp. 5901.
Tempelaars, S. 1976. ``The VOSIM oscillator.'' Presented at the 1976 International
Computer Music Conference, MIT, Cambridge, Massachusetts, 2831 October.
Tempelaars, S. 1977. Sound Signal Processing. Translated by Ruth Koenig. Utrecht:
Institute of Sonology.
Tempelaars, S. 1996. Signal Processing, Speech, and Music. Lisse: Swets and Zeitlinger.
Todoro, T. 1995. ``Real-time granular morphing and spatialization of sounds with
gestural control within Max/FTS.'' In R. Bidlack, ed. Proceedings of the 1995 International Computer Music Conference. San Francisco: International Computer Music
Association. pp. 315318.
Torresani, B. 1995. Analyse continue par ondelettes. Paris: InterEditions.
Truax, B. 1984. Acoustic Communication. Norwood, New Jersey: Ablex Publishing.
Truax, B. 1986. ``Real-time granular synthesis with the DMX-1000.'' In P. Berg, ed.
1987. Proceedings of the 1986 International Computer Music Conference. San Francisco:
Computer Music Association. pp. 138145.
Truax, B. 1987. ``Real-time granulation of sampled sound.'' In S. Tipei and J. Beauchamp, eds. 1987. Proceedings of the 1987 International Computer Music Conference.
San Francisco: Computer Music Association. pp. 138145.
379
References
Truax, B. 1988. ``Real-time granular synthesis with a digital signal processing computer.'' Computer Music Journal 12(2): 1426.
Truax, B. 1990a. ``Time-shifting of sampled sound sound using a real-time granulation
technique.'' In Proceedings of the 1990 International Computer Music Conference. San
Francisco: Computer Music Association. pp. 104107.
Truax, B. 1990b. ``Composing with real-time granular sound. Perspectives of New Music
28(2): 120134.
Truax, B. 1992. ``Composition with time-shifted environmental sound.'' Leonardo Music
Journal 2(1): 3740.
Truax, B. 1994a. ``Discovering inner complexity: time-shifting and transposition with a
real-time granulation technique.'' Computer Music Journal 18(2): 3848.
Truax, B. 1994b. ``Granulation and time-shifting of sampled sound in real-time with a
quad DSP audio computer system.'' In Proceedings of the 1994 International Computer
Music Association. San Francisco: Computer Music Association. pp. 335337.
Truax, B. 1995. ``Sound in context: soundscape research and composition at Simon
Fraser University.'' In Proceedings of the 1995 International Computer Music Conference. San Francisco: International Computer Music Association. pp. 14.
Truax, B. 1996a. ``Time-stretching of hyper-resonated sound using a real-time granulation technique. In Proceedings of the 1996 International Computer Music Conference.
San Francisco: International Computer Music Association. pp. 491492.
Truax, B. 1996b. ``Soundscape, acoustic communication, and environmental sound
composition.'' Contemporary Music Review 15(1): 4965.
Tyndall, J. 1875. Sound. Akron: Werner.
Ungeheuer, E. 1992. Wie de elektronische Musik ``erfunden'' wurde . . . Mainz: B.
Schott's Sohne.
Vaggione, H. 1984. ``The making of Octuor.'' Computer Music Journal 8(2): 4854.
Reprinted in C. Roads, ed. 1989. The Music Machine. Cambridge, Massachusetts: The
MIT Press. pp. 149155.
Vaggione, H. 1996a. ``Autour de l'approche electroacoustique : situations, perspectives.''
In G. Bennett, C. Clozier, S. Hanson, C. Roads, and H. Vaggione, eds. Esthetique et
Musique Electroacoustique. Bourges: Editions Mnemosyne. pp. 101108.
Vaggione, H. 1996b. ``Articulating micro-time.'' Computer Music Journal 20(2): 3338.
Vaggione, H. 1996c. ``Vers une approche transformationelle en CAO.'' Actes des JIM
1996. Les cahiers du GREYC. Caen: CNRS-Universite de Caen. Internet: www.
ircam.fr/equipes/repmus/jim96/actes/Vaggione/VaggioneTEXT.html.
380
References
Vaggione, H. 1997. ``Singularite de la musique et analyse: l'espace d'intersection.'' In F.
Barriere and C. Clozier, eds. Analyse en Musique Electroacoustique. Bourges: Editions:
Mnemosyne. pp. 7481.
Vaggione, H. 1998. ``Transformations morphologiques: quelques exemples.'' Actes des
JIM 1998.
Vaggione, H. 1999. Personal communication.
Varese, E. 1940. ``Organized sound for the sound lm.'' Quoted in Miller (1945).
Varese, E. 1962. From a lecture given at Yale University. Reprinted in Varese (1971).
Varese, E. 1971. ``The liberation of sound.'' In C. Boretz and E. Cone, eds. Perspectives
on American Composers. New York: Norton. pp. 2634.
Vercoe, B. 1993. Csound: A Manual for the Audio Processing System and Supporting
Programs with Tutorials. Cambridge, Massachusetts: MIT Media Laboratory.
Vetterli, M., and C. Herley. 1992. ``Wavelets and lter banks: theory and design.'' IEEE
Transactions on Signal Processing ASSP40(9): 22072234.
Walker, B., and K. Fitz. 1992. Lemur Manual. Urbana: CERL Sound Group, University of Illinois.
Wannamaker, R., and E. Vrscay. 1997. ``Fractal wavelet compression of audio signals.''
Journal of the Audio Engineering Society 45(7/8): 540553.
Weare, C. 1997. Personal communication.
Weidenaar, R. 1989. ``The Telharmonium: A History of the First Music Synthesizer,
18931918.'' Ph.D. dissertation. New York: New York University.
Weidenaar, R. 1995. Magic Music from the Telharmonium. Metuchen: Scarecrow Press.
Weinberg, S. 1983. The Discovery of Subatomic Particles. New York: W.H. Freeman.
Wenger, E., and E. Spiegel. 1999. MetaSynth Manual. San Francisco: U&I Software.
Whitehouse, D. 1999. ``Haunting sound from the dawn of time.'' British Broadcasting Company. 23 September 1999. Internet: news.bbc.co.uk/hi/english/sci/tech/
newsid_454000/454594.stm
Whiteld, J. 1978. ``The neural code.'' In E. Carterette and M. Friedman, eds. 1978.
Handbook of Perception, vol. IV, Hearing. Orlando: Academic.
Wickerhauser, V. 1989. ``Acoustic signal compression with wave packets.'' New Haven:
Department of Mathematics, Yale University.
Wiener, N. 1964. I Am a Mathematician. Cambridge, Massachusetts: MIT Press.
381
References
Wiener, N. 1964. ``Spatial-temporal continuity, quantum theory, and music.'' In M.
Capek, ed. 1975. The Concepts of Space and Time. Boston: D. Reidel.
Winckel, F. 1967. Music, Sound, and Sensation. New York: Dover Publications.
Winham, G. 1966. The Reference Manual for Music 4B. Princeton: Princeton University
Music Department.
Winham, G., and K. Steiglitz. 1970. ``Input generators for digital sound synthesis.''
Journal of the Acoustical Society of America 47(2, Part 2): 665666.
Winston, P. 1984. Articial Intelligence. Second edition. Reading, Massachusetts:
Addison-Wesley.
Wishart, T. 1994. Audible Design. York: Orpheus the Pantomime.
Wishart, T. 1996. Program notes for Tongues of Fire. Internet: www.aec.at/prix/1995/
E95gnM-tongues.html.
Xenakis, I. 1960. ``Elements of stochastic music.'' Gravensaner Blatter 18: 84105.
Xenakis, I. 1971. Formalized Music. Bloomington: Indiana University Press.
Xenakis, I. 1989. ``Concerning time.'' Perspectives of New Music 27(1): 8492.
Reprinted in French as ``Sur le temps'' in I. Xenakis, 1994, Keleutha. Paris: L'Arche.
Xenakis, I. 1992. Formalized Music. Revised edition. New York: Pendragon Press.
Yermakov, A. 1999. SoundFront documentation. Internet: www.kagi.com
384
Appendix A
385
Density
Grain density is a function of grains per second. A synchronous cloud generates metrical
rhythmic sequences when grain densities are in the range of between two and twenty
grains per second. When the initial density is not the same as the nal density, this
creates an acceleration or deceleration eect.
386
Appendix A
For asynchronous clouds, a variation between the initial and nal density produces an
irregular acceleration or deceleration. To create continuous noise sounds, one sets the
density to above one hundred grains per second.
Bandlimits
Bandlimits describe the frequencies of the four corners of a trapezoidal region inscribed
on the frequency-versus-time plane. The grains scatter within the bounds of this region.
To create a line of grains at a xed frequency, one sets all the bandlimits to the same
frequency, for example, 440 Hz. To scatter grains in a rectangular region between 80 Hz
and 300 Hz, one sets both high bandlimits to 300, and both low bandlimits to 80. For
a cloud that converges to 300 Hz, one would set both nal bandlimits to 300. (See
gure 3.6.)
Bandlimits apply only in the case of a cloud lled by synthetic grains, where the
selected waveform is of the type ``Synthetic.'' They have no meaning when the cloud is a
granulated sound le.
Cloud Type
This parameter species the temporal distribution of the grains, either synchronous or
asynchronous. Synchronous generation means that one grain follows another in series,
with the distance between the grain attacks determined by the density parameter. For
example, a synchronous cloud with a density of ve grains per sec creates quintuplets at
a tempo of M.M. 60. The spacing between grain attacks is 200 ms. Asynchronous clouds
scatter grains at random time points within the specied boundaries of the cloud, at the
specied density. Asynchronous clouds produce aleatoric and explosive eects.
Stereo Location
The stereo location potentiometers provide an intuitive interface for the initial and nal
position of the sound cloud. By clicking on either the initial or nal Random box the
grain scatters to a random position in stereo space. This randomization articulates
the granular texture of a cloud. It enhances the impression that the cloud is a threedimensional object in space.
Grain Duration
Grain duration has a strong inuence on the timbre of the resulting cloud. Grain duration is a value in seconds. Thus a value of 0.1 equals 100 ms. The initial grain duration
applies at the beginning of the cloud. The nal grain grain duration applies at the end. If
the Random box is checked, the grain duration parameters switch from ``Initial'' to
``Minimum'' and from ``Final'' to ``Maximum.'' In this case, the grain duration is a
random function between the specied minimum and maximum values.
387
Selection Order
The Selection Order parameter applies only to granulated clouds. It determines the
order in which grains are selected from the input sound le. Three options present
themselves:
1. Randomselect input grains from random points in the input sound le.
2. Statistical evolutionselect input grains in a more-or-less left-to-right order. That is,
at the beginning of the cloud there is a high probability that grains derive from the
beginning of the input le; at the end of the cloud there is a high probability that
grains derive from the end of the input le.
3. Deterministic progressionselect input grains in a strictly left-to-right order.
Option 1 results in a scrambled version of the input. Option 3 preserves the temporal
order of the original (we use this mode when time-shrinking or time-stretching the input
sound le). Option 2 is a blend of options 1 and 3.
Waveform
The waveform parameter has several controls. The rst is a choice between Synthetic or
Granulated source waveforms. The name of the chosen waveform appears in the small
box below the buttons.
1
In a synthetic cloud, an oscillator reads a wavetable containing one period of a waveform at a frequency within the high and low bandlimit boundaries.
In a granulated cloud, the grains are extracted from a stereo input sound le of any
length. The extracted grains are scattered in time without pitch transposition. The
bandlimit parameters are not operational in a granulated cloud.
The Select button brings up a dialog box that depends on whether the cloud is synthetic or granulated. If it is synthetic, the dialog box displays ve choices:
1. Sine
2. Sawtooth
3. Square
4. User-drawnone can freely draw in the waveform editor
5. Imported from a sound le. The imported le must be exactly 2048 samples (46 ms)
in length. If the waveform is extracted from a sampled sound le, this waveform will
repeat at its extracted frequency only if the cloud bandlimits are set to 21 Hz. Any
frequency above this transposes the waveform
If the cloud is granulated, the Select button displays an Open File dialog box. The user
is invited to select a stereo sound le to granulate.
388
Appendix A
Text/AIFF Output
This option lets one save the cloud data in the form of numerical text. This text could be
read by a plotting program to make a graphical score. Clicking on AIFF creates a sound
le as the output of the program's calculations. This is the normal mode.
This recording documents the history of synthesis with microsonic particles and presents
artistic excerpts of music compositions realized with these techniques. It also serves as an
audio notebook of research experiments in this domain.
Tracks 1 through 3 present the historical examples. Tracks 4 through 13 present musical excerpts. Tracks 14 through 44 present an audio notebook of research experiments
in particle synthesis. Tracks 45 through 68 present an audio notebook of experiments in
microsonic sound transformation.
Historical Examples
1. Three excerpts of Analogique A et B for string orchestra and analog granular synthesis by Iannis Xenakis. The piece was composed in 1958 and 1959. The composition
opposes part A for string orchestra, and part B for tape. (a) Tape part alone. (b) Tape
with instruments in a busy section. (c) Tape with instruments in a sparse section.
Reproduced with the permission of Iannis Xenakis and Editions Salabert, 22 rue
Chauchat, 75009 Paris, France, www.salabert.fr. [1:03]
2. Klang-1 (Curtis Roads, December 1974), rst granular synthesis experiment by computer. The recording is distorted, due to technical factors. [0:48]
3. Excerpt of Prototype (Curtis Roads, April 1975), rst study in automated granular
synthesis. [0:26]
390
Appendix B
6. Excerpt of Organic (Curtis Roads, 1994). This excerpt [1:552:49] opens with an
asynchronous pulsar cloud. All of the beating and pulsating sounds are pulsars.
[0:58]
7. Excerpt of Half-life (Curtis Roads, 1999). This excerpt [0:001:04] from the rst
movement, entitled Sonal Atoms, consists of a pulsar stream that has been granulated and then extensively edited on a microsonic time scale. [1:09]
8. Excerpt of Tenth vortex (Curtis Roads, 2000). This excerpt [0:001:29] consists of a
constantly evolving cloud of grains produced by my constant-Q granulation program applied to a stream of synthetic pulsars. [1:28]
9. Excerpt of Eleventh vortex (Curtis Roads, 2001). The opening of this excerpt [0:00
1:30] features multiple synchronous pulsar streams, each at a separate tempo. The
thick stream of particles thins into a series of discrete particles about 40 seconds into
the piece. [1:05]
10. Excerpt of Sculptor (Curtis Roads, 2001). The source material of Sculptor was a
driving percussion piece by the group Tortoise, sent to me for processing in August
2000 by John McEntire. I granulated and ltered the material by means of the
constant-Q granulator, which disintegrated the beating drums into a torrent of
sound particles. Beginning with this whirling sound mass, I articulated the internal
morphologies and implied causalities within the current of particles. This meant
shaping the river of particle densities, squeezing and stretching the amplitudes of
individual particles and particle clouds, carving connected and disconnected frequency zones, and twisting the spatial ow. Over months of intermittent editing on
dierent time scales, I was able to sculpt this material into its current form. The
composition was completed in July 2001 in Santa Barbara. [1:31]
11. Excerpt of Agon (Horacio Vaggione, 1998). Agon is one of a series of works by
Horacio Vaggione that explore the concept of composition on multiple time scales.
The original sound sources were acoustic percussion instruments, which have been
broken down into microsonic particles and reassembled by the composer in a technique of micromontage. [1:08]
12. Excerpt of Life in the Universe (Ken Fields, 1997). This excerpt [0:001:36] features
the granulation of the synthetic voice of physicist Steven Hawking. It appeared on a
CD-ROM entitled Steven Hawking Life in the Universe that was marketed by MetaCreations. [1:42]
13. Three excerpts from the electronic part of Paleo for double bass and CD. (JoAnn
Kuchera-Morin, 2000) demonstrating the use the phase vocoder to time scale
sounds. (a) The rst 45 seconds of the computer-generated portion of the work. The
sound consists of processed double bass and dijeridu string Csound instruments. No
time scaling. (b) From the middle section of the work, a passage time-stretched by a
factor of 7.9. (c) From the middle section of the work, a passage time-shrunk by a
factor of 0.5. [2:36]
391
Glisson Synthesis
21. Isolated glissons. (a) Two 250 ms glissons, one sweeping from 1000 Hz to 500 Hz,
the other in the opposite direction. (b) Two 56 ms glissons, sweeping from 55 Hz to
1400 Hz and the other in the opposite direction. (c) Two sets of ``bird call'' glissons.
[0:11]
392
Appendix B
22. Trains of 6 ms glissons with small frequency deviations centered at 1450 Hz and 400
Hz respectively. [0:07]
23. Seven glisson clouds. (a) High-density cloud, wide frequency sweep. (b) Long glissons, high density cloud, converging on 550 Hz. (c) Long glissons converging on
1100 Hz. (d) High-density glisson cloud sweeping from 6 to 2000 Hz. (e) Long glissons. (f ) Short glissons. (g) High-frequency glissons. [1:46]
Grainlet Synthesis
Experiments in grainlet synthesis, 19961998. These experiments demonstrate linkages
between grainlet frequency and grainlet duration. All experiments realize asynchronous
(non-metric) clouds. I have added reverberation and some editing eects, primarily particle replication.
24. Sparse cloud in which high frequency grainlets are short, while low-frequency
grainlets are long. [0:10]
25. Four clouds. (a) One cycle (fundamental period) per grainlet. (2) Four cycles per
grainlet. (3) Eight cycles per grainlet. (d) Sixteen cycles per grainlet. [0:33]
26. Medium-density grainlet cloud, with short high frequencies and long low frequencies. [0:23]
27. A cloud with a frequency pole at 440 Hz, making grainlets around that frequency
much longer than low frequency and high frequency grainlets. [0:13]
28. Three high-density (90 particles per second) grainlet clouds. (a) Two cycles per
grainlet. (b) Four cycles per grainlet. (c) Twenty cycles per grainlet. [0:20]
29. Six high-density grainlet experiments.
[1:06]
Trainlet Synthesis
Experiments in trainlet synthesis, 19992000.
30. Individual trainlets. (a) Gbuzz test demonstrating increasing ``lowest harmonic.''
(b) 40 Hz trainlet with 16 harmonics and chroma 1.0. (c) 600 Hz trainlet with
chroma 1.0. (d) 600 Hz trainlet with chroma 0.5. [0:34]
31. Sparse trainlet phrases. [1:00]
32. Dense trainlet clouds.
[1:02]
Pulsar Synthesis
Experiments in pulsar synthesis, 19912001.
33. Cosmic pulsar from the neutron star PSR 0329+54. [0:07]
393
394
Appendix B
43. Transient transformation, in which a single transient event is extracted from a longer
sound and then reshaped into dierent sounds through editing operations. Each example is separated by two seconds of silence. (a) The original source, a noisy onesecond sound recorded by a hydrophone. (b) 125-ms extraction from (a). (c) The
same event as (b), pitch-shifted up an octave, cloned ten times, and shaped with an
exponential decay envelope. (d) The same event as (b), strongly ltered to create a
colored tone. (e) The same event as (b) reversed and re-enveloped into a percussive
sound. (f ) The same event as (e), pitch-shifted and replicated to form two pitched
tones. [0:18]
Granulation
46. Basic granulations. (a) Original spoken Italian utterance. (b) ``Robotic'' granulation
transformation by layering several copies slightly delayed. (c) ``Thickening'' with
increasing delay. (c) ``Scrambling'' with large grains. (d) Scrambling with mediumsized grains at medium density. (e) Scrambling with tiny grains and high density,
going to long grains and low density. (f ) Granulation with pitch shifting for a chorus
eect. (g) ``Graverberation'' (granular reverberation) eects realized with the SampleHose by Michael Norris. [1:34]
47. Timbral enrichment by particle envelope manipulation. (a) Edgard Varese speaking.
(b) Change in formant location caused by many overlapping grains. (c) Short grains
make the speech noisy. [0:44]
395
396
Appendix B
Convolution of Microsounds
Experiments in convolving sound particles with other sounds.
55. Granular convolution examples. (a) Synchronous grain cloud of grains lasting 7
seconds. The grain durations are between 1 and 9 ms. (b) Snare drum rim shot. (c)
Convolution of (a) and (b). (d) Convolution of (b) with an asynchronous grain cloud
with increasing density up to 90 grains per second. (e) 12-second synchronous cloud
of sinusoidal grains at a density of 7 grains per second. (f ) Swirling electronic texture
lasting 7 seconds. (g) Convolution of (f ) and (g), which extends the duration to 19
seconds. [0:16]
56. Convolutions with a dense cloud of sonic grains to create quasi-reverberation eects.
Convolution of Italian spoken phrase ``Lezione numero undice, l'ora'' with a dense
cloud of Gaussian grains. The convolution window is of the Hamming type. [1:25]
57. Pulsar convolution example. (a) Italian word ``qui.'' (b) Pulsar train. (c) Convolution of (a) and (b). [0:26]
397
Sonographic Transformations
62. Sonogram ltering. (a) Voice of Edgard Varese. (b) Voice ltered by inscribing multiple regions on a sonogram and reducing them by 24 dB. [0:26]
Wavelet-Based Transformations
68. Comb wavelet separation of harmonic and inharmonic components. (a) Violin note.
(b) Harmonic part of violin tone. (c) Inharmonic part of violin tone. (d) Low piano
tone. (e) Harmonic part of piano tone. (f ) Inharmonic part of piano tone. [0:19]
Name Index
400
Name Index
Ferrari, L., 232, 312
Fields, K., 321
Fokker, A., 81
Fourier, J., 243
Fritton, W., 56
Fry, C., 190
Gabor, D., vii, 27, 54, 5760, 62, 84, 86,
250, 2956, 300, 302, 349
Galileo, G., 53
Gassendi, P., 51
Goeyvaerts, K., 71
Gredinger, P., 71
Hawking, S., 321
Heisenberg, W., 250
Helmholtz, H., 17
Helmuth, M., 113
Heraclitus, 14
Hiller, L., 337
Hinton, M., 112
Howard, E., 116
Huygens, C., 50
Kandinsky, W., 328
Kavina, L., 82
Keagi, W., 69
Keller, D., 97, 115, 322
Kelvin, L., 243
Koenig, G. M., 29, 68, 71, 812, 131, 138
Koenigsberg, C., 81
Krenek, E., 71
Kronland-Martinet, R., 288
Kuchera-Morin, J., ix, 318
Kussmaul, C., x
La Barbara, J., 321
Lansky, P., 186, 317
Lavoisier, A., 192
Le Duc, J.-C., 231
Leibig, B., 109
Lesbros, V., 15960
Ligeti, G., 15, 83, 332
Lippe, C., 113
Lopez, S., 114
Lucchese, L., x
Lucretius, 51
Mallat, S., 284
Mart, F., 114
Mauchly, W., 115
Maxwell, J., 50
McCartney, J., 113, 193
McLaren, N., 157
Mersenne, M., 51, 53
Meyer-Eppler, W., 24, 62, 69, 300
Miranda, E., 116
Moles, A., 63
Morlet, J., 284
Mozart, W., 11, 168, 345
Nelson, G., 112
Nelson, J., 321
Newton, I., 49, 63
Norris, M., 116
Omsky, J., x
Pape, G., ix, 319
Parmegiani, B., 15
Penderecki, W., 83
Piccialli, A., x, 112
Pope, S., ix, 122
Poullin, J., 302
Pound, E., 556
Pousseur, H., 316
Rayleigh, L., 173
Reihn, R., 317
Resina, E., 114
Risset, J.-C., viii, ix, 2989
Roads, M., x
Robindore, B., x, 320
Rocha, M., 321
Rockmore, C., 82
Roget, P., 56
Rolfe, C., 115
Roy, P., x
Roy, S., 318
Russolo, L., 233, 327
401
Name Index
Scaletti, C., 323
Schaeer, P., 17, 20, 61, 181, 302, 328
Scherchen, H., 62
Schnell, N., 114
Schoeller, P., 319
Schoenberg, A., 328
Scott, D., 318
Serra, X., 278
Sery, D., x
Seurat, G., 171
Sica, G., x, 319
Smalley, D., 14
Smith, L., 298
Starn brothers, 308
Stockhausen, K., 11, 56, 6883, 138, 300,
331, 329
Thomson, J. J., 52
Todoro, T., 114
Truax, B., 97, 112, 115, 190, 3112, 343
Vaggione, H., ix, 3123, 332, 338
Varese, E., 15, 54, 233, 3278, 342
Vetruvius, 52
von Neumann, J., 59
Wagner, R., 11
Weare, C., x
Wiener, N., 63
Wishart, T., 14, 114, 2059, 263, 317
Xenakis, I., ix, 30, 36, 638, 834, 86,
1089, 122, 176, 3003, 329330, 347
Young, T., 50
Subject Index
Cantata, 318
Capybara, 114
CCMIX, ix, 383
CDP GrainMill, 114
CEMAMU, 158
Center for Music Experiment. See CME
Center for New Music and Audio
Technologies. See CNMAT
Center for Research in Electronic Art
Technology. See CREATE
Centre de Creation Musicale Iannis
Xenakis. See CCMIX
Centre Nationale de Recherche
Scientique, Marseille. See CNRS
Channel masking, 14951
Chant, 113
Chants de Maldoror, 317
ChaosSynth, 116
Chirplet, 121
Chroma, 132
Clang-Tint, 156, 171, 3079
Cloud Generator, 26, 111, 190, 199200,
224, 319, 383388
Clouds, of sound, 14
bandwidth, 1045
density, 1056
ll factor, 1056
taxonomy, 16
textures, 15
type, 386
404
Subject Index
Clutter, 253
Cmask, 115, 344
CME, 110
CNMAT, 310
CNRS, 284, 288
Coalescence of sound, 277
Code versus grammar, 345
Coherence, 338
Comb wavelets, 289
Comme etrangers et voyageurs sur la terre,
320
Communication versus sensation, 346
Complex tones, 17
Complexity versus simplicity, 343
Composer's Desktop Project, 263
Computer Audio Research Laboratory,
3178
Concerto for Clarinet and Clarinets, 318
Concret PH, 15, 64
Constant-Q granular ltering, 2034
Constraints, 14
Convolution and pulsar synthesis,
2201
Convolution of microsounds, 209221
Convolution of pulsars, 1524
Convolution with clouds, 22431
Cosmophone, 176
CREATE, ix, 193, 224, 232, 321
Creatophone spatializer, 232
Creatovox synthesizer, ix, 1167, 137,
1935, 339, 384
Cri, 320
Cross-ltering, 2167
Csound, 110, 113, 184
Cybernephone, 231
Data packing, 241
Data reduction, 241
De natura sonorum, 15
Decay of information, 348
Dele!, 319
Density of sound, 15, 332
Deterministic plus stochastic analysis,
278
DFT, 2456
405
Subject Index
Gabor analysis, 295249
Gabor matrix, xi, 5763, 65, 92, 296,
349
Gabor transform, 296
musical applications of, 2989
Gaboret, 296
GENDYN, 30, 176
Gesang der Junglinge, 71
GIST, 113
Glisson synthesis, 1215, 177
GMEBaphone, 231
Grain of sound
anatomy, 88
density, 385
duration, 1012, 386
envelope, 8890
generator, 9091
waveform, 1034, 387
Grainlet synthesis, 1259, 177
Grammar versus code, 345
GranQ, 2034
Granula, 114
Granular spatial eects, 107
Granular Synthesis Toolkit, 113
Granular synthesis, 656, 86118, 177
algorithmic models, 978
asynchronous, 92, 967
global organization, 918
implementations of, 108116
parameters of, 99107
physical models, 978
pitch synchronous, 92
quasi-synchronous, 93
spectra of, 989
synchronous, 93
Granulate, 185, 190
Granulation, 187193, 2224
real time, 1912
selective, 191
Graphic synthesis, 15763
GRM, 634, 163
Grossman-Morlet wavelet, 285
Groupe de Recherches Musicale. See
GRM
Gruppen, 75
406
Subject Index
Kurzweil sampler, 116
Kyma, 114, 322
Laboratoire de Mecanique et
d'Acoustique, 298
La cloche sans vallees, 319
La Maquina de Cantar, 312
Le sacre du printemps, 249
Lemur Pro, 275, 277, 279
Les Ateliers UPIC, 111, 308, 383
Lexicon Varispeech, 198
Licht, 11
Little Animals, 322
Localization blur, 107
Lux Aeterna, 15
Macintosh computer, 111
MacMix, 183
MacPOD, 115
Macro time scale, 11
Macroform, 12
Magic Flute, The, 168
Makbenach, 319
MarcoHack, 27980
Masking, of pulsars, 14951
Massachusetts Institute of Technology.
See MIT
Max, 113
Meso time scale, 14
Metastasis, 64, 83, 123
MetaSynth, 16, 117, 160, 262, 269
Michelson-Stratton harmonic analyzer,
243
Micro arc synthesis, 158, 177
Micro time scale, 20
Micromontage, 1827
Microsound
analog domain, 813
aesthetics of, 325348
modern concept of, 55
perception of, 21
Microtemporal perception
auditory acuity, 24
ssion, 22
fusion, 22
intensity perception, 22
pitch perception, 24
preattentive perception, 24
silence perception, 23
subliminal perception, 25
Mimetismo, 318
MIT, 63, 110
Council for the Arts, 307
Experimental Music Studio, 189, 306
Media Laboratory, 307
Mixed Emotions, 195
MixViews, 318
Modulation with sound particles, 224
Moin-Mor, 321
Moog synthesizer, 313, 329
Morphing, 276
Morphologies versus intervals, 340
Morphology of sound objects, 20
MP3, 242, 322
MPEG 1 layer 3, 242, 322
Multiscale approach to composition,
3301
Multivibrator, 69
Music 4C, 111
Music 11, 110, 189
Music IVBF, 111
Music N languages, 344
Music V, 302
Mycenae-Alpha, 159
NHK Studio, 79
Nodal, 314
Notjustmoreidlechatter, 317
Nscor, 3067
Nyquist frequency, 31
Oberlin Conservatory, 112
Objet sonore, 17
Octuor, 185
Ondes Martenot, 308, 327
Ondioline, 138, 268
Opacity of sound, 322
Oscillator bank resynthesis, 258
Outside time music, 38
Overlap-add resynthesis, 2578
407
Subject Index
Pacic, 312
Pacic Fanfare, 312
Paleo, 318
Parameter variation versus strategy
variation, 343
Particle cloning synthesis, 1713, 177
Particle pluriphony, 231
Particle spatialization, 22133
Particle synthesis, 349
Particle-based formant synthesis, 1638,
177
PAST, 279
Perisonic intensities, 7
Persepolis, 15
Phase vocoder, 239, 2539
Philips Pavilion, 64
PhISEM, 174
Phonogene, 61
Phonogramme, 159161
Phonons, 33
PhotoShop, 269
Photosonic synthesis, 157
Physical modeling synthesis, 26
Physical models of particles, 1735, 177
Pitch-shifting, 200202
Gibson's method, 2012
Lent's method, 2001
Malah, Jones, and Parks's method 200
Pitch-shifting on a micro time scale, 195
6
Pitch-time changing by granulation, 197
202
Pitch-time changing by phase vocoder,
25960
Pitch-time changing by wavelet analysis/
resynthesis, 289
Pithoprakta, 83
Planck time interval, 35
PLFKLANG, 303, 344
Pluriphonic sound, 222
Poeme electronique, 328
Polarons, 33
Presque rien, numero un, 312
Pro Tools, 184, 196
Prototype, 110, 302
408
Subject Index
Sound object, 17
Sound object morphology, 20
Sound object time scale, 16
Sound Synthesis Program, 29
SoundHack, 204
SoundMagic FX, 116, 190
SoundMaker, 116
Spatialization of particles, 22133
Spatiotemporal eects, 2178
SpecDraw, 269
Spectastics, 116
Spectral mutation, 2612
Spectromorphology, 14
Spontaneity versus reection, 338
SSP, 29
Stable and transient extractor, 2601
Stanford University, 290
Stationary processes, 3334
STFT, 239, 24653
Stochastic masking, 14951
Stochastic processes, 3334
StochGran, 113
Strategy variation versus parameter
variation, 343
Subsample time scale, 31
Subsonic intensities, 7
Sud, 318
SuperCollider, 111, 122, 154
version 2, 115, 154, 190, 344
version 3, 115
Supra time scale, 9
Syd, 175
Synth-O-Matic, 113, 154
Tape feedback loop, 6970
Tar, 313
Telemusik, 79
Telharmonium, 2, 327
Tenth vortex, 156, 3101
Texture-Multiple, 322
The Computer Music Tutorial, viii, 238
The Gates of H., 319
The MIT Press, x
Thema, 185, 187, 313
Thereminovox, 327
409
Subject Index
von Neumann lattice, 59
VOSIM generators, 306
VOSIM formant synthesis, 16566
VOX-5, 317
Wavecycle distortion, 2059
Wavelab, 290
Wavelet analysis/resynthesis, 28295
display, 288
ltering, 289
resynthesis, 286
transformations, 28890
Wavelets, 240
Waves versus particles, 44
Waveset distortion, 2059
Window function synthesis, 1678
Window types, 2556
Windowed analysis and transformation,
58, 235300
Windowed frequency domain, vii
Windowed spectrum analysis, 238243
Windowing, 2467
Wings of Nike, 311, 343
Zoetrope, 56
Zones of frequency, 6
Zones of intensity, 6