Large MIMO Systems PDF
Large MIMO Systems PDF
Large MIMO Systems PDF
Large MIMO systems, with tens to hundreds of antennas, are a promising emerg-
ing communication technology. This book provides a unique overview of this tech-
nology, covering the opportunities, engineering challenges, solutions, and state-
of-the-art of large MIMO test beds. There is in-depth coverage of algorithms
for large MIMO signal processing, based on metaheuristics, belief propagation,
and Monte Carlo sampling techniques, and suited for large MIMO signal de-
tection, precoding, and LDPC code designs. The book also covers the training
requirement and channel estimation approaches in large-scale point-to-point and
multiuser MIMO systems; spatial modulation is also included. Issues like pilot
contamination and base station cooperation in multicell operation are addressed.
A detailed exposition of MIMO channel models, large MIMO channel sound-
ing measurements in the past and present, and large MIMO test beds is also
presented. An ideal resource for academic researchers, next generation wireless
system designers and developers, and practitioners in wireless communications.
This is a very timely and useful book written by authors who are pioneers in
the area of large MIMO systems.
Vijay K. Bhargava
The University of British Columbia
Large MIMO will power our wireless networks before this decade is out and the
race is just starting. Chockalingam and Sundar Rajan have compiled an excellent
companion for this journey.
Arogyaswami Paulraj
Stanford University
Large MIMO Systems
A. CHOCKALINGAM AND B. SUNDAR RAJAN
Indian Institute of Science, Bangalore
University Printing House, Cambridge CB2 8BS, United Kingdom
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9781107026650
Cambridge University Press 2014
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2014
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalog record for this publication is available from the British Library
Library of Congress Cataloging in Publication Data
Chockalingam, A., author.
Large MIMO systems / A. Chockalingam, Indian Institute of Science, Bangalore;
B. Sundar Rajan, Indian Institute of Science, Bangalore.
pages cm
ISBN 978-1-107-02665-0 (hardback)
1. MIMO systems. I. Rajan, B. Sundar, author. II. Title.
TK5103.4836.C49 2014
621.39 8dc23 2013041123
ISBN 978-1-107-02665-0 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To our teachers and students
Contents
1 Introduction 1
1.1 Multiantenna wireless channels 2
1.2 MIMO system model 4
1.3 MIMO communication with CSIR-only 5
1.3.1 Slow fading channels 5
1.3.2 Fast fading channels 6
1.4 MIMO communication with CSIT and CSIR 7
1.5 Increasing spectral eciency: quadrature amplitude modulation
(QAM) vs MIMO 9
1.6 Multiuser MIMO communication 11
1.7 Organization of the book 12
References 14
3 MIMO encoding 25
3.1 Spatial multiplexing 25
3.2 Space-time coding 27
3.2.1 Space-time block codes 28
3.2.2 High-rate NO-STBCs 29
3.2.3 NO-STBCs from CDAs 30
viii Contents
4 MIMO detection 40
4.1 System model 43
4.2 Optimum detection 44
4.3 Linear detection 45
4.4 Interference cancelation 47
4.5 LR-aided linear detection 48
4.5.1 LR-aided detection 49
4.5.2 SA 51
4.6 Sphere decoding 54
References 59
processing) which do not come with smaller systems. Bringing out such large
MIMO centric opportunities, issues, and solution approaches and techniques is
one of the key objectives in this book.
A few words about what motivated us to write this book are in order. Our
teaching and research interest in space-time coding and multiuser detection in
the early- to mid-2000s brought us together to collaborate on MIMO wireless
research. Being in the same department and having oces in the same building
helped we could discuss ideas over casual chats during coee/tea breaks and
evening walks. Our rst set of results on large MIMO systems were published in
mid-2008. Since then, we have continued our research on various signal processing
aspects in large MIMO systems, which has led to several of our subsequent
publications on large MIMO. The large MIMO idea seems to have caught on,
as we can see in the chapter on large MIMO testbeds (Chapter 12). Over these
years, we have given tutorial talks on this topic to conferences and industry. We
felt that, in the process, we had generated a critical mass of material, enough
to write a book on large MIMO systems. Also, we found that a book written
exclusively on large MIMO systems was yet to appear at the time of proposing
this book to the publisher. We thank the publisher for having accepted our
proposal for writing this book, and here we are with our intended book on large
MIMO systems.
It is heartening to see that large MIMO systems have become more popular
now compared to the days when we rst started publishing on this topic in 2008.
Large MIMO systems seem to have started to ourish under several names; large-
scale MIMO, massive MIMO, hyper-MIMO, higher-order MIMO, to name a few.
It is even more heartening to realize that large MIMO technology is one of the
key technologies being considered for standardization in 5G and beyond.
We hope that this book will be of interest and use to researchers, graduate
students, and wireless system designers and implementers, and will create the
interest needed to take large MIMO research, development, and standardization
activities to the next level.
Acknowledgments
We would rst like to thank our graduate students for their valuable contribu-
tions to our large MIMO research. At a time when people started thinking that
there is not much of interest left in MIMO research, they took on the challenges of
exploring the uncharted area of MIMO systems with tens to hundreds of anten-
nas. Thanks to their dedicated and sustained eorts, we were able to make some
of the early contributions to the eld of large MIMO systems. This book to a
large extent draws on these contributions, and we thank all our students for their
commitment, hard work, and help. Our many thanks are due to: K. Vishnu Vard-
han, Saif K. Mohammed (currently an Assistant Professor at the Indian Institute
of Technology, Delhi), Ahmed Zaki, N. Srinidhi, Suneel Madhekar, P. J. Thomas
Sojan, Pritam Som, Tanumay Datta, N. Ashok Kumar, Suresh Chandrasekaran,
Yogendra Umesh Itankar, P. M. Chandrakanth, M. Raghavendra Nath Reddy,
Harsha Eshwaraiah, T. Lakshmi Narasimhan, Kamal Agarwal Singhal, Manish
Mandloi, and Shovik Biswas.
Our research and teaching in multiuser detection and space-time coding had
a positive inuence on our understanding of and contribution to large MIMO
research. We thank all the students who attended our courses on CDMA and
multiuser detection, and space-time signal processing and coding. We also thank
the students who contributed to our research in these areas. Parts of early drafts
of this book were used in the CDMA and multiuser detection course. We thank
the students for the valuable feedback on these drafts.
N. Srinidhi, Tanumay Datta, and T. Lakshmi Narasimhan were helpful in
the development of the manuscript in many ways (generating gures, proof-
reading, LaTex help, oering general feedback and comments on the structure
and contents of the book). Our special thanks are due to them. We also thank
Ms. G. Nithya, our project associate, for her help in the preparation of the
manuscript. We appreciate her technical support to our laboratory activities
and large MIMO related activities.
We thank Emanuele Viterbo and Yi Hong for their fruitful research collabo-
ration on MIMO precoding and sampling based lattice decoding. We also thank
Onkar Dabeer for his collaboration on AdaBoost for MIMO signal detection. We
are grateful to Rajesh Sundaresan and Vivek Borkar for useful discussions on
MCMC techniques.
xvi Acknowledgments
2G Second generation
3G Third generation
3GPP Third generation partnership project
4G Fourth generation
5G Fifth generation
ADC Analog-to-digital conversion
AGC Automatic gain control
AoA Angle of arrival
AoD Angle of departure
AP Access point
APP A posteriori probability
AS Angular spread
ASIC Application specic integrated circuit
AWGN Additive white Gaussian noise
BC Broadcast channel
BCJR BahlCockeJelinekRaviv
BER Bit error rate
BP Belief propagation
bpcu Bits per channel use
BPSK Binary phase shift keying
BQP Binary quadratic program
BS Base station
CCDF Complementary cumulative distribution function
CDA Cyclic division algebra
CDF Cumulative distribution function
CDMA Code division multiple access
CN Check node
COMP Coordinated multipoint
COST Cooperation in science and technology
CP Cyclic prex
CPSC Cyclic prexed single-carrier
CRB CramerRao bound
CRLB CramerRao lower bound
CSI Channel state information
xviii Abbreviations
() Complex conjugation
()H Hermitian transposition
()T Transposition
|| Absolute value of a complex number (or cardinality of a set)
Euclidean norm of a vector
. Rounding operation to the nearest integer
c Largest integer less than c
Element-wise multiplication operation
Kronecker product
CN (, 2 ) Circularly symmetric complex Gaussian distribution
with mean and variance 2
nt Number of transmit antennas
nr Number of receive antennas
vec() Stack columns of the input matrix into one column vector
det(X) Determinant of matrix X
tr{X} Trace of matrix X
In n n identity matrix
x Vector x
X Matrix X
C Field of complex numbers
E[.] Expectation operation
R Field of real numbers
R+ Non-negative real numbers
Z Set of all integers
() Real part of the complex argument
() Imaginary part of the complex argument
1 Introduction
1 1
2 2
MIMO MIMO
transmitter receiver
MIMO channel
nt nr
User 1
1
User 2 2
3
BS
User 3
User K
Uplink
Downlink
wireless channel is the variation of the channel strength over time and over fre-
quency [6],[14]. These variations are typically classied into two types: large scale
fading and small scale fading. Large scale fading is due to path loss as a function
of distance and shadowing by large objects like buildings, bridges, trees, etc.,
and is typically frequency independent. Small scale fading, on the other hand, is
due to the constructive and destructive interference of the multiple signal paths
between the transmitter and receiver. Small scale fading happens at the spatial
scale of the order of the carrier wavelength, and is frequency dependent. The
channel is then classied as frequency-selective or frequency-at. When the sig-
naling bandwidth is larger than the coherence bandwidth of the channel (which
has an inverse relation with the maximum delay spread of the channel), the
channel is frequency-selective [14]. On the other hand, in frequency-at chan-
nels, the signaling bandwidth is much smaller than the coherence bandwidth
of the channel. Even when the channel is frequency-selective, techniques like
OFDM can convert the channel into multiple frequency-at channels on which
the techniques designed for frequency-at fading can be employed.
In terms of time variation, wireless channels are further classied as slowly
fading or fast fading, depending on the fade rate relative to the signaling rate. If
the fade remains constant over the signaling duration, the fading is termed slow
(or time-at) fading, whereas if the fade varies within the signaling duration, it is
termed fast (or time-selective) fading. The carrier wavelength and velocity of the
communication terminal determine the amount of time selectivity (or Doppler
spread) in the channel [14].
Most multiantenna wireless channels with nt transmit and nr receive antennas
(Fig. 1.1) are modeled as a linear channel with an equivalent baseband channel
4 Introduction
matrix Hc Cnr nt . The (i, j)th entry of Hc represents the channel gain from
the jth transmit antenna to the ith receive antenna. The channel gains are
also referred to as the channel state information (CSI). The availability of
knowledge of these gains at the receiver and transmitter is an important factor
which decides the performance of the communication system. CSI at the receiver
(CSIR) refers to the scenario where the receiver has knowledge of the channel
gains. Likewise, CSI at the transmitter (CSIT) refers to the scenario where the
transmitter has knowledge of the channel gains. In fast fading channels, accurate
estimation of the channel gains can become an issue, in which case non-coherent
or blind techniques can be considered. In addition, obtaining CSIT through feed-
back can become ineective in fast fading. However, in applications where the
channel is not varying fast, it is generally possible to estimate the channel gains
accurately through pilot-assisted transmission. Also, CSIT based on measured
CSI fed back from the receiver is eective in such slow fading channels.
The channel gains can be independent or correlated, depending on various
factors including the spacing between antenna elements, the amount of scattering
in the environment, pin-hole eects, etc. Mathematical models that characterize
the spatial correlation in MIMO channels are used in the performance evaluation
of MIMO systems. Spatial correlation at the transmit and/or receive side can
aect the rank structure of the MIMO channel resulting in degraded MIMO
capacity. The structure of scattering in the propagation environment can also
aect the capacity. In addition, transmit correlation in MIMO fading can be
exploited by using non-isotropic inputs (precoding) based on knowledge of the
channel correlation matrices.
of large length, the theoretical limit for the codeword error rate of any encoding
decoding scheme is the channel outage probability, which is dened as
Poutage (, R) = min p CM IM O (, Hc , Kx ) < R . (1.3)
Kx |tr{Kx }=P
Any practical encodingdecoding scheme would have a codeword error rate more
than the channel outage probability given in (1.3). Therefore, it is important to
design transmit schemes and corresponding receivers which can perform very
close to the channel outage probability for all values of and R. For slow fading
channels, there are two important parameters, namely, diversity gain and mul-
tiplexing gain. Diversity gain is a measure of reliability, whereas multiplexing
gain is a measure of the degrees of freedom in the MIMO channel. These two pa-
rameters are usually related by the so called diversitymultiplexing gain tradeo
[15]. The maximum diversity gain achievable is nr nt and the maximum multi-
plexing gain achievable is min(nr , nt ). When the rate of transmission R is xed,
the limiting value (as ) of the negative of the slope of log(Poutage (, R))
wrt log can be no more than nr nt . For a given scheme, we can therefore dene
the diversity order achievable (with xed R) as
log(Pe ())
d = lim , (1.4)
log
where Pe () is the codeword error rate of the scheme. For simple MIMO schemes
like V-BLAST [16], it can be shown that the maximum diversity order achievable
is only nr . This is because symbols transmitted from the antennas in V-BLAST
are independent, and each such symbol reaches the receiver only through nr
dierent paths.
Space-time block coding is a well-known technique which can achieve the full
diversity gain of nr nt [5]. To achieve full diversity, symbols are coded across both
space and time. Orthogonal space-time block codes (STBC) allow simple decod-
ing achieving full diversity [17],[18]. However, they make sacrices regarding
the multiplexing rate, and are therefore not suited for systems with high tar-
get spectral eciencies. Subsequent to orthogonal STBCs, several high-rate and
high-diversity STBCs were proposed. One such class of STBCs is non-orthogonal
STBCs (NO-STBCs) from cyclic division algebras (CDAs) [19],[20]. STBCs from
CDA can achieve the full diversity of nr nt without sacricing rate.
120
nr = nt = 8, CSIR only
nr = nt = 8, CSIT and CSIR
100 nr = nt = 16, CSIR only
nr = nt = 16, CSIT and CSIR
Ergodic capacity (bps/Hz)
60
40
20
0
6 4 2 0 2 4 6 8 10 12
Average received SNR (dB)
Figure 1.3 Ergodic MIMO capacity for increasing nt = nr with (i) CSIR only, and (ii)
CSIT and CSIR.
allocation among these n subchannels [2]. Note that the optimal power alloca-
tion is isotropic for the case of CSIR-only.
When CSIT is available, it is possible to use the available power judiciously by
allocating more power to the subchannel with higher channel gain. At low SNRs,
the availability of CSIT in addition to CSIR has an even higher impact on the
ergodic capacity when compared to the CSIR-only scenario. This is because, at
low SNRs, capacity is known to increase almost linearly with SNR, and therefore
with CSIT the transmitter allocates all available power to the subchannel with
the highest channel gain. In contrast to this, with CSIR-only, the available power
is equally divided among the subchannels, resulting in a lower achievable capacity
when compared to the scenario with CSIT. At high SNRs, waterlling power
allocation distributes roughly equal power to all the subchannels. Therefore,
power allocation at high SNRs is almost the same for scenarios with CSIR-only
as well as those with CSIR and CSIT. This implies that at high SNRs, both
CSIR-only and the CSIR and CSIT scenarios have roughly the same ergodic
capacity. This can be seen in Fig. 1.3, where the CSIT and CSIR ergodic
capacity is plotted, in addition to the CSIR only capacity. It can be observed
that indeed, for a given nt = nr and SNR, , the ergodic capacity with CSIT
and CSIR is more than the ergodic capacity with CSIR only. Also, the gap
between the ergodic capacity of the two scenarios reduces with increasing SNR.
Another important fact, which is not highlighted in Fig. 1.3 is that, at low SNRs,
the ergodic capacity with CSIT and CSIR is more than n log2 (1 + ), which
is the capacity of n parallel, independent SISO non-faded AWGN channels [6].
In slow fading channels, the codewords transmitted are subject to block fading
(i.e., the channel remains almost the same for the duration of the transmitted
codeword). As pointed out earlier, in such block fading scenarios, if the capacity
of the channel is below the rate of transmission, there will always be a codeword
error (outage) irrespective of the coding scheme used. With the availability of
CSIT, however, it is possible to theoretically achieve zero outage probability by
adapting the transmitted codewords (i.e., codeword rate and transmit power)
for a given long-term average power constraint [21]. This leads to a variable
rate transmission scheme, and also a large peak to average requirement on the
transmit radio frequency (RF) ampliers, which are undesirable in many ap-
plications. Therefore, in such applications, it is obvious that outages cannot be
avoided. Hence, it is important that encoding and decoding schemes are devised
to achieve high diversity and multiplexing gains. The maximum diversity gain
is nt nr and the maximum multiplexing gain is min(nt , nr ). CSIT can be used
to encode the information symbols into transmit vectors, a process commonly
called precoding. Several precoding schemes are known in the literature. Most
known precoding schemes (or precoders for short) achieve either (i) high rate or
high diversity at low complexity (e.g., linear precoders and non-linear precoders
based on TomlinsonHarashima precoding) or (ii) both high rate and high diver-
sity but at high complexity (e.g., precoders based on lattice reduction techniques
and vector perturbation).
1.5 Increasing spectral eciency: QAM vs MIMO 9
Table 1.1. Reliability and capacity of SISO, SIMO, and MIMO channels
The achievable link reliability (in terms of probability of error) and capacity
in bps/Hz in SISO (nt = nr = 1), SIMO (n1 = 1, nr > 1), and MIMO
(nt > 1, nr > 1) channels are summarized in Table 1.1. In non-fading SISO
AWGN channels, the probability of error falls exponentially with increasing SNR,
whereas the capacity grows only logarithmically with increasing SNR. With fad-
ing, the probability of error degrades to a linear fall with increase in SNR; this
is a detrimental eect of fading in SISO channels. This performance degradation
in fading can be alleviated by using more receive antennas, which oers receive
diversity. That is, in SIMO fading channels, the probability of error falls with
SNR as nr . While this means better error performance in SIMO fading com-
pared to SISO fading, the capacity of SIMO fading, like that of SISO fading,
grows only logarithmically with increasing SNR. That is, in SISO and SIMO
fading channels, signicant power increase is needed to increase capacity. How-
ever, MIMO fading channels are attractive in terms of both achievable reliability
as well as capacity. The probability of error in MIMO channels falls with SNR as
nt nr , which approaches an exponential fall for large nt , nr . More importantly,
the MIMO channel capacity increases linearly with the minimum of the number
of transmit and receive antennas, which is much better than the logarithmic
increase in capacity with increasing SNR.
Spectral eciency in communication systems can be increased by increasing
the size of the modulation alphabet (e.g., increasing M in M -QAM), or in-
creasing the number of spatial dimensions for signaling (i.e., increasing nt ), or a
combination of both. To achieve a given spectral eciency, using a small mod-
ulation alphabet size and increasing number of antennas is more power ecient
than using a small number of antennas and increasing the modulation alphabet
size. This can be illustrated as follows. Consider the communication systems in
Fig. 1.4. The system in Fig. 1.4(a) uses one transmit antenna and 64-QAM to
achieve 6 bps/Hz spectral eciency. The system in Fig. 1.4(b) achieves the same
spectral eciency using six transmit antennas and binary phase shift keying
10 Introduction
64 QAM
1 2
Tx
Rx
nr
(a)
BPSK
1 1
2 2
Tx Rx
nt =6 nr
(b)
Figure 1.4 Communication systems with 6 bps/Hz spectral eciency: (a) SISO/SIMO
with 64-QAM. (b) MIMO with nt = 6 and BPSK. (Rx: receiver; Tx: transmitter.)
(BPSK). The achieved bit error rates (BERs) (pe ) versus SNR () performances
of these systems are compared in Fig. 1.5. The performance of the 64-QAM with
nt = nr = 1 is least power ecient. As mentioned earlier, in this SISO fading
case pe falls as 1 . This can be seen by noting that for a 10 dB increase in
, the pe falls by one order: e.g., pe improves from 2 103 at = 35 dB to
2 104 at = 45 dB. By increasing the number of receive antennas from
nr = 1 to nr = 6, keeping nt = 1 and 64-QAM, the performance improves due
to receive diversity. However, the MIMO system with nt = nr = 6 and BPSK
signicantly outperforms the SIMO and SISO systems with 64-QAM. At a pe
of 103 , the nt = nr = 6 MIMO system with BPSK is power ecient by more
than 6 dB compared to the nt = 1, nr = 6 SIMO system with 64-QAM. This
illustrates the better power eciency of MIMO systems compared to SIMO and
SISO systems with the same spectral eciency. Therefore, increasing the num-
1.6 Multiuser MIMO communication 11
100
SISO, nt=1, nr=1, 64QAM
SIMO, nt=1, nr=6, 64QAM
MIMO, nt=6, nr=6, BPSK
6 bps/Hz
101
BER
102
103
104
0 5 10 15 20 25 30 35 40 45 50
Average received SNR (dB)
The rate region for the Gaussian BC has been characterized, and it is known that
a subset of this region can be achieved by dirty paper coding (DPC) [22],[23].
The sum capacity, which is the maximum aggregation of all users data rates,
grows linearly with the minimum of the number of antennas nt and the num-
ber of single-antenna users nu , provided the transmitter and receivers all know
the channel. The sum capacity was shown to be achievable by DPC [24],[25].
In DPC, the information is coded in such a way that, despite interference from
other users, each user can still receive its information perfectly as if there were no
interference at all. DPC is, however, not suited to practical implementations due
to high complexity. Hence, practical encoding and decoding schemes are required
to achieve rates in the BC rate region. A simple idea that seems to give reason-
able performance is that of prenulling the interference from users by precoding
the information vector with the inverse of the channel matrix. This precoder
(also known as the zero-forcing (ZF) precoder) does indeed prevent interference,
but the penalty to be paid is in the increase of average transmit power (particu-
larly when the channel is ill conditioned). Due to this, the ZF precoder is known
to achieve poor diversity order in fading channels. Most other low complexity
precoders also suer from loss in performance. Vector perturbation (VP) based
techniques were proposed as a low complexity alternative to DPC [26],[27]. VP
techniques are known to achieve good performance. In VP based techniques, the
precoder matrix is still the channel inverse matrix. However, prior to ZF, the in-
formation symbol vector is perturbed by an integer vector in such a way that the
transmit power requirement is minimized. The optimal integer vector is usually
searched using sphere encoding or lattice reduction techniques. Low complexity
search techniques can further reduce complexity to suit large multiuser MIMO
systems [12].
When the number of antennas at the BS transmitter is made signicantly
larger than the number of downlink users, simple linear precoders can achieve
very good performance because of the additional spatial dimensions available
at the transmitter. Likewise, on MAC channels with fading, providing the BS
receiver with a large number of receive antennas results in large dimension spatial
signatures towards users, which can be exploited to achieve good
detection performance using low complexity MUD algorithms at the BS receiver.
The main focus of this book is MIMO systems with a large number of antennas.
Since several earlier books on MIMO have already treated MIMO systems with
a small number of antennas in detail, we will keep our discussions on such small
systems only to an extent that is relevant for this book. Many chapters exclusively
deal with techniques and performance of large MIMO systems, in line with our
intent and motivation to write this book.
1.7 Organization of the book 13
References
Large MIMO systems are systems which use tens to hundreds of antennas in com-
munication terminals. Depending on the application scenario, dierent MIMO
system congurations are possible. These include point-to-point MIMO and
multiuser MIMO congurations. In multiuser MIMO, point-to-multipoint (e.g.,
downlink in cellular systems), and multipoint-to-point (e.g., uplink in cellular
systems) congurations are common. In a point-to-point communication sce-
nario (Fig. 1.1), the number of transmit antennas nt at the transmitter and the
number of receive antennas nr at the receiver can be large. A typical application
scenario for a point-to-point large MIMO conguration is providing high-speed
wireless backhaul connectivity between BSs using multiple antennas at each BS.
Since space constraint need not be a major concern at the BSs, a large number
of antennas can be used at both the transmit and receive BSs in this application
scenario.
In multiuser MIMO (Fig. 1.2), the communication is between a BS and multi-
ple user terminals. These user terminals can be small devices like mobile/smart
phones or medium sized terminals like laptops, set-top boxes, TVs, etc. In mo-
bile applications where mobile/smart phones and personal digital assistants are
the user terminals, only a limited number of antennas can be mounted on them
because of space constraints. However, in applications involving user terminals
like TVs, set-top boxes, and laptops a larger number of antennas can be used on
the user terminal side as well. Regardless of the size of the user terminals and the
number of antennas that can be accommodated in them, use of tens to hundreds
of antennas at the BS end in multiuser MIMO is not dicult. In such cases, the
greater the number of antennas at the BS, the greater can be the spatial degrees
of freedom available to perform precoding on the downlink and detection on the
uplink.
which indicates the possibility of achieving very high data rates using large nt , nr
without increasing the bandwidth. The rate gains in multiuser MIMO with a
large number of antennas can also be substantial [1]. For example, space division
multiple access (SDMA) in a multiuser uplink becomes quite attractive when
a large number of receive antennas are used at the BS. The larger the number
of BS receive antennas, the larger can be the number of uplink users supported
in the system. Also, a large number of transmit antennas at the BS in mul-
tiuser downlink can allow the use of simple precoding methods and exible user
selection and scheduling.
Though the most obvious advantages of large MIMO systems are increased data
rate and diversity gain, the large dimensionality they oer can also result in a
host of other advantages that do not come with smaller systems. As the nr nt
channel matrix H becomes larger (i.e., both nt and nr increase, keeping their
ratio xed), the distribution of its singular values becomes less sensitive to the
actual distribution of the entries of the channel matrix (as long as they are
independent and identically distributed (iid)). This is a result of the Marcenko
Pastur law, which states that if the entries of an nr nt matrix H are zero
mean iid with variance 1/nr , then the empirical distribution of the eigenvalues
of HH H converges almost surely, as nt , nr with nt /nr , to the density
function [2]
+
1 (x a)+ (b x)+
f (x) = 1 (x) + , (2.1)
2x
where (z)+ = max(z, 0), a = (1 )2 , and b = (1 + )2 . In a similar way,
the empirical distribution of the eigenvalues of HHH converges to
Equations (2.1) and (2.2) are plotted in Figs. 2.1(a) and (b), respectively, for
dierent values of . Note that the mass points at 0 are not plotted.
An eect of the MarcenkoPastur law is that very tall or very wide channel
18 Large MIMO systems
1
0.9
0.8
0.7 b = 0.2
b = 0.5
0.6 b =1
b = 10
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 15
x
(a)
1
0.9
0.8 b = 0.2
0.7 b = 0.5
b=1
0.6 b = 10
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18
x
(b)
Figure 2.1 MarcenkoPastur density function for (a) HH H and (b) HHH . The mass
at zero points are not shown.
matrices are very well conditioned. This can be seen from Figs. 2.1(a) and (b)
for = 0.2 and = 10, as the support of the non-zero eigenvalues of HH H and
HHH moves away from zero. The MarcenkoPastur law also implies that the
channel hardens, meaning that the eigenvalue histogram of a single realiza-
tion converges to the average asymptotic eigenvalue distribution. In this sense,
the channel becomes more and more deterministic as the number of antennas
increases. The channel hardening behavior in large dimensions can be seen pic-
torially in the intensity plots of HH H for nt = nr = 8, 32, 96, 256 in Fig. 2.2,
where the entries of H are iid Gaussian entries with zero mean and unit variance.
Figure 2.2 shows that, as the size of H increases, the diagonal terms of HH H
become increasingly larger in magnitude than the o-diagonal terms.
Channel hardening results in several advantages in large dimensional signal
2.3 Technological challenges and solution approaches 19
8x8 32 x 32
15 60
10 40
5 20
0 0
5 20
10 40
8 40
6 30
4 8 20 40
6 30
2 4 10 20
2 0 0 10
0 0
(a) (b)
96 x 96 256 x 256
150 400
100 300
200
50
100
0
0
50
-100
100
300
50 200
80 100 100 300
40 60 200 250
0 0 20
0 0 50 100 150
(c) (d)
Figure 2.2 Intensity plots of HH H matrices for (a) 8 8 MIMO, (b) 32 32 MIMO,
(c) 96 96 MIMO, and (d) 256 256 MIMO channels.
processing. For example, linear detectors like ZF and minimum mean square
error (MMSE) detectors need to perform matrix inversions. Inversion of large
random matrices can be done quickly, using series expansion techniques [3].
Because of channel hardening, approximate matrix inversions using series ex-
pansion and deterministic approximations from the limiting distribution become
eective in large dimensions. Also, channel hardening can allow simple detection
methods/algorithms to achieve a very good performance in large dimensions.
Such low complexity detection algorithms suited for large dimensions are treated
in Chapters 58.
Current wireless standards including IEEE 802.11n/11ac (WiFi) and 3GPP LTE-
A have adopted MIMO techniques to achieve increased spectral eciency and
reliability. These standards can harness only some of the potential benets of
MIMO, since they currently use only a small number of antennas (e.g., 28
20 Large MIMO systems
Use higher carrier frequencies One approach to enable the placement of more
antennas in a given antenna aperture is to use higher carrier frequencies.
Since higher carrier frequencies have smaller wavelengths (e.g., /2 = 3 cm
in 5 GHz), more antennas can be mounted in a given antenna aperture. Also,
operation at 11 GHz, 30 GHz, and 60 GHz carrier frequencies can be attractive
in terms of the antenna placement issue, since the carrier wavelengths in these
frequencies are in millimeters.
Exploit volume Another approach is to mount antennas not only in one or two
dimensions (as in linear and planar arrays) but in a compact volume. Placing
multiple antenna elements in cubic structures (referred to as MIMO cubes)
is a promising approach. More details on MIMO cube antennas are given in
Section 11.5.3.
Compact antenna arrays Compact antenna arrays are antenna arrays with an
inter-element spacing of less than half the wavelength. The design of arrays
that are compact yet demonstrate acceptable mutual coupling and radiation
eciency is another interesting approach to address the antenna placement
issue in large MIMO systems. The use of PIFAs as the basic elements in
compact arrays is appealing. The topic of compact antenna arrays is discussed
in Section 11.5.
Spatial modulation (SM) The number of transmit RF chains at the MIMO
transmitter can be reduced without compromising on the spectral eciency
using SM, a relatively new modulation scheme suitable for multiantenna com-
munications [8],[9]. It reduces RF hardware complexity, size, and cost. SM is
introduced in Chapter 3.
MIMO detection The MIMO detector at the receiver, whose job is to recover
the symbols that are transmitted simultaneously from multiple transmitting
antennas, is often a bottleneck in terms of overall performance and com-
plexity. Complexities involved in optimum detectors based on the maximum
likelihood (ML) or the maximum a posteriori probability (MAP) criterion are
exponential in the number of transmit antennas, and hence are prohibitive
for large MIMO systems. Widely known detectors in the literature either
22 Large MIMO systems
perform well but do not scale in complexity (e.g., the sphere decoder (SD)
[10] and variants) or scale well in complexity but perform poorly in large sys-
tems (e.g., linear detectors like ZF, MMSE detectors). Fortunately, the chan-
nel hardening behavior witnessed in large matrices (discussed in Section 2.2)
becomes helpful. Several low complexity detection algorithms based on local
search, metaheuristics, BP, and sampling techniques have shown promising
performance and complexity attributes suited for large MIMO systems. They
have the same complexity orders as linear detectors yet exhibit near-optimal
performance, particularly when applied for signal detection in large dimen-
sions. This new generation of MIMO detection algorithms is treated in detail
in Chapters 58.
LDPC codes Another interesting aspect of detection using message passing
algorithms like in BP large dimensions is that these graphical model based al-
gorithms can be naturally combined with turbo or LDPC decoding algorithms
(which are also graphical model based algorithms) to achieve joint processing
of detection and decoding in large MIMO systems. Such a joint detection
decoding approach allows one to design good LDPC codes tailored for large
MIMO channels. In particular, LDPC codes matched to large MIMO chan-
nels can be designed using the extrinsic information transfer (EXIT) behav-
ior of the joint detectiondecoding message passing receiver. In large MIMO
channels, such specially designed LDPC codes outperform o-the-shelf LDPC
codes designed for other types of channels. LDPC code design for large MIMO
systems is covered in Section 7.8.
SM This is a relatively new modulation scheme suited for multiantenna com-
munications [8],[9]. It is a promising technique for large MIMO systems as
it allows the use of fewer transmit RF chains than the number of trans-
mit antennas without compromising on the spectral eciency. This reduces
RF hardware complexity, size, and cost in large MIMO systems. A novel
aspect of this modulation scheme is that it conveys information in the in-
dices of the chosen antennas for transmission, in addition to information
conveyed through conventional modulation alphabets like QAM. Chapter 3
introduces SM as a MIMO encoding scheme. The low complexity detection
algorithms in Chapters 58 can be used to detect spatial modulation
signals.
Single-carrier communication Single-carrier communication techniques are
increasingly preferred over multicarrier techniques like OFDM/OFDMA be-
cause of the high peak-to-average power ratio (PAPR) in multicarrier sys-
tems. The LTE standard already employs single-carrier frequency division
multiple access (SC-FDMA) on the uplink. SC-FDMA can be benecially
employed in multiuser MIMO downlink as well [11]. While SC-FDMA of-
fers PAPR and performance advantage over OFDMA, the receiver complex-
ity will be greater in SC-FDMA because of the need to perform equaliza-
tion. However, SC-FDMA equalization complexity in large MIMO systems can
be addressed by using the detection algorithms in Chapters 58 for the purpose
2.3 Technological challenges and solution approaches 23
References
The job of MIMO encoding is to map the input symbols, say, from a modula-
tion alphabet, to symbols to be transmitted over multiple transmit antennas.
Spatial multiplexing and space-time coding are two well-known MIMO encoding
techniques [1],[2]. Spatial modulation (SM) is a more recently proposed scheme
for multiantenna communications [3]. These MIMO encoding schemes do not
require any knowledge of the CSI at the transmitter, and hence are essentially
open-loop schemes. MIMO encoding using CSI at the transmitter is referred
to as MIMO precoding, which is treated in Chapter 10. Spatial multiplexing is
an attractive architecture for achieving high rates. Space-time coding is attrac-
tive for achieving increased reliability through transmit diversity. SM serves a
dierent purpose. It allows fewer transmit RF chains than the number of trans-
mit antennas to be used without compromising much on the rate. This reduces
the RF hardware complexity, size, and cost. In spatial multiplexing and space-
time coding, information is carried on the modulation symbols. In SM, on the
other hand, in addition to modulation symbols, the indices of the antennas on
which transmission takes place also convey information. This is why SM does
not compromise much on the rate. Among the three MIMO encoding schemes,
spatial multiplexing is the simplest, and its complexity rests more at the receiver
in detecting the transmitted symbol vector. SM, though simple conceptually,
needs additional memory to construct the encoding table at the transmitter
for selecting the antennas for transmission. Detection is more involved in SM
than in spatial multiplexing, since, at the receiver in addition to detecting the
modulation symbols, the indices of the transmitting antennas also need to be
detected. Space-time coding is the most sophisticated of the three MIMO en-
coding schemes. Rich and sophisticated mathematical tools (e.g., CDA, Cliord
algebra) have been applied to design space-time codes that achieve good diver-
sity and rate. We will describe spatial multiplexing, space-time coding, and SM
in some detail in the following sections.
streams simultaneously, one on each transmit antenna. That is, independent data
streams are multiplexed in space. There is no coding across the transmit anten-
nas, however. A key advantage of spatial multiplexing is that it utilizes all the
available spatial degrees of freedom and achieves the full rate of nt symbols per
channel use. Another advantage is that it applies to systems with any number
of transmit antennas. A drawback, however, is that it does not achieve the max-
imum spatial diversity of nt nr , where nr is the number of receive antennas. It
achieves receive diversity but not transmit diversity (because there is no coding
across transmit antennas).
The transmit signal vector in a given channel use is x = [x1 x2 xnt ]T ,
where the symbols xk , k = 1, . . . , nt , come from a modulation alphabet. These
transmitted symbols are detected jointly at the receiver using nr receive anten-
nas. In a V-BLAST system with nt = nr = 2, an upper bound on the pairwise
error probability of a transmit vector x1 being decoded as x2 at the receiver can
be expressed as [1]
2
1
P (x1 x2 )
1 + SNRx1 x2 2 /4
16
. (3.1)
SNR2 x1 x2 4
The exponent in the SNR factor is the diversity gain. In the above, the achieved
diversity order is only 2, whereas the maximum diversity order is nt nr = 4.
Spatial multiplexing is a commonly used MIMO encoding scheme in practical
wireless systems and standards. Also, a multiuser uplink system with K single-
antenna user terminals transmitting simultaneously on the same frequency to
a BS with N receive antennas can be viewed as a virtual spatial multiplexing
system also referred to as SDMA [1]. SDMA users signals are decoded at
the BS receiver based on the spatial signatures they establish at the receive
antenna array. SDMA is highly spectrally ecient since neither orthogonality
among users transmissions (as in TDMA and FDMA) nor code signatures which
involve bandwidth expansion (as in CDMA) are needed in SDMA. All users
can transmit at the same time on the same frequency in SDMA without any
bandwidth expansion.
Several algorithms for detecting spatially multiplexed signals with varying
levels of performance and complexity are known in the literature. Some of the
well-known detection algorithms are covered in Chapter 4. These algorithms
either scale well in complexity for a large number of antennas but perform poorly,
or perform well but do not scale well in complexity. Advanced low complexity
approaches and algorithms with good performance are needed for detection in
large MIMO systems. Such algorithms suited for large MIMO signal detection
are covered in Chapters 58.
3.2 Space-time coding 27
where s = Es /4N0 , Es is the average energy per complex symbol, and i s are
the singular values of the dierence matrix B = C1 C2 . If r denotes the rank
of the matrix A = BBH and 1 , 2 , . . . , r denote the non-zero eigenvalues of
A, then
r nr
rnr
P (C1 C2 ) i (s ) . (3.3)
i=1
Thus a diversity gain of rnr and a coding gain of (1 2 r )1/r are achieved.
The following criteria are generally used to design space-time codes.
Rank criterion Maximize the minimum rank of the matrix B(C1 , C2 ) over
all distinct pairs of codewords C1 and C2 . If the minimum rank is r, then a
diversity of rnr is achieved.
Determinant criterion To achieve a better coding gain, the minimum of the
determinant of A(C1 , C2 ) taken over all pairs of distinct code words C1 and
C2 must be maximized.
Two types of space-time codes, namely, space-time trellis codes (STTCs) and
STBCs, are well known [2]. The rank and determinant criteria apply to the design
of both STTCs and STBCs. STTCs combine modulation and trellis coding to
transmit information over multiple transmit antennas, and they can be viewed
28 MIMO encoding
as trellis coded modulation (TCM) for MIMO channels. While STTCs need the
vector-Viterbi algorithm for decoding, some STBCs admit very simple decoding.
where x1 , x2 , . . . , xk are the information symbols and the elements of X are linear
combinations of x1 , x2 , . . . , xk and their conjugates [5]. A well-known orthogonal
STBC is the Alamouti code [6], which is a 2 2 code (i.e., nt = p = 2). Two
symbols x1 and x2 and their conjugates are sent in two time slots using two
transmit antennas. The corresponding STBC matrix is given by
x1 x2
X = . (3.5)
x2 x1
It can be seen that k = 2, and therefore the rate r = k/p = 1.
An advantage of OSTBCs is that they admit simple decoding. This can be
explained using the decoding of Alamouti code with one receive antenna (nr = 1)
as follows. Let h1 and h2 denote the channel gains from transmit antennas 1 and
2, respectively, to the receive antenna. The received signals in two time slots,
denoted by y1 and y2 , are given by
y1 = h1 x1 + h2 x2 + n1 ,
y2 = h1 x2 + h2 x1 + n2 ,
where
3.2 Space-time coding 29
h1
h= .
h2
Assuming that the receiver has knowledge of the channel gains, the following
receiver operation can be performed
= HH y
y
.
= h2 x + n
is still white. Therefore, x1 and x2 can be decoded separately
The noise vector n
rather than jointly. Since h2 is the amplitude of the symbols, the diversity
order is 2, which is the achieved transmit diversity (since nr = 1). If nr receive
antennas are used, then the above receiver operation results in
nr
nr
r= HH i ri = h2 x + n,
i=1 i=1
where n = ej2/n , j = 1, and xu,v , 0 u, v n 1 are the data symbols
from a QAM alphabet. When = e 5 j and t = ej , the resulting STBC achieves
full transmit diversity as well as information losslessness. When = t = 1, the
code ceases to be of full diversity, but continues to be information lossless.
The main attractive features of the above NO-STBCs from CDAs for large
MIMO systems are that (i) they achieve both the maximum rate of nt spcu as
well as the full transmit diversity of nt , and (ii) they are valid for any number of
transmit antennas. However, because of the large dimensions involved, decoding
NO-STBCs from CDAs is challenging. Note that a 1616 NO-STBC from CDAs
3.3 Spatial modulation (SM) 31
has 512 real dimensions. Fortunately, detection approaches and algorithms that
scale well in complexity and achieve near-optimum performance in such large
dimensions are available. One such approach based on local search techniques
is treated in Chapter 5. Other detection approaches based on PDA, BP, and
Monte-Carlo sampling (treated in Chapters 68) can also be used to decode
NO-STBCs from CDAs in large dimensions.
A key issue in the practical realization of large MIMO systems is the need to
have a large number of RF chains. This increases the hardware complexity, size,
and cost. SM, which was proposed for multiantenna systems, can alleviate this
issue by using fewer transmit RF chains than transmit antennas. Space shift
keying (SSK) is a special case of SM, and generalized spatial modulation (GSM)
is a generalized version of SM. SM, SSK, and GSM schemes for multiantenna
communication are described in the following subsections.
3.3.1 SM
SM employs a multiple antenna array at the transmitter but only a single trans-
mit RF chain [3],[12]. This reduces the RF hardware complexity and cost. Though
SM employs a multiple antenna array, it uses only one antenna from the array at
a time for transmission. The choice of which antenna to activate is made based
on a group of m data bits, where m = log2 nt , and nt = 2m is the number of
transmit antennas in the array. On the chosen antenna, a symbol from an M -ary
modulation alphabet AM (e.g., M -QAM) is sent. The remaining nt 1 antennas
remain silent. Therefore, the achieved rate in SM (number of bits conveyed in
one time unit) in SM is
Snt ,M = {xj,l : j = 1, . . . , nt , l = 1, . . . , M },
st xj,l = [0, . . . , 0, xl , 0, . . . , 0]T , xl AM . (3.7)
jth coordinate
SM signal detection
The job of signal detection at the receiver in SM involves determining the index
of the transmitting antenna and the M -ary modulation symbol sent on it. Let
x Snt ,M denote the nt 1 transmitted signal vector. Then x will have an
32 MIMO encoding
M -ary modulation symbol in one of the coordinates and zeros in all the other
coordinates. Let H denote the nr nt channel gain matrix, whose entries are
assumed to be iid complex Gaussian with zero mean and unit variance. Let n
denote the nr 1 noise vector at the receiver, whose entries are iid complex
Gaussian with zero mean and variance 2 . Let y denote the nr 1 received
signal vector. Assuming equally likely inputs, the ML decision rule is
= argmax P (y|H, x)
x
xSnt ,M
3.3.2 SSK
SSK is a special case of SM [16]. Like SM, SSK also uses a one-to-one mapping
between a group of m information bits and the spatial position (index) of the ac-
tive transmitting antenna, which is chosen among the available nt = 2m transmit
antennas. But instead of sending an M -ary modulation symbol (e.g., an M -QAM
symbol) as is done in SM, in SSK a signal known to the receiver, say +1, is sent
on the chosen antenna. The remaining nt 1 transmit antennas remain silent.
By doing so, the problem of SSK signal detection at the receiver becomes one of
merely nding out which antenna is transmitting, whereas in SM, demodulation
of the M -ary modulation (e.g., QAM) symbol is needed in addition to nding
out which antenna is transmitting. So, SSK has a lower detection complexity
than SM. Also, the achieved rate in SSK is
m bpcu.
3.3 Spatial modulation (SM) 33
Table 3.2. Data bits to SSK signal mapping for m = 2, nt = 4. Achieved rate: m bpcu
j = argmax p(y|xj , H)
j, 1jnt
= argmin y hj 2 . (3.9)
j
The estimated antenna index j is demapped to the information bits which rep-
resent that index. Exact BER expressions for SSK with m = 1 (nt = 2) and
BER upper bounds based on union bounding for m > 1 (nt > 2) have been de-
rived in the literature [17],[18]. The performance of SSK and SM in single-carrier
communication on frequency-selective channels is studied in [19].
3.3.3 GSM
Two limitations in SM and SSK are: (i) the number of transmit antennas is
restricted to powers of 2, and (ii) the number of transmit RF chains is restricted
to 1 because of which only one antenna can be active at a time. Both these
restrictions are relaxed in GSM. In GSM, nrf transmit RF chains, 1 nrf nt ,
are used, and the number of transmit antennas nt is not restricted to powers
of 2 [20],[21]. By using more than one transmit RF chain, GSM allows multiple
transmit antennas to be active simultaneously. This enables GSM to achieve
higher spectral eciencies compared to SM and SSK.
34 MIMO encoding
In GSM, the transmitter has nt transmit antennas and nrf transmit RF chains,
1 nrf nt . An nrf nt switch connects the RF chains to the transmit
antennas. nrf out of nt transmit antennas are chosen, and M -ary information
symbols are sent on these chosen antennas. The remaining nt nrf antennas
remain silent (i.e., they can be viewed as transmitting the value zero). Therefore,
if AM denotes the M -ary alphabet used on the active antennas, the eective GSM
alphabet becomes A0 = AM 0.
Let us dene an antenna activation pattern to be an nt -length vector that
indicates which antennas are active (denoted by a 1 in the corresponding
antenna index) and which antennas are silent (denoted by a 0). There are
nt
L=
nrf
are used to choose an activation pattern for a given channel use. Note that
not all L activation patterns are needed, and any 2K patterns out of them are
adequate. Let us take any 2K patterns out of L patterns and form a set called
the antenna activation pattern set, S. We illustrate
4 this using the following
example. Let nt = 4 and nrf = 2. Then, L = 2 = 6, K = log2 6 = 2, and
2K = 4. The L = 6 antenna activation patterns are given by
$ %
[1, 1, 0, 0]T , [1, 0, 1, 0]T , [0, 1, 0, 1]T , [0, 0, 1, 1]T , [0, 1, 1, 0]T , [1, 0, 0, 1]T .
Out of these six patterns, any 2K = 4 patterns can be taken to form the set S.
Accordingly, let us take the antenna activation pattern set as
$ %
S = [1, 1, 0, 0]T , [1, 0, 1, 0]T , [0, 1, 0, 1]T , [0, 0, 1, 1]T .
Table 3.3 shows the mapping of data bits to GSM signals for nt = 4, nrf = 2 for
the above activation pattern set. Suppose 4-QAM is used to send information
on the active antennas. Let x An0 t denote the nt -length transmit vector. Let
010011 denote the information bit sequence. GSM translates these bits to the
transmit vector x as follows: (i) the rst two bits are used to choose the activity
pattern, (ii) the second two bits form a 4-QAM symbol, and (iii) the third two
bits form another 4-QAM symbol. Using Gray mapping, the transmit vector x
becomes
x = [1 + 1, 0, 1 1, 0]T .
Note that both SM and spatial multiplexing (i.e., V-BLAST) turn out to be
special cases of GSM with nrf = 1 and nrf = nt , respectively.
3.3 Spatial modulation (SM) 35
where A0 = AM 0 denotes the eective GSM alphabet, x0 denotes the zero
norm of vector x (i.e., number of non-zero entries in x), and tx denotes the
antenna activation pattern vector corresponding to x, where txj = 1, i xj =
0 , j = 1, 2, . . . , nt . Note that
|U| = 2R ,
where
" #
nt
R = log2 + nrf log2 M.
nrf
The activation pattern set S and the mapping between elements of S and an-
tenna selection bits are known at both the transmitter and the receiver. The ML
decision rule for GSM signal detection is then given by
= argmin y Hx2 .
x (3.10)
xU
For small nt and nrf , the set U may be fully enumerated and ML detection as
per (3.10) can be done. But for large nt and nrf , brute force computation of
in (3.10) becomes computationally infeasible. A low complexity near-ML de-
x
tector for GSM which separates the antenna set detection from information bits
detection is presented in [22]. A Gibbs sampling based algorithm for detection
of GSM signals with large nt , nrf is presented in [23]. Analytical upper bounds
on the BER performance of GSM are derived in [20],[22].
SSK nt = 2m , m {1, 2, . . .} 1 m
SM nt = 2m , m {1, 2, . . .} 1 m + log2 M
" #
nt
GSM nt {1, 2, . . .} 1 nrf nt log2 + nrf log2 M
nrf
80
70
Achievable rate, R (bpcu)
nt = 4
64 nt = 8
60 nt = 12
nrf =24
nt = 16
50 nt = 22
nt = 32
40
nrf =16
35
30
20 nrf =13
17
10
0
0 5 10 15 20 25 30
Number of transmit RF chains, nrf
Figure 3.1 Achievable rate R in GSM as a function of nrf for dierent values of nt
and 4-QAM.
Table 3.5. Percentage saving in transmit RF chains and percentage increase in rate in
GSM compared to spatial multiplexing (V-BLAST) for nt = 16, 32 and BPSK,
4-QAM, 8-QAM, 16-QAM
zero and the second term is nt log2 M . When nrf < nt , the second term is
(nt nrf ) log2 M less than the V-BLAST rate, but there is a positive rst term.
Therefore, R can exceed the V-BLAST rate of nt log2 M whenever the rst term
exceeds (nt nrf ) log2 M . This explains the rate gains and RF chain savings
in GSM. These gains will diminish for large values of M , as the second term
will increasingly dominate the rst term for increasing values of M . This can be
seen in Table 3.5, where the gains are large for BPSK and 4-QAM, but small for
8-QAM and 16-QAM.
References
[14] M. D. Renzo and H. Haas, Bit error probability of SM-MIMO over generalized
fading channels, IEEE Trans. Veh. Tech., vol. 61, no. 3, pp. 11241144, Mar.
2012.
[15] A. Younis, R. Mesleh, H. Haas, and P. M. Grant, Reduced complexity sphere
decoder for spatial modulation detection receivers, in IEEE GLOBECOM2010,
Miami, FL, Dec. 2010, pp. 15.
[16] J. Jeganathan, A. Ghrayeb, L. Szczecinski, and A. Ceron, Space shift keying
modulation for MIMO channels, IEEE Trans. Wireless Commun., vol. 8, no. 7,
pp. 36923703, Jul. 2009.
[17] M. D. Renzo and H. Haas, A general framework for performance analysis of space
shift keying (SSK) modulation for MISO correlated Nakagami-m fading channels,
IEEE Trans. Commun., vol. 58, no. 9, pp. 25902603, Sep. 2009.
[18] , Space shift keying (SSK-) MIMO over correlated Rician fading channels:
performance analysis and a new method for transmit-diversity, IEEE Trans.
Commun., vol. 59, no. 1, pp. 116129, Jan. 2011.
[19] P. Som and A. Chockalingam, Spatial modulation and space shift keying in single
carrier communication, in IEEE PIMRC2012, Sydney, Sep. 2012, pp. 19621967.
[20] A. Younis, N. Seramovski, R. Mesleh, and H. Haas, Generalised spatial mod-
ulation, in Asilomar Conf. on Signals, Systems and Computers, Nov. 2010, pp.
14981502.
[21] J. Fu, C. Hou, W. Xiang, L. Yan, and Y. Hou, Generalised spatial modulation
with multiple active transmit antennas, in IEEE GLOBECOM2010 Workshops,
Miami, FL, Dec. 2010, pp. 839844.
[22] J. Wang, S. Jia, and J. Song, Generalised spatial modulation system with multiple
active transmit antennas and low complexity detection scheme, IEEE Trans.
Wireless Commun., vol. 11, no. 4, pp. 16051615, Apr. 2012.
[23] T. Datta and A. Chockalingam, On generalized spatial modulation, in IEEE
WCNC2013, Shanghai, Apr. 2013, pp. 27162721.
4 MIMO detection
y = Hx + n, (4.1)
where x Cdt , H Cdr dt , y Cdr , n Cdr , and dt and dr are the number
of transmit and receive dimensions, respectively. In CDMA, dt = dr = K, where
K is the number of active users, x is the transmit vector consisting of the trans-
mitted bits from each active user, H is the cross-correlation matrix, and y is the
received vector at the output of the K matched lters (matched to the signa-
ture sequences of the active users). In MIMO with spatial multiplexing, dt = nt ,
dr = nr , x is the vector transmitted from nt transmit antennas, H is the channel
gain matrix, and y is the received vector at the nr receive antennas. Because of
this structural similarity, approaches and algorithms that were investigated for
MUD in CDMA are natural candidates for MIMO detection.
MIMO detection 41
The job of any detection algorithm is to obtain an estimate of the transmit vec-
tor x, given knowledge of the received vector y and the channel matrix H [3]. The
elements of x often come from a predecided modulation alphabet with discrete-
valued symbols. Certain detection algorithms naturally produce soft outputs,
e.g., BP based algorithms, where the output will be the soft values of the likeli-
hood of the transmitted symbols. These soft output values can be fed to channel
decoders in coded systems, which can oer improved performance compared to
feeding hard inputs to channel decoders. Several detection algorithms, on the
other hand, produce only hard outputs, e.g., search based algorithms which test
a set of discrete-valued candidate vectors and choose one among them as the
output. While these hard decision outputs can be fed to the channel decoder
input as such, in order to improve performance futher, suitable methods have to
be devised to generate soft values from the detectors hard outputs for feeding
to the channel decoder.
The early breakthrough in MIMO system implementation was due to the lab-
oratory prototype of the V-BLAST system demonstrated by Bell Labs in the
late 1990s [4],[5]. This prototype system used 8 transmit antennas and 12 re-
ceive antennas, employed spatial multiplexing, and achieved a spectral eciency
of 25 bps/Hz. The detection algorithm employed in the V-BLAST system was
the zero forcing successive interference cancelation (ZE-SIC) algorithm, which
was then a popularly studied detection algorithm in the CDMA multiuser de-
tection literature [2],[6],[7]. Subsequently, it has become common to refer to the
ZF-SIC algorithm as the V-BLAST detection algorithm in the MIMO literature.
The basic idea in the V-BLAST detection algorithm is to detect the symbols
in a layered manner. In each layer, one symbol is detected. Detection in each
layer is done using ZF (a well-known linear detection method). Interference due
to the detected symbol in the rst layer is estimated and subtracted from the
received signal. From the rst layers interference canceled signal, another sym-
bol is detected in the second layer. The interference due to the second layers
detected symbol is estimated and subtracted. These detection and interference
cancelation steps are carried out in each layer, till all the symbols are detected.
Since the days of Bell Labs V-BLAST system demonstration, MIMO detec-
tion research has grown in two main directions. One direction is along the lines
of linear and non-linear detection approaches adopted from CDMA MUD litera-
ture. Well-known algorithms along these lines are the linear detectors, including
matched lter (MF), ZF, and MMSE detectors, and non-linear detectors in-
cluding multistage interference cancelers like ZF-SIC and MMSE-SIC detectors
[1]. These are suboptimum algorithms whose key advantage is their polynomial
complexity. The complexity of the ZF-SIC algorithm can be further reduced by
the square-root algorithm proposed in [8] by eciently deducing the pseudo-
inverse of the deated channel matrix in a given layer from the pseudo-inverse
computed in the previous layer, using array algorithm ideas in linear estimation
theory. Also, the performance of linear detection methods can be improved upon
using lattice reduction techniques [9][12].
42 MIMO detection
The other main direction of MIMO detection research is along the lines of
sphere decoding [13]. This line of research became quite dominant because the
sphere decoding algorithm is an ML decoding algorithm whose average com-
plexity over a wide range of SNRs is polynomial in the number of dimensions
[14], which is signicantly less than the exponential complexity of ML detection.
Sphere decoding is based on a bounded distance search among the lattice points
falling inside a sphere centered at the received point. A drawback, however, is
that its complexity in low and medium SNRs is still exponential in the number of
dimensions, making it unsuitable for more than 32 dimensions [13],[14]. Numer-
ous variants of sphere decoding aimed at complexity reduction while retaining
the ML decodability have appeared in the literature [15],[16]. Some later variants
of sphere decoding have compromised on the ML decodability while reducing or
xing the complexity for implementation ease [17].
Apart from the progress made in the above two main directions, another gen-
eral trend (as happened in CDMA MUD research) has been to look for promising
algorithms and tools from optimization, heuristics, machine learning, and arti-
cial intelligence. For example, detectors based on semi-denite relaxation (SDR)
[18][20], PDA [21],[22], and BP [23],[24] have exhibited good potential to achieve
close-to-optimal performance.
Often, the twin objectives of achieving good performance and low complex-
ity do not seem to be met simultaneously. One seems to come at the expense
of the other. For example, linear methods scale well in complexity but suer
signicant performance loss compared to optimum detection. Sphere decoding,
on the other hand, achieves ML performance but does not scale well in com-
plexity. So, the traditional view has been that complexity needs to be traded
o to achieve good performance, and that this tradeo will render large MIMO
systems (with tens to hundreds of antennas) impractical due to the high com-
plexities that may be involved for signal detection in large dimensions. However,
quite contrary to this view, the channel hardening [25] that happens when
the number of dimensions increases can be exploited to achieve near-optimal
performance in large MIMO systems at low complexities that are aordable
in practice.
Channel hardening (discussed in Section 2.2) is the phenomenon in which the
variance of the mutual information (capacity) grows very slowly relative to its
mean or even shrinks as the number of antennas grows [25]. In particular, con-
sider an nr nt MIMO channel with large nr and xed nt . This, in practice, may
correspond to an uplink scenario with nr receive antennas at the BS and nt syn-
chronized uplink users each having one transmit antenna. nr can be in the order
of hundreds and nt can be in tens. Since the loading factor = nt /nr 1, the
system is over-determined. As nr increases, the diagonal entries of HH H become
increasingly more prominent than the o-diagonal entries. Specically, HH H/nr
converges to Int as nr and the nt eigenvalues of HH H/nr approach 1 [25].
Because of this, in large systems with 1, simple linear detectors like MF and
4.1 System model 43
y = Hx + n, (4.2)
where H Cnr nt is the nr nt channel matrix whose entries are modeled as iid
complex Gaussian with zero mean and unit variance, and n is a complex AWGN
vector where E{nnH } = 2 Inr . It is assumed that H is known perfectly at the
receiver but is unknown at the transmitter. With the above assumptions, the
output likelihood function for the system model in (4.2) is given by
1 1
p(y|H, x) = exp y Hx2
. (4.3)
( 2 )nr 2
where
(H) (H)
R2nr 2nt ,
T T
Hr = yr = [(y) (y) ]T R2nr ,
(H) (H)
T T T T
xr = [(x) (x) ]T R2nt , nr = [(n) (n) ]T R2nr .
Note that the elements of vector xr in the real-valued system model (4.4) come
from the underlying PAM alphabet corresponding to the QAM alphabet
employed in (4.2).
M L = argmin y Hx2 .
x (4.5)
xAnt
MF detector
The MF detector is a simple linear detector. In detecting the symbol in a given
stream, the MF detector treats the interference from other streams as merely
noise. Dening hi , i = 1, 2, . . . , nt , to be the ith column of the channel matrix
H, (4.2) can be written in the form
nt
y = Hx + n = hi xi + n
i=1
nt
= hk xk + hi xi + n, (4.6)
i=1,i =k
where the rst term in (4.6) is the component due to the kth stream, and the
second term is due to all streams other than the kth stream, i.e., the second term
is the interference term as far as the kth stream is concerned. In detecting the
kth stream symbol xk , the MF detector simply ignores the second term as noise
and obtains a soft estimate of xk as
xk = hk y, (4.7)
ZF detector
The ZF detector is a linear detector in which the linear transformation on the
received vector is carried out using the pseudo-inverse of the H matrix. Let Q
denote the nt nr matrix which is the pseudo-inverse of H, i.e.,
xk = qk y = qk Hx + qk n
= xk + qk n, (4.10)
|xk |2
SN Rk = . (4.11)
qk 2 2
Note that the ZF operation, in addition to nulling the interference, has enhanced
the noise variance by a factor of qk 2 . Because of this, at low SNRs (large ), the
noise enhancement eect dominates and the ZF detector may end up performing
worse than the MF detector. At high SNRs, however, the interference nulling
eect dominates and the performance of the ZF detector is better than that of
the MF detector. The ZF solution, in vector form, can be written as
MMSE detector
The MMSE detector is a linear detector whose transformation matrix is that
matrix which minimizes the mean square error between the transmit vector and
4.4 Interference cancelation 47
the estimated vector (i.e., the transformed received vector). That is, the trans-
formation matrix GM M SE is given by the solution to the following minimization
problem:
min E x Gy2 . (4.13)
G
V-BLAST detector
The detector used in the Bell Labs V-BLAST system was the ZF-SIC detector,
which uses ZF for symbol detection in each stage. The ZF-SIC algorithm for
V-BLAST MIMO is summarized below.
Set y(1) = y, H(1) = H, where the superscript denotes the stage index. Q(m)
(m)
is the pseudo-inverse of the channel matrix H(m) , and ql is the lth row of
Q(m) . Set stage index m = 1. Let k denote the index of the user detected in a
given stage.
1. Symbol detection (ZF) Detect the symbol of the kth data stream using the
ZF detector, i.e.,
(m)
k = f (qk
x y(m) ), (4.16)
where f (.) is the slicing function. End the algorithm if m = nt .
2. Interference estimation Using x k , obtain an estimate of the interference vector
due to the kth stream ak as
k = hk x
a k . (4.17)
3. Interference cancelation Subtract (4.17) from y(m) to get the canceled output
y(m+1) as
k = y(m) hk x
y(m+1) = y(m) a k . (4.18)
4. Obtain H(m+1) by setting the kth column of H(m) to zero, i.e., H(m+1) =
[H(m) with kth column set to zero]. m m + 1. Go to Step 1.
The algorithm needs to do a matrix inversion in each stage and the resulting
complexity is O(n4t ), an order more than the complexity of the ZF detector.
The ZF-SIC performance, however, is better than the ZF performance. This is
illustrated in Fig. 4.1, which shows the BER performance of MF, ZF, and ZF-
SIC detectors in a V-BLAST MIMO system with nt = 8, nr = 12, and 4-QAM.
Though ZF-SIC performs better than linear detectors, its performance is far
from optimum. Also, it does not scale well for large MIMO systems.
100
ZFSIC
nt =8, nr =12, 4QAM ZF
MF
101
BER
102
103
104
4 2 0 2 4 6 8 10 12
Average received SNR (dB)
Figure 4.1 BER performance comparison between MF, ZF, and ZF-SIC detectors.
nt
L(H) = hi CZ . (4.19)
i=1
50 MIMO detection
Since the inverse of a unimodular matrix always exists, the relation H = HT1
holds. The aim of the lattice reduction is to transform a given basis H into a new
basis H with vectors of shortest length, or, equivalently, into a basis consisting
of roughly orthogonal basis vectors. Usually, H will be much better conditioned
than H.
Linear detection is optimal for an orthogonal channel matrix. With H = HT
and dening z = T1 x, the received signal vector can be written as
y = Hx + n
= HTT1 x + n
= Hz + n. (4.21)
Note that Hx and Hz denote the same point in the lattice, but the reduced
matrix H is much better conditioned than H. For x CnZt , we also have z CnZt .
So x and z come from the same set. However, for the QAM alphabet, i.e., x A,
the lattice is nite and the domain of z diers from A.
The idea behind LR-aided linear detection is to consider the equivalent system
model in (4.21) and perform the slicing on z instead of x. For LR-aided ZF, this
means that
4.5.2 SA
SA is an iterative method of lattice reduction [10]. The metric that SA uses to
quantify the orthogonality of the channel matrix is
nt
q(H) = hn 2 hn 2 , (4.24)
n=1
where hn is the nth basis vector of the dual lattice L , i.e., H H H = I, where
H = [h1 hnt ] denotes the dual basis. q(H) assumes its minimum, i.e., q(H) =
nt , if and only if the basis H is orthogonal. SA nds a (local) minimum of
q(H) = q(HT) in an iterative manner. In view of (4.24), it can be said that the
basis and its dual are reduced simultaneously.
Basic principle of SA
For any matrix D, let di denote the ith column and dui denote the updated ith
column. Initializing T = I and H = H, SA repeats the following steps until H
is SA-reduced [12].
or, equivalently,
Note that huk = Htuk . In each iteration, H is again a valid basis for L. In fact,
any basis for L can be achieved by a sequence of updates according to (4.25) and
(4.26).
SA-reduced basis
Consider a basis vector update (4.26) for a given index pair (i, j) not necessarily
the selected index pair (k, l):
Hi,j = [h1 hi1 hui hi+1 hnt ] with hui = hi + i,j hj . (4.27)
The best update value i,j such that q(Hi,j ) is minimized is obtained as [10]
( )
hjH hi hH
j hi
i,j =
. (4.28)
2hi 2 2hj 2
It can be shown that q(Hi,j ) q(H) if and only if i,j = 0. We call the basis H
SA-reduced if no decrease of q(H) can be achieved for any (i, j), i.e., i,j = 0 for
all possible (i, j). Thus, to obtain an SA-reduced basis, one simply has to repeat
52 MIMO detection
where i,j = q(H) q(Hi,j ). That is, in each iteration n2t nt basis updates
with respect to their achieved reduction of Seysens orthogonality measure are
calculated and the best basis update is retained. If k,l = 0, a local minimum is
found and SA ends.
Implementation of SA
The inputs to SA are the channel matrix H, i.e., the original basis of L, the
basis of the dual lattice L , i.e., H = QH , where Q = (HH H)1 HH is the
pseudo-inverse of H, and the corresponding Gram matrices S = HH H and S =
H H 1
H H = (H H) . Let si,j and si,j denote the (i, j)th elements of the matrices
S and S , respectively.
Initialization
Set H = H and H = H , and calculate all possible update values i,j with their
corresponding reduction i,j of Seysens orthogonality measure. The update
values are calculated as
sj,i sj,i
i,j = vi,j , where vi,j = . (4.30)
2si,i 2sj,j
Using this expression, i,j can be eciently calculated as follows. The update
of the ith basis vector of H according to (4.25) corresponds to the update of the
jth basis vector of H according to
Hi,j = [h1 hj1 hju hj+1 hnt ] with hju = hj i,j hi . (4.31)
We then have
i,j = q(H) q(Hi,j )
= hi 2 hi 2 + hj 2 hj 2 hui 2 hi 2 hj 2 hju 2 . (4.32)
Substituting hui , hju , we obtain
$ % $ H %
i,j = 2 |i,j |2 hj 2 hi 2 + hi 2 i,j hH
i h j h j 2
i,j hj hi
$ % $ %
= 2 |i,j | sj,j si,i + si,i i,j si,j sj,j i,j sj,i
2
* +
s s
= 2sj,j si,i i,j
j,i i,j 2
|i,j |
si,i sj,j
, -
= 2sj,j si,i 2i,j vi,j |i,j |
2
. (4.33)
4.5 LR-aided linear detection 53
Iteration
Set T = I and repeat the following steps until H is SA-reduced, i.e., i,j = 0 for
all (i, j).
1. Select (k, l) according to (4.29) and update T (4.25), H (4.26), and H (4.31)
using k,l (4.28).
2. Compute corresponding updates of S and S .
3. Calculate new i,j values (4.30) and i,j values (4.33) for all index pairs
corresponding to the updated elements of S and S .
Output
The output of SA is given by the unimodular transformation matrix T, the
SA-reduced basis H = HT, and the associated reduced dual basis H = Q .
are plotted. CCDFs are plotted for the input channel matrix H (i.e., before
SA) and the output matrix H (i.e., after the SA). These CCDFs for q(H) and
q(H) are obtained through simulation of 108 channel realizations. It is seen that
the performance in terms of orthogonalization measure improves signicantly
for H compared to that for H. This improved orthogonality of q(H) results in
signicantly improved BER performance of LR-MMSE detection compared to
MMSE detection. This is illustrated in Fig. 4.3 for a V-BLAST MIMO system
with nt = nr = 4 and 4-QAM.
The complexity of SA is O(n2t ). This can be explained as follows. The initial-
ization of SA requires the calculation of n2t nt dierent values according to
(4.30) and at most (if the corresponding i,j s are all non-zero) n2t nt dierent
values according to (4.32). Therefore, the initialization step has a complexity
of O(n2t ). At each iteration, the update of T, H, H has a complexity of O(nt ),
resulting in a per-iteration complexity that is linear in nt . The total complexity
of SA is therefore dominated by the initialization step, which is O(n2t ). This
complexity is one order less than the complexity of the matrix inversion in the
ZF/MMSE operation (cubic in nt ). Therefore, the orders of total complexity of
LR-aided ZF/MMSE detection and ZF/MMSE detection are both cubic in nt .
54 MIMO detection
100
101
102
CCDF
103
4x4, before SA
104 4x4, after SA
6x6, after SA
105 6x6, before SA
10x10, after SA
10x10, before SA
106
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
q(H)
Figure 4.2 CCDF of q(H) (before SA) and q(H) (after SA).
100
101
nt = nr = 4
102
4QAM
103
BER
104
105
MMSE
106 LRMMSE
107
0 5 10 15 20 25
Average received SNR (dB)
Figure 4.3 BER performance comparison between MMSE detection and LR-MMSE
detection. nt = nr = 4, 4-QAM.
Sphere decoding is a detection method that obtains the exact ML solution, gen-
erally faster than a brute-force exhaustive search. The ML optimization problem
in (4.5) formulated using the complex-valued system model (4.2) can be written
4.6 Sphere decoding 55
Algorithm
Assume that 2nt 2nr so that there are at least as many equations as unknowns.
The lattice point Hr xr lies inside a sphere of radius d if and only if
d2 yr Hr xr 2 . (4.36)
d2 QH
1 yr Rxr .
2
(4.39)
Further, dening z = QH
1 yr , (4.39) can be written as
2nt
2nt 2
d2 zi ri,j xj , (4.40)
i=1 j=1
where zi is the ith element in z, xj is the jth element in xr , and ri,j is the (i, j)th
element of R. Because of the upper triangular nature of R, the right-hand side
of the inequality in (4.40) can be expanded as
where the rst term depends only on x2nt , the second term depends only on
(x2nt , x2nt 1 ), the third term depends only on (x2nt , x2nt 1 , x2nt 2 ), and so on.
Therefore, a necessary (not sucient) condition for Hr xr to lie inside the sphere
is that
where [.] and [.] denote rounding to the nearest larger element and smaller
element, respectively, in the PAM constellation that spans the lattice. For every
x2nt satisfying (4.43), dening
d22nt 1 = d2 (z2nt r2nt ,2nt x2nt )2 , (4.44)
and
z2nt 1|2nt = z2nt 1 r2nt 1,2nt x2nt , (4.45)
4.6 Sphere decoding 57
go to Step 3;
7. Solution found. Save xr and its distance from yr , d22nt d21 + (z1 r1,1 x1 )2 ;
go to Step 4.
a stronger condition can be found by looking at the rst two terms in (4.41),
which leads to x2nt 1 belonging to the interval
d2nt 1 + z2nt 1|2nt d2nt 1 + z2nt 1|2nt
x2nt 1 . (4.46)
r2nt 1,2nt 1 r2nt 1,2nt 1
where 1 is set to a value close to 1, e.g., = 0.01. If the point is not found,
increase the probability 1 , adjust the radius, and search again. For a point
xr found inside the sphere, Hr xr need not be the closest point to yr . So, when-
ever the algorithm nds a point xr inside the sphere, set the new radius as
d2 = yr Hr xr 2 and restart the algorithm. Such radius updating may be use-
ful in low SNRs, where the number of points inside the initial sphere can be large.
58 MIMO detection
x105
10
9 Sphere Decoder nt=nr, 4QAM
MMSE
Complexity in number of real
8
7
operations
6
5
4
3
2
1
2 4 6 8 10 12 14 16
Number of transmit antennas, nt
Figure 4.4 Sphere decoding and MMSE complexity at 102 BER for V-BLAST MIMO
with nt = nr = 4 and 4-QAM.
100
4x4, MMSE
nt =nr , 4QAM 8x8, MMSE
16x16, MMSE
4x4, SD
101 8x8, SD
16x16, SD
MMSE
SD
BER
102
103
104
0 2 4 6 8 10 12 14 16
Average received SNR (dB)
Figure 4.5 BER performance of sphere decoding and MMSE detection for V-BLAST
MIMO with nt = nr = 4, 8, 16 and 4-QAM.
The choice of radius d being statistical in nature, and H and n being random,
the computational complexity of sphere decoder is a random variable, whose
mean and variance can be computed [14]. Sphere decoding is very ecient in
terms of complexity at high SNRs due to the small search radius. However, it is
inecient at low to moderate SNRs due to the increased search radius [3]. This
References 59
is illustrated in Fig. 4.4, which shows the complexity of SD in the number of real
operations at a BER target of 102 . It is seen that at this target BER (which is
not as small as it would be at high SNRs), the complexity grows exponentially
in nt . LR techniques can be used as preprocessors to sphere decoding in order
to reduce complexity [37]. Several variants of the SD have also been proposed
to reduce complexity. Still, although the SD and several of its low complexity
variants achieve ML performance (Fig. 4.5 shows the BER performance of sphere
decoder in V-BLAST MIMO with nt = nr = 4, 8, 16 and 4-QAM), their com-
plexity in low to moderate SNRs becomes prohibitive beyond 32 real dimensions
[13],[14], making them inadequate for large MIMO systems.
References
[13] E. Viterbo and J. Boutros, A universal lattice code decoder for fading channels,
IEEE Trans. Inform. Theory, vol. 45, no. 5, pp. 16391642, Jul. 1999.
[14] B. Hassibi and H. Vikalo, On the sphere decoding algorithm I. Expected com-
plexity, IEEE Trans. Signal Process., vol. 53, no. 8, pp. 28062818, Aug. 2005.
[15] H. Vikalo and B. Hassibi, On the sphere-decoding algorithm II. Generalizations,
second-order statistics, and applications to communications, IEEE Trans. Signal
Process., vol. 53, no. 8, pp. 28192834, Aug. 2005.
[16] Y. Wang and K. Roy, A new reduced complexity sphere decoder with true lat-
tice boundary awareness for multi-antenna systems, in IEEE ISCAS2005, Kobe,
vol. 5, May 2005, pp. 49634966.
[17] L. G. Barbero and J. S. Thompson, Fixing the complexity of the sphere decoder
for MIMO detection, IEEE Trans. Wireless Commun., vol. 7, no. 6, pp. 2131
2142, Jun. 2008.
[18] P. H. Tan and L. K. Rasmussen, The application of semidenite programming for
detection in CDMA, IEEE J. Sel. Areas Commun., vol. 19, no. 8, pp. 14421449,
Aug. 2001.
[19] , Multiuser detection in CDMA a comparison of relaxations, exact, and
heuristic search methods, IEEE Trans. Wireless Commun., vol. 3, no. 5, pp.
18021809, Sep. 2004.
[20] N. D. Sidiropoulos and Z.-Q. Luo, A semidenite relaxation approach to MIMO
detection for high-order QAM constellations, IEEE Signal Process. Lett., vol. 13,
no. 9, pp. 525528, Sep. 2006.
[21] J. Luo, K. R. Pattipati, P. K. Willett, and F. Hasegawa, Near-optimal multiuser
detection in synchronous CDMA using probabilistic data association, IEEE Com-
mun. Lett., vol. 5, no. 9, pp. 361363, Sep. 2001.
[22] D. Pham, K. R. Pattipati, P. K. Willet, and J. Luo, A generalized probabilistic
data association detector for multiantenna systems, IEEE Commun. Lett., vol. 8,
no. 4, pp. 205207, Apr. 2004.
[23] Y. Kabashima, A CDMA multiuser detection algorithm on the basis of belief
propagation, J. Phys. A: Math. Gen., vol. 36, pp. 11 11111 121, Oct. 2003.
[24] M. N. Kaynak, T. M. Duman, and E. M. Kurtas, Belief propagation over
SISO/MIMO frequency selective fading channels, IEEE Trans. Wireless Com-
mun., vol. 6, no. 6, pp. 20012005, Jun. 2007.
[25] B. M. Hochwald, T. L. Marzetta, and V. Tarokh, Multiple-antenna channel hard-
ening and its implications for rate feedback and scheduling, IEEE Trans. Inform.
Theory, vol. 50, no. 9, pp. 18931909, Sep. 2004.
[26] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, Uplink power eciency of mul-
tiuser MIMO with very large antenna arrays, in Allerton Conf. on Commun.,
Contr., and Comput., Monticello, IL, Sep. 2011, pp. 12721279.
[27] J. Hoydis, S. ten Brink, and M. Debbah, Massive MIMO in the UL/DL of cellular
networks: how many antennas do we need? IEEE J. Sel. Areas in Commun.,
vol. 31, no. 2, pp. 160171, Feb. 2013.
[28] K. V. Vardhan, S. K. Mohammed, A. Chockalingam, and B. S. Rajan, A low-
complexity detector for large MIMO systems and multicarrier CDMA systems,
IEEE J. Sel. Areas Commun., vol. 26, no. 3, pp. 473485, Apr. 2008.
References 61
Local search has grown from a simple heuristic idea into an important and mature
eld of research in combinatorial optimization [1]. When confronted with NP-
hard problems, one can resort to (i) an enumerative method that is guaranteed
to produce an optimal solution, or (ii) an approximation algorithm that runs in
polynomial time, or (iii) some kind of heuristic technique without any guarantee
on the quality of the solution and running time. The rst approach of true
optimization algorithms may become prohibitive due to the problem of size or the
lack of insight into the problem structure. The second approach of polynomial-
time approximation algorithms, though characterizable by performance bounds,
may give inferior solutions. The third approach of heuristics is the preferred
choice for NP-hard problems, as it provides a robust means to obtain good
solutions to problems of large size in a reasonable time. Local search techniques
come under the third approach. Optimum signal detection in MIMO systems
involves the minimization of a certain cost over a discrete signal space, where
the exhaustive enumerative approach becomes prohibitive when the number of
signaling dimensions becomes large. Therefore, the local search approach can be
a considered choice for signal detection in MIMO systems with a large number
of antennas.
An important characteristic of a local search algorithm is its neighborhood
function/denition which guides the search to a good solution. Typically, a local
search algorithm starts with an initial solution (often generated by some other
low complexity algorithm or just generated randomly) and then continually at-
tempts to nd better solutions by searching in the neighborhoods dened by the
neighborhood function. A basic version of local search is based on iterative im-
provement, where the algorithm starts with some initial solution and searches its
neighborhood for a solution of lesser cost. If such a solution is found, it replaces
the current solution and the search continues. Otherwise, it returns the current
solution which is a locally optimal solution. Variants of this iterative algorithm
with dierent neighborhood denitions and stopping criteria oer a rich tradeo
between performance and complexity. Neighborhood denitions depend on the
problem under consideration, and nding suitable and ecient neighborhood
functions/denitions that can lead to high-quality local optima can be viewed
as one of the crucial requirements in local search. Also, the quality of the local
optima reached has been found to depend a lot on the initial solution chosen.
Detection based on local search 63
[ 1 , 1, 1 , 1] , [ 1, 1 , 1 , 1 ], [ 1 , 1, 1 , 1 ] .
*
solution) and continue the search. Tabu search [5],[6] adopts this second strat-
egy. Escape strategies must be devised along with suitable stopping criteria so
that the resulting complexity is not unduly high. In addition to the escape strat-
egy, tabu search uses other features like avoiding already visited solutions using
memory structures (e.g., a tabu list) and allowing revisits after a certain tabu
period to enhance the eciency of the search. Another common way to improve
local search performance is multistart search, where the local search procedure
is run several times, each time starting with a dierent random initial solution
and declaring the best solution among the multiple runs.
Analyzing the performance and complexity of particular local search algo-
rithms is generally not a simple task. It is often dicult to get non-trivial bounds
on the solution quality (in terms of the amount by which the local optimum cost
diers from the optimal cost) and running time of local search. Nevertheless, in
practice, many local search algorithms are known to converge quickly and nd
high-quality solutions. The exibility and ease of implementation of local search
algorithms have resulted in the successful handling of many complex real-world
problems.
The local search approach has been adopted in communication problems as
well. MUD in CDMA [7] is one prominent area where local search techniques
have been widely adopted [8][17]. The problem is to jointly detect the binary
symbols sent by multiple users to the BS. This can be considered as a problem
of optimizing a quadratic objective function (ML cost) with binary constraints
on decision variables referred to as a binary quadratic program (BQP) in the
area of combinatorial optimization. Local search algorithms including 1-opt and
k-opt local search have been proposed for the BQP problem and were found
to be capable of quickly nding near-optimal solutions for large problem sizes
[2]. 1-opt and k-opt searches along with the multistart strategy are found to be
eective for large problems. Since k-coordinate away neighborhoods are obtained
by ipping one or more bits, algorithms using them are also referred to as bit-
ipping algorithms.
In CDMA MUD, 1-opt and k-opt local search have been widely studied and
have been shown to achieve better performance than other detectors [8][15]. MF,
decorrelating (ZF), and decision feedback detectors have been used to generate
the initial solution for the search leading to better solutions than the initial
solutions themselves [10],[12]. Near-single-user performance in CDMA systems
with a large number of users has been reported using LAS, a greedy 1-opt search
[15]. CDMA MUD using tabu search has been reported in [16],[17]. In [18], k-opt
local search has been used for reduced complexity turbo equalization in coded ISI
channels. With the growing popularity of MIMO systems, local search is being
adopted for MIMO detection [19][26].
The rest of this chapter is devoted to illustrating how local search methods
can be eectively exploited to achieve near-optimum signal detection in MIMO
systems with a large number of antennas at aordable complexities. In partic-
ular, the focus will be on local search algorithms that have been established
5.1 LAS 65
5.1 LAS
yr = Hr xr + nr . (5.2)
66 Detection based on local search
Dropping the subscript r for notational simplicity, the real-valued system model
is written as
y = Hx + n, (5.3)
where f (x) = xT HT Hx 2yT Hx is the ML cost.
One-symbol update
Assume that the pth coordinate in the (k + 1)th iteration is updated; p can take
values from 1, . . . , nt for M -PAM and 1, . . . , 2nt for M -QAM. The update rule
can be written as
where ep denotes the unit vector with its pth entry only as 1, and all other
entries as zero. Also, for any iteration k, x(k) should belong to the space S, and
(k)
therefore p can take only certain integer values. For example, in the case of
(k)
4-PAM or 16-QAM both have the same signal set Ap = {3, 1, 1, 3} , p can
take values only from {6, 4, 2, 0, 2, 4, 6}. Using (5.5) and (5.6), and dening
a matrix G as
G = HT H, (5.7)
where sgn(.) denotes the signum function. For the ML cost function to reduce
from the kth to the (k + 1)th iteration, the cost dierence should be negative.
68 Detection based on local search
(k)
Using this fact and that ap and lp are non-negative quantities, we can conclude
(k)
from (5.10) that the sign of p must satisfy
sgn((k) (k)
p ) = sgn(zp ). (5.11)
Using (5.11) in (5.10), the ML cost dierence can be rewritten as
2
F(lp(k) ) = Cpk+1 = lp(k) ap 2lp(k) |zp(k) |. (5.12)
(k)
For F(lp ) to be non-positive, the necessary and sucient condition from (5.12)
is
(k)
2|zp |
lp(k) < . (5.13)
ap
(k)
The value of lp which satises (5.13) and at the same time gives the largest
descent in the ML cost function from the kth to the (k + 1)th iteration when
(k)
symbol p is updated (i.e., greedy choice) can be found. Also, lp is constrained
to take only certain integer values, and therefore the brute-force way to get the
(k) (k) (k)
optimum lp is to evaluate F(lp ) at all possible values of lp . This becomes
computationally expensive as the constellation size M increases. However, for
(k)
the case of one-symbol update, a closed-form expression for the optimum lp
(k)
that minimizes F(lp ) can be found, which is given by
( )
(k)
(k) |zp |
lp,opt = 2 , (5.14)
2ap
where . denotes the rounding operation, i.e., for a real number x, x is the
(k)
integer closest to x. If the pth symbol in x(k) , i.e., xp , were indeed updated,
then the new value of the symbol would be given by
(k)
x(k+1)
p = x(k) (k)
p + lp,opt sgn(zp ). (5.15)
(k+1)
However, xp can take values only in the set Ap , and therefore the possibility
(k+1)
of xp being greater than (M 1) or less than (M 1) needs to be checked. If
(k+1) (k) (k+1)
xp > (M 1), then lp,opt is adjusted so that the new value of xp with the
(k) (k+1)
adjusted value of lp using (5.15) is (M 1). Similarly, if xp < (M 1), then
(k) (k+1) (k)
lp,opt is adjusted so that the new value of xp is (M 1). Let lp,opt be obtained
(k) (k)
from lp,opt after these adjustments. It can be shown that if F(lp,opt ) is non-
(k) (k)
positive, then F(lp,opt ) is also non-positive. Compute F(lp,opt ), p = 1, . . . , 2nt .
Now, let
(k)
s = argmin F(lp,opt ). (5.16)
p
(k)
If F(ls,opt ) < 0, the update for the (k + 1)th iteration is
(k)
x(k+1) = x(k) + ls,opt sgn(zs(k) ) es , (5.17)
z (k+1)
=z (k)
l(k) sgn(zs(k) ) gs , (5.18)
s,opt
5.1 LAS 69
where gs is the sth column of G. The update in (5.18) follows from the denition
(k)
of z(k) in (5.8). If F(ls,opt ) 0, then the one-symbol update search terminates.
The data vector at this point is referred to as 1-symbol update local minimum,
which is the nal output in the 1-LAS algorithm. In the K-LAS algorithm, after
reaching a one-symbol update local minimum, a further decrease in the cost
function is sought by updating multiple symbols simultaneously.
Multiple-symbol updates
The motivation for trying out multiple-symbol updates is as follows. Let LK S
denote the set of data vectors such that for any x LK , if a K-symbol update
is performed on x resulting in a vector x
, then ||y Hx
|| ||y Hx||. Note
that xM L LK , K = 1, 2, . . . , 2nt , because any number of symbol updates on
0K
xM L will not decrease the cost function. Dene another set MK = j=1 Lj .
Note that xM L MK , K = 1, 2, . . . , 2nt , and M2nt = {xM L }, i.e., M2nt is
a singleton set with xM L as the only element. It is noted that if the updates
are done optimally, then the output of the K-LAS algorithm converges to a vec-
tor in MK . Also, |MK+1 | |MK |, K = 1, 2, . . . , 2nt 1. For any x MK ,
K = 1, 2, . . . , 2nt and x = xM L , it can be seen that x and xM L will dier in
K + 1 or more locations. The probability that xM L = x increases with increasing
SNR, and so the separation between x MK and x will monotonically increase
with increasing K. Since xM L MK , and |MK | decreases monotonically with
increasing K, there are fewer non-ML data vectors to which the algorithm can
converge for increasing K. Therefore, the probability of the noise vector n induc-
ing an error decreases with increasing K. This indicates that K-symbol updates
with large K could approach ML performance with increasing complexity for
increasing K.
K-symbol update
2nt
K-symbol updates can be done in ways, among which the update that
K
gives the largest reduction in the ML cost is of interest. Assume that in the (k +
1)th iteration, K symbols at the indices i1 , i2 , . . . , iK of x(k) are updated. Each
ij , j = 1, 2, . . . , K, can take values from 1, 2, . . . , nt for M -PAM and 1, 2, . . . , 2nt
for M -QAM. Further, dene the set of indices, U = {i1 , i2 , . . . , iK }. The update
rule for the K-symbol update can then be written as
K
(k)
x(k+1) = x(k) + ij eij . (5.19)
j=1
(k)
For any iteration k, x(k) belongs to the space S, and therefore ij can take
(k) (k) (k)
only certain integer values. In particular, ij Aij , where Aij = {x|(x +
(k)
xij ) Aij , x = 0}. For example, for 16-QAM, Aij = {3, 1, 1, 3}, and
70 Detection based on local search
(k) (k)
if xij is 1, then Aij = {2, 2, 4}. Using (5.5), the cost dierence function
(k) (k) (k)
CUk+1 (i1 , i2 , . . . , iK ) = C (k+1) C (k) can be written as
K
K
(k) (k)
K
(k) (k)
+ 2 ip iq (G)ip ,iq 2 ij zij , (5.20)
q=1 p=q+1 j=1
U = F1
(k) (k)
U zU . (5.22)
(k)
However, the solution given by (5.22) need not lie in AU . So, rst round o the
solution as
& 1
(k) = 2 0.5(k) ,
(5.23)
U U
(k)
where the operation in (5.23) is done element-wise, since U is a vector. Further,
(k) =
let (k)
[ (k) ]T . It is still possible that the solution
(k) (k) in (5.23)
U i1 i2 iK U
(k) (k+1)
need not lie in AU . This would result in xij
/ Aij for some j. For example,
if Aij is M -PAM, then xij
(k+1)
/ Aij if xij
(k) (k)
+ > (M 1) or x +
(k) (k) <
ij ij ij
(k) for j = 1, 2, . . . , K
(M 1). In such cases, the following adjustment to ij
5.1 LAS 71
can be used:
* (k)
(M 1) xij , (k) + x(k) > (M 1),
when
(k)
= ij ij
(5.24)
ij (k) (k) + x(k) < (M 1).
(M 1) xij , when ij ij
(k) (k)
The update rules for the z and x vectors are given by
K
z(k+1) = z(k) (k) g ,
(5.26)
i
j
ij
j=1
K
x(k+1) = x(k) + (k) e .
(5.27)
i
j
ij
j=1
5.1.3 Complexity
The complexity of the LAS algorithm comprises three main components, namely,
(i) computation of the initial vector x(0) , (ii) computation of HT H, and (iii)
the search operation. For nt = nr , because of the matrix inversion involved, the
complexity of computing the ZF or MMSE initial solution vector is O(n3t ), i.e.,
O(n2t ) per-symbol complexity. Likewise, HT H can be computed in O(n2t ) per-
symbol complexity. From simulations, it has been found that the LAS search
requires an average per-symbol complexity of O(nt ). So the total complexity
of the algorithm is dominated by the initial solution computation rather than
the search operation. The overall average per-symbol complexity is O(n2t ) which
scales well for large MIMO systems.
the jth bit of the ith transmitted symbol is +1 and 1, respectively. So, if
y Hdj j+ 2
i y Hdi is positive (or negative), it indicates that the jth
2
bit of the ith transmitted symbol has a higher likelihood of being +1 (or 1).
So, the quantity y Hdj j+ 2
i y Hdi , appropriately normalized to avoid
2
unbounded increase for increasing nt , can be a good soft value for the jth bit of
the ith symbol. With this motivation, a soft output value for the jth bit of the
ith symbol can be generated as
y Hdj j+ 2
i y Hdi
2
bi,j = , (5.28)
hi 2
where the normalization by hi 2 is to contain unbounded increase of bi,j for
increasing nt . The right-hand side in the above can be eciently computed in
terms of z and G as follows. Since dj+
i and dj
i dier only in the ith entry,
dj j+
i = di + i,j ei . (5.29)
Since dj
i and dj+
i are known, i,j is known from (5.29). Substituting (5.29) in
(5.28),
bi,j hi 2 = y Hdj+ j+ 2
i i,j hi y Hdi
2
2
= i,j hi 2 2i,j hTi (y Hdj+
i ) (5.30)
2
= i,j hi 2 2i,j hTi (y Hdj
i ). (5.31)
2 zi
bi,j = i,j 2i,j . (5.32)
(G)i,i
If bi,j = 1, then dj
i = d, and substituting this in (5.31) and dividing by hi ,
2
2 zi
bi,j = i,j 2i,j . (5.33)
(G)i,i
It is noted that z and G are already available upon the termination of the
K-LAS algorithm, and hence the complexity of computing bi,j in (5.32) and
5.1 LAS 73
(5.33) is constant. Hence, the overall complexity in computing the soft values
for all the bits is O(nt log2 M ) for M -PAM. It is seen from (5.32) and (5.33)
that the magnitude of bi,j depends upon i,j . For large-size signal sets, the
possible values of i,j will also be large in magnitude. Therefore bi,j has to be
normalized for the channel decoder to function properly. For turbo codes,
2 it has
been observed through simulations that normalizing bi,j by i,j /2 results in
good performance. In [22], it was shown that this soft decision output generation
method, when used in large V-BLAST MIMO systems, oers about 11.5 dB
improvement in coded BER performance compared to that achieved using hard
decision outputs from the K-LAS algorithm.
100
101
102
BER
103
Increasing # antennas
improves BER performance
104 nt= nt =1
ZF1LAS, nt = nr = 10
ZF1LAS, nt = nr = 50
ZF1LAS, nt = nr = 100
105 ZF1LAS, nt = nr = 200
ZF1LAS, nt = nr = 400 For large # antennas
ZF1LAS, MF1LAS, MMSE1LAS
perform almost same
106
1 2 3 4 5 6 7 8 9 10
Average received SNR (dB)
(a)
26
ZFSIC
BER target =0.001
Average received SNR required (dB)
24 ZF1LAS
nt=nr
ML
22
20
ZFSIC
18
16
14
ML performance ZF1LAS Clost to
12
SISO AWGN
10 performance
Figure 5.1 BER performance of the 1-LAS algorithm in large V-BLAST MIMO
systems with nt = nr and BPSK: (a) BER vs. average received SNR; (b) average
received SNR required to achieve a target BER of 0.001 vs. number of antennas.
4-QAM are shown. MMSE detector output is used as the initial vector. Hence
the algorithms are referred to as MMSE-1LAS and MMSE-3LAS in the gure.
The unfaded SISO AWGN performance for 4-QAM is also shown as a lower
bound. It is noted that the performance of the MMSE detector is quite poor for
nt = nr = 64, whereas the performance of MMSE-3LAS much better. As ex-
pected, 3-LAS achieves a better performance than 1-LAS in Fig. 5.2(a); however,
5.1 LAS 75
100
MMSE1LAS, nt=nr=32
MMSE3LAS, nt=nr=32
MMSE1LAS, nt=nr=64
MMSE3LAS, nt=nr=64
101 Unfaded SISO AWGN
102
BER
VBLAST, 4QAM
103
104
1050 2 4 6 8 10 12
Average received SNR (dB)
(a)
100
101
103
nt = nr = 16, MMSE3LAS
nt = nr = 32, MMSE3LAS
nt = nr = 64, MMSE3LAS
nt = nr = 64, MMSE
nt = nr = 128, MMSE3LAS
104 nt = nr = 256, MMSE3LAS
Unfaded SISO AWGN
105
0 2 4 6 8 10 12
Average received SNR (dB)
(b)
Figure 5.2 BER performance of 3-LAS algorithm in large V-BLAST MIMO systems
with 4-QAM: (a) 3-LAS vs. 1-LAS performance for nt = nr = 32, 64; (b) 3-LAS
performance for nt = nr = 16, 32, 64, 128, 256.
the improvement is not very signicant. In Fig. 5.2(b), it is seen that 3-LAS ap-
proaches unfaded SISO AWGN performance for increasing values of nt = nr ,
and that 3-LAS, like 1-LAS, is attractive mainly in the hundreds of antennas
regime (e.g., the 3-LAS performances for nt = nr = 128 and 256 are close to
the unfaded SISO AWGN performance). Therefore, it is of interest to consider
alternative ways to create the large dimension advantage of the LAS algorithm
in the tens of antennas regime as well.
76 Detection based on local search
where n = ej2/n , j = 1, and xu,v, 0 u, v n 1 are the data symbols
from a QAM alphabet. When = e 5 j and t = ej , the STBC achieves full
transmit diversity (under ML decoding) as well as information losslessness. When
= t = 1, the code ceases to be of full diversity, but continues to be information
lossless.
Yc = Hc Xc + Nc , (5.35)
k
Xc = x(i) (i)
c Ac , (5.36)
i=1
where xc is the ith complex data symbol and Ac Cnt p is its weight matrix.
(i) (i)
From (5.35) and (5.36), and applying the vec (.) operation
k
vec (Yc ) = x(i) (i)
c vec (Hc Ac ) + vec (Nc ). (5.37)
i=1
If U,V,W,D are matrices such that D = UWV, then it is true that vec (D) =
(VT U) vec (W), where denotes tensor product of matrices. Using this, (5.37)
can be written as
k
c =
where Ip is the p p identity matrix. Further, dene yc = vec (Yc ), H
(i) (i)
(Ip Hc ), ac = vec (Ac ), and nc = vec (Nc ). From these denitions, it is clear
c Cnr pnt p , a(i)
that yc Cnr p1 , H c C
nt p1
, and nc Cnr p1 . Dene a
matrix Hc C nr pk (i)
, whose ith column is Hc ac , i = 1, . . . , k. Let xc Ck1 ,
(i)
whose ith entry is the data symbol xc . With the above denitions, (5.38) can
be written as
k
yc = x(i) (i)
c (Hc ac ) + nc = Hc xc + nc . (5.39)
i=1
78 Detection based on local search
c , xc , and nc
Each element of xc is an M -PAM or M -QAM symbol. Let yc , H
be decomposed into real and imaginary parts as
Further, dene xef f R2k1 , yef f R2nr p1 , Hef f R2nr p2k , and nef f
R2nr p1 as
Now, an equivalent real-valued linear vector channel model for the NO-STBC
MIMO system can be written in the form
The LAS algorithm can be applied on this equivalent linear vector channel model
to detect xef f , and hence the symbol vector xc .
Performance
The uncoded BER performance of 1-, 2-, and 3-LAS algorithms in decoding 44,
8 8, 16 16, 32 32 STBCs from CDA with = t = 1 for nt = nr = 4, 8, 16, 32
and 4-QAM is illustrated in Fig. 5.3(a). The corresponding performance with
the MMSE detector along with the unfaded SISO AWGN performance are also
plotted for comparison. Note that 32 32 STBC has 2048 real dimensions a
problem size that can well exploit the large dimension advantage of LAS. The
MMSE detector performance is found not to improve with increasing STBC size
(i.e., increasing nt = nr ), whereas, the performance of the MMSE-LAS algorithm
(LAS with MMSE detector output as the initial solution) improves for increasing
nt = nr . For example, decoding of 16 16 and 32 32 STBCs (with 512 and
2048 real dimensions, respectively) using LAS achieves a performance very close
to that of unfaded SISO AWGN. With such large dimensions, 1-LAS itself is
found to be adequate to achieve near optimal performance without the need for
2- or 3-LAS. In terms of coded BER performance, from Fig. 5.3(b) it is observed
that 1-LAS with soft output followed by turbo decoding is able to achieve a very
good coded performance which is close to within about 4 dB from the ergodic
MIMO capacity.
In Fig. 5.4, it is observed that LAS when applied to V-BLAST MIMO with
nt = nr = 16 and 4-QAM (32 real dimensions) achieves a performance which
is far from the performance achieved by the SD in the same system. Whereas,
for the same number of antennas and modulation in 16 16 NO-STBC (512
real dimensions), LAS achieves better performance even compared to the SD
performance in 16 16 V-BLAST MIMO. This is because of the availability
of transmit diversity in STBC and the lack of it in V-BLAST. The complexity
5.1 LAS 79
100
NO-STBCs
nr = nt, 4QAM
MMSE
101 (1, 2, 3, 4)
(1): 4x4 STBC, MMSE
(2): 8x8 STBC, MMSE
102 (3): 16x16 STBC, MMSE
(4): 32x32 STBC, MMSE
4x4 STBC, MMSE1LAS
BER
32 x 32 NOSTBC
2
nt = nr = 32, 16QAM
10
BER
103
Min SNR = 11.12 dB
Min SNR = 3.32 dB
104
105
0 5 10 15 20 25 30
Average received SNR (dB)
(b)
Figure 5.3 BER performance of 1-LAS algorithm in large NO-STBC MIMO systems:
(a) uncoded BER performance; (b) coded BER performance.
involved in LAS decoding of 1616 NO-STBC is less than that of sphere decoding
of 16 16 V-BLAST at low to moderate SNRs [23].
Complexity
Two good properties of NO-STBCs from CDA are useful in achieving low orders
(0)
of complexity for the computation of xef f and HTef f Hef f . They are: (i) the
(i)
weight matrices Ac s are permutation type, and (ii) the n2t n2t matrix formed
80 Detection based on local search
100
101
103
105
0 2 4 6 8 10 12
Average received SNR (dB)
(i)
with (n2t 1)-sized ac vectors as columns is a scaled unitary matrix. These
properties allow the computation of the MMSE/ZF initial solution in O(n3t nr )
complexity, i.e., in O(nt nr ) per-symbol complexity (since there are n2t symbols
in one STBC matrix). Likewise, the computation of HTef f Hef f can be done in
O(n3t ) per-symbol complexity.
The average per-symbol complexities of the 1-LAS and 2-LAS search opera-
tions are O(n2t ) and O(n2t log nt ), respectively, which is explained as follows. The
average search complexity is the complexity of one search stage times the mean
number of search stages till the algorithm terminates. For 1-LAS, the number of
search stages is always1. There
are multiple iterations in the search, and in each
2n2t
iteration all possible one-symbol updates are considered. So, the per-
1
iteration complexity in 1-LAS is O(n2t ), i.e., O(1) complexity per symbol. Fur-
ther, the mean number of iterations before the algorithm terminates in 1-LAS
was found to be O(n2t ) through simulations. So, the overall per-symbol com-
plexity of 1-LAS is O(n2t ). In 2-LAS, the complexity of 2the
two-symbol update
2nt
dominates the one-symbol update. Since there are possible two-symbol
2
updates, the complexity of one search stage is O(n2t ), i.e., O(n2t ) complexity per
symbol. The mean number of stages till the algorithm terminates in 2-LAS was
found to be O(log nt ) through simulations [27]. Therefore, the overall per-symbol
complexity of 2-LAS is O(n2t log nt ).
For the special case of information-lossless-only STBCs (i.e., STBCs with
(0)
= t = 1), the complexity involved in computing xef f and HTef f Hef f can
be reduced further. This becomes possible due to the following property of
5.2 RS 81
Local search algorithms can be designed based on random selection methods for
choosing the set of vectors to be tested in a local neighborhood. The RS algorithm
[29] presented in this section is one such algorithm. The algorithm also keeps
track of the symbol positions changed in the previous iterations to improve the
search eciency. Used along with multiple restarts, the RS algorithm achieves
near-optimal performance in large MIMO systems at a complexity of O(n1.4 t ).
5.2.1 RS algorithm
Consider the V-BLAST MIMO system model dened in Section 5.1.1. Given
y and H, the RS algorithm starts with an initial solution vector x(0) , a xed
index set S = {1, 2, . . . , 2nt }, and two dynamic index sets C and D which are
initialized to be empty. The algorithm is iterative where each iteration results
in a solution vector, which, in turn, is used as the input in the next iteration.
The set C is updated only once per iteration, whereas the set D may be updated
multiple times or may not be updated within each iteration. The set C contains
the set of those indices (i.e., symbol positions) where a symbol change in those
positions relative to the solution vector of the previous iteration led to an ML
cost improvement. In other words, in iteration t, the set C adds the index of the
element in x(t) which when changed, improved the ML cost. The set D contains
the set of indices where a symbol change in those positions within an iteration
did not result in an ML cost improvement.
82 Detection based on local search
Step 1
Given an initial solution vector x(t=0) , nd its neighborhood set, N (x(t=0) ).
Step 2
Randomly select an element m from the index set {S C D}. Choose a subset
of vectors from N (x(t) ), denoted by {d(j), j = 1, 2, . . . , |A| 1}, such that the
d(j)s dier from x(t) in the mth position, m {S C D}. It is noted that
j {1, . . . , |A| 1}, since, for each symbol in a given position, there are |A| 1
possible other symbols. Let g(x(t) d(j)) denote the dierence in the ML cost
between x(t) and d(j), i.e.,
g(x(t) d(j)) = f (x(t) ) f (d(j))
= y Hx(t) 2 y Hd(j)2
= x(t)T HT Hx(t) d(j)T HT Hd(j) 2yT H(x(t) d(j)). (5.42)
Let G = HT H, z = HT y, and (j) = g(x(t) d(j)). By denition, d(j) can
be rewritten as
d(j) = x(t) + m em , (5.43)
where em denotes the vector with its mth entry only as 1 and all other entries
as zeros, and m belongs to a set of integers such that d A2nt . For example,
if A = {3, 1, +1, +3}, then the possible integer values that m can take are
{6, 4, 2, 0, 2, 4, 6}. Now, (5.42) can be simplied as
(j) = 2m zm 2m eTm Gx(t) 2m Gm,m , (5.44)
where zm denotes the mth element of z, and Gi,j denotes the element in the ith
row and jth column of G.
Step 3
Compute
$ %|A|1
max = max (j) j=1 (5.45)
j
$ %|A|1
max idx = argmax (j) j=1
. (5.46)
j
5.2 RS 83
It is noted that the maximum number of iterations possible is 2nt , and the size
of the neighborhood set N (x(t) ) decreases by |A| 1 in each iteration.
Multistart RS
Running the above RS algorithm with a random initial vector allows only some
parts of the solution space to be explored. Exploring other parts of the solution
space can yield better solution vectors. This can be achieved by running the RS
algorithm several times (parameterized by L 1, referred to as the number of
restarts) such that each time a dierent part of the solution space is likely
to be explored without increasing the order of complexity. This can be realized
through starting the RS algorithm with a dierent random initial vector each
time; this works as follows:
101
103
104
0 10 20 30 40 50 60 70
Number of restarts, L1
(a)
100
VBLAST MIMO, 4QAM
101
102 8x8
BER
Figure 5.5 BER performance of RS algorithm in large V-BLAST MIMO systems with
4-QAM: (a) BER vs number of restarts; (b) BER vs average received SNR.
number of restarts does not yield signicant BER improvement. For example, at
8 dB SNR, the chosen values for L are: L = 16 for 8 8, L = 24 for 16 16,
L = 32 for 32 32, and L = 32 for 64 64. From Fig. 5.5(b), it is observed that
the RS algorithm achieves almost the same performance as the SD for 16 16
MIMO. The RS performance in 32 32 and 64 64 is also close to unfaded SISO
AWGN performance which is a lower bound on the ML performance. Through
simulations, it has been found that the complexity of the RS algorithm scales as
O(n1.4
t ) [29], which is quite attractive for large MIMO systems.
The performances of the LAS and RS algorithms are close to the optimum in
large MIMO systems for 4-QAM modulation. But their performance for higher-
order modulation is far from the optimum performance. The issue of improving
5.3 RTS 85
Tabu search
The basis for the tabu search can be explained as follows. Given a function (x) to
be optimized over a set S, a tabu search begins the same way as an ordinary local
search, proceeding iteratively from one solution to another until a specied stop-
ping criterion is satised. Each x S has an associated neighborhood N (x) S.
A tabu search diers from an ordinary local search in that it employs the strategy
of modifying N (x) as the search progresses, eectively replacing it by another
neighborhood N
(x). This allows the search to avoid the local minima traps
encountered in an ordinary local search. The use of special memory structures
(e.g., the tabu list) serves to determine N
(x), and hence organizes the way in
which the solution space is explored. The tabu mechanism described above is
one way to determine the solutions admitted to N
(x).
In the following sections, tabu search algorithms and their variants that have
been adopted for MIMO detection and have been shown to achieve near-optimal
performance in large MIMO systems are discussed.
System model
Consider a V-BLAST MIMO system with nt transmit and nr receive antennas.
The transmitted symbols take values from a modulation alphabet A (e.g., M -
QAM/M -PSK). Let x Ant denote the transmitted vector. Let H Cnr nt
denote the channel gain matrix, whose entries are assumed to be iid Gaussian
with zero mean and unit variance. The received vector y is given by
y = Hx + n, (5.47)
where n is the noise vector whose entries are modeled as iid CN (0, 2 ). The ML
detection rule is given by
where
(x) = xH HH Hx 2 yH Hx (5.49)
5.3 RTS 87
Neighborhood denition
Symbol neighborhood Let M denote the cardinality of the modulation alpha-
bet A = {a1 , a2 , . . . , aM }. Dene a set N (aq ), q {1, . . . , M }, as a xed
subset of A\aq , which is referred to as the symbol-neighborhood of aq . Choose
the cardinality of this set to be the same for all aq , q = 1, . . . , M ; i.e., take
|N (aq )| = N, q. Note that the maximum and minimum values of N are M 1
and 1, respectively. Let the symbol-neighborhood denition be based on Eu-
clidean distance, i.e., for a given symbol, those N symbols which are the near-
est will form its neighborhood; the nearest symbol will be the rst neighbor,
the next nearest symbol will be the second neighbor, and so on. An exam-
ple of a symbol-neighborhood with N = 2 for the alphabet shown in Fig. 5.6
is N (a1 ) = {a2 , a3 }, N (a2 ) = {a1 , a3 }, N (a3 ) = {a2 , a4 }, N (a4 ) = {a3 , a2 }.
Likewise, for N = 3, the symbol-neighborhood is N (a1 ) = {a2 , a3 , a4 }, N (a2 ) =
{a1 , a3 , a4 }, N (a3 ) = {a2 , a4 , a1 }, N (a4 ) = {a3 , a2 , a1 }.
a1 a2 a3 a4
{1, 1}. Let wv (aq ), v = 1, . . . , N denote the vth element in N (aq ). Call wv (aq )
the vth symbol-neighbor of aq .
(m) (m) (m)
Vector neighborhood Let x(m) = [x1 x2 xnt ] denote the data vector
(m)
belonging to the solution space in the mth iteration, where xi A. The vector
(m) (m)
z(m) (u, v) = z1 (u, v) z2 (u, v) zn(m) t
(u, v) , (5.50)
is referred to as the (u, v)th vector-neighbor or simply the (u, v)th neighbor of
x(m) , u = 1, . . . , nt , v = 1, . . . , N , if (i) x(m) diers from z(m) (u, v) in the uth
coordinate only, and (ii) the uth element of z(m) (u, v) is the vth symbol-neighbor
(m)
of xu . That is,
*
(m)
(m) xi for i = u,
zi (u, v) = (m) (5.51)
wv (xu ) for i = u.
There will be nt N vectors which dier from a given vector in the solution
space in only one coordinate. These nt N vectors form the neighborhood of the
given vector. As an example, for the symbol-neighborhood denition with N = 2
in Fig. 5.6, the vector-neighbors of a three-element vector x = [a3 a2 a4 ]T are
shown in Fig. 5.7.
Vector-neighbors of x
a3 a2 a4 a3 a3 a3 a3
X = a2 a2 , a2 , a1 , a3 , a2 , a2
a4 a4 a4 a4 a4 a3 a2
Tabu matrix
A tabu matrix T of size nt M N is the matrix whose entries, tr,s , r = 1, . . . , nt M
and s = 1, . . . , N , denote the tabu values of moves. For each coordinate of the
solution vector (there are nt coordinates), there are M rows in T, where each
row corresponds to one symbol in the modulation alphabet A; the indices of
the rows corresponding to the uth coordinate are from (u 1)M + 1 to uM ,
u {1, . . . , nt } (see Fig. 5.8). The N columns of the T matrix correspond to the
N symbol-neighbors of the symbol corresponding to each row. In other words, the
5.3 RTS 89
Search algorithm
Let g(m) be the vector which has the least ML cost found till the mth iteration of
the algorithm. Let lrep be the average length (in number of iterations) between
two successive occurrences of a solution vector (repetitions). The tabu period,
P , a dynamic non-negative integer parameter, is dened as follows: if a move is
marked as tabu in an iteration, it will remain as tabu for P subsequent iterations
unless the move results in a better solution. A binary ag, lf lag {0, 1}, is
used to indicate whether the algorithm has reached a local minimum in a given
iteration or not; this ag is used in the evaluation of the stopping criterion
of the algorithm. The algorithm starts with an initial solution vector x(0) . Set
g(0) = x(0) , lrep = 0, and P = P0 . All entries of the tabu matrix are set to zero.
The following steps (1)(3) are performed in each iteration. Consider the mth
iteration in the algorithm, m 0.
Step (1)
Initialize lf lag = 0. The ML costs of the nt N neighbors of x(m) , (z(m) (u, v)),
u = 1, . . . , nt , v = 1, . . . , N , are computed. Let
(u1 , v1 ) = argmin (z(m) (u, v)). (5.52)
u,v
90 Detection based on local search
The move (u1 , v1 ) is accepted if any one of the following two conditions is satis-
ed:
and check for acceptance of the move (u2 , v2 ). If this also cannot be accepted,
repeat the procedure for (u3 , v3 ), and so on. If all the nt N moves are tabu, then
all the tabu matrix entries are decremented by the minimum value in the tabu
matrix; this goes on till one of the moves becomes acceptable. Let (u
, v
) be the
index of the neighbor with the minimum cost for which the move is permitted.
Make
x(m+1) = z(m) (u
, v
) . (5.56)
The variables q
, q
, v
Step (2)
The new solution vector obtained from Step (1) is checked for repetition. For the
linear vector channel model in (5.47), repetition can be checked by comparing
the ML costs of the solutions in the previous iterations. If there is a repetition,
the length of the repetition from the previous occurrence is found, the average
length, lrep , is updated, and the tabu period P is modied as P = P + 1. If the
number of iterations elapsed since the last change of the value of P exceeds lrep ,
for a xed > 0, make P = max(1, P 1). After a move (u
, v
) is accepted, if
(x(m+1) ) < (g(m) ), make
T((u
1)M + q
, v
) = T((u
1)M + q
, v
) = 0,
g(m+1) = x(m+1) ,
else
T((u
1)M + q
, v
) = T((u
1)M + q
, v
) = P + 1,
lf lag = 1, g(m+1) = g(m) .
It is noted that this Step of the algorithm implements the reactive part in the
search, by dynamically changing P .
5.3 RTS 91
START
C
Compute the initial A B D
solution vector
and make it the current
solution Yes Is any move
non-tabu?
Find the neighborhood
of the current solution
vector No
Make the oldest move performed
Find the best vector as non-tabu
in the neighborhood
Yes Stopping
No
Exclude the vector criterion
A B from the neighborhood D satisfied?
C Yes
END
Step (3)
Update the entries of the tabu matrix as
Stopping criterion
The search algorithm described above is stopped if the maximum number of iter-
ations max iter is reached. If the current solution is a local minimum (lf lag = 1)
and the total number of repetitions of solutions is greater than max rep, the al-
gorithm is also stopped. The solution of the algorithm is then the vector with
the least ML cost which has been found before the algorithm was stopped.
This completes the description of the RTS algorithm. A owchart of the algo-
rithm described above is shown in Fig. 5.9. The algorithm is parameterized by
the parameters listed in Table 5.1.
Parameters Function
max iter Maximum number of iterations allowed.
Used in the stopping criterion.
max rep Maximum number of repetitions allowed.
Used in the stopping criterion.
P0 Initial value of the tabu period.
A positive constant. Used in adaptation
of the tabu period.
0
10
VBLAST, 4QAM 8x8 VBLAST
SNR = 10 dB, MMSE initial vector 16x16 VBLAST
101 32x32 VBLAST
64x64 VBLAST
SISO AWGN
BER
102
103
104
0 50 100 150 200 250 300 350 400 450 500
Maximum number of iterations, max_iter
Figure 5.10 Convergence behavior of the RTS algorithm in large MIMO systems.
Figure 5.11 shows the BER performance of the RTS algorithm in comparison
with that of the LAS algorithm in 16 16, 32 32, and 64 64 V-BLAST MIMO
systems with 4-QAM. It is seen that for the number of dimensions (i.e., nt ) con-
sidered, RTS performs better than LAS; e.g., LAS requires 128 real dimensions
(i.e., 64 64 V-BLAST with 4-QAM) to achieve a performance close to within
1.8 dB of the unfaded SISO AWGN performance at 103 BER, whereas the per-
formance of RTS is closer to that of SISO AWGN with just 32 real dimensions
(i.e., 16 16 V-BLAST with 4-QAM). Also, in 64 64 V-BLAST MIMO, RTS
achieves 103 BER at an SNR that is just 0.4 dB away from SISO AWGN perfor-
mance. RTS is able to achieve this better performance because, while the basic
neighborhood denitions are similar in both RTS and LAS, the inherent escape
strategy in RTS allows it to move out of local minima and move towards bet-
ter solutions. Because of the escape strategy, RTS incurs some extra complexity
compared to LAS.
Complexity
The total complexity of the RTS algorithm comprises three components, namely,
(i) computation the initial solution vector x(0) , (ii) computation of HH H, and
(iii) the RTS operation. The MMSE initial solution vector can be computed in
O(n2t nr ) complexity, i.e., in O(nt nr ) per-symbol complexity since there are nt
symbols per channel use. Likewise, the computations of HH H can be done in
O(nt nr ) per-symbol complexity. Since computations of x(0) and HH H are needed
in both RTS and LAS, complexity components (i) and (ii) will be the same for
both these algorithms. Further, while complexity components (i) and (ii) are
deterministic, component (iii), which is due to the search part alone, is random
and its average complexity is obtained from simulations. Figure 5.12 shows the
94 Detection based on local search
100
VBLAST, 4QAM
102 MMSE initial vector
BER
Figure 5.11 BER performance of RTS and LAS algorithms in 16 16, 32 32, and
64 64 V-BLAST MIMO with 4-QAM.
complexity plots for the search part alone (i.e., component (iii)) as well as the
overall complexity plots of the RTS and LAS algorithms for V-BLAST MIMO
with nt = nr and 4-QAM at a BER of 102 . Figure 5.12 shows that the RTS
search part has a higher complexity than the LAS search part. This is expected,
because the RTS can escape from a local minimum and look for better solutions,
whereas LAS settles in the rst local minimum itself. However, since the overall
complexity is dominated by the computation of HH H and x(0) , the dierence
in overall complexity between RTS and LAS is not high. This low complexity
attribute of the RTS algorithm is attractive for large MIMO signal detection.
24
RTS (overall) VBLAST, 4QAM
RTS (search part)
BER = 0.01
log2(average no. of real operations)
22 LAS (overall)
LAS (search part)
20 a1n3t
a2n2t
18
16
14
12
10
4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 6
log2(nt)
16.5 dB
102
0.5 dB 7.5 dB
103
104
0 5 10 15 20 25 30 35 40 45 50
Average received SNR (dB)
in large MIMO systems with higher-order QAM. Layered RTS (LTS) [25] and
random-restart RTS (R3TS) [26] algorithms which use the RTS algorithm as the
basic core are treated in the next two sections.
96 Detection based on local search
5.3.4 LTS
The LTS algorithm adopts a strategy of detecting symbols in a layered manner,
where the RTS algorithm (presented in the previous section) is applied in each
layer. In each layer, RTS is used to detect a subvector of the transmitted sym-
bol vector. The subvector size is increased from one layer to the next layer. In
addition, the detected subvector in a given layer is used to form the initializ-
ing solution for the search in the next layer. The layered structure was inspired
by previously suggested approaches based on successive cancelation (or decision
feedback) systems, along with the use of QR decomposition for detection and
detection ordering. However, unlike cancelation, the LTS approach can update
the solution vector for all symbols under consideration within the specic layer.
Let U denote the upper triangular matrix obtained from the QR decomposi-
tion of H. Then, the objective equivalent to (5.48) will be to nd the transmitted
vector x which minimizes U(x x)2 , where
x = H y, (5.58)
and H is the MoorePenrose pseudo-inverse of H. Let uij denote the element
in the ith row and jth column of the U matrix, and xi denote the ith element
of the vector x.
The algorithm processes one layer at a time. It starts with the nt th layer
rst. In the kth layer, k = nt , (nt 1), (nt 2), . . . , 1, the algorithm detects the
(nt k + 1)-sized subvector [xk , xk+1 , . . . , xnt ]. The symbols of this subvector
are detected jointly because they interfere with each other due to the structure
of the U matrix. For example, since U is upper triangular, there will be no
interference to the symbol xnt in the nt th layer. In the (nt 1)th layer, there
will be one interferer, xnt . In the (nt 2)th layer there will be two interferers,
xnt 1 and xnt , and so on in the subsequent layers. The joint detection method
employed in each layer is the RTS algorithm described in the previous section.
The complexity can be reduced by skipping the joint detection search in a layer
if a simple cancelation of interference due to the already detected symbols in the
previous layer results in a good quality output. The LTS algorithm based on the
above principles is described below.
Let x be the quantized version of x, i.e., each element in x is rounded o to
its nearest symbol in the alphabet to get x, so that x Ant . Let dmin be the
minimum Euclidean distance between any two symbols in the alphabet A. The
steps performed in the kth layer, k = nt , (nt 1), . . . , 1, are as follows:
Step (1)
Calculate
nt
ukl
rk = xk (xl xl ), (5.59)
ukk
l=k+1
which is a cancelation operation that removes the interference due to the symbols
detected in the previous layer (i.e., xl s). Note that for k = nt (i.e., for the nt th
5.3 RTS 97
layer, which is processed rst), there will be no second term on the right-hand
side in (5.59).
Step (2)
Find the symbol in the alphabet A which is closest to rk in Euclidean distance.
Let this symbol be aq .
(i) If |rk aq | < dmin , 0 < 0.5, then xk = aq (xk is the detected symbol
corresponding to xk ). Make k = k 1 and return to Step (1). Execution
of this part of the step essentially skips the joint detection using RTS. The
nearness of rk to an element in A to within dmin , 0 < 0.5 is used as
the criterion to decide to carry out or skip RTS in layer k. Figure 5.14 shows
the BER performance and complexity (in average number of real operations
per symbol) of the LTS algorithm as a function of for 16 16 V-BLAST
MIMO with 16-QAM at an SNR of 19 dB. It can be seen that the BER
improves as is decreased from 0.5 towards 0. This is because a smaller
means an increased chance of carrying out joint detection using RTS in Step
(2)(ii), which results in improved performance while incurring increase in
complexity.
(ii) If |rk aq | dmin , then set xk = xk . Run the RTS algorithm in Sec-
tion 5.3.1, by replacing x(0) with x(0) , H with H, y with y, where x(0) , H,
y for the kth layer are taken as
The output vector from the RTS algorithm is the updated [xk , xk+1 , , xnt ]
subvector. Make k = k 1 and return to Step (1).
After processing all the nt layers, the vector x = [x1 , x2 , . . . , xnt ] is declared to
be the nal detected data vector. Note that in the RTS algorithm, RTS is carried
out once on the full nt nr system model. Whereas,
in the LTS algorithm RTS is
performed multiple times, once on each layer depending on the eectiveness of
the interference cancelation performed in that layer as per (5.59) . The dimension
of the problem increases by 1 from one layer to the next.
0.012 22
0.006
21
0.004 20.8
20.6
0.002
20.4
0
0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
Figure 5.14 BER performance and complexity of LTS algorithm for 16 16 V-BLAST
MIMO with 16-QAM.
100
LTS, 4x4 VBLAST, 16QAM
LTS, 8x8 VBLAST, 16QAM
ZFSIC, 8x8 VBLAST, 16QAM
MMSESIC, 8x8 VBLAST, 16QAM
101 LTS, 32x32 VBLAST, 16QAM
Unfaded SISO AWGN, 16QAM
LTS, 32x32 VBLAST, 8PSK
102
BER
103
104
5 10 15 20 25 30 35 40 45
Average received SNR (dB)
100
101
64QAM
BER
102
TS w/o layering, 64QAM
16QAM
LTS, no ordering, 64QAM
103 LTS, with ordering, 64QAM
TS w/o layering, 16QAM
LTS, no ordering,16QAM
LTS, with ordering, 16QAM
104
8 10 12 14 16 18 20 22 24 26 28 30
Average received SNR (dB)
SNR of 24 dB to achieve 103 BER for 16-QAM, whereas the LTS algorithm
with ordering achieves the same BER at 19 dB, which amounts to an SNR gain
of 5 dB. For 64-QAM, this SNR gain is even higher.
Figure 5.17 shows a complexity comparison between the LTS and RTS algo-
rithms, where the average number of real operations as a function nt = nr for
16-QAM at 102 BER is plotted. Though the order of complexity for RTS is
100 Detection based on local search
30
22
20
18
16
14
12
4 8 16 32 64
nt
less, the constant is high; at nt = nr = 16 the LTS with ordering has a com-
plexity similar to that of RTS. Also, LTS without ordering has about the same
complexity as RTS for nt = nr = 32; LTS without ordering, however, achieves a
better performance than that of RTS.
5.3.5 R3TS
Using multiple random restarts is an ecient technique for achieving improved
search performance. The idea is to run the basic search algorithm multiple times,
each time with a random initial vector and choose the best among the resulting
solution vectors. By doing so, opportunities to search dierent parts of the so-
lution space are created leading to good solutions. A good strategy to limit the
number of restarts is essential to limit the complexity.
In R3TS, the RTS algorithm is used as the basic search algorithm. Three
parameters (MAX, , p) are dened for the purpose of limiting the number of
searches. The R3TS algorithm works as follows.
Step (1) Choose a random initial vector. Run the RTS algorithm using this
initial vector and obtain the corresponding solution vector.
Step (2) Check if MAX number of RTS searches have been done. If yes, go
to Step (5); else go to Step (3).
Step (3) If the ML cost of the solution vector from Step (1) is less than ,
then output the solution vector from Step (1) as the nal solution vector and
stop; else go to Step (4).
5.3 RTS 101
START
Have
MAX Yes
iterations been
done
?
No
Is
the ML cost of
Yes
solution vector
<
?
No
No Is Yes
L/Kp
?
STOP
Step (4) Let K denote the number of searches done so far. Let L denote the
number of distinct solution vectors from Step (1) so far. If L/K p, go to
Step (5); else go to Step (1).
Step (5) Output the best (in terms of ML cost) among the solution vectors
obtained so far and stop.
The choice of the value of is made as follows. If the solution vector is the same
as the transmitted vector, then the ML cost is n2 , which has a non-central
chi-square distribution with mean nr 2 and variance nr 4 . The value can
be taken empirically to be nr 2 + 2 nr 4 , i.e., the value is taken to be the
mean plus twice the standard deviation of the ML cost variable corresponding to
error-free detection. The threshold comparison in Step (3) reduces the number
of searches and hence the complexity. Also, the motivation to do Step (4) is to
reduce complexity in realizations where n2 happens to be greater than . The
parameter values p = 0.2 and MAX = 50 are found to result in good performance.
A owchart of the R3TS algorithm is shown in Fig. 5.18.
100
16 x 16 VBLAST MIMO R3TS, 4QAM
R3TS, 16QAM
R3TS, 64QAM
101 Conv. RTS, 4QAM
Conv. RTS, 16QAM
Conv. RTS, 64QAM
SD, 4QAM
64QAM SD, 16QAM
102
SD, 64QAM
BER
16QAM
4QAM
103
104
0 5 10 15 20 25 30 35 40 45
Average received SNR (dB)
in the simulations: max rep = 75, max iter = 300, = 0.1, P0 = 2 for
4-QAM, max rep = 250, max iter = 1000, = 0.01, P0 = 2 for 16-QAM,
and max rep = 1000, max iter = 3000, = 0.01, P0 = 2 for 64-QAM. MMSE
initial vector is used in RTS.
In Fig. 5.19, the R3TS performance is compared with the performances of
RTS and the SD. Comparison of R3TS is made with the SD only for 16 16
MIMO, and not for 32 32 and 64 64 MIMO, because of the prohibitively
high complexity of the SD in such large dimensions. So, for 32 32 and 64 64
MIMO, comparison is made between R3TS performance and that of the unfaded
SISO AWGN, which is a lower bound on the ML performance.
Figure 5.19 shows that the performance of the R3TS algorithm almost matches
that of SD in 1616 MIMO for all the modulations considered (4-, 16-, 64-QAM).
It achieves this excellent performance (i.e., almost SD performance) at a much
lower complexity than that of SD. Also, the performance improvement achieved
by R3TS compared to RTS is quite signicant for 16-QAM and 64-QAM (e.g.,
for 64-QAM in 1616 MIMO, R3TS outperforms conventional RTS performance
by about 5 dB at 103 BER). Figures 5.20 and 5.21 show that R3TS performs
very well in 32 32 and 64 64 MIMO as well.
Table 5.2 presents a performance (in terms of SNR required to achieve 102
BER) and complexity (in terms of number of real operations at 102 BER)
comparison between R3TS and RTS 32 32 and 64 64 MIMO with 16- and
64-QAM. At 102 BER, R3TS outperforms RTS by about 5.3 dB in 32 32
MIMO with 64-QAM, and by about 6.6 dB in 64 64 MIMO with 64-QAM, at
additional complexities incurred due to multiple restarts.
5.3 RTS 103
100
32 x 32 VBLAST MIMO SISOAWGN, 4QAM
SISOAWGN, 16QAM
SISOAWGN, 64QAM
101 Conv. RTS, 4QAM
64QAM Conv. RTS, 16QAM
Conv. RTS, 64QAM
102 R3TS, 4QAM
R3TS, 16QAM
BER
104
0 10 20 30 40 50 60
Average received SNR (dB)
100
R3TS, 4QAM
64 x 64 VBLAST MIMO Conv. RTS, 4QAM
SISOAWGN, 4QAM
R3TS, 16QAM
Conv. RTS, 16QAM
SISOAWGN, 16QAM
101 R3TS, 64QAM
Conv. RTS, 64QAM
SISOAWGN, 64QAM
BER
4QAM
103
104
0 5 10 15 20 25 30 35 40 45 50
Average received SNR (dB)
Table 5.2. Complexity and performance comparison between R3TS and RTS algorithms
in 16 16, 32 32, 64 64 V-BLAST MIMO with 16-QAM and 64-QAM
n n
eT S n+ 1 eT S = 1
1 2 2
eM L n+ 1 xT S eM L
x x xT S
n
eT S = 0
1 2
eM L 0 xM L ?
x = xT S
(c)
Figure 5.22 Neighborhood for bounding ML performance using RTS algorithm: (a)
eT S n + 1; (b) eT S = , n, (c) eT S = 0.
xM L is lower bounded by n + 1, i.e., eT S , eM L n + 1 Fig. 5.22(a) . So, in the
simulations, if eT S n + 1 in a given realization, then take eM L as n + 1 as
a lower bound on the number of symbol errors in the ML vector. On the other
hand, if xT S Nx which implies that eT S = , 1 n, then two cases
are possible: (1) xT S is the ML vector, and (2) xT S is not the ML vector. In
eM L = , and in case (2) eT S = and xM L being outside Nx ,
case (1) eT S =
eM L n + 1 Fig. 5.22(b) . So, in the simulations, if eT S = , n, then take
eM L as as a lower bound.
Lastly, if eT S = 0, then xT S = x which may or may
not be the ML vector Fig. 5.22(c) ; in such a realization, take eM L = 0 as a
lower bound. In summary, in the simulations,
which result in a lower bound on the ML symbol error performance. Since the
number of symbol errors is a lower bound on the number of bit errors, it is a bit
error bound as well.
106 Detection based on local search
100
SD
BER
102
ML lower bound, n=1
ML lower bound, n=2
3 ML lower bound, n=3
10
ML lower bound, n=4
Unfaded SISO AWGN
4
10
0 2 4 6 8 10 12
Average received SNR (dB)
Figure 5.23 ML lower bounds using the RTS algorithm for 16 16 V-BLAST MIMO
with 4-QAM.
100
16x16 V-BLAST MIMO
101
102
ML lower bound, 4QAM
BER
5 0 5 10 15 20 25 30
Average received SNR (dB)
performance. It can be noted that, complexity-wise, like the lower bound, the
approximate ML performance is also easily obtained for large nt . In Fig. 5.24,
the ML lower bound for n = 1, the approximate ML performance, and the
SD performance for 16 16 V-BLAST MIMO with 4-, 16-, and 64-QAM are
compared. The approximate ML performance is quite close to the actual ML
performance obtained by sphere decoding even at low SNRs.
References
Algorithms based on PDA can be employed for low complexity MIMO detection.
PDA was originally developed for target tracking in remote sensing applications
like radar, sonar, etc. [1][3]. In these applications, tracking a target (poten-
tially a non-cooperative hostile target) or multiple targets is of interest. Signals
from such targets of interest can be weak. To detect such weak signals, the de-
tection threshold has to be lowered, which may lead to the detection of other
background signals (e.g., signals from other unwanted targets) and sensor noise,
yielding spurious measurements which are clutter or false-alarm originated. Tar-
get tracking loss may result when such spurious measurements are used in a
tracking lter (e.g., a Kalman lter). The selection of which measurements to
use to update the estimate of the target state for tracking is known as data asso-
ciation. Data association uncertainty arises from ambiguity as to which measure-
ments are appropriate for use in the tracking lter. Tracking of targets of inter-
est in the presence of data association uncertainty has been studied extensively.
The goal is to obtain accurate estimates of the target state and the associated
uncertainty.
Optimal estimation of the target state involves recursive computation of the
conditional probability density function (pdf) of the state, given all the infor-
mation available, namely, (i) the prior information about the initial state, (ii)
the intervening known inputs, and (iii) the set of measurements up to that
time. The conditional pdf of the state in optimal estimation has a number of
mixture terms that increases exponentially in time. So several practical sub-
optimal target state estimation algorithms have been proposed. PDA is one
such algorithm. Other algorithms include particle lter, multiple hypothesis
tracker, etc.
A key feature in the PDA approach is that it approximates the conditional pdf
of the target state at every stage as a single Gaussian with moments matched
to the mixture. This repeated conversion of a multimodal Gaussian mixture
probability structure to a single Gaussian with matched mean and covariance is
the source of PDAs suboptimality. However, this Gaussian forcing has been
shown to be attractive in reducing complexity and eective in achieving good
performance in several successful real-world systems that use PDA.
6.1 PDA in communication problems 111
Because of its broad appeal in several application domains, PDA has been put
to good use in solving several communication problems as well. For example,
the PDA approach has been successfully adopted to carry out signal detection,
equalization, and channel estimation/tracking in a variety of communication
systems. PDA is a reduced complexity alternative to the a posteriori probability
(APP) detector/equalizer.
The early and prominent adoption of the PDA approach in communication
problems happened in MUD in CDMA [4][6]. In CDMA, the basic premise
of PDA adoption is the approximation of the inter-user interference (IUI) as
Gaussian with an appropriately modied covariance matrix, and the iterative
update of the probability associated with each user signal. While Gaussian forc-
ing occurs once per scan without any revisit in tracking applications, in CDMA
Gaussian forcing is done for each user and there is iteration. PDA uses a soft
IUI cancelation by increasing the covariance of the eective noise (IUI + noise)
based on the uncertainty in the other user signals. Demonstrated advantages of
PDA include: (i) near-optimal performance, and (ii) O(K 3 ) complexity, where
K is the number of users. Another advantage is that, since the PDA detector
generates APPs directly, it can be applied in coded CDMA for iterative multiuser
decoding with only minor modications [7]. Other related works in MUD that
used the PDA framework have focused on CDMA with QAM [8], asynchronous
CDMA [9],[10], PIC schemes based on PDA [11], reducing complexity by using
soft-decision values from zero-forced output to generate the initial probability
values for PDA [12], connection between PDA and soft interference cancelation
[13], and large system analysis [14].
The PDA approach is increasingly being adopted in MIMO signal processing.
One popular application is in MIMO detection. Symbol based PDA detection
of V-BLAST MIMO signals has been reported using a real-valued [15] as well
as a complex-valued [16],[17] formulation of the detection problem. PDA based
detection has been shown to be equivalent to MMSE based iterative soft-decision
interference cancelation (MMSE-ISDIC) [9],[18]. A PDA-aided sphere decoding
approach has been shown to oer better performancecomplexity tradeo com-
pared to PDA and the SD [19]. The idea is that the dimensionality reduction
achieved through a single-stage PDA preprocessing can provide signicant com-
putational relief to sphere decoding at a small performance cost. Equalization in
MIMO inter-symbol interference (MIMO-ISI) channels is another popular area
where PDA is increasingly employed [20][22]. PDA has also been used as a
frequency domain equalizer to remove inter-symbol interference (ISI) and inter-
carrier interference (ICI) in OFDM without a cyclic prex [23]. Iterative/joint de-
tection and decoding (turbo equalization) [24][26], channel estimation/tracking
[27],[28], decoding of space-frequency block code (SFBC) signals [29], and de-
coding of large dimension NO-STBC signals [30] are other receiver functions
112 Detection based on PDA
where PDA has been successfully employed. PDA has also been used in emerg-
ing areas like MIMO wireless relaying [31] and BS cooperation [32].
It has been observed that PDA performs increasingly well with increasing prob-
lem size [30],[33],[34]. This is because the quality of the Gaussian approximation
used in PDA improves as the number of dimensions (e.g., number of antennas in
MIMO systems) increases. This attribute of PDA makes it well suited for signal
detection in large MIMO systems. The development of the PDA approach for
MIMO detection and its performance in large MIMO systems are discussed next.
denote the transmitted bit vector. Dening c = [20 21 2q1 ], we can write x
as
y = H
(I2k c) b + n, (6.5)
=H
where H R2nr p2qk is the eective channel matrix. The MAP estimate of bit
(j)
bi is then given by
(j) (j)
bi = argmax p bi = a | y, H , (6.6)
a{1}
(j)
2k1
q1
(m)
y= hqi+j bi + hql+m bl + n, (6.8)
l=0 m=0
m =q(il)+j
=n
114 Detection based on PDA
Similarly, j
i can be written as
2k1
q1
j
i = hqi+j + hql+m (2pm+
l 1)
l=0 m=0
m =q(il)+j
= j+
i 2hqi+j . (6.10)
The 2nr p 2nr p covariance matrix, Cji , of y given bji is
9
2k1
q1
(m)
Cji = E n + hql+m (bl 2pm+
l + 1)
l=0 m=0
m =q(il)+j
2k1
q1
T :
(m)
n+ hql+m (bl 2pm+
l + 1) . (6.11)
l=0 m=0
m =q(il)+j
Assuming independence among the constituent bits, Cji in (6.11) can be simpli-
ed as
2k1
q1
Cji = 2 I2nr p + hql+m hTql+m 4pm+
l (1 pm+
l ). (6.12)
l=0 m=0
m =q(il)+j
(j)
Using the above mean and covariance, the distribution of y given bi = 1 can
be written as
j T j 1
e(yi ) (Ci ) (yi )
j
(j)
P (y|bi = 1) = . (6.13)
(2)nr p |Cji | 2
1
(j) (j)
i 1, and 1 otherwise. In coded systems, the i s are fed as soft inputs to
the decoder.
Computation of (Cji )1
Using (6.16) and (6.12), Cji can be written in terms of D as
Cji = D 4pj+ j+
i (1 pi ) hqi+j hqi+j .
T
(6.18)
116 Detection based on PDA
Overall complexity
HHT needs to be computed at the start of the algorithm. This requires O(qkn2r p2 )
complexity. So the computation of the initial D1 requires O(qkn2r p2 )+O(n3r p3 ).
Based on the complexity reduction described above, the complexity in updat-
ing the statistics of one constituent bit is O(n2r p2 ). Hence, the complexity for
the update of all the 2qk constituent bits in an iteration is O(qkn2r p2 ). Since
the number of iterations is xed, the overall complexity of the algorithm is
O(qkn2r p2 ) + O(n3r p3 ). Note that for a nt = nr V-BLAST MIMO system since
p = 1 the per-bit complexity is O(n2t ), which is the same as that of the LAS
algorithm presented in the Chapter 5. However, in the case of NO-STBC since
p = nt PDA has a per-bit complexity of O(n4t ), which is an order higher than
that of the LAS algorithm.
The BER performance of PDA detection in large V-BLAST MIMO systems and
large NO-STBC MIMO systems is illustrated in the following subsections. A
comparison between the performance of the PDA and LAS algorithms is also
illustrated.
6.3 Performance results 117
Initialization
(j)
1. pj+ = pj = 0.5, i = 1, i = 0, 1, . . . , 2k 1, j = 0, 1, . . . , q 1
i i
1
1
2. u = 0, D = HH + 2 I T
11. pj+
i,old = pj+
i , pj j
i,old = pi
j+
(j) pi,old
12. i = j
pi,old
(j) (j) (j)
13. i = i i
(j)
i 1
14. pj+
i = (j)
, pj
i = (j)
1 + i 1 + i
Update of u and D1
15. u u + 2 pj+ pj+ hqi+j
i j+ i,old j+
16. = 4pi 1 pi 4pi,old 1 pj+
j+
i,old
D1 hqi+j hTqi+j D1
17. D1 D1 T
hqi+j D1 hqi+j + 1
18. end; End of for loop starting at line 5
19. if ( = num iter) go to line 21
20. = + 1, go to line 5
(j)
21. bi = sgn log(i ) , i = 0, 1, . . . , 2k 1, j = 0, 1, . . . , q 1
(j)
q1 j (j)
22. xi = j=0 2 bi , i = 0, 1, . . . , 2k 1
23. Terminate
100
m =1
m =2
m =4
m =8
101 SISO AWGN
102
BER
VBLAST MIMO
103 nt = nr = 64, 4QAM
104
105
0 2 4 6 8 10 12 14
Average received SNR (dB)
Figure 6.1 BER performance of PDA detection in a V-BLAST MIMO system with
nt = nr = 64 and 4-QAM for dierent number of PDA iterations (m = 1, 2, 4, 8).
100
nt = nr = 8
nt = nr = 16
nt = nr = 32
nt = nr = 64
nt = nr = 96
101 SISO AWGN
102
BER
VBLAST MIMO
nt = nr , 4QAM, m = 5
103
Performance improves
with increasing nt =nr
104
105
0 2 4 6 8 10 12 14 16
Average received SNR (dB)
102
BER
104
105
0 2 4 6 8 10 12 14 16
Average received SNR (dB)
Figure 6.3 Comparison between the BER performances of the PDA and LAS
algorithms in decoding 4 4, 8 8, 16 16 NO-STBCs with nt = nr and 4-QAM.
one NO-STBC block of nt channel uses, and iid from one NO-STBC block to
another) is made.
4-QAM performance
Figure 6.3 shows the BER performance of the PDA algorithm in decoding 4 4,
8 8, and 16 16 NO-STBCs from CDA with nt = nr and 4-QAM. Note
that the numbers of real dimensions of the transmitted vectors for 4 4, 8 8,
and 16 16 NO-STBCs with QAM are 32, 128, and 512, respectively. For the
same settings, the performance of the 1-LAS algorithm with an MMSE initial
vector is also plotted for comparison. From Fig. 6.3, it can be seen that (i) as in
V-BLAST MIMO, the BER performance in NO-STBC improves as the number
of dimensions is increased, and (ii) with 4-QAM, PDA and LAS algorithms
achieve almost similar performances. A performance close to within about 1 dB
from non-faded SISO AWGN performance is achieved at 103 BER in decoding
16 16 NO-STBC from CDA having 512 real dimensions. This illustrates the
ability of the PDA to achieve good performance at low complexities in large
dimensions.
16-QAM performance
Figure 6.4 presents a BER comparison between the PDA and LAS algorithms
in decoding NO-STBCs from CDA with 16-QAM. It is seen that PDA performs
better at low SNRs than LAS. For example, with 88 and 1616 NO-STBCs, at
low SNRs (e.g., < 25 dB for 16 16 NO-STBC), PDA performs better by about
2 dB compared to LAS at 102 BER. The PDA performance is still far from
120 References
102
BER
103
BER improves
with increasing nt = nr
104
105 5 10 15 20 25 30 35 40 45
Average received SNR (dB)
Figure 6.4 Comparison between the BER performances of the PDA and LAS
algorithms in decoding 4 4, 8 8, 16 16 NO-STBCs. nt = nr , 16-QAM.
optimal for 16-QAM, and there appears to be room to improve PDA performance
for higher-order QAM in large dimensions.
References
[1] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association. San Diego,
CA: Academic, 1988.
[2] Y. Bar-Shalom and X. R. Li, Estimation and Tracking: Principles, Techniques and
Software. Dedham, MA: Artech House, 1993.
[3] Y. Bar-Shalom, F. Daum, and J. Huang, The probabilistic data association lter,
IEEE Cont. Syst. Mag., vol. 29, no. 6, pp. 82100, Dec. 2009.
[4] R. R. Muller and J. B. Huber, Iterated soft-decision interference cancelation for
CDMA, Broadband Wireless Communications, pp. 110115, London: Springer,
1998.
[5] A. Lampe and J. B. Huber, On improved multiuser detection with iterated soft
decision interference cancelation, in IEEE ICC1999, Vancouver Jun. 1999, pp.
172176.
[6] J. Luo, K. R. Pattipati, P. K. Willett, and F. Hasegawa, Near-optimal multiuser
detection in synchronous CDMA using probabilistic data association, IEEE Com-
mun. Lett., vol. 5, no. 9, pp. 361363, Sep. 2001.
[7] P. H. Tan, L. K. Rasmussen, and J. Luo, Iterative multiuser decoding based
on probabilistic data association, in IEEE ISIT2003, Ykohama, Jun.Jul. 2003,
p. 301.
[8] J. F. RoBler and J. B. Huber, Iterative soft decision interference cancelation
receivers for DS-CDMA downlink employing 4-QAM and 16-QAM interference
References 121
[25] S. Liu, M. Zhao, Z. Luo, F. Li, and Y. Liu, V-BLAST architecture employing
joint iterative GPDA detection and decoding, in IEEE ICC2006, Istanbul, Jun.
2006, pp. 42194224.
[26] L. Rugini, P. Banelli, K. Fang, and G. Leus, Enhanced turbo MMSE equaliza-
tion for MIMO-OFDM over rapidly time-varying frequency-selective channels, in
IEEE ICASSP2009, Taipei, Apr. 2009, pp. 3640.
[27] Z. J. Wang, Z. Han, and K. J. R. Liu, A MIMO-OFDM channel estimation
approach using time of arrivals, IEEE Trans. Wireless Commun., vol. 4, no. 3,
pp. 12071213, May 2005.
[28] Y. Jia, C. Andrieu, R. J. Piechocki, and M. Sandell, PDA multiple model ap-
proach for joint channel tracking and symbol detection in MIMO systems, IEE
Proc. Commun., vol. 153, no. 4, pp. 501507, Aug. 2006.
[29] B. Yang, P. Gong, S. Feng, et al. Monte Carlo probabilistic data association
detector for SFBC-VBLAST-OFDM system, in IEEE WCNC2007, Hong Kong,
Mar. 2007, pp. 15021505.
[30] S. K. Mohammed, A. Chockalingam, and B. S. Rajan, Low-complexity near-MAP
decoding of large non-orthogonal STBCs using PDA, in IEEE ISIT2009, Seoul,
Jun.Jul. 2009.
[31] Z. Mei, Z. Yang, X. Li, and L. Wu, Probabilistic data association detectors for
multi-input multi-output relaying system, IET Commun., vol. 5, no. 4, pp. 534
541, Mar. 2011.
[32] S. Yang, T. Lv, R. G. Maunder, and L. Hanzo, Distributed probabilistic-data-
association-based soft reception employing base station cooperation in MIMO-
aided multiuser multicell systems, IEEE Trans. Veh. Tech., vol. 60, no. 7, pp.
35323538, Sep. 2011.
[33] J. Fricke, M. Sandell, J. Mietzner, and P. Hoeher, Impact of the Gaussian approx-
imation on the performance of the probabilistic data association MIMO decoder,
EURASIP J. Wireless Commun. Netw., vol. 5, no. 5, pp. 796800, Oct. 2005.
[34] S. Yang, T. Lv, R. G. Maunder, and L. Hanzo, Unied bit-based probabilistic
data association aided MIMO detection for high-order QAM constellations, IEEE
Trans. Veh. Tech., vol. 60, no. 3, pp. 981991, Mar. 2011.
[35] B. A. Sethuraman, B. S. Rajan, and V. Shashidhar, Full-diversity high-rate space-
time block codes from division algebras, IEEE Trans. Inform. Theory, vol. 49,
no. 10, pp. 25962616, Oct. 2003.
7 Detection/decoding based on
message passing on graphical models
Probability theory and graph theory are two branches of mathematics that are
widely applicable in many dierent domains. Graphical models combine con-
cepts from both these branches to provide a structured framework that supports
representation, inference, and learning for a broad spectrum of problems [1].
Graphical models are graphs that indicate inter-dependencies between random
variables [2],[3]. Distributions that exhibit some structure can generally be repre-
sented naturally and compactly using a graphical model, even when the explicit
representation of the joint distribution is very large. The structure often allows
the distribution to be used eectively for inference, i.e., answering certain queries
of interest using the distribution. The framework also facilitates construction of
these models by learning from data.
In this chapter we consider the use of graphical models in: (1) the representa-
tion of distributions of interest in MIMO systems, (2) formulation of the MIMO
detection problem as an inference problem on such models (e.g., computation
of posterior probability of variables of interest), and (3) ecient algorithms for
inference (e.g., low complexity algorithms for computing the posterior proba-
bilities). Three basic graphical models that are widely used to represent distri-
butions include Bayesian belief networks [4], Markov random elds (MRFs) [5],
and factor graphs [6]. Message passing algorithms like the BP algorithm [7] are
ecient tools for inference on graphical models. In this chapter, a brief survey
of various graphical models and BP techniques is presented. Application of BP
to equalization in MIMO-ISI channels and large MIMO signal detection is also
covered.
x1
x2 x3
x4 x5
p (x1 , x2 , x3 , x4 , x5 ) = p (x1 ) p (x2 |x1 ) p (x3 |x1 ) p (x4 |x2 ) p (x5 |x2 , x3 ) . (7.1)
where P (xi ) denotes the parents of the node pertaining to the variable xi .
The probability of any variable xk , k {1, 2, . . . , N } can then be obtained by
marginalizing the joint probability function over all the other variables, i.e.,
p (xk ) = p (x1 , x2 , . . . , xN ) . (7.3)
x1 xk1 xk+1 xN
where N (xk ) represents the set of all nodes neighboring the node pertaining to
the variable xk .
Usually, the variables in an MRF are constrained by a compatibility function,
also known as a clique potential in the literature. A clique of an MRF is a fully
connected subgraph. A maximal clique is a clique which does not remain fully
connected if any additional vertex of the MRF is included in it. Often, the term
clique is used to refer to a maximal clique. Let there be NC cliques in the MRF,
and let xj be the variables in clique j. Let j (xj ) be the clique potential of clique
7.1 Graphical models 125
x1 x3
x5
x2 x4
For example, consider the MRF shown in Fig. 7.2. There are two cliques in the
MRF, namely, {x1 , x2 , x3 , x4 } and {x3 , x4 , x5 }. The joint probability distribution
is given by
p (x1 , x2 , x3 , x4 , x5 ) = p (x1 ) p (x2 |x1 ) p (x3 |x1 , x2 ) p (x4 |x1 , x2 , x3 ) p (x5 |x3 , x4 ) .
(x1 ,x2 ,x3 ,x4 ) 2 (x3 ,x4 ,x5 )
Pair-wise MRF
An MRF is called a pair-wise MRF if all the cliques in the MRF are of size 2. In
this case, the clique potentials are all functions of two variables. The clique po-
tentials can then be denoted as i,j (xi , xj ), where xi , xj are variables connected
by an edge in the MRF.
Consider a pair-wise MRF in which the xi s denote the underlying hidden
variables on which the observed variables yi s are dependent. Let the dependence
between the hidden variable xi and the explicit variable yi be represented by a
joint compatibility function i (xi , yi ). This scenario is shown in Fig. 7.3. In such
a scenario, the joint distribution of the hidden and explicit variables is given by
p (x, y) i,j (xi , xj ) i (xi , yi ) . (7.6)
i,j i
x1 x2 x3
y1 y2 y3
x4 x5 x6
y4 y5 y6
x7 x8 x9
y7 y8 y9
Figure 7.3 An example of a pair-wise MRF with observed (explicit) variables and
hidden (implicit) variables.
x1 x2 x3 x4 x5
NF
f (x) fj (xj ) , (7.7)
j=1
where fj s are the local functions and xj s are the sets of variables on which the
local functions fj s depend. Every variable node in the factor graph of such a
function denotes the variables in x, whereas every function node represents a
local function.
As an example, again consider the joint probability distribution as given by
(7.1). Let us dene the local functions as
Then, the factor graph that demonstrates the factorization of the joint distribu-
tion in (7.1) is obtained as shown in Fig. 7.4.
7.2 BP 127
x1 x2 x3 x4 x5
f1(x1,x2,x3,x4) f2(x3,x4,x5)
Figure 7.5 A factor graph obtained from the MRF of Fig. 7.2.
7.2 BP
The generalized distributive law (GDL) is an algorithm that solves the margin-
alization of a product function problem by message passing on junction trees
[8]. The sumproduct algorithm is another general message passing algorithm
that achieves marginalization of a global function on a factor graph [6]. The
BP algorithm, the Viterbi algorithm, the BahlCockeJelenikRaviv (BCJR)
algorithm and the turbo decoding algorithm are known to be special cases of
GDL and the sumproduct algorithm [6],[9],[10].
Taking this cue, BP techniques with suitable approximations for large MIMO
detection have been devised and reported in the literature [30],[31].
The message from the function node f to the variable node x is dened as
mf x (x) = f (xf ) myf (y) , (7.9)
x\{x} yN (f )\{x}
in which the outer summation is a marginalization over the variable x. The nal
belief at variable node x is calculated as a product of all incoming messages at
the variable node, i.e.,
b (x) = mhx (x) . (7.10)
hN (x)
7.2.4 Loopy BP
As mentioned before, BP computes the exact beliefs about the variables on
cycle-free graphs. Moreover, in a cycle-free graph, the message passing needs
to be carried out only once, to compute the belief about a variable. However,
in a graph with cycles, one can start with an initial set of beliefs, probably a
set of a priori beliefs about the variables, and iterate the message update rule
of BP. Though there is no guarantee that this will yield the correct beliefs in
a cyclic graph, it has been observed that in many practical cases, loopy BP
does give satisfactory results [9],[32]. It is noted that graphical models of MIMO
systems are fully/densely connected (loopy graphs). Still, as we will see later
in the performance results section, running BP on the loopy graphs of MIMO
systems gives very good performance.
7.2.5 Damped BP
In systems characterized by fully/densely connected graphical models, the BP
algorithm may fail to converge, and if it does converge, the estimated marginals
may be far from exact [33],[34]. However, in the literature there are several
methods that include damping, [32],[35],[36], which can be used to improve things
if BP does not converge (or converges too slowly). Double loop methods [37],[38]
have also been shown to improve the convergence of BP.
In the damping method, messages to be passed are computed as a weighted
average of the previous message (i.e., message in the previous iteration) and
the current message (i.e., message in the current iteration) [32]. As an example,
(t)
consider BP on a pair-wise MRF. Let m i,j (xj ) denote the current message in
iteration t, i.e.,
(t)
(t1)
m i,j (xj ) i (xi ) i,j (xi , xj ) mk,i (xi ) . (7.14)
xi kN (i)\j
L1
yn = H(l) xnl + nn , (7.16)
l=0
where nn CN (0, N0 Inr ) is the additive complex Gaussian noise vector at time
n, Inr is the nr nr identity matrix, and N0 /2 is the noise power spectral density
per real dimension. The channel matrix H(l) is considered to be constant for a
xed quasi-static interval. As a special case, we can consider the quasi-static
interval to be equal to one symbol duration.
Graphical model
The multipath fading channel considered is essentially an ISI channel. Such a
channel can conveniently be represented as a factor graph. The variable nodes
132 Detection/decoding based on message passing on graphical models
1 1
0 1 L1
i (0) (1) (L1) j
Hj,i Hj,i Hj,i
+
Tap delay line between TxRx antenna pair (i,j)
nt nr
Tx antennas Rx antennas
1 2 3 4 5 6 7
y1 y2 y3 y4 y5 y6 y7
Figure 7.7 Factor graph model of a (nt , nr , L) = (2, 4, 3) MIMO system for a block
length of N = 5.
correspond to the transmitted symbols, which essentially are the variables that
we intend to infer. Each function node in the graph corresponds to a time k. Since
the received signal at any time k depends on the past L symbols transmitted
from every transmit antenna, and since there are nt transmit antennas, every
function node is connected to Lnt variable nodes. For detection, consider a block
size of N bits. It is assumed that the L 1 bits transmitted before and after
a block are known, and, without loss of generality, these bits are assumed to
be +1. An example of such a factor graph for (nt , nr , L) = (2, 4, 3), and block
length of 5 is$shown in Fig. 7.7. %
Let Xp xm pt : m = 1, 2, . . . , nt , t = 0, 1, . . . , L 1 be the set of variables
on which the vector received at time p depends. Then, in Fig. 7.7, the local
function at the function node p is given by
f (Xp ) = p (yp |Xp ) , (7.17)
which is the probability distribution of receiving yp given that a variable pattern
Xp was transmitted.
7.3 Application of BP in MIMO an example 133
BP algorithm
BP is carried out on the factor graph by passing messages between the variable
and function nodes. Let us denote the message passed by the function node p
to the variable node pertaining to xm m m
n as Rpn (q). Also, let Qnp (q) be the
m
message passed by the variable node pertaining to xn to the function node p.
The messages at function and variable nodes are computed as follows.
2(n
t L1)
P (xm
n = q|yp ) = n = q, p (j) |yp
P xm
j=1
2(n
t L1)
p xm
n = q, p (j) , yp
=
j=1
p (yp )
(nt L1)
2 p xm
n = q, p (j) , yp
=
j=1
p (yp )
2(n
t L1)
m
p yp |xm
n = q, p (j) P xn = q, p (j)
j=1
posterior probability prior probability
2(n
t L1)
[Aposterior ] [Aprior ]
j=1
Rpn
m
(q) ,
In the rst iteration, the prior probability part can be computed by assuming
equally likely transmit patterns. In subsequent iterations, the prior probability
134 Detection/decoding based on message passing on graphical models
part is updated to reect the latest belief about the variable node as follows:
nt
[Aprior ] = Qm
lp (p,j (m, l)) , (7.19)
n }
m=1 lN (p)\{xm
where N (p) is the set of variable nodes connected to the function node p and
p,j (m, l) is the bit in the transmit pattern corresponding to the pth function
node, jth pattern index, mth transmit antenna, and lth variable node. It can be
noted that the function to variable node messages are as expected as per (7.8).
Final decision
After iterating on the graph for the required number of times, the variable node
pertaining to the symbol xm m
n computes the belief about the event xn = q as
m
Rln (q)
lN (n)
b (xm
n = q) = , (7.21)
m m
Rln (+1) + Rln (1)
lN (n) lN (n)
which is the soft output of the detector and can be hard-limited to get the binary
decision output. In the case of a coded system, the soft output can be directly
fed to the decoder.
7.3 Application of BP in MIMO an example 135
101
102
BER
103
1 Iteration
2 Iterations
3 Iterations
4 Iterations
5 Iterations
104
6 7 8 9 10 11
Average SNR (dB)
100
102
BER
(2,2,2) BP
(2,3,2) BP
104 (2,2,3) BP
(2,4,2) BP
(2,2,4) BP
SISO AWGN
Flat fading
106
0 5 10 15 20
Average SNR (dB)
100
101
BER
102
(2,2,2), BP algorithm
103 (2,2,2) Viterbi algorithm
(2,3,2), BP algorithm
(2,3,2), Viterbi algorithm
(2,4,2), BP algorithm
(2,4,2), Viterbi algorithm
104
0 2 4 6 8 10 12 14
Average SNR (dB)
Figure 7.10 Performance comparison between the BP detector and the Viterbi
algorithm in MIMO-ISI channels with parameters (nt , nr , L) = (2, 2, 2), (2, 3, 2),
(2, 4, 2).
channel coecients. It is unlikely that all four edges that form a cycle have large
weights, and, therefore, a performance loss due to such a length-4 cycle occurs
with only a low probability [26]. (2) The second reason could be the sparsity
in the variable-node graph for large block lengths. Consider the connectivity
matrix, whose row and column headers are variable nodes of the corresponding
factor graph. An entry of the matrix can be either a 1 or a 0. An entry of 1 at
a location in the matrix indicates that the variables indicated by the row and
column headers are connected by an edge in the variable node graph, which is
essentially an MRF. For example, the connectivity matrix for the example of
Fig. 7.7 is given below.
x11 x21 x12 x22 x13 x23 x14 x24 x15 x25
x11 1 1 1 1 1 1 0 0 0 0
x21 1 1 1 1 1 1 0 0 0 0
x12 1 1 1 1 1 1 1 1 0 0
x22 1 1 1 1 1 1 1 1 0 0
x13 1 1 1 1 1 1 1 1 1 1
x23 1 1 1 1 1 1 1 1 1 1
x14 0 0 1 1 1 1 1 1 1 1
x24 0 0 1 1 1 1 1 1 1 1
x15 0 0 0 0 1 1 1 1 1 1
x25 0 0 0 0 1 1 1 1 1 1
As can be seen from this example, the connectivity matrix is a banded matrix,
the band being nt (2L 1) thick. For larger and larger block lengths, this matrix
becomes more and more sparse. At the block length of 1000 considered in the
simulations, the matrix is quite sparse, and this could be another reason why the
performance of the BP based detector is very close to that of the ML detector.
Complexity
The complexity of the BP algorithm is dominated by the computation of function
to variable node messages. Therefore, we can ignore the complexity of comput-
ing the variable to function node messages. Every function node is connected
to nt L variable nodes. Therefore, every function node has to compute nt L mes-
sages at every iteration, which contributes a factor of nt L to the complexity. To
compute each of these messages, it is necessary to carry out a summation over
2(nt L1) terms. The prior probability part [Aprior ] of (7.19) needs a product
of nt L 1 terms to be computed. Therefore, the overall computational com-
plexity is of the order of n2t L2 2(nt L1) . At the end of the required number
of iterations,
the BP algorithm detects nt symbols. Therefore, the complexity
2 (nt L1)
is O nt L 2 per symbol per iteration. In view of the high complex-
ity (exponential complexity in nt and L) of this detector, it is not attractive
for large MIMO systems, even though its performance is close to ML perfor-
mance. In the following, two BP based approaches are presented, one based
138 Detection/decoding based on message passing on graphical models
on a pair-wise MRF and another based on factor graph with scalar Gaussian
approximation of interference, that scale very well (quadratic and linear per-
symbol complexity in nt ) for large MIMO systems while achieving near-optimal
performance.
y = Hx + n, (7.22)
where n CN 0, 2 Inr is an nr 1 vector that represents the additive complex
Gaussian noise.
Graphical model
The considered V-BLAST MIMO system can be conveniently represented as an
undirected graph with a node for every symbol. Since every transmit antenna
is used to transmit a separate symbol, there are nt nodes in such a graph. The
edges of the graph represent the conditional dependence between the symbols.
Since every transmitted symbol interferes with every other transmitted symbol
in V-BLAST, the graph is fully connected. A graphical model for a V-BLAST
MIMO system with nt = 8 is shown in Fig. 7.11.
x1
x8 x2
x7 x3
x6 x4
x5
Figure 7.11 Graphical model of a V-BLAST MIMO system with eight transmit
antennas. The variable node xi represents the symbol transmitted from the ith
antenna.
is then given by
i = argmax p xi = a | y, H ,
x (7.24)
a{1}
Dening R = (1/ 2 )HH H and z = (1/ 2 )HH y, (7.27) can be written as
p(x | y, H) exp {xi Rij xj } exp {xi zi } exp ln p(xi )
i<j i i
= exp xi {Rij }xj exp xi {zi } + ln p(xi ) , (7.28)
i<j i
where zi and Rij are the elements of z and R, respectively, and {.} denotes
the real part of a complex number. Comparing (7.28) and (7.11), it is seen that
the MRF of the MIMO system has pair-wise interactions with the following
potentials:
i,j (xi , xj ) = exp xi {Rij }xj , (7.29)
i (xi ) = exp xi {zi } + ln p(xi ) . (7.30)
where N (i) denotes the set of all nodes neighboring the node i. Equation (7.31)
actually constitutes an iteration, as the message is dened in terms of the other
messages. So, BP essentially involves computing the outgoing messages from a
node to each of its neighbors using the local joint compatibility function and
the incoming messages and transmitting them. The algorithm terminates after
a xed number of iterations. Damping of messages as described in Section 7.2.5
can be carried out in each iteration. The nal belief about the variable xi is
computed as
bi (xi ) i (xi ) mj,i (xj ) , (7.32)
jN (i)
which is the soft output of the detector that can be hard-limited to get the
binary decision output. A complete listing of the MRF BP algorithm described
above is listed in Table 7.1. In the case of a coded system, the soft output of the
algorithm can be directly fed to the decoder.
7.4 Large MIMO detection using MRF 141
Initialization
(0)
1. mi,j (xj ) = 0.5, p(xi = 1) = p(xi = 1) = 0.5, i, j = 1, . . . , nt
(0)
2. i,j (xj )
m = 0.5, i, j = 1, . . . , nt
3. z = (1/ 2 ) HH y; R = (1/ 2 ) HH H
4. for i = 1 to nt
5. i (xi ) = exp xi {zi } + ln(p(xi ))
6. end for
7. for i = 1 to nt
8. for j = 1 to nt , j
= i
9. i,j (xi , xj ) = exp(xi {Ri,j } xj )
10. end for
11. end for
Iterative update of messages
12. for t = 1 to num iter
Damped message calculation
13. for i = 1 to nt
14. for j = 1 to nt , j
= i
(t) ; (t1)
15. m i,j (xj ) xi i (xi )i,j (xi , xj ) kN (i)\j mk,i (xi )
(t) (t1) (t)
16. mi,j (xj ) = mi,j (xj ) + (1 ) m
i,j (xj )
17. end for
18. end for
19. end for; End of for loop starting at line 12
Belief calculation
20. for i = 1 to nt
; (num iter)
21. bi (xi ) i (xi ) jN (i) mj,i (xi )
22. end for
Detection of data bits
i = argmax bi (xi ) , i = 1, . . . , nt
23. x
xi {1}
24. Terminate
7.4.4 Performance
Figure 7.12 shows the BER performance of the above MRF BP detector with
no message damping ( = 0) as a function of average received SNR for vari-
ous nt = nr , BPSK modulation, and ve BP iterations. For comparison pur-
poses, the BER performance over a SISO AWGN channel as well as on a SISO
frequency-at fading channel are plotted. From Fig. 7.12, it is seen that, as
nt = nr is increased, the BER performance improves and gets closer to SISO
AWGN performance for nt = nr . When nt = nr = 300, the BER performance
of the detector at high SNRs is close to that of SISO AWGN, illustrating the
detectors near-optimality in large MIMO systems. The eect of message damp-
ing on the performance is illustrated in Fig. 7.13. = 0 corresponds to the
case of no damping. In Fig. 7.13, damping is seen to signicantly improve MRF
142 Detection/decoding based on message passing on graphical models
100
101
102
BER
103
SISO flat fading
10 10
50 50 BER improves with
104 100 100 increasing nt = nr
300 300
600 600
SISO AWGN
0 2 4 6 8 10
Average received SNR (dB)
Figure 7.12 BER performance of MRF BP detector in V-BLAST MIMO systems for
dierent nt = nr , = 0, and ve BP iterations.
100
nt = nr = 16, ZF
nt = nr = 24, ZF
nt = nr = 16, MMSE
nt = nr = 24, MMSE
nt = nr = 16, MRF BP
101 nt = nr = 24, MRF BP
nt = nr = 16, ML (sphere dec.)
nt = nr = 24, ML (sphere dec.)
BER
102
104
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Message damping factor ()
102
nt = nr = 4, MRF BP
BER
nt = nr = 8, MRF BP
nt = nr = 16, MRF BP
103 nt = nr = 24, MRF BP
nt = nr = 32, MRF BP BER for MRF BP
nt = nr = 4, MMSE improves with
nt = nr = 8, MMSE increasing nt=nr
nt = nr = 16, MMSE
104
nt = nr = 24, MMSE
nt = nr = 32, MMSE
nt = nr = 32, ML (sphere dec.)
SISO AWGN
105
0 1 2 3 4 5 6 7 8 9 10
Average received SNR (dB)
7.4.5 Complexity
In every iteration, each of the nt nodes computes the message to be sent to the
other nodes. As seen from (7.31), computing one such message involves multi-
plying incoming messages from the other nt 1 nodes. This is repeated for both
xi = +1 and xi = 1 and after multiplyingwith i (xi ) i,j (xi , xj ), the results
are added. This involves a complexity of O n2t per iteration. Since nt symbols
are detected at a time, the complexity is O (nt ) per iteration per symbol. How-
ever, the computations of z and R involve complexities of O (nt nr ) and O n2t nr ,
respectively. These computations need be carried out only once before the rst
iteration. Again, since nt symbols are detected at a time, the complexities per
symbol amount to O (nr ) and O (nt nr ), respectively. Assuming the number of
iterations is much less than nr , the complexity order is given by O (nt , nr ) per
symbol. Damping of messages does not increase the order of complexity.
Consider the MIMO system model in (7.22). Treat each entry of the observation
vector y as a function node (observation node) in a factor graph, and each
transmitted symbol as a variable node. The received signal yi can be written as
nt
nt
yi = hij xj + ni = hik xk + hij xj + ni . (7.33)
j=1 j=1,j =k
Interference
144 Detection/decoding based on message passing on graphical models
1i = ({ pij+ } , j 1)
Equations (7.34),(7.32),(7.33)
x1 p1+ x1 y1 k1 pk1+=g({kl}, l 1)
i y1
Equation (7.35)
pi2+ 2i
yi x2 k2 xk pk2+
x2 y2 y2
pnt+
i nt
i knr pkn+r
xnt ynr ynr
xnt
(a) (b)
Figure 7.15 Message passing in the FG BP algorithm with SGA: (a) messages received
at and sent by an observation node; (b) messages received at and sent by a variable
node.
With the motivation of reducing complexity, when computing the message from
the ith observation node to the kth variable node, make the following scalar
Gaussian approximation (GA) of the interference (GAI):
nt
yi = hik xk + hij xj + ni , (7.34)
j=1,j =k
= zik
nt
z2ik = |hij |2 Var(xj ) + 2 . (7.36)
j=1,j =k
For BPSK signaling, the log-likelihood ratio (LLR) of the symbol xk {+1, 1}
at observation node i, denoted by ki , can be written as
p(yi |H, xk = 1)
ki = log
p(yi |H, xk = 1)
4
= (hik (yi zik )) . (7.37)
z2ik
The LLR values computed at the observation nodes are passed to the vari-
able nodes (Fig. 7.15(a)). Using these LLRs, the variable nodes compute the
probabilities
pk+
i = pi (xk = +1|y)
nr
exp( l=1,l kl )
= nr =i , (7.38)
1 + exp( l=1,l =i kl )
7.5 Large MIMO detection using a factor graph 145
Initialization
1. ki = 0, pk+i = 0.5, sk = 0, zik = z2ik = szi = 0,
sz2 = 0, i = 1, . . . , nr , k = 1, . . . , nt
i
2. for t = 1 to num iter
Computation of LLRs at observation nodes
3. for i = 1 to nr
t
szi = n j=1 hij 2pi 1
j+
4.
t
sz2 = 4 n j=1 |hij | pi 1 pj+
2 j+
5. i
i
6. for k = 1 to nt
7. zik = szi hik 2pk+ i 1
8. z2ik = sz2 4|hik |2 pk+
i 1 pk+
i + 2
i
and pass them back to the observation nodes (Fig. 7.15(b)). This message passing
is carried out for a certain number of iterations. Messages can be damped as
described in Section 7.2.5 and then passed. Finally, xk is detected as
nr
k = sgn
x ki . (7.39)
i=1
The algorithm listing is given in Table 7.2. As will be seen from the complex-
ity discussion next, approximating the interference as scalar Gaussian in (7.34)
greatly simplies the computation of messages. Because of this scalar Gaussian
approximation (SGA), the algorithm is referred to as FG BP-SGA algorithm.
146 Detection/decoding based on message passing on graphical models
7.5.2 Performance
Figure 7.16 shows the BER performance of the FG BP-SGA algorithm in nt nr
V-BLAST MIMO with nt = nr = 8, 16, 32 and BPSK. The number of BP
iterations and the message damping factor used are 10 and 0.4, respectively.
It is seen that, like the MRF BP algorithm, the performance of the FG BP-
SGA algorithm also approaches SISO AWGN performance for increasing nt .
Importantly, the FG BP-SGA algorithm achieves this very good performance
with O(nt ) per-symbol complexity compared to the O(n2t ) per-symbol complexity
of MRF BP. This illustrates that the SGA is very attractive in both complexity
and performance for signal detection in large dimensions.
100
101
BER
102
nt = nr = 8, FG BPSGA
nt = nr = 16, FG BPSGA
103
nt = nr = 32, FG BPSGA
nt = nr = 32, ML (sphere dec.)
SISO AWGN
104
0 1 2 3 4 5 6 7 8 9 10
Average received SNR (dB)
t 1
n
y = hi xi + hj xj + n, (7.40)
j=0
j =i
=n
t 1
n
j (1 pj ),
hj hTj 4p+ +
Ci = 2 Inr + (7.41)
j=0
j =i
where p+ j = p(xj = +1). The second term on the right-hand side of (7.41) con-
tains the o-diagonal elements of the covariance matrix, which is proportional
to HHH matrix. It is noted that, as the size of H increases, the diagonal en-
tries of HHH become increasingly larger in magnitude than the o-diagonal
entries, showing the channel hardening behavior discussed in Section 2.2. This
can be observed in Fig. 7.17, where the intensity plots of HHH matrices for
nt = nr = 8, 16, 32, 64 are shown. It can be seen that the diagonal entries of
HHH are relatively more prominent than their o-diagonal counterparts in a
64 64 MIMO channel than in an 8 8 MIMO channel. Therefore, one can
ignore the o-diagonal terms for large values of nt = nr without much penalty.
148 Detection/decoding based on message passing on graphical models
Sample index
Sample index
3 5
6 15
4
5 10 10
4
6
5
7 2 15
8 5 10 15
2 4 6 8 Sample index
Sample index
64x64 MIMO
32x32 MIMO 70
10
40 60
Sample index
5
20
Sample index
50
10 30 30 40
15
20 40 30
20
50 20
25 10
10
30 60
5 10 15 20 25 30 10 20 30 40 50 60
Sample index Sample index
Figure 7.17 Intensity plots of HHH matrices for 8 8, 16 16, 32 32, and 64 64
MIMO channels.
What the SGA in the FG BP algorithm does in (7.34) can be viewed as essentially
amounting to ignoring the o-diagonal terms in the covariance matrix as a fur-
ther approximation, in which case only variance computation is needed. This in
essence gives one order complexity advantage for the SGA in FG BP compared
to the VGA in PDA. What is more interesting is that this complexity advantage
comes without much price in terms of performance loss when the number of
antennas is large. This is illustrated in Fig. 7.18, where there is a noticeable per-
formance gap between VGA and SGA in 8 8 MIMO (due to strong o-diagonal
terms in 8 8), whereas in 64 64 MIMO there is almost no performance gap
between SGA and VGA (due to weak o-diagonal terms in 64 64 MIMO). This
shows that the FG BP algorithm with SGA is very attractive for large MIMO
detection.
The MRF and FG BP algorithms presented above perform well when the modu-
lation alphabet is BPSK. However, these algorithms do not perform quite so well
in higher-order QAM. An attempt to design an MRF based BP detection algo-
rithm for higher order QAM is reported in [39]. This algorithm uses a Gaussian
tree approximation (GTA) to convert the fully-connected graph representing the
7.6 BP with the Gaussian tree approximation (GTA) 149
100
SGA, 8 x 8
VGA, 8 x 8
SGA, 64 x 64
101 VGA, 64 x 64
SISO AWGN
102
BER
103
104
105
0 1 2 3 4 5 6 7 8 9 10
Average received SNR (dB)
Figure 7.18 BER performance of FG BP with SGA versus PDA with VGA in 8 8
and 64 64 V-BLAST MIMO systems.
MIMO system into a tree, and to carry out BP on the resultant approximated
tree. The GTA BP algorithm uses the real-valued system model and works as
follows.
The GTA BP algorithm is based on an approximation of the exact probability
function
1
p(x|y, H) exp 2 Hx y2 , x = (x1 , x2 , . . . , x2nt ) A2nt , (7.42)
2
where A is the underlying PAM alphabet corresponding to a QAM alphabet.
The graph corresponding to (7.42) is fully connected. Applying BP on a fully-
connected graph is exponentially complex and may fail to produce convergence.
Finding an optimal tree approximation of a fully-connected graph would be use-
ful, since BP has been proved to be optimal for trees. In [40], Chow and Liu
proposed a method to nd an optimal tree approximation of a given distribution
that has the minimal KullbackLeibler (KL) divergence to the true distribution.
They showed that the optimal tree can be learned eciently via a maximum
spanning tree whose edge weights represent the mutual information between the
two variables corresponding to the edges end points. The GTA BP algorithm
uses the approximation technique in [40] to construct an approximate tree from
the fully-connected graph corresponding to (7.42), and applies BP on the re-
sulting approximated tree. The listing of the GTA BP algorithm is given in
Table 7.3. The per-bit complexity of the GTA BP algorithm is O(n2t ). The BER
performance of the algorithm in 12 12 V-BLAST MIMO is shown in Fig. 7.19
for 16-QAM. It can be seen that the performance of the GTA BP algorithm is
far from that of the SD, and that further improvements are possible.
150 Detection/decoding based on message passing on graphical models
Algorithm:
1. Compute: z = (HT H + ( 2 /e)I)1 HT y and C = 2 (HT H + ( 2 /e)I)1
2. Denote:
1 (xi zi )2
f (xi ; z, C) = exp
2 2Cii
1 ((xi zi ) Cij /Cjj (xj zj ))2
f (xi | xj ; z, C) = exp
2 Cii Cij2
/Cjj
3. Compute the maximum spanning tree of the n-node graph where the weight
of the edge between the nodes i and j is the square of the correlation coecient,
2 = Cij2
/(Cii Cjj ). Assume the tree is rooted at node x1 and denote the parent
of node i by i .
5. Downward BP message:
Message from variable xi to its parent variable xi is computed based
on all the messages xi received from its children as
;
mii (xi ) = f (xi | xi ; z, C) mji (xi )
xi A j|j =i
If xi is a leaf node in the tree then themessage is simply
mii (xi ) = f (xi | xi ; z, C)
xi A
6. Upward BP message:
Message from a parent variable xi to its child variable xi is computed based
on the messages from its parent xi and from downward messages received
from all the siblings of xi
;
mi i (xi ) = f (xi | xi ; z, C) mi i (xi ) mji (xi )
xi A j|j=i,j =i
7. Belief calculation:
;
For root node belief is: b(xi ) = f (xi ; z, C) xi
j|j =i
;
For other nodes belief is: b(xi ) = mi i (xi ) xi
j|j =i
8. Symbol decoding:
xi = argmax b(a)
aA
7.7 BP based joint detection and LDPC decoding 151
100
12x12 VBLAST MIMO, 16QAM
GTA BP
SD
101
102
BER
103
104
105
15 20 25 30
Average received SNR (dB)
Often, detection and decoding in communications receivers are carried out as two
independent functions. However, processing detection and decoding functions
jointly can lead to improved performance [41]. Also, turbo equalization that
performs detection and decoding in an iterative manner is known to give coded
performance at low complexities [42]. In [42], a receiver that performs detection
and decoding is (i) independently referred to as a Type-B receiver, (ii) iteratively
(between detection and decoding) referred to as a Type-C receiver, and (iii)
jointly (optimal) referred to as a Type-A receiver, as shown in Fig. 7.20.
Decoder Decoder
complexities and that decoding of LDPC codes is also done using message passing
on factor graphs, joint detectiondecoding schemes based on message passing on
an integrated factor graph that combines MIMO constraints as well as LDPC
code constraints can be devised to achieve improved performance. The joint
factor graph formulation would involve two sets of factor nodes, one representing
the received vector over the MIMO channel and the other representing the LDPC
check equations. The marginalization of the joint probabilities is done through
message passing from the dierent sets of nodes and appropriately combining
them at the variable nodes. A message passing scheme on a combined factor
graph for joint detectiondecoding in large MIMO systems and its coded BER
performance are presented in the following subsections.
LDPC decoding
The LDPC decoding algorithm is a message passing algorithm, which gives the
APPs of the coded bits. The LDPC decoder graph is described by the parity
check matrix, F, of dimension n (n k). If b is a valid codeword, then bF = 0.
The graph over which the messages are passed is a bipartite graph, consisting
of n variable nodes corresponding to the coded bits in a block and n k check
nodes corresponding to the check equations. The message passing algorithm for
LDPC decoding can be briey described as follows.
Steps (1) and (2) are repeated until bF = 0 or a certain number of itera-
tions are completed. At the end, decisions on the bits are made based on the
probabilities in Step (3).
p(x | S, y) p(x, S, y)
= p(S | x)p(y | x)p(x), (7.45)
where
nk
p(S | x) = p(Sj | x). (7.46)
j=1
A graph whose joint probability factorizes according to the above equation can
be constructed, and marginalization on this graph gives the probability of the
154 Detection/decoding based on message passing on graphical models
c2 cnk
c1
x1 x2 x3 xn
y1 y2 y3 yMnr
Figure 7.21 Illustration of the joint graph with observation nodes yi , variable nodes xl ,
and check nodes cj .
c1 y1 c1 y1
R1l Ql1 Pl1
1l
Ql(nk) Plnr
R(nk)l nrl
(a) (b)
received symbols. The constructed graph consists of three sets of nodes, namely,
the variable nodes set, the observation nodes set, and the check nodes set. There
are M nr observation nodes corresponding to the received vectors, M nt = n
variable nodes corresponding to the transmitted coded symbol vectors over M
channel uses, and n k check nodes corresponding to the check equations of the
LDPC code. Figures 7.21 and 7.22 illustrate the joint graph and the messages
passed over it.
The messages passed over the graph are: (i) Pli , the message from variable
node xl to observation node yi ; (ii) Qlj , the message from variable node xl to
check node cj ; (iii) Rjl , the message from check node cj to variable node xl ;
and (iv) il , the message from observation node yi to variable node xl , where
l = 1, . . . , n, i = 1, . . . , M nr , j = 1, . . . , n k, m = 1, . . . , M , and n is chosen
such that n = M nt . It can be observed that m = i/nr = l/nt . The message
7.7 BP based joint detection and LDPC decoding 155
il is computed as in (7.37) with only the corresponding H(m) and x(m) . Thus,
il is a function of all Pri , where r {l
| l
/nt = m} \ l.
p(Sj | xl = +1)
Rjl = ln
p(Sj | xl = 1)
;
1 + rNv (j)\l (1 2p(br = 1))
= ln ; , (7.47)
1 rNv (j)\l (1 2p(br = 1))
where Nc (l) is the set of check nodes connected to xl , Nv (j) is the set of variable
nodes connected to cj , and No (l) is the set of observation nodes connected to xl ,
No (l) = {i
| i
/nr = m}. In typical LDPC decoding, the computation of Rjl
is simplied by the use of the tanh(.) function. The messages Pli and Qlj are
computed as given by (7.45) and only the extrinsic information from one set of
nodes is passed to the other. The LLRs or the beliefs of the symbols at the end
of an iteration are given by
Ll = il + Rjl . (7.50)
iNo (l) jNc (l)
100
LDPC code of n=2640 and rate=1/2 Individual det/dec
nt = nr = 64 Iterative det/dec
101 Joint det/dec
102
Coded BER
103
104
105
106
1 2 3 4 5 6
Average SNR (dB)
Type-C receiver, the numbers of local iterations in detection and decoding, re-
spectively, are 1 and 10, with two global (turbo) iterations between detection
and decoding. For detection and decoding, the number of iterations is 10. Belief
damping with a damping factor of 0.2 is used. It can be observed that the turbo
equalizer and joint detectiondecoding perform signicantly better than the in-
dividual detection and decoding scheme, and that the joint detectiondecoding
performance is better than the turbo equalizer.
The total complexity of the factor graph based detection scheme is O(nt nr )
for both the computation of the messages at observation nodes and at variable
nodes [31]. The complexity of the LDPC decoding algorithm requires O(cn)
additions for variable node messages and O(r(n k)) multiplications for check
node messages, where c and r are the column and row weights of the parity
check matrix, respectively. The joint message passing algorithm requires the
same complexity for the computation of messages at observation nodes and check
nodes. The complexity of variable node messages computation is O(nt M (nr M +
c)). The overall complexity therefore scales well for large numbers of antennas.
systems, these specially designed LDPC codes can outperform o-the-shelf LDPC
codes designed for non-fading AWGN channels.
An LDPC code is a linear block code for which the m n parity check matrix
of interest has a low density of ones. A regular LDPC code is one for which the
parity check matrix of interest has a xed number (wc ) of ones in every column
and a xed number (wr ) of ones in every row. Each code bit is involved with wc
parity constraints and each parity constraint involves wr bits. Low density means
wc m and wr n. The number of ones in the parity matrix is wc n = wr m.
So, m n k means the code rate r = k/n 1 (wc /wr ), and thus wc < wr .
An irregular LDPC code is one in which the row weights and/or column weights
of the parity check matrix are not constant. Irregular LDPC codes are known to
generally perform better than regular LDPC codes [45].
A way to construct irregular LDPC codes is to optimize the degree distribu-
tion using either density evolution [46] or EXIT charts [47]. One requirement in
LDPC code design using the EXIT chart approach is knowledge of the EXIT
characteristics of the detector of interest on a given channel. For simple detec-
tors and channels, the EXIT behavior can be analytically characterized in closed
form [48]. When such analytical characterization is not tractable, one can resort
to Monte Carlo simulations to obtain the EXIT curves [47].
One can design irregular LDPC codes for large MIMO systems by rst char-
acterizing the EXIT characteristics of the FG BP-SGA detector in Section 7.5,
then obtaining the EXIT curves for the combination of the detector and de-
coder using knowledge of the EXIT characteristics of the detector, and nally
constructing LDPC codes by matching the EXIT chart of the combined detector
decoder with the EXIT chart of the LDPC check node set. That is, the ability
to generate EXIT curves for the joint message passing detectordecoder for a
large number of antennas can be exploited for the purpose of designing ecient
LDPC codes for large MIMO systems. Such an EXIT chart based LDPC code
design approach applied in a large MIMO system model in Section 7.7.1 with
QPSK modulation is described in the following subsections.
where is the average SNR at the receiver. To obtain the EXIT curve, the input
a priori LLR value A is modeled as A N (xA , A 2 2
) such that A = A /2 and
x {+1, 1}. Then IA,det can be written as [48]
IA,det (A ) = I(X, A)
/ +
1 2PA (z|X = x)
= PA (z|X = x) log2 dz
2 x=1,1 PA (z|X = 1) + PA (z|X = 1)
= J(A ). (7.52)
/ +
1
IE,det (A ) = PE (z|X = x)
2 x=1,1
2PE (z|X = x)
log2 dz. (7.53)
PE (z|X = 1) + PE (z|X = 1)
The above extrinsic information depends on the detector used. For the considered
system and the FG BP-SGA detector, the analytical evaluation of IE,det in (7.53)
is a dicult task. For simplicity, one can resort to the computation of PE (z|X =
x) through Monte Carlo simulations. The EXIT curves can be computed by
simulating (7.53) in conjunction with the simulation of the FG BP-SGA detection
algorithm for various values of received SNRs. Figure 7.24(a) shows the EXIT
curves obtained for the FG BP-SGA detector for a 16 16 MIMO system for
dierent SNR values, and Fig. 7.24(b) shows the EXIT curves of the FG BP-
SGA detector for nt = nr = 16, 32, 64, 128, 256 MIMO congurations at 3 dB
SNR.
7.8 Irregular LDPC codes design for large MIMO 159
1
0.9
0.8
0.7
0.6
IE,det
0.5
SNR = 2 dB
0.4 SNR = 3 dB
0.3 SNR = 4 dB
SNR = 5 dB
0.2 SNR = 6 dB
0.1
0
0 0.2 0.4 0.6 0.8 1
IA,det
(a)
1
0.9
0.8
0.7
0.6
nt=nr =16
IE,det
0.5
nt=nr =32
0.4 nt=nr =64
0.3 SNR = 3 dB nt=nr =128
nt=nr =256
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
IA,det
(b)
Figure 7.24 IE,det versus IA,det EXIT curves of FG BP-SGA detector in large MIMO
systems: (a) dierent SNRs; (b) dierent nt = nr .
is evaluated as [47]
where IE,Joint is the extrinsic information and IA,joint is the a priori information
at the block formed by the combination of the variable nodes of the FG BP-
SGA detector and LDPC decoder. Figure 7.25 illustrates the transfer of mutual
information between the combined observation node/variable node (ON/VN)
block and the LDPC check node block of the joint detector/decoder.
IE,JV
Combined ON/VN block Joint IA,CN
p(yi|H,xl=+1) FG BP-SGA / LDPC LDPC
FG BP-SGA p(y i|H,xl=1) variable nodes CN block
y
observation nodes {x1,..,xl,..,xn} IE,CN {c1,..,cj,..,cm}
{y1,..,yi,..,yn} IA,joint
p(xl=1|y,S)
x
p(xl=+1|y,H,S)
*S: Set of check node equations; ON: Observation nodes; VN: Variable nodes p(xl=1|y,H,S)
The EXIT curves for the combined detectordecoder for nt = nr =16, 32, 64,
128, 256 MIMO systems at 3 dB SNR evaluated using the EXIT curves of the
FG BP-SGA detector and the LDPC variable nodes are shown in Fig. 7.26(a).
The decoding trajectory for dv = 3, dc = 6, nt = nr = 16 and 3 dB SNR is
plotted in Fig. 7.26(b). It is necessary that the EXIT curve of the CN set lies
below the EXIT curve of the combined variable nodes set for the decoder to
converge, which must be satised by the code design.
1
0.9
0.8
0.7
0.6
IE, joint
0.1
0
0 0.2 0.4 0.6 0.8 1
IA, joint
(a)
1
0.9
0.8
0.7
IE,joint , IA,CN
0.6
0.5
0.4 CN EXIT curve
0.3 Joint EXIT curve
Figure 7.26 (a) EXIT curves dv = 3, dierent nt = nr and (b) mutual information
trajectory of joint detectordecoder for large MIMO systems at SNR = 3 dB,
dv = 3, dc = 6, 16 16 MIMO.
then given by
Dv
dv = E(dv ) = piv div = pTv dv , (7.55)
i=1
number of edges incident at the variable nodes is the same as the number of
edges incident at the CNs, ndv = (n k)dc for a xed dc . Thus, dv = (1 R)dc .
The probability that an edge is connected to a variable node of degree div is
pive = npiv div /ndv . For a xed dc , let D = (1/(1 R)dc ) diag(d1v , . . . , dD
v ). Hence,
v
162 Detection/decoding based on message passing on graphical models
pve = [p1ve p2ve pD
ve ] = Dpv , and pve 1 = 1. Since the EXIT curve of a mixture
v
of codes is the same as the average of the individual codes, the eective EXIT
curve of the mixture of codes with varying variable node degree is
ef f
Dv
IE,joint (, IA,joint ) = pive IE,joint (, div , IA,joint , ). (7.56)
i=1
Since a closed-form analytical expression for IE,det , and hence for IE,joint , is
unavailable, IE,joint is evaluated at N dierent points, using which (7.56) is
written in the form
(n k)pjc djc
pjce = , pce = [p1ce p2ce pD
ce ].
c
(7.59)
(n k)dc
Let R
= 1 R. So dv becomes dv = R
dc . To start the optimization, x either
dv or dc . The eective EXIT curve for a varying CN degree is
ef f
Dc
IA,CN (IE,CN ) = pjce IA,CN (djc , IE,CN ). (7.60)
j=1
Let QA,CN be an N Dc matrix whose element in the gth row and jth column
g g
is IA,CN (djc , IE,CN ), where IE,CN is the gth IE,CN value, g = 1, . . . , N . Write
(7.60) in the form
div 2 8 10 16 18 24
piv 0.3928 0.1990 0.2160 0.0505 0.1244 0.0173
dic 8 12 24 64
pic 0.6216 0.2072 0.0604 0.1108
Dc
dc ). Then, pve = Kv Vpv , pce = Kc Cpc , where the scalars Kv and Kc are given
by Kv = 1/dv (for xed dv ) and Kv = 1/R
dc (for xed dc ), and Kc = R
/dv (for
xed dv ) and Kc = d1c (for xed dc ). qA,CN and qE,joint need to be matched.
Thus, the optimization problem is to nd pv and pc such that {Kv QE,joint Vpv
Kc QA,CN Cpc } is minimized, subject to pv 1 = 1, pc 1 = 1, Vpv 1 = 1/Kv ,
Cpc 1 = 1/Kc , pic 0, and piv 0. Let
pv
p= .
pc
The optimization problem can then be reduced to
= argmin {[Kv QE,JV V Kc QA,CN C] p} ,
p (7.62)
p
102
Coded BER
103
104
105
106
3 4 5 6 7 8 9
Average SNR in dB
(a)
102
103
104
105
5 6 7 8 9 10 11 12
Average SNR in dB
(b)
Figure 7.27 Coded BER of (a) rate-1/2 LDPC codes with n = 4000 for 16 16
and 64 64 MIMO systems with the joint detectordecoder, (b) rate-3/4 LDPC
codes with n = 2048 for 16 16 and 256 256 MIMO systems with the joint
detectordecoder.
a performance improvement of about 1.8 dB over the regular code in [50] and
1.2 dB over the irregular code in [45], at a coded BER of 105 . Likewise, the
performance improvement in 64 64 MIMO is 1.6 dB over the regular code in
[50] and 1 dB over the irregular code in [45]. Figure 7.27(b) shows a performance
comparison between the designed LDPC codes and the irregular LDPC code in
WiMax standard [51] for rate-3/4 and n = 2048 in 16 16 and 256 256 MIMO.
Here again, it is observed that the designed irregular LDPC codes perform better
by about 2 dB than their regular counterparts and by 1 dB than the irregular
References 165
References
[1] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Tech-
niques. Cambridge, MA: The MIT Press, 2009.
[2] B. J. Frey, Graphical Models for Machine Learning and Digital Communication.
Cambridge, MA: MIT Press, 1998.
[3] J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding belief propagation
and its generalizations, in Exploring Articial Intelligence in the New Millen-
nium, G. Lakemeyer and B. Nebel, Eds. San Mateo, CA: Morgan Kaufmann, 2002,
ch. 8.
[4] D. Heckerman and M. P. Wellman, Bayesian networks, Commun. ACM, vol. 38,
pp. 2730, 1990.
[5] D. Grieath, Introduction to Markov Random Fields, in Denumerable Markov
Chains, J. G. Kerney, J. L. Snell and A.W. Knupp, Eds., second edition. New York:
Springer-Verlag, 1976, pp. 425485.
[6] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-
product algorithm, IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498519,
Feb. 2001.
[7] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible In-
ference. San Mateo, CA: Morgan Kaufmann, 1988.
[8] S. M. Aji and R. J. McEliece, The generalized distributive law, IEEE Trans.
Inform. Theory, vol. 46, no. 2, pp. 325343, Mar. 2000.
[9] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, Turbo decoding as an instance
of Pearls belief propagation algorithm, IEEE J. Sel. Areas Commun., vol. 16,
no. 2, pp. 140152, Feb. 1998.
[10] N. Wiberg, Codes and decoding on general graphs, in Ph.D. dissertation, Linkop-
ing University, 1996.
[11] D. J. C. MacKay and R. Neal, Good codes based on very sparse matrices, in
5th IMA Conf. Cryptography and Coding, vol. 1025. Berlin, Germany: Springer
Lecture Notes in Computer Science. Berlin: Springer, 1995, pp. 100111.
[12] R. G. Gallager, Low density parity check codes, IRE Trans. Inform. Theory,
vol. IT-8, no. 2, pp. 2128, Jan. 1962.
[13] R. M. Tanner, A recursive approach to low complexity codes, IEEE Trans.
Inform. Theory, vol. IT-27, no. 5, pp. 533547, Sep. 1981.
[14] G. Berrou, A. Glavieux, and P. Thitimajshima, Near shannon limit error-
correcting coding: Turbo codes, in IEEE ICC1993, Geneva, May 1993, pp. 1064
1070.
[15] D. J. C. MacKay, Good error-correcting codes based on very sparse matrices,
IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 399431, Feb. 1999.
166 References
[49] A. Ashikhmin, G. Kramer, and S. ten Brink, Extrinsic information transfer func-
tions: A model and two properties, in Conf. Inform. Sci. and Sys. (CISS2002),
Prinston, Mar. 2002, pp. 742747.
[50] D. J. C. MacKay, Encyclopedia of sparse graph codes, Online: http://www.
inference.phy.cam.ac.uk/mackay/codes/data.html.
[51] Air interface for xed and mobile broadband systems, in IEEE P802.16e Draft,
2005.
8 Detection based on MCMC
techniques
In large systems where the physical laws governing system behavior are inher-
ently probabilistic and complicated, traditional methods of obtaining closed-form
analytic solutions may not be adequate for the level of detailed study needed.
In such situations, one could simulate the behavior of the system in order to
estimate the desired solution. This approach of using randomized simulations
on computers, which came to be called the Monte Carlo methods, is powerful,
elegant, exible, and easy to implement. From the early days of their application
in simulating neutron diusion evolution in ssionable material in the late 1940s,
Monte Carlo methods have found application in almost every discipline of science
and engineering. Central to the Monte Carlo approach is the generation of a series
of random numbers, often a sequence of numbers between 0 and 1 sampled from a
uniform distribution, or, in several other instances, a sequence of random numbers
sampled from other standard distributions (e.g., the normal distribution) or from
more general probability distributions that arise in physical models. More sophis-
ticated techniques are needed for sampling from more general distributions, one
such technique being acceptancerejection sampling. These techniques, however,
are not well suited for sampling large-dimensional probability distributions. This
situation can be alleviated through the use of Markov chains, in which case the
approach is referred to as the Markov chain Monte Carlo (MCMC) method [1].
Typically, MCMC methods refer to a collection of related algorithms, namely,
the MetropolisHastings algorithm, simulated annealing, and Gibbs sampling.
Before introducing these MCMC algorithms, a brief introduction to Monte Carlo
integration and Markov chains is presented in the next two sections. Subsequently,
large MIMO detection algorithms based on MCMC techniques are presented.
The Monte Carlo approach was originally developed to use random number gen-
eration to compute complex integrals. The basic idea is to view integration of a
function as an expectation over a certain distribution. Suppose we are interested
in computing the complex integral
/ b
h(x)dx. (8.1)
a
170 Detection based on MCMC techniques
If h(x) can be decomposed as a product of a function f (x) and a PDF p(x) dened
over the interval (a, b), then the integral can be expressed as an expectation of
f (x) over the density p(x), i.e.,
/ b / b
h(x)dx = f (x)p(x)dx = Ep(x) [f (x)]. (8.2)
a a
Often, m = Ep(x) [f (x)] is analytically intractable, i.e., the integration or sum-
mation required is too complicated. In such cases, a Monte Carlo estimate of m
is obtained by
1
N
m f (Xn ). (8.3)
N n=1
can be approximated by
1
N
(y) = f (y|Xn ), (8.5)
N n=1
where Xn s are draws from p(x).
In situations where estimating properties of a particular distribution is of in-
terest, but only samples from a dierent distribution rather than the distribution
of interest can be generated, the importance sampling technique can be used.
Suppose the density p(x) roughly approximates the density of interest q(x). Then
/ /
q(x) f (x)q(x)
f (x)q(x)dx = f (x) p(x)dx = Ep(x) . (8.6)
p(x) p(x)
This forms the basis of the importance sampling technique with
/
1 f (Xn )q(Xn )
N
f (x)q(x)dx , (8.7)
N n=1 p(Xn )
=
where Xn s are draws from the approximating density p(x). Likewise, f (y|x)
q(x)dx can be approximated by
/
1 f (y|Xn )q(Xn )
N
f (y|x)q(x)dx . (8.8)
N n=1 p(Xn )
Importance sampling can be used for variance reduction in Monte Carlo meth-
ods. Variance reduction procedures are aimed at increasing the accuracy of the
8.2 Markov chains 171
estimates that can be achieved for a given number of iterations. The idea be-
hind importance sampling is that certain values of the input random variables in
a simulation have more impact than others on the parameter being estimated.
The variance of the estimator can be reduced if these important values are
emphasized in the simulations. This is done by choosing a distribution which en-
courages the important values. Choosing or designing a good biased distribution
is key in importance sampling. The benet of a good distribution can be huge
simulation run-time savings. Direct use of such biased distributions results in a
biased estimator. However, the simulation outputs can be weighted to correct
for the use of the biased distribution to get an unbiased estimate.
Let Xt denote a random variable at time t, and the possible values that it can take
form a nite or countable set called the state space S = {sj : j = 1, 2, . . . , K},
K . A Markov chain is a sequence of random variables X0 , X1 , X2 , . . . with
the Markov property, namely,
p(Xt+1 = sj |Xt = si , Xt1 = sit1 , . . .) = p(Xt+1 = sj |Xt = si ), (8.9)
for all sj , si , sit1 , . . . S and t = 0, 1, 2, . . ., i.e., given the current state, the
future and past states are independent. The probability
pij = p(Xt+1 = sj |Xt = si ), si , sj S, (8.10)
is called the transition probability from state si to state sj (the probability
that the chain moves from state si to state sj in a single step). For every i,
K
j=1 pij = 1. A state si is called an absorbing state if pii = 1, i.e., once the
chain enters into this state, it will not come out of it.
Let j (t) = p(Xt = sj ) denote the probability that the chain is in state sj at
time t, and (t) denote the K-length vector of these state probabilities at time t.
The chain is started by specifying a starting vector (0) drawn from some initial
K
distribution = {j : j = 1, 2, . . . , K}, i.e., p(X0 = sj ) = j and j=1 j = 1.
The probability that the chain is in state sj at time (or step) t + 1 is given by
j (t + 1) = p(Xt+1 = sj )
= p(Xt+1 = sj |Xt = si ) p(Xt = si )
i
= pij i (t). (8.11)
i
Successive iterations of the above equation describe the evolution of the chain.
Dening the transition probability matrix P as a K K matrix whose (i, j)th
element is pij , the evolution equation (8.11) can be compactly written in matrix
form as
(t + 1) = (t)P. (8.12)
172 Detection based on MCMC techniques
which is the probability that the chain visits state sj in nite time if it starts in
state si . In particular, fii is the probability of returning to the starting state si
in nite time. A state sj is said to be
transient if fjj < 1,
recurrent if fjj = 1, and
absorbing if pjj = 1.
If state sj is recurrent, then it is said to be positive recurrent if the mean time
between revisits is nite, i.e.,
(n)
nfjj < . (8.17)
n=1
be aperiodic, which means that the chain is not forced into some cycle of xed
length between certain states. It can be seen that if P has no eigenvalues equal
to 1 then the chain is aperiodic.
(n)
The limiting probability limn pjj may or may not converge. For a transient
(n)
or null recurrent state sj , limn pjj = 0, i.e., the probability of the chain
being in state sj eventually goes to zero. If state sj is positive recurrent and
(n)
periodic, then limn pjj will not converge. If sj is positive recurrent and
(n)
aperiodic, then limn pjj will converge to steady state probability j > 0. A
positive recurrent and aperiodic Markov chain reaches a stationary distribution
, where the vector of probabilities of being in any particular given state is
independent of the initial condition (0). The stationary distribution satises
= P. (8.19)
A sucient condition for a unique stationary distribution is that the detailed
balance or time reversibility condition, namely, i pij = j pji , i, j, is satised.
(4) If the jump increases the density (i.e., if > 1), then accept the candidate
value (i.e., set xt = x ) with probability 1, and return to Step (2). If the
jump decreases the density (i.e., if < 1), then accept the candidate value
with probability and reject with probability 1 , and return to Step (2).
In summary, we see that Metropolis sampling is a procedure that computes
f (x )
= min ,1 (8.21)
f (xt1 )
in each step and accepts the candidate value x with probability . This proce-
dure generates a Markov chain (x0 , x1 , x2 , . . . , ), as the transition probabilities
from xt to xt+1 depend only on xt and not on (x0 , x1 , . . . , xt1 ). After a su-
ciently long burn-in period of, say, k steps, the chain approaches its stationary
distribution and the samples from the vector (xk+1 , . . . , xk+N ) are samples from
p(x). Hastings [4] generalized the Metropolis algorithm by using an arbitrary
transition probability function q(x1 , x2 ) = Pr(x1 x2 ) and taking the accep-
tance probability as
f (x )q(x , xt1 )
= min , 1 . (8.22)
f (xt1 )q(xt1 , x )
This is the MetropolisHastings algorithm. The Metropolis algorithm results
when the proposal distribution is symmetric, i.e., q(x, y) = q(y, x). It can be
shown that MetropolisHastings sampling generates a Markov chain whose sta-
tionary density is p(x). The chain is said to be poorly mixing if the value of x
remains at for long periods in the evolution of the chain. For example, this could
correspond to the situation where several consecutive x values are rejected in
the acceptreject test. On the other hand, if the value of x varies signicantly
over iterations, then the chain is said to be mixing well.
value. A number of standard distributions can be used for g(y). Since g(x) is not
generally equal to g(y), i.e., the proposal distribution in this case is not generally
symmetric, MetropolisHastings sampling has to be used. The proposal distri-
bution can be tuned to adjust the mixing (in particular, the accept probability).
For example, this is generally done by (i) adjusting the variance/eigenvalues
of the covariance matrix if a normal/multivariate normal distribution is used,
(ii) changing the range if a uniform distribution is used, and (iii) changing the
degrees of freedom if the chi-square distribution is used.
where T is called the temperature and T (t) is called the cooling schedule. Metropo-
lis sampling becomes a special case for T (t) = 1, t. Typically, a cooling schedule
with a geometric decrease in temperature, given by
t/n
Tf
T (t) = T0 , (8.24)
T0
is used, where T0 is the initial temperature and Tf is the nal temperature
at the nth iteration. If we want to cool to temperature Tf by iteration n and
subsequently keep the temperature constant at Tf , then we can use
t/n
Tf
T (t) = max T0 , Tf . (8.25)
T0
176 Detection based on MCMC techniques
1
B+N
E[f (x)]N f (xi ). (8.29)
N
i=B+1
8.4 MCMC based large MIMO detection 177
As N , E[f (x)]N E[f (x)]. Likewise, Monte Carlo estimates of other mo-
ments can be computed using the Gibbs sequence. Also, approximate marginals
can be obtained directly using the Gibbs sequence. Alternatively, the approx-
imate marginals can be obtained
= from the average of the conditional densities
p(x|y = yi ). Since p(x) = p(x|y)p(y)dy = Ey [p(x|y)], the marginal can be
approximated by
1
B+N
p(x) p(x|y = yi ). (8.30)
N
i=B+1
where a A, the modulation alphabet. Note that the posterior density p(x|y)
p(y|x)p(x), where p(x) is the prior distribution of x. The computation of (8.31)
is clearly prohibitive for large dimensions, in which case one can resort to Monte
Carlo methods. Suppose we can generate samples x(1) , x(2) , . . . , x(N ) from the
distribution p(x|y). Then, we can approximate the marginal posterior p(xi =
a|y) by the empirical distribution based on the corresponding component in the
(1) (2) (N )
Monte Carlo sample, i.e., xi , xi , . . . , xi , and approximate the marginaliza-
tion in (8.31) as
1
N
(n)
p(xi = a|y) I(xi = a), (8.32)
N n=1
where I(.) is an indicator function. MCMC methods that use Gibbs sampling
are an eective means to sample from the posterior distribution p(x|y). MCMC
simulations are found useful in reducing the exponential complexity in (8.31) to
polynomial complexity.
MCMC methods have been applied to design receivers in a number of digital
communication applications including signal detection and decoding in AWGN
channels, ISI channels, CDMA channels, and MIMO channels [9][14].
In the rest of this chapter, large MIMO detection algorithms based on MCMC
techniques are presented. It will be seen that a careful choice of sampling distribu-
tion and stopping criteria is needed to simultaneously achieve both near-optimal
performance as well as scalability to large dimensions.
178 Detection based on MCMC techniques
yc = Hc xc + nc , (8.33)
where nc is the noise vector whose entries are modeled as iid CN (0, 2 ). We will
work with the real-valued system model corresponding to (8.33), given by
yr = Hr xr + nr , (8.34)
User 1
User 2 2
3
BS
(tens to hundreds of
4 receive antennas)
User 3
.......
...
Distributed
large MIMO channel
.
N
.. .
.. .
User K
where f (x) = xT HT Hx2yT Hx is the ML cost. While the ML detector in (8.37)
is exponentially complex in K (which is prohibitive for large K), the MCMC
based algorithms in the following subsections have a per-symbol complexity that
is quadratic in K and they achieve near-ML performance as well.
The ML detection problem in (8.37) can be solved by using MCMC simulations
[15]. First, consider the conventional Gibbs sampler, which is an MCMC method
used for sampling from distributions of multiple dimensions. In the context of
MIMO detection, the joint probability distribution of interest is
y Hx 2
p(x1 , . . . , x2K |y, H) exp . (8.38)
2
Assume perfect knowledge of channel gain matrix H at the BS receiver.
t denote the iteration index and i denote the coordinate index, i = 1, 2, . . . , 2K.
Each iteration consists of 2K coordinate updates. In each iteration, 2K updates
are carried out by sampling from distributions as follows:
(t+1) (t) (t) (t)
x1 p(x1 |x2 , x3 , . . . , x2K , y, H),
(t+1) (t+1) (t) (t)
x2 p(x2 |x1 , x3 , . . . , x2K , y, H),
(t+1) (t+1) (t+1) (t) (t)
x3 p(x3 |x1 , x2 , x4 , . . . , x2K , y, H), (8.39)
..
.
(t+1) (t+1) (t+1) (t+1)
x2K p(x2K |x1 , x2 , . . . , x2K1 , y, H).
The updated symbol vector at the end of each iteration is fed back to the next
iteration for further coordinate updates. The algorithm is run for a certain num-
ber of iterations. The detected symbol vector is chosen to be that symbol vector
which has the least ML cost in all the iterations.
A problem with the above conventional Gibbs sampling based detection is
the stalling problem which results in BER oors at high SNRs [11]. This is
illustrated in Fig. 8.2(a) for K = N = 16, 4-QAM, random initial vector, and
256 iterations, where the BER of the conventional Gibbs sampler is degraded for
SNRs more than 8 dB. The reason for this ooring is that the algorithm becomes
trapped in some poor local solutions for a long time (i.e, for many iterations).
This can be observed in Fig. 8.2(b) which shows an evolution of the ML cost
of the state vector in the nth iteration as a function of n for 12 dB SNR. Note
that the ML cost of the state vector does not change much from iteration 4 to
iteration 256, and that this trapped ML cost is quite poor compared to the ML
cost of the SD solution. This leads to inferior performance compared to the SD.
Although the chain is guaranteed to converge to the target distribution (8.38)
asymptotically as n , stalling occurs and degrades performance with nite
number of iterations.
100
Uplink multiuser MIMO
K=N=16, 4QAM
10-1
10-2
BER
10-4
0 2 4 6 8 10 12
Average received SNR (dB)
(a)
600
Uplink multiuser MIMO 600 Conventional Gibbs sampling,=1
ML cost of state vector in nth iteration
400 0
2 4 6 8 10
300
200
100
0
50 100 150 200 250
Iteration index, n
(b)
Figure 8.2 (a) BER performance and (b) evolution of ML cost of the state vector in a
conventional Gibbs sampler, Gibbs sampler with = 1.5, and mixed-Gibbs sampler
for K = N = 16 and 4-QAM.
8.4.4 MGS
In order to break away from traps that lead to stalling, one needs to use a noisy
version of the MCMC procedure. The noisiest is the one with innite temperature
(i.e., = ), which randomly and uniformly samples from all the possibilities.
A simple, yet eective, approach is to use MGS which employs a mixture of (i)
Gibbs sampling with the posterior in (8.38) (i.e., = 1) and (ii) random uniform
sampling (i.e., = ) [20]. The idea behind the MGS approach is that, in each
(t)
coordinate update, instead of updating xi s as per the update rule in (8.39) with
probability 1 as is done in conventional Gibbs sampling, they are updated as per
(8.39) with probability (1 q) and a dierent update rule with probability q is
used. The dierent update rule is as follows [21],[22]. Generate |A| probability
values from uniform distribution as
(t)
p(xi = j) U [0, 1], j A,
|A| (t) (t)
such that j=1 p(xi = j) = 1, and sample xi from this generated probability
mass function (pmf). In other words, the mixture distribution for sampling is
given by
p(x1 , . . . , x2K |y, H) (1 q)(1 ) + q(2 ), (8.41)
where
y Hx 2
() = exp ,
2 2
and q is the mixing ratio. Dierent values for (1 , 2 ) can be chosen. Note that
with 1 = 1 and 2 = , the rst and second distributions in (8.41) become
the true distribution and the uniform distribution, respectively. That is, the
8.4 MCMC based large MIMO detection 183
K=N=2, 4QAM
K=N=3, 4QAM
SNR= 12 dB K=N=4, 4QAM
102
Average rq
101 q=0.125
q=0.167
q=0.25
Figure 8.3 Average r q (expected number of iterations to reach the global minima for
the rst time averaged over the starting points) as a function of q for N = K = 2, 3, 4
with 4-QAM.
100
Uplink multiuser MIMO
SNR=10 dB, 4QAM
10-1
K=N=8
BER
K=N=16
K=N=32
K=N=64
10-2
10-3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Mixing ratio, q
ML cost at several instances in the evolution is very good to the extent that the
ML cost of the SD solution is almost reached. This enables the sampling from
the mixture distribution in (8.3) to achieve almost SD BER performance, as seen
in Fig. 8.2(a).
The eect of parameter q on the BER performance of the mixed-Gibbs sampler
is shown in Fig. 8.4 for K = N = 8, 16, 32, 64 and 4-QAM at 10 dB SNR, where
the optimum value of q that minimizes the BER is observed to be 1/2K. The
optimum value of q is small for large values of K. For K = N = 64 in Fig. 8.4,
1
the optimum q is small, i.e., qopt = 128 = 0.0078. The BER dierence between
the cases of q = 0.0078 and q = 0 for K = N = 64 is signicant.
encountered before s iterations, the algorithm proceeds with the newly found
lower ML cost; else, the algorithm terminates. If termination does not happen
through the stalling limit as above, the algorithm terminates on completing a
maximum number of iterations, MAX-ITER.
The algorithm chooses the value of s depending on the quality of the stalled
ML cost, as follows. A large value for s is preferred if the quality of the stalled
ML cost is poor, because of the available potential for improvement from a poor
stalled solution. On the other hand, if the stalled ML cost quality is already
good, then a small value of s is preferred. The quality of a stalled solution is
determined in terms of the closeness of the stalled ML cost to a value obtained
using the statistics (mean and variance) of the ML cost for the case when x is
detected error-free. Note that when x is detected error-free, the corresponding
ML cost is nothing but n2 , which has a scaled chi-squared distribution with
2N degrees of freedom with mean N 2 and variance N 4 . Let us dene the
quality metric to be the dierence between the ML cost of the stalled solution
and the mean of n2 , scaled by the standard deviation, i.e., the quality metric
of vector x is dened as
y Hx2 N 2
(x) = . (8.42)
N 2
100
Uplink multiuser MIMO
4QAM
101
102
K=N=8
BER
K=N=8, MGS
K=N=8, Sphere decoder K=N=16
103 K=N=16, MGS
K=N=16, Sphere decoder
K=N=32, MGS
K=N=64, MGS
104 K=N=128, MGS
Unfaded SISOAWGN
105
0 2 4 6 8 10 12
Average received SNR (dB)
(a)
20
Uplink multiuser MIMO
log2 (ave. no. of real operations per bit)
19
4QAM, BER=0.01
18
17 MGS
a1K2
16 a2K
15
14
13
12
11
10
9
8 16 32 64 128
Number of users (K=N)
(b)
Figure 8.5 (a) BER performance and (b) complexity of the MGS algorithm for
K = N = 8, 16, 32, 64, 128 and 4-QAM.
grows only quadratically in K (i.e., O(K 2 )). Because of such low complexity,
the MGS algorithm scales easily for K = N = 32, 64, 128, whose simulated
BER performances are also shown in Fig. 8.5(a). Since SD simulation is pro-
hibitive for such large dimensions, the unfaded SISO AWGN performance is
plotted as a lower bound on ML performance for comparison. It can be seen
that the MGS detector achieves a performance which is very close to the SISO
AWGN performance for large K = N , e.g., close to within 0.5 dB at 103 BER
for K = N = 128. This illustrates the ability of the MGS detector to achieve
near-optimal performance in large-scale multiuser MIMO systems.
188 Detection based on MCMC techniques
100
Uplink multiuser MIMO MGS, 4-QAM
K = N = 16 Sphere decoder, 4-QAM
MGS, 16-QAM
Sphere decoder, 16-QAM
101 MGS, 64-QAM
Sphere decoder, 64-QAM
BER
102
103
104
0 5 10 15 20 25 30 35 40
Average received SNR (dB)
300
Initial vector 1, Conv. Gibbs samp.
Initial vector 1, MGS
Least ML cost up to nth iteration Uplink multiuser MIMO
Initial vector 2, Conv. Gibbs samp.
K=N=16, 4QAM, SNR=11 dB
250 Initial vector 2, MGS
Initial vector 3, Conv. Gibbs samp.
Initial vector 3, MGS
SDs solution vector cost
200
150
MGS
50
300
Initial vector 1, Conv. Gibbs samp.
Initial vector 1, MGS
Least ML cost up to nth iteration
150
50
Figure 8.7 Least ML cost up to the nth iteration versus n in conventional Gibbs
sampling and MGS for dierent initial vectors in multiuser MIMO with K = N = 16:
(a) 4-QAM, SNR=11 dB, (b) 16-QAM, SNR=18 dB.
sampling becomes locked in some state (with very low state transition proba-
bility) for a long time without any change in ML cost in subsequent iterations,
whereas the mixed sampling strategy is able to exit from such states quickly
and give improved ML costs in subsequent iterations. This shows that MGS is
preferred over conventional Gibbs sampling. More interestingly, comparing the
least ML costs of 4-QAM and 16-QAM (in Figs. 8.7(a) and (b), respectively), it
is seen that all the three random initializations could converge almost to the true
ML vector cost for 4-QAM within 100 iterations, whereas only initial vector 3
converges to near true ML cost for 16-QAM, while initial vectors 1 and 2 do
190 Detection based on MCMC techniques
10-1
10-2
10-3
1 2 3 4 5 6 7 8 9 10
Number of restarts, R
not. Since any random initialization works well with 4-QAM, MGS is able to
achieve near-ML performance without multiple restarts for 4-QAM. However,
it can be seen that 16-QAM performance is more sensitive to the initialization,
which explains the poor performance of MGS without restarts in higher-order
QAM. An MMSE vector can be used as an initial vector, but it is not a good
initialization for all channel realizations. This points to the possibility of achiev-
ing good initializations through multiple restarts to improve the performance of
MGS in higher-order QAM.
The closeness of the ML costs obtained so far to the error-free ML cost in terms
of its statistics may allow the algorithm to approach the ML solution. Checking
for repetitions allows the number of restarts, and hence the complexity, to be
restricted. The minimum standardized ML cost obtained so far and its number of
repetitions are used to decide the credibility of the solution. An integer threshold
(P ) is dened for the best ML cost obtained so far for the purpose of comparison
with the number of repetitions. The number of repetitions needed for termination
(P , the integer threshold) is chosen as per the following expression [20]:
P = max (0, c2 (x)) + 1, (8.45)
where x is the solution vector with minimum ML cost so far, and c2 is a constant
chosen depending on the QAM size; a larger value of c2 is chosen for larger QAM
size. Now, denoting Rmax to be the maximum number for restarts, the MGS with
multiple restarts algorithm (referred to as the MGS-MR algorithm) can be stated
as follows.
Note that the output solution vectors from the MGS and MGS-MR algorithms
are hard-decision outputs. Soft-decision values for channel decoding can be gen-
erated from these hard-decision output vectors following the method proposed
in Section 5.1.4.
100
10-1
BER
10-2
10-4
10 15 20 25
Average received SNR (dB)
100
Uplink multiuser MIMO
K = N = 16
Sphere decoder, 4QAM
MGSMR, 4QAM
10-1 Sphere decoder, 16QAM
MGSMR, 16QAM
Sphere decoder, 64QAM
MGSMR, 64QAM
BER
10-2
10-3
10-4
0 5 10 15 20 25 30 35 40 45
Average received SNR (dB)
(a)
100
Uplink multiuser MIMO
K = N = 32
Unfaded SISOAWGN, 4QAM
Unfaded SISOAWGN, 16QAM
10-1 Unfaded SISOAWGN, 64QAM
MGSMR, 4QAM
MGSMR, 16QAM
MGSMR, 64QAM
10-2
BER
10-3
10-4
0 5 10 15 20 25 30 35 40 45
Average received SNR (dB)
(b)
Figure 8.10 BER performance of the MGS-MR algorithm in uplink multiuser MIMO
with 4-/16-/64-QAM: (a) K = N = 16; (b) K = N = 32.
100
10-1 MF
ZF
MMSE
MGSMR
10-2
BER
10-4
109
Complexity in number of real operations
107
106
105
104
0.125 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Loading factor, K/N
(b)
Figure 8.11 (a) BER performance and (b) complexity of the MGS-MR detector in
comparison with those of linear (MF, ZF, MMSE) detectors as a function of loading
factor = K/N . N = 128, 16-QAM.
presented and compared. The number of BS antennas N is xed at 128, and the
number of uplink users K is varied from small values up to 128. From Fig. 8.11,
it is observed that the MGS-MR detector performs better than the MF, ZF,
and MMSE detectors: it is moderately better under low loading factors and
signicantly better (about 12 orders of improved BER) under medium to high
loading factors. It is also seen that the complexity increase in MGS-MR detection
compared to ZF and MMSE detection is nominal (not orders higher).
References 195
References
In the previous chapters, large MIMO detection algorithms were presented under
the assumption of perfect knowledge of channel gains at the receiver. However, in
practice, these gains are estimated at the receiver, either blindly/semi-blindly or
through pilot transmissions (training). In FDD systems, channel gains estimated
at the receiver are fed back to the transmitter (e.g., for precoding purposes). In
TDD systems, where channel reciprocity holds, the transmitter can estimate the
channel and use it for precoding. Due to noise and the nite number of pilot
symbols used for channel estimation, the channel estimates are not perfect, i.e.,
there are estimation errors. This has an inuence on the achieved capacity of
the MIMO channel and the error performance of detection and precoding algo-
rithms. This chapter addresses the eect of imperfect CSI on MIMO capacity,
how much training is needed for MIMO channel estimation, and channel esti-
mation algorithms and their performance on the uplink in large-scale multiuser
TDD MIMO systems.
The capacity of MIMO channels can be degraded if the CSI is not perfect.
Gaussian input distribution, which is the capacity achieving distribution in the
perfect CSI case, is suboptimal when CSI is imperfect [1],[2]. Lower and upper
bounds on the mutual information for iid frequency-at Rayleigh fading point-
to-point MIMO channels have been derived for the imperfect CSI case in [3]
assuming Gaussian input, where the MMSE channel estimate is assumed at
the receiver and the same channel estimate is assumed to be available at the
transmitter. Some key results on MIMO capacity with this imperfect CSI model
are summarized below [3].
First, while the Gaussian mutual information saturates with increasing SNR
with imperfect CSI, it still increases linearly with the smaller of the number of
transmit and receive antennas.
Second, in the perfect CSI case, the capacity gain in knowing the channel at
the transmitter decreases with increasing SNR. This is because the optimal input
covariance matrix approaches the identity matrix for increasing SNRs, which is
198 Channel estimation in large MIMO systems
also the optimal covariance matrix when the channel is not known at the trans-
mitter. This capacity gain trend, however, changes with imperfect CSI. The
capacity gain due to exploitation of CSIT becomes signicant with increasing
estimation error and does not reduce much at high SNRs. This is because the
estimation error causes the eective SNR to saturate, and thereby eliminates
the high SNR capacity region where transmitter channel knowledge becomes
unimportant.
Third, in terms of optimal power allocation strategies to exploit CSIT, for er-
godic capacity, the optimal strategy is modied waterlling over the spatial and
temporal domain. For outage capacity, it is spatial waterlling and temporally
truncated channel inversion. The improvement in ergodic and outage capaci-
ties due to spatial power allocation becomes signicant with channel estimation
errors. Spatial power allocation with imperfect CSI helps even at high SNRs.
Performing temporal power adaptation in addition to spatial power allocation
enhances the outage capacity signicantly but gives only negligible gains in terms
of ergodic capacity.
Other key references for the eect of training sequence based channel estima-
tion on the achievable rate and outage capacity are [4][8]. In particular, they ad-
dress the question of how much training is needed in point-to-point frequency-at
[4][6] and frequency-selective [7] MIMO wireless links, and in multiuser MIMO
links [8]. More on the results in [6], [8] and their relevance to channel estimation
and performance in the context of large MIMO systems is presented in the next
section.
The eect of imperfect channel knowledge on the achievable rates is also an-
alyzed in [9], where a lower bound on capacity is expressed as a function of
the CramerRao bound (CRB). In several works (e.g., [10], [11]), the perfor-
mance of channel estimation methods is investigated by deriving expressions
of the CRB for dierent pilot symbol/placement designs. The relation between
achievable rate and channel CRB established in [9] therefore enables the compar-
ison of achievable rates under dierent pilot design and placement alternatives.
The eect of MMSE and ML channel estimates on the decoding performance of
space-time codes is studied in [12].
tection in the data phase. The estimate obtained in the training phase can be
further rened using detected data in the data phase in an iterative manner.
A key question of interest in training based channel estimation in MIMO
systems in general, and in large MIMO systems in particular, is how much time
should be spent in training, for a given number of transmit antennas (nt ), number
of receive antennas (nr ), length of the channel coherence time (T , in number of
channel uses), and average received SNR (). This question is addressed in [6]
for point-to-point MIMO wireless links, and in [8] for multiuser TDD MIMO
systems.
The optimal length of the training interval Tp is nt for all SNRs and coherence
times T , if p and d are allowed to vary, whereas it can be more than nt if p
200 Channel estimation in large MIMO systems
70
Perfect CSIR 16 x 16 MIMO
60 Est. CSIR (T=144, Tp=16)
Est. CSIR, (T=32, Tp=16)
Ergodic capacity (bps/Hz)
50
40
30
24 bps/Hz
21.3 bps/Hz
20
12 bps/Hz
10
7.7 dB
0 4.3 dB
4 2 0 2 4 6 8 10 12 14 16
Average SNR (dB)
Figure 9.1 Lower bound on the ergodic capacity of 16 16 MIMO channel with
(i) estimated CSIR, T = 32, Tp = 16, p = d = 1, (ii) estimated CSIR, T = 144,
Tp = 16, p = d = 1, and (iii) perfect CSIR.
and d are made equal (Fig. 3 in [6]). In the latter case, the trend is that
the optimal training length for a given nt , nr increases with increasing T and
decreasing , such that as 0 the length increases until it reaches T /2.
Figure 9.1 shows the lower bound on the ergodic capacity with estimated
CSI (9.1) evaluated for a 16 16 MIMO channel with (i) T = 144, Tp = 16
(large coherence time; a slowly fading channel) and (ii) T = 32, Tp = 16 (short
coherence time; a relatively fast fading channel). Compared to the perfect CSI
case, the capacity degradation in case (ii) is more than in case (i), i.e., for a given
nt , the larger the value of T the smaller the capacity/throughput loss will be
compared to perfect CSI capacity. This implies that large MIMO systems with
large nt benet from large coherence times (e.g., in slow fading as witnessed in
no mobility/low mobility scenarios). Even after accounting for the throughput
loss due to training overhead, the spectral eciencies achieved with large nt are
in the tens of bps/Hz range, which are signicantly higher than the spectral
eciencies achieved in current wireless systems.
45
SNR = 18 dB
40 nr=12
35 T=100
Capacity (bits/channel use)
30
25
20
15
10
5
0
0 10 20 30 40 50 60 70 80 90 100
No. of transmit antennas, nt
Figure 9.2 Variation of capacity lower bound as a function of nt for a given nr = 12,
T = 100, and = 18 dB.
nr = 12. It is seen that the capacity increases initially for increasing nt , but
starts diminishing beyond nt = 15 and reaches zero when nt = T = 100 (i.e.,
when the entire coherence time is used for training). Such a behavior, namely,
fewer transmit antennas being optimum, has also been captured in [14] through
simulation of practical MIMO system designs. In [14], it has been shown that
a MIMO system with nt = 12 achieves a higher spectral eciency and better
coded BER performance than a system with nt = 16 for T = 48, nr = 16 and
training based channel estimation.
a lower bound on the net sum-rate on the downlink. It has been shown that,
given a large number of antennas at the BS (N > 16), even with short coherence
intervals (T = 10) and low SINRs (0 dB on the downlink and 10 dB on the
uplink), it is both possible as well as advantageous to learn the channel (with the
pilot length equal to the number of users) and serve several users simultaneously.
In summary, a key observation to make here in the context of large MIMO
systems is that, although the potential for capacity increase with an increasing
number of antennas is diminished by training, the spectral eciency achieved
with the optimum number of antennas is still high (e.g., about 45 bps/Hz with
nt = 15 in Fig. 9.2 and about 20 bps/Hz with N = 16, K = 1 in Fig. 5 of
[8]). Considering that the spectral eciencies in current systems are much less
than 10 bps/Hz, the large MIMO system approach is an attractive and viable
approach to achieving a quantum jump in the eciency of spectrum usage in
future systems and standards.
model is given by
yc = Hc xc + nc . (9.2)
y = Hx + n, (9.3)
Frame structure
In order to detect the transmitted data vector x, knowledge of the channel matrix
H is needed. The channel matrix is estimated based on a pilot based channel
estimation scheme, where transmission is carried out in frames, with each frame
consisting of several blocks as shown in Fig. 9.3. A slow fading channel (typical
with no/low mobility users) is assumed, where the channel is assumed to be
constant over one frame duration. Each frame consists of a pilot block for the
purpose of initial channel estimation, followed by Q data blocks. The pilot block
consists of K channel uses in which a K-length pilot symbol vector comprising
pilot symbols transmitted from K users (one pilot symbol per user) is received by
N receive antennas at the BS. Each data block consists of K channel uses, where
K information symbol vectors, each of length K (one data symbol from each user)
are transmitted. Taking both pilot and data channel uses into account, the total
number of channel uses per frame is (Q+1)K. Data blocks are detected using any
of the known large MIMO detection algorithms (e.g., the MGS-MR algorithm
presented in Chapter 8) using an initial channel estimate. The detected data
blocks are then iteratively used to rene the channel estimates during the data
phase employing a Gibbs sampling based channel estimation algorithm described
below.
YP = Hc XP + NP , (9.4)
where NP is the N K noise matrix at the BS. The following pilot sequence is
used:
Frame 1 Frame 2
1 PB Q DBs
Figure 9.3 Frame structure for uplink multiuser MIMO system in frequency-at fading
(PB: pilot block; DB: data block).
where p is chosen to be p = KEs , and Es is the average symbol energy. Using
c is obtained as
the scaled identity nature of xP , an initial channel estimate H
c = YP /p.
H (9.6)
symbol vectors transmitted by the users in the ith data block during the data
phase, i = 1, 2, . . . , Q. The received signal matrix at the BS in the ith data block,
Yi of size N K, is given by
Yi = Hc Xi + Ni , (9.7)
where Ni is the N K noise matrix at the BS during the ith data block. Detection
is performed on a vector by vector basis using the independence of data symbols
(t)
transmitted by the users. Let yi denote the tth column of Yi , t = 0, 2, . . . , K1.
(t)
Denoting the tth column of Xi as xi = [x1i (t), x2i (t), . . . , xK T
i (t)] , the system
equation (9.4) can be rewritten as
(t) (t) (t)
yi = Hc xi + ni , (9.8)
(t)
where ni is the tth column of Ni . The initial channel estimate H c obtained
from (9.6) is used to detect the transmitted data vectors using, say, the MGS-
MR algorithm presented in Chapter 8.
c = Hc + NP /p. This knowledge
From (9.4) and (9.6), it is observed that H
9.3 Large multiuser MIMO systems 205
(t) (t)
Each entry of the vector ni NP xi /p has mean zero and variance 2 2 . Using
this knowledge at the receiver, the transmitted data are detected using the MGS-
(t)
MR algorithm and x i is obtained. Let the detected data matrix in data block
i = [
i be denoted X
(0)
xi , x
(1)
i , . . . , x
i
(K1)
].
This system model corresponding to the full frame is converted into a real-valued
system model. That is, (9.9) can be written in the form
Y = HX + N, (9.10)
where
(Ytot ) (Ytot ) (Hc ) (Hc )
Y= , H= ,
(Ytot ) (Ytot ) (Hc ) (Hc )
(Xtot ) (Xtot ) (Ntot ) (Ntot )
X= , N= .
(Xtot ) (Xtot ) (Ntot ) (Ntot )
YT = XT HT + NT . (9.11)
r = I2N XT g + z. (9.12)
=S
Now, the goal is to estimate g knowing r, the estimate of S, and the statistics
of z using Gibbs sampling. The estimate of S is obtained as
= I2N X
S T,
206 Channel estimation in large MIMO systems
where
tot ) (X
(X tot )
X=
tot ) (X
(X tot ) and Xtot = [XP X1 X2 . . . XQ ].
Hence,
(i) T
r si 2
gi
gi exp si 2
p gi |r, S, , (9.20)
2
si 2
9.3 Large multiuser MIMO systems 207
T
which is Gaussian with mean gi = r(i) si /si 2 , and variance g2i = 2 /2si 2 .
Let M AX denote the number of iterations. In each iteration, for each coordinate,
the probability distribution specied by its mean and variance has to be calcu-
lated to draw samples. Let the mean and variance in the rth iteration and the ith
(r) (r)
coordinate be denoted as gi and g2i , respectively, where r = 1, 2, . . . , M AX
(0) in (9.13), which is the estimate from the pilot
and i = 1, 2, . . . , 4KN . Use g
phase, as the initial estimate. In the rth iteration, g (r) is obtained from g (r1)
as follows:
Take g (r1) .
(r) = g
(r) (r)
(r) by sampling from N gi , g2i
Update the ith coordinate of g for all i.
(r)
Let gi
denote the updated ith
coordinate of g(r) .
2
(r) (r) (r) (r)
Compute weights i = exp gi gi / 2g2i for all i. This gives
more weight to samples closer to the mean.
After M AX iterations, compute the nal estimate of the ith coordinate, denoted
by gi , to be the following weighted sum of the estimates from previous and
current iterations:
M
AX
(r) (r)
i gi
gi = r=1
MAX
. (9.21)
(r)
i
r=1
is obtained by restructuring
Finally, the updated 2N 2K channel estimate H
T
g = [g1 , g2 , . . . , g4KN ] as
H(p, q) = gn , p = 1, 2, . . . , 2N, q = 1, 2, . . . , 2K, (9.22)
where n = 2N (p 1) + q and H(p, q) denotes the element in the pth row and qth
column of H. A listing of the above Gibbs sampling based channel estimation
algorithm is given in Algorithm 3.
The matrix H obtained thus is used for data detection using the MGS-MR algo-
rithm. This ends one iteration between channel estimation and detection. The de-
tected data matrix is fed back for channel estimation in the next iteration, whose
output is then used to detect the data matrix again. This iterative channel esti-
mation and detection procedure is carried out for a certain number of iterations.
Performance results
In Fig. 9.4(a), the mean square error (MSE) performance of the iterative chan-
nel estimation/detection scheme using Gibbs sampling based channel estimation
and MGR-MR based detection with 4-QAM for K = N = 128 and Q = 9 is
shown. In the simulations, the MGS-MR algorithm parameter values used are
the same as in Section 8.4.12. For the channel estimation algorithm, the value
of MAX used is 2. The MSEs of the initial channel estimate, and the channel
208 Channel estimation in large MIMO systems
estimates after one and two iterations between channel estimation and detection
are shown. For comparison, the CramerRao lower bound (CRLB) for this sys-
tem is also plotted. It can be seen that the channel estimation/detection scheme
results in good MSE performance with improved MSE for an increased number
of iterations between channel estimation and detection. For the same set of sys-
tem and algorithm parameters as in Fig. 9.4(a), the BER performance curves are
plotted in Fig. 9.4(b). For comparison, the BER performance with perfect chan-
nel knowledge is also plotted. It can be seen that with two iterations between
channel estimation and detection the channel estimation/detection scheme can
achieve 103 BER within about 1 dB of the performance with perfect channel
knowledge.
MIMO-OFDM
One popular way to deal with ISI channels is to use multicarrier techniques like
OFDM which can transform a frequency-selective channel into several narrow-
9.3 Large multiuser MIMO systems 209
100
Initial channel estimate
Iter. estimation/detection, # iter=1
Iter. estimation/detection, # iter=2
CRLB
101
MSE
103
0 2 4 6 8 10 12 14
Average received SNR (dB)
(a)
103
104
105
0 2 4 6 8 10 12 14
Average received SNR (dB)
(b)
Figure 9.4 (a) MSE and (b) BER performance of iterative channel
estimation/detection using Gibbs sampling based channel estimation and MGS-MR
based detection in uplink multiuser MIMO system with K = N = 128, Q = 9, 4-QAM.
(b)
Single-carrier communication
Instead of adopting a multicarrier approach, one can resort to a single-carrier
block transmission approach and perform equalization at the receiver. The prefer-
ence for single-carrier communication over OFDM communication is motivated
by the peak-to-average power ratio (PAPR) problem encountered in the mul-
ticarrier approach [18]. In OFDM systems, the PAPR of the transmitted sig-
nal is large. This results in non-linear distortion in the power amplier. Unless
PAPR-reduction techniques are incorporated to control the non-linear distor-
tion, power backo in the amplier becomes necessary. Several PAPR-reduction
algorithms have been reported in the literature [19][21]. However, the result-
ing PAPRs are still (at least a few dB) larger than those of single-carrier block
transmissions. Therefore, single-carrier schemes are considered to be good alter-
natives to address the PAPR issue that arises in multicarrier systems [18][25].
Single-carrier schemes alleviate the PAPR problem by discarding the IFFT at the
transmitter (Fig. 9.5(b)). In addition, they also retain the FFT at the receiver
which facilitates low complexity equalization in the frequency domain. As in
OFDM, a CP or ZP can be used in single-carrier schemes [25]. In the following,
system model development and the channel estimation/equalization approach
for large multiuser MIMO-CPSC (MIMO cyclic prexed single carrier) systems
are presented. Likewise, the channel estimation/equalization approach can be
9.3 Large multiuser MIMO systems 211
Frame 1 Frame 2
1PB Q DBs
1DB
Figure 9.6 Frame structure for multiuser MIMO-CPSC system in ISI channels (PB:
pilot block; DB: data block).
the receiver is taken to be the channel use in which the rst pilot symbol in the
frame is sent.
where
j j j j
yP = [yP (0), yP (1), . . . , yP (KL 1)]T , hj = [(h(j,1) )T , . . . , (h(j,k) )T , . . . , (h(j,K) )T ]T ,
9.3 Large multiuser MIMO systems 213
From the signal observed at the jth receive antenna from time 0 to KL 1
during the pilot phase, an initial estimate of the channel vector hj is obtained
using the scaled identity nature of BP , as
j = yj /b,
h j = 1, 2, . . . , N. (9.26)
P
These initial channel estimates are used for equalization and detection of data
vectors in the data phase.
Dj,k (m) is the mth diagonal element of the matrix Dj,k . Also, bi = Fxi ,
where F = FI IK , xi = [a1i (L 1) . . . aK
i (L 1), ai (L) . . . ai (L), . . . , ai (I + L
1 K 1
where Bki = diag(bki ) and dj,k is a vector consisting of the diagonal elements of
matrix Dj,k , which is the I-point DFT of h(j,k) (zero padded to length I), i.e.,
dj,k = FIL h(j,k) , where FIL is the matrix with the rst L columns of FI .
Now, (9.35) can be written as
K
zji = Bki FIL h(j,k) + wij . (9.36)
k=1
Dening Aki = Bki FIL , (9.36) can be written as
zji = Ai hj + wij , i = 1, . . . , Q, (9.37)
where Ai = [A1i A2i . . . AK
i ]. Now, (9.37) can be written as
zj = Ahj + wj , (9.38)
where
zj1 A1 w1j
zj2 A2 w2j
z =
j , A =
..
, w =
j .
.. ..
. . .
zjQ AQ wQ j
Using the signal received at antenna j from blocks 1Q in a frame (i.e., using zj )
and the matrix A which is formed by replacing the information symbols {xki }
in A by the detected information symbols {xki }, the channel coecients {hj }
216 References
100
Initial channel estimate
Iter. estimation/detection, # iter=1
Iter. estimation/detection, # iter=2
101 Perfect channel knowledge
104
105
0 2 4 6 8 10 12 14 16 18
Average received SNR (dB)
are estimated using the Gibbs sampling based estimation technique presented in
Section 9.3.2. This ends one iteration between channel estimation and detection.
The detected data matrix is fed back for channel estimation in the next itera-
tion, whose output is then used to detect the data matrix again. This iterative
channel estimation/equalization procedure is carried out for a certain number of
iterations.
Performance results
Figure 9.7 presents the BER performance of the iterative channel estimation/
equalization scheme described above in an uplink multiuser MIMO system on
frequency-selective fading with K = N = 16, L = 6, I = 64, Q = 9 and 4-QAM.
For the same settings, the BER performance with perfect channel knowledge is
also plotted. It can be seen that the BER improves as the number of iterations
between channel estimation and detection increases. It can also be seen that with
estimated channel knowledge, a performance close to that with perfect channel
knowledge is achieved.
References
[3] T. Yoo and A. Goldsmith, Capacity and power allocation for fading MIMO chan-
nels with channel estimation error, IEEE Trans. Inform. Theory, vol. 52, no. 5,
pp. 22032214, May 2006.
[4] T. L. Marzetta, BLAST training: estimating channel characteristics for high ca-
pacity space-time wireless, in 37th Annual Allerton Conf. on Commun. Contr.
and Comput., Monticello, IL, Sep. 1999, pp. 958966.
[5] J. Baltersee, G. Fock, and H. Meyr, Achievable rate of MIMO channels with data-
aided channel estimation and perfect interleaving, IEEE J. Sel. Areas Commun.,
vol. 19, no. 12, pp. 23582368, Dec. 2001.
[6] B. Hassibi and B. M. Hochwald, How much training is needed in multiple-antenna
wireless links? IEEE Trans. Inform. Theory, vol. 49, no. 4, pp. 951963, Apr.
2003.
[7] X. Ma, L. Yang, and G. B. Giannakis, Optimal training for MIMO frequency-
selective fading channels, IEEE Trans. Wireless Commun., vol. 4, no. 2, pp.
453456, Mar. 2005.
[8] T. L. Marzetta, How much training is required for multiuser MIMO? in Proc.
40th Asilomar Conf. on Signals, Systems and Computers, Pacic Grove, CA, Oct.
Nov. 2006, pp. 359363.
[9] L. Berriche, K. Abed-Meraim, and J.-C. Belore, Investigation of the channel es-
timation error on MIMO system performance, in European Signal Process. Conf.,
Antalya, Sep. 2005.
[10] M. Dong and L. Tong, Optimal design and placement of pilot symbols for channel
estimation, IEEE Trans. Signal Process., vol. 50, no. 12, pp. 30553069, Dec.
2002.
[11] L. Berriche, K. Abed-Meraim, and J.-C. Belore, Cramer-Rao bounds for
MIMO channel estimation, in IEEE ICASSP2004, Montreal, vol. 4, May 2004,
pp. 397400.
[12] G. Tarrico and E. Biglieri, Space-time coding with imperfect channel estimation,
IEEE Trans. Wireless Commun., vol. 4, no. 4, pp. 18741888, Apr. 2005.
[13] Y. Sung, T. E. Sung, B. M. Sadler, and L. Tong, Training for MIMO wireless com-
munications, in Space-time Wireless Systems: From Array Processing to MIMO
Communications, H. Bolcskei, D. Gesbert, C. B. Papadias, and A.-J. van der Veen,
Eds. Cambridge, UK: Cambridge University Press, 2006, ch. 17.
[14] S. K. Mohammed, A. Zaki, A. Chockalingam, and B. S. Rajan, High-rate space-
time coded large-MIMO systems: low-complexity detection and channel estima-
tion, IEEE J. Sel. Topics in Signal Process., vol. 3, no. 6, pp. 958974, Dec.
2009.
[15] B. Muquet, Z. Wang, G. B. Giannakis, M. de Courville, and P. Duhamel, Cyclic
prexing or zero padding for wireless multicarrier transmissions? IEEE Trans.
Commun., vol. 50, no. 12, pp. 21362148, Dec. 2002.
[16] Y. Li, Simplied channel estimation for OFDM systems with multiple transmit
antennas, IEEE Trans. Wireless Commun., vol. 1, no. 1, pp. 6775, Jan. 2002.
[17] K. Higuchi, H. Kawai, N. Maeda, H. Taoka, and M. Sawahashi, Experiments on
real-time 1-Gb/s packet transmission using MLD-based signal detection in MIMO-
OFDM broadband radio access, IEEE J. Sel. Areas Commun., vol. 24, no. 6, pp.
11411153, Jun. 2006.
218 References
System model
Consider an nt nr point-to-point MIMO system (nr nt ), where nt and
nr denote the number of transmit and receive antennas, respectively. Assume
CSI to be known perfectly at both the transmitter and the receiver. Let x =
(x1 , . . . , xnt )T be the vector of symbols transmitted by the nt transmit antennas
in one channel use. Let H = {hij }, i = 1, . . . , nr , j = 1, . . . , nt , be the nr nt
channel coecient matrix, where hij is the complex channel gain between the
jth transmit antenna and the ith receive antenna and hij s are modeled as iid
CN (0, 1). The nr 1 received vector is given by
y = Hx + n, (10.1)
220 Precoding in large MIMO systems
z = Gu + u0 , (10.2)
x = Tz. (10.3)
E[x2 ] = PT , (10.4)
where PT is the total transmit power, and SNR is dened as = PT /N0 .
In slow fading MIMO channels, where transmissions are subject to block fad-
ing, diversity gain/order is a relevant performance metric. In fast fading MIMO
channels, ergodic capacity is the relevant metric. The rate and diversity order
for the precoding schemes are dened as follows. The rate R is dened as the
number of information bits transmitted in each channel use (bits per channel
use or bpcu). Since b bits are transmitted in each channel use, R = b bpcu. To
dene the achieved diversity order dord , let P (H, ) be the word error probabil-
ity of u for a given channel realization H and SNR . The average word error
probability, averaged over the channel fading statistics, is P () = EH [P (H, )].
The diversity order is then dened as
log P ()
dord = lim . (10.5)
log()
H = UV, (10.6)
10.1 Precoding in point-to-point MIMO 221
T = VH , G = Ins , u0 = 0. (10.7)
y = HTu + n. (10.8)
Let U Cnr ns be the submatrix with the rst ns columns of U. The receiver
computes
r = UH y = u + w, (10.9)
ri = i ui + wi , i = 1, . . . , ns , (10.10)
with non-negative fade coecients i , where the channel gain of the kth sub-
channel is k , the kth singular value of the channel matrix. The diversity order
achieved by the kth stream (i.e., the asymptotic slope of the average error proba-
bility for the information symbol uk wrt ) depends on how the pdf of k behaves
around k = 0 [2],[3]. For iid Rayleigh fading, the pdf of the kth singular value
around k = 0 is [2]
(n k+1)(nt k+1)1 (n k+1)(nt k+1)1
p(k ) = ck k r + o k r . (10.11)
So, the diversity order of the kth stream is given by (nr k + 1)(nt k + 1).
The lowest diversity order is achieved by the ns th stream, i.e., the overall error
performance is dominated by the minimum singular value ns . When ns = nr =
nt , the resulting diversity order of SVD precoding is only 1.
x2
xA
(-a, a)
(a, a)
xB
x1
xD
xC
Rotation coding
Consider an SISO fading channel and signaling in two channel uses. Let u =
[u1 u2 ]T , ui a, denote the information vector. The transmit vector x is
obtained as x = Tu, where the rotation matrix T is parameterized by a single
angle , which is given by
cos sin
T= , (0, 2). (10.12)
sin cos
Then the following four codewords are possible: xA = T[a a]T , xB = T[a a]T ,
xC = T[a a]T , xD = T[a a]T . Figure 10.1 shows the original constellation
and the rotated constellation. The received signal in the kth channel use is
y k = hk x k + w k , k = 1, 2, (10.13)
where h1 and h2 are iid fade coecients in channel uses 1 and 2, respectively,
and wk is the additive noise component. For this system model, the pairwise
symbol error probability is given by [7]
16
P (xA xB ) 2 . (10.15)
|d1 d2 |2
10.1 Precoding in point-to-point MIMO 223
X-precoding
Without loss of generality, consider even nr and ns = nr . Precoding using real
rotation matrices, known as X-codes [5], is performed on pairs of subchannels.
The X-precoding scheme is shown in Fig. 10.2. A linear encoder dened by matrix
X Cnr nr pairs the subchannels so that the precoder matrix T and transmit
vector x are given by
T = VH X, x = VH Xu.
Xik ,ik = cos k , Xik ,jk = sin k , Xjk ,ik = sin k , Xjk ,jk = cos k .
The optimal pairing in terms of achieving the best diversity order is the one in
which the kth subchannel is paired with the (nr k + 1)th subchannel [4], i.e.,
the optimal pairing is given by ik = k, jk = (nr k + 1), k = 1, 2, . . . , nr /2.
This ordering achieves diversity order dord (nr /2 + 1)(nt nr /2 + 1). This is
a signicantly improved diversity order compared to that achieved in the case
of no pairing. For example, with nr = nt = ns , the overall diversity order in
the scheme with pairing is (nr /2 + 1)2 , whereas the diversity order is 1 for
the scheme without pairing. If only ns (ns even) out of the nr subchannels are
used for transmission, the lower bound on the overall achievable diversity order
224 Precoding in large MIMO systems
Xprecoder
Channel Receiver
x y r
X VH H UH
where dmin (k ) = min(p,q)SM (p2 + q 2 ) cos2 (k tan1 (q/p)), SM = {(p, q) =
(0, 0) | 0 p (M 1), 0 q (M 1)}, and = PT /N0 . The maximization
in (10.19) can be done numerically. It can be done oine, since these angles
can be xed a priori. The performance of X-codes with these precomputed xed
angles is found to be good (see the BER plot for 16 16 MIMO with 16-QAM
in Fig. 10.4).
Decoding at the receiver is carried out as follows. The receiver computes
r = UH y = Xu + w = Mu + w. (10.20)
This is equivalent to
r k = Mk u k + wk , k = 1, 2, . . . , nr /2, (10.21)
where
rik uik wik ik cos k ik sin k
rk = , uk = , wk = , Mk = .
rjk u jk w jk jk sin k jk cos k
Therefore, ML decoding of u reduces to independent ML decoding of the k
pairs. Also, ML decoding for each pair is separable into independent ML decod-
ing of the real and imaginary components of uk . Hence, the overall complexity
is nr two-dimensional real ML decoders (e.g., SDs). This low complexity allows
10.1 Precoding in point-to-point MIMO 225
No Xcode Xcode
Fade[1=1, 2=0.25]
2 2
dmin = 0.0313 dmin = 0.138
Figure 10.3 Signal space of the transmit and received two-dimensional codewords.
the X-precoding scheme to scale well for a large number of antennas. The reason
why X-precoding achieves good performance can be explained as follows. In the
case of no coding across two subchannels, a deep fade along any one subchannel
can result in an arbitrarily small minimum distance between the received code-
words, and this would increase the word error probability. However, with rotation
using the X-code, the minimum distance between the received codewords of the
rotated constellation is larger and not vanishing even when there is a deep fade
along one of the component subchannels as illustrated in Fig. 10.3.
Y-precoding
When a pair of subchannels is well conditioned (i.e., 1 /2 is close to 1), perfor-
mance of X-codes is good. However, their performance degrades when the pair
of subchannels is ill conditioned (i.e., 1 /2 1). To improve performance in
ill-conditioned subchannels, Y-precoding [5] can be used. The idea behind Y-
precoding is as follows. In SVD precoding, the subchannel gains are the ordered
singular values of the SVD of the channel matrix. By pairing these subchan-
nels, one of the subchannels in a pair is stronger than the other. So, it is in-
tuitive that the codewords be chosen so that the minimum Euclidean distance
between the received code words along the stronger subchannel component is
larger than that along the weaker subchannel component. By doing so, the code
design can make use of the total constrained transmit power to achieve a min-
imum received codeword Euclidean distance greater than that achieved with
226 Precoding in large MIMO systems
100
Xprecoder
Yprecoder
101 MMSE precoder
102
BER
103
nr =nt =16, 16QAM
104 64 bps/Hz
105
10 15 20 25 30 35
(dB)
Figure 10.4 BER performance of the X-precoder, the Y-precoder, and the MMSE
precoder for a 16 16 MIMO system with 16-QAM.
rotated constellations used in X-codes. Y-codes are designed based on this intu-
ition, and the codewords form a subset of a two-dimensional real skewed lattice.
Y-codes are parameterized with two parameters ak and bk related to power al-
located to the two subchannels so that the Ak matrix in Y-codes is of the form
ak 2ak
Ak = , ak , bk R+ , (10.22)
2bk 0
and ak , bk are computed so as to minimize the average error probability. The
overall structure of Y-codes using these parameters for three pairs of subchannels
for nt = nr = 6 is given by
a1,1,1 a1,1,2
a2,1,1 a2,1,2
a3,1,1 a3,1,2
G = . (10.23)
a3,2,1
a2,2,1
a1,2,1
As in X-codes, the optimum parameters for Y-codes can be computed indepen-
dent of H whose performance is found to be good [5]. In addition, Y-codes have
the advantage of lower detection complexity than X-codes. This is because ML
detection of X-codes requires a two-dimensional search whereas ML detection of
Y-codes needs only one-dimensional search.
nr = 16). The performance of the MMSE precoder is also plotted for comparison.
It can be seen that both X- and Y-codes perform signicantly better than the
MMSE precoder. Also, Y-codes perform better than X-codes.
1
User 1
BS nr1
x1 H1
u1
u2 x2
BS H2 1
preprocessing User 2
nr
2
unu
Hnu
xnt
CSI 1
User nu
nrn
u
where B = [B1 B2 Bnu ] is the global precoding matrix, and u = [uT1 uT2 uTnu ]T
is the global data vector. The matrix of channel gains for the ith user is repre-
sented by Hi , and the global channel matrix is given by H = [HT1 HT2 HTnu ]T .
The signal vector received at the ith user is given by
nu
yi = Hi Bi ui + Hi Bj u j + n i , (10.25)
j=1,j =i
where ni Cnri is the noise vector at the ith user. The global received signal
vector can be written as
y = HBu + n, (10.26)
This method does not lead to the linear capacity growth with min(nt , nu ) that
is possible in the multiuser channel. This is because, with a power constraint,
an ill-conditioned channel matrix when inverted needs a large normalization
factor that dramatically reduces the SNR at the receivers. Generalization of
the ZF method to the case of users with more than one antenna can be done
[18]. At the receiver, if the ith user performs the receive processing with a
10.2 Precoding in a multiuser MIMO downlink 229
From (10.28) and (10.29), it can be observed that the interference to the ith user,
i > 1, is less in THP than in linear precoding. Here again, joint optimization of
the precoder and user receive lters can improve performance. The additional
operations involved in non-linear precoding like interference presubtraction and
optimal user ordering result in improved performance of non-linear precoders
compared to linear precoders, but also increase the complexity which is not
desired for large multiuser systems.
System model
Consider a multiuser MISO system, where a BS communicates with nu users on
the downlink (Fig. 10.6). The BS employs nt transmit antennas and each user
is equipped with one receive antenna (i.e., nr = 1). Let uc Cnu denote the
complex information symbol vector, where the ith symbol in uc is meant for the
ith user, i = 1, . . . , nu . Precoding on the information symbol vector uc is carried
out to obtain the precoded symbol vector xc Cnt , which is transmitted using
nt transmit antennas such that the jth symbol of xc is transmitted on the jth
transmit antenna, j = 1, . . . , nt .
Let yi denote the received complex signal at user i, and yc = [y1 y2 ynu ]T .
Let Hc Cnu nt denote the channel matrix such that its (i, j)th entry hi,j is
the complex channel gain from the jth transmit antenna to the ith users receive
antenna. Assuming rich scattering, the entries of Hc are modeled as iid and
CN (0,1). Let ni denote the noise at the ith user, and nc = [n1 n2 nnu ]T . The
elements of nc are modeled as iid and CN (0, 2 ). Therefore, yc can be expressed
in terms of Hc , xc , and nc as
yc = Hc xc + nc . (10.30)
y = Hx + n, (10.31)
10.2 Precoding in a multiuser MIMO downlink 231
BS
User 1
1
2
Data streams
for downlink Precoding User 2
users
nt
Channel
matrix H User nu
H
where
HI HQ
u = [uTI uTQ ]T R2nu , H = R2nu 2nt ,
HQ HI
x = [xTI xTQ ]T R2nt , y = [yIT yQ ] , R2nu , n = [nTI nTQ ]T R2nu ,
T T
and
uc = uI + juQ , xc = xI + jxQ , yc = yI + jyQ , Hc = HI + jHQ , nc = nI + jnQ .
With the above real-valued system model, the real part of the original complex
information symbols (i.e., symbols in uc ) will be mapped to [u1 , . . . , unu ] and
the imaginary part of these symbols will be mapped to [unu +1 , . . . , u2nu ]. For
M -PAM modulation, [unu +1 , . . . , u2nu ] will be zeros since M -PAM symbols take
only real values. In the case of M -QAM, [u1 , . . . , unu ] can be viewed to be from
an underlying M -PAM signal set, and so is [unu +1 , . . . , u2nu ].
Vector perturbation
With the above system model, let G R2nt 2nu denote the precoding matrix.
Therefore, the unit-norm transmitted symbol vector x can be written as
Gu
x= . (10.32)
Gu
For the ZF precoder [32] with nt nu , the precoding matrix is given by
G = GZF = HT (HHT )1 , (10.33)
and the corresponding received signal vector y is given by
u
y= + n. (10.34)
Gu
From (10.34), it is seen that Gu has a scaling eect on the instantaneous
received SNR at the users, and for poorly conditioned channels this results in
a signicant loss in SNR. It is assumed that Gu is known at the receiver
so that the received signal is scaled by Gu prior to detection. Simulations
232 Precoding in large MIMO systems
show that using E {Gu} instead of the instantaneous value of Gu results in
almost the same performance. Hence, in order to improve performance, Gu
needs to be minimized. One technique suggested in the literature is to perturb
the information symbol vector u in such a way that the perturbed vector u is
another point in the lattice, but Gu is much less than Gu [33]. Specically,
u can be dened as
u = u + p, (10.35)
where p Z2nu is the perturbation vector and is a positive real number. The
optimal value of u, denoted by uopt , is given by
uopt = u + popt , (10.36)
where
popt = argmin G(u + p)2 . (10.37)
pZ2nu
In (10.40), the operation is dened on each entry of the vector since each user
gets only one entry of the vector y. The value of the positive real scalar is
xed. The choice of the value of aects the overall performance. Too high a
value is good as far as mitigating the eect of receiver noise is concerned (since
the constellation replicas are placed far apart, and there is little probability that
noise may push a point from one replica to another), but on the other hand a
high value of results in a high value of G(u + p). It has been empirically
observed that a good choice of is given by [33]
= 2|cmax | + d, (10.41)
where |cmax | is the maximum value of either the real or imaginary component of
the constellation symbols, and d is the spacing between the constellation symbols.
10.2 Precoding in a multiuser MIMO downlink 233
n
Normalized
u u linear precoder x y
NDS matrix, G H
(ZF/MMSE)
Precoder
Channel
where p(k) Z2nu is the perturbation vector for the (k+1)th iteration. To reduce
computational complexity, constrain p(k) to have only one non-zero entry. This
can be viewed to be similar to the one-symbol neighborhood denition in the
1-LAS algorithm for detection in Chapter 5.
Let F = GT G, where G R2nt 2nu is the precoding matrix, which can be
one of the linear precoders (e.g., ZF or MMSE). Let q (k) be the power (squared-
norm) of the precoded symbol vector after the kth iteration. Therefore, q (k) is
given by
T
q (k) = Gu(k) 2 = u(k) F u(k) . (10.43)
In the (k + 1)th iteration, the algorithm nds a constrained integer vector p(k)
such that q (k+1) q (k) . Let
q (k+1) = q (k+1) q (k) . (10.44)
Let ei denote a 2nu -dimensional vector of which only the ith entry is one, and
234 Precoding in large MIMO systems
all the other entries are zeros. Since only one non-zero entry in p(k) is allowed,
p(k) can be expressed as a scaled integer multiple of some ei , i = 1, . . . , 2nu .
Because q (k+1) can be negative for more than one choice of i, an appropriate
(k+1)
i has to be chosen. Let qi denote the value of q (k+1) when p(k) is a
scaled integer multiple of ei . For each i, there exists a scaling integer for ei ,
(k) (k+1) (k+1)
i , which minimizes qi . Let this minimum value of qi be denoted
(k+1) (k+1)
qi,opt . Therefore qi,opt can be expressed as
(k+1) (k) 2 2 (k)
qi,opt = i Fi,i + 2i zi (k) , (10.45)
where Fi,i is the ith diagonal entry of F, zi (k) is the ith entry of the vector
z(k) = Fu(k) , (10.46)
and
(k) (k+1)
i = argmin qi
Z
(k)
The values of j used in (10.49) are those after the -adjustment described
above. From (10.46), z(k+1) can be written as
where fj refers to the jth column of F. The algorithm terminates after the nth
iteration if
(n+1)
min qi,opt 0. (10.53)
i
100
nt = nu, nr =1
101 4QAM
102
BER
103
NDSZF, nu =50
NDSMMSE,nu =50
104 NDSZF, nu =200
NDSMMSE, nu =200
SISO AWGN
0 5 10 15 20 25 30
Average received SNR (dB)
Figure 10.8 BER performances of the NDS-ZF and NDS-MMSE precoders for
nt = nu = 50, 200, nr = 1, and 4-QAM.
SE precoders, it is seen that the VP-SE precoder gives the better performance at
moderate to high SNRs. However, the performance of the NDS-MMSE precoder
is quite close to that of the VP-SE precoder at these SNRs. For example, for
nt = 16, the SNR gap between the VP-SE and NDS-MMSE performances at
103 BER is just about 0.4 dB. It is important to note that the NDS-MMSE
precoder achieves this good performance at a much reduced complexity compared
to the exponential complexity of the VP-SE precoder in solving (10.37). This low
complexity advantage of the NDS-MMSE over the VP-SE precoder is illustrated
in Fig. 10.10. It can be seen that the VP-SE precoder has exponential complexity
in nt = nu , whereas the NDS-MMSE precoder has similar complexity to the
MMSE precoder.
In the above context, it is of interest to note that a new type of cellular
structure that comprises inexpensive single-antenna terminals working with BSs
with 50 or 100 antennas, each driven by its own tower-top amplier, of power no
greater than a typical cell-phone power amplier, is envisioned in [36]. Precoders
like the NDS precoder can address the need for low complexity near-optimal
precoding algorithms in such large multiuser MISO systems.
100
101
nr = 1, nu = 8
102 4QAM
BER
Figure 10.9 BER performance comparison between MMSE, NDS-MMSE, and VP-SE
precoders. nt = 8, 16, nu = 8, nr = 1 and 4-QAM.
2
log10(CPU run time per information
MMSE
nt = nu,4QAM NDSMMSE
1 SNR = 15 dB Search in NDSMMSE
Search in VPSE
0 c1nu3
n
c22 u
symbol vector)
1
2
3
4
5
6
7
1 2 3 4 5 6 7
log2(nu)
is a lower bound for Csum , i.e., Csum CCSIR . Also note that CCSIR is the
ergodic capacity of a point-to-point single-user MIMO system with nt receive
antennas and nu transmit antennas with CSIR only. On the other hand, receiver
cooperation between the users increases the capacity, and therefore the sum
capacity Csum is upper bounded by the capacity of a point-to-point MIMO sys-
10.3 Multicell precoding 239
40
CCSIR , nt = 8, nu = 8
35 CCSIT, nt = 8, nu = 8
CCSIR , nt = 12, nu = 8
30 CCSIT, nt = 12, nu = 8
Capacity (bps/Hz)
CCSIR , nt = 16, nu = 8
CCSIT, nt = 16, nu = 8
25
20
15
10
0
6 4 2 0 2 4 6 8 10 12
Average received SNR (dB)
Figure 10.11 Upper and lower bounds for the ergodic sum capacity, Csum .
tem with nt transmit antennas and nu receive antennas with CSIT and CSIR.
We denote this upper bound as CCSIT .
In Fig. 10.11, the upper and lower bounds (CCSIT and CCSIR ) of the sum
capacity Csum are plotted. It is observed that the gap between the bounds di-
minishes with increasing SNR, and therefore any of these bounds is a good
approximation at high SNR. However, at low SNRs, there is a gap between the
bounds, which diminishes as the system becomes more asymmetrical. For exam-
ple, with nu = 8 users and a target spectral eciency of 1.5 bps/Hz for each user,
the gap between the upper and lower bounds is 0.5, 0.8, and 1.3 dB for nt = 16,
12, and 8, respectively. The performance of the NDS-MMSE precoder with turbo
coding and its closeness to the upper bound on Csum is illustrated in Fig. 10.12.
Figure 10.12 shows the turbo coded BER performance of the NDS-MMSE
and VP-SE precoders for nu = 8, nt = 8, 12, 16, 4-QAM, nr = 1, rate-3/4 turbo
code, and sum-rate 8 2 3/4 = 12 bps/Hz. The minimum SNRs required to
achieve a sum-rate of 12 bps/Hz obtained from the upper bound on the sum
capacity curves in Fig. 10.11 are also shown. From Fig. 10.12, it can be seen
that the VP-SE precoder achieves a vertical fall in coded BER at about 9.2, 7.8,
and 7.2 dB away from the respective theoretical minimum SNRs required for
nt = 8, 12, and 16. It is further seen that the vertical fall for the NDS-MMSE
precoder for nt = nu = 8 occurs at about 1.5 dB away from that of the VP-SE
precoder. For the asymmetric cases of nt = 12 and nt = 16, the performance of
the NDS-MMSE precoder is quite close to that of the VP-SE precoder.
100
nt = 8, NDSMMSE
nt = 8, VPSE
nt = 12, NDSMMSE
101 nt = 12, VPSE
nt = 16, NDSMMSE
102
nu = 8, 4QAM
Rate3/4 turbo code
103 Sum rate=12 bps/Hz
104
105
2 0 2 4 6 8 10 12 14 16 18
Average received SNR (dB)
Figure 10.12 Turbo coded BER performance of NDS-MMSE and VP-SE precoders for
nu = 8, nt = 8, 12, 16, nr = 1, 4-QAM, rate-3/4 turbo code, sum-rate = 12 bps/Hz.
of all the BSs can be done jointly by minimizing the SMSE in the system. This
design of joint precoding matrix requires the CSI to be estimated at all the
BSs. So, the symbols transmitted at the antennas of each BS will be the linear
transformation of all the information symbols in the system. Precoding matrices
for all BSs with BS cooperation can be designed considering the minimization
of the MSE of all users in the system.
The coecients that correspond to path loss and shadowing between the BS of
the lth cell and the users in the jth cell are represented by a K K diagonal
matrix, given by
$ %
Djl = diag jl1 jl2 ... jlK . (10.59)
There are two phases in the communication scheme: (i) uplink training, and
(ii) downlink data transmission. The uplink training phase consists of users
transmitting pilots to the corresponding BSs, and downlink data transmission
consists of BSs transmitting data to their users.
Uplink training
In uplink training, each user employs a training sequence of length Np . The
training sequence used by the kth user in the jth cell is denoted as sjk . The
242 Precoding in large MIMO systems
Cell l Cell j
1
2
BS m
jlkhjlkm
M User k
Figure 10.13 System model showing the BS in the lth cell and the kth user in the jth
cell.
L
1
Yl = P u Np Sj Djl
2
Hjl + Wl , (10.60)
j=1
L 1
jl =
1
H Pu Np Djl
2
SH
j I + P N
u p S D S
i il i
H
Yl . (10.61)
i=1
1
L
1
The matrix Mjl =: Pu Np Djl2 SH j I + Pu N p
H
i=1 Si Dil Si in the above equa-
tion is obtained by solving the following MMSE problem:
2
argmin E Hjl Mjl Yl F . (10.62)
Mjl
jl , j = 1, 2, . . . , L. The error
At the end of the training phase, the lth BS has H
in CSI estimate is denoted Hjl = Hjl Hjl .
where Pd is the average transmit power from each BS on the downlink, and
zj = [zj1 zj2 zjK ]T is the K 1 additive noise vector whose elements are
CN (0,1) random variables.
Achievable rates
The received signal at the kth user of the jth cell can be expressed as
M
K
ki
rjk = gjl xli + zjk
l=1 i=1
kk ki
= gjj xjk + gjl xli + zjk , (10.65)
(l,i) =(j,k)
ki
where gjl depends on the system model. In the system without BS cooperation,
ki
1/2
gjl is given as the (k, i)th element of the matrix Pd Djl Hjl Al . In the system
with BS cooperation, gjl is given as the (k, (j 1) K)th element of the matrix
ki
L 1/2 kk
l=1 Pd Djl Hjl Bl . The complex term gjj is not known at the user. So, the
received symbol expression is rewritten as
kk kk kk
rjk = E gjj xjk + gjj E gjj xjk + ki
gjl xli + zjk ,
(l,i) =(j,k)
=E kk
gjj xjk + zjk , (10.66)
where
kk zjk is the eective additive noise term. The
kkmotivation
for signal term
E gjj xjk in (10.66) comes from the fact that E gjj is known at the user as it
depends only on the channel statistics, and not on the instantaneous channel. The
variance of the eective additive noise is minimum with this rearrangement of
244 Precoding in large MIMO systems
terms. The variance of the eective noise term zjk is given by var{zjk } =
ki 2
kk
var{gjj } + (l,i) =(j,k) E |gjl | + 1. It can be veried that E xjk zjk = 0. So,
the additive noise term is uncorrelated with the signal term. In [41], it has been
proved that the worst case uncorrelated additive noise is independent Gaussian
noise of the same variance. This implies that the rate achievable with indepen-
dent Gaussian noise is achievable in the case of any uncorrelated additive noise
whose distribution may not be Gaussian. So, the rate given by the following
expression is achievable for the kth user of the jth cell:
? kk ?2
?E g ?
Rjk = log2 1 + $ kk %
jj
ki . (10.67)
var gjj + (l,i) =(j,k) E |gjl |2 + 1
?
2 2 ?
min EF jl , zl , xl l (Fll Al xl + zl ) xl + l (Fjl Al xl ) ?F jl (10.68)
Al ,l
j =l
subject to tr{AH l Al } = 1,
1/2
where Fjl = Pd Djl Hjl , F jl = Pd D H 1/2 jl , and F jl = Pd D1/2 H jl , for
jl jl
all j, l. This objective function is quite intuitive. It consists of two parts: (i) the
expected sum of squares of errors seen by the users in the lth cell, and (ii) the
expected sum of squares of interference seen by the users in the other cells.
The parameter controls the relative weights associated with these two parts.
The real scalar parameter l is important as it corresponds to scaling that can be
performed at the users. The following property of the CSI estimation
error is used
?
H ?
in solving the optimization problem in (10.68): E Fjl Fjl Fjl = jl IM , where
, -
1/2 1/2 1/2
jl = Pd tr Djl IK Pu Np Djl SH j l Sj Djl Djl , (10.69)
L 1
and l = I + Pu Np i=1 Si Dil SH i . It can be seen that the above matrix
includes the training sequences of the users. The solution to the optimization
problem in (10.68) in closed form is obtained as
1 H
H F jl + IM
1
H,
Aopt
l = F ll F ll + 2
F jl F ll (10.70)
lopt j =l
where = ll + 2 j =l jl + K, and lopt satises tr{(Aopt H opt
l ) Al } = 1. This
precoding method outperforms single-cell precoding methods as the optimization
in this method considers the inter-cell interference and the statistics of the CSI
estimation error. The eect of pilot contamination can be further reduced and
increased sum-rates can be achieved through precoding with BS cooperation.
10.3 Multicell precoding 245
where F jl and F
jl are as dened before. The optimization problem is dened as
follows:
L . L .2 ?
. . ?
min EF, z, x . Flj Bj x + zl xl . ? F (10.71)
,{Bl }L
l=1
2
l=1 j=1
$ %
subject to tr BH
l Bl = 1, l = 1, 2, . . . , L.
Here, the objective function is the expected sum of the errors seen by all the
users in the system. The Lagrangian formulation is
L (B1 , B2 , . . . , BL , , 1 , 2 , . . . , L )
L . L .2 ??
L
. . ?
= EF,z,x
. Flj Bj x + zl xl . ? F + l tr{BH
l Bl } 1 ,
2
l=1 j=1 l=1
L ,
L -,
L -H ??
= tr EF,z,x Flj Bj x + zl xl Flj Bj x + zl xl ?
?F
l=1 j=1 j=1
L
+ l tr{BH
l Bl } 1
l=1
L
L
L H
L
= tr EF 2 Flj Bj Flj Bj Flj Bj Il + 2 IK
l=1 j=1 j=1 j=1
L H ?
L
H ?
Il Flj Bj + IK ? F + l tr{BH
l Bl } 1
j=1 l=1
*
L
L
L H
L
= tr 2 lj Bj
F lj Bj
F + 2 lj BH
j Bj
l=1 j=1 j=1 j=1
+
L H
L H
L
lj Bj Il Il
F lj Bj
F + l tr{BH
l Bl } 1 + ( + 1)KL,
2
where
T
Il = 0KK 0KK . . . IK . . . 0KK .
lth position
?
The following properties are used in the above simplications: E F jl F
H ? F jl =
jl
jl IK and E [xxH
l ] = Il . Dierentiating the Lagrangian with Bj , , and equating
246 Precoding in large MIMO systems
opt H
tr{Bopt
j Bj } = 1, j = 1, 2, . . . , L, (10.73)
9 H
L L L L
H opt
tr 2opt lj Bopt
F lj Bopt
F + 2 opt
lj Bopt Bj
j j j
l=1 j=1 j=1 j=1
L H
L H :
lj Bopt
F
Il Il lj Bopt
F + 2opt KL = 0. (10.74)
j j
j=1 j=1
10.3.4 Performance
Consider a multicell system with L = 4 cells and K = 4 users in each cell as
shown in Fig. 10.14. The average powers at the BS and at the users are taken to
10.3 Multicell precoding 247
Cell 1 Cell 3
22k = 1 44k = 1
24k = 0.08
42k = 0.08 BS
BS
Cell 2 Cell 4
220
Multicell MMSE with BS cooperation with Ideal CSI
Sum rate (bits per channel use)
have perfect CSI. In case (ii), the CSI is still imperfect because of the estimation
error due to MMSE estimation. In Fig. 10.15, the eect of pilot contamination
can be clearly seen: the gap between the sum-rate curves for the ideal CSI case
and the no pilot contamination case is smaller than the gap between the sum-
rate curves of without and with pilot contamination. It can also be seen that
precoding with BS cooperation gives greater sum-rates than precoding without
BS cooperation.
References
[14] C. B. Peel, On dirty-paper coding, IEEE Signal Process. Mag., vol. 20, no. 3,
pp. 112113, May 2003.
[15] S. Vishwanath, N. Jindal, and A. Goldsmith, Duality, achievable rates and sum-
rate capacity of MIMO broadcast channels, IEEE Trans. Inform. Theory, vol. 49,
no. 10, pp. 26582668, Oct. 2003.
[16] P. Viswanath and D. Tse, Sum capacity of the vector Gaussian broadcast channel
and uplinkdownlink duality, IEEE Trans. Inform. Theory, vol. 49, no. 8, pp.
19121921, Aug. 2003.
[17] M. Vu and A. Paulraj, MIMO wireless linear precoding, IEEE Signal Process.
Mag., vol. 24, no. 5, pp. 86105, Sep. 2007.
[18] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, Zero-forcing methods for
downlink spatial multiplexing in multiuser MIMO channels, IEEE Trans. Signal
Process., vol. 52, no. 2, pp. 461471, Feb. 2004.
[19] Z. Pan, K.-K. Wong, and T.-S. Ng, Generalized multiuser orthogonal space divi-
sion multiplexing, IEEE Trans. Wireless Commun., vol. 3, no. 2, pp. 19691973,
Nov. 2004.
[20] C.-B. Chae, D. Mazzarese, and R. W. Heath Jr., Coordinated beamforming for
multiuser MIMO systems with limited feedforward, in Asilomar Conf. on Signals,
Systems and Computers, Pacic Grove, CH, Oct.Nov. 2006, pp. 15111515.
[21] L. Choi and D. Murch, A transmit preprocessing technique for multiuser MIMO
systems using a decomposition approach, IEEE Trans. Wireless Commun., vol. 3,
no. 1, pp. 2024, Jan. 2004.
[22] B. Bandemer, M. Haardt, and S. Visuri, Linear MMSE multi-user MIMO down-
link precoding for users with multiple antennas, in IEEE PIMRC2006, Helsinki,
Sep. 2006, pp. 15.
[23] J. Zhang, Y. Wu, S. Zhou, and J. Wang, Joint linear transmitter and receiver
design for the downlink of multiuser MIMO systems, IEEE Commun. Lett., vol. 9,
no. 11, pp. 991993, Nov. 2005.
[24] S. Shi, M. Schubert, and H. Boche, Downlink MMSE transceiver optimization
for multiuser MIMO systems: duality and sum-MSE minimization, IEEE Trans.
Signal Process., vol. 55, no. 11, pp. 54365446, Nov. 2007.
[25] A. Mezghani, M. Joham, R. Hunger, and W. Utschick, Transceiver design for
multi-user MIMO systems, in WSA2006, Schtoss Reisenburg, Mar. 2006.
[26] A. M. Khachan, A. J. Tenenbaum, and R. S. Adve, Linear processing for the
downlink in multiuser MIMO systems with multiple data streams, in IEEE
ICC2008, Beijing, Jun. 2008, pp. 41134118.
[27] A. J. Tenenbaum and R. S. Adve, Linear processing and sum throughput in the
multiuser MIMO downlink, IEEE Trans. Wireless Commun., vol. 8, no. 5, pp.
26522661, May 2009.
[28] M. Tomlinson, New automatic equaliser employing modulo arithmetic, Electron.
Lett., vol. 7, no. 5/6, pp. 138139, Mar. 1971.
[29] H. Harashima and H. Miyakawa, Matched transmission technique for channels
with inter-symbol interference, IEEE Trans. Commun., vol. 20, no. 8, pp. 774
780, Aug. 1972.
[30] A. Mezghani, R. Hunger, M. Joham, and W. Utschick, Iterative THP
transceiver optimization for multi-user MIMO systems based on weighted sum-mse
minimization, in IEEE SPAWC2006, Cannes, Jul. 2005, pp. 15.
250 References
Channel models play a crucial role in the design and analysis of wireless commu-
nication systems. They enable the system designers to analyze the performance of
wireless systems and optimize design parameters even before the systems are ac-
tually built. They are key ingredients in such design and performance evaluation
exercises, which are often carried out through mathematical analysis or computer
simulation or a combination of both. A good channel model that accurately cap-
tures the real channel behavior is a very valuable tool that can accelerate the
development of practical wireless systems. The need for good channel models
to aid the design, analysis, and development of MIMO systems in general, and
large MIMO systems in particular, is immense. A lot of eort has been directed
towards MIMO channel sounding campaigns and MIMO channel modeling. Mea-
surements from these campaigns have aided the formulation of MIMO channel
models in wireless standards [1][4]. Channel sounding campaigns with large
numbers of antennas, in both outdoor and indoor settings, have also appeared,
though sparsely, in the literature. Now, with the increasing interest in large
MIMO system implementations, there is renewed interest and activity in large
MIMO channel sounding. While these channel measurements are expected to
yield accurate and realistic models for large MIMO channels, the traditional an-
alytical MIMO channel models which are widely known in the literature, are ex-
pected to nd continued use. This chapter gives a summary of analytical MIMO
channel models and MIMO channel models in current wireless standards, and
details of some of the more recent large MIMO channel sounding campaigns.
The number of antenna elements and their geometrical conguration, po-
larization, and propagation environment inuence the real-life MIMO channel
characteristics, and hence the corresponding channel models. MIMO channel
measurements using linear, circular, planar, and three-dimensional arrays with
single or dual polarized antenna elements are common. These measurements can
yield stochastic characterization of MIMO channels. They can also be used to
validate analytical models. It is typical to plot the amplitude envelope of the
MIMO channel coecients obtained from measurements as normalized pdfs and
see if the distributions follow well-known distributions (e.g., Rayleigh, Ricean,
log-normal) or other distributions. Another key interest is to capture the spa-
tial characteristics of the MIMO channel in the model. Widely studied MIMO
channel models include the Kronecker model [5],[6], the Weichselberger model
252 MIMO channel models
[7], the nite scatterers model [8], the maximum entropy model [9], and the vir-
tual channel separation model [10]. Validation of various models is often carried
out using data from channel sounding campaigns by extracting the model pa-
rameters from measured data, generating synthesized channels by Monte Carlo
simulations, and comparing certain metrics from the synthesized channels with
those extracted directly from the measurement.
The most commonly used analytical MIMO channel model is the spatially iid
frequency non-selective (at) fading channel model. In this narrowband chan-
nel model, the channel gain between any pair of transmitreceive antennas is
modeled as a complex Gaussian random variable. This model relies on (i) the
antenna elements in the transmitter/receiver being spatially well separated, and
(ii) the presence of a large number of temporally but narrowly separated mul-
tipaths (common in a rich-scattering environment), whose combined gain, by
the central-limit theorem, can be approximated by a Gaussian random variable.
In a pure multipath environment without a line-of-sight (LOS) component, the
gains have zero mean and the corresponding amplitude distribution is Rayleigh.
If an LOS component exists in addition to the multipath components, then the
mean is non-zero and the amplitude distribution is Ricean. In general, the nr nt
channel matrix H can be considered to be made up of a zero-mean stochastic
part Hs and a deterministic part Hd according to
@ @
1 K
H= Hs + Hd , (11.1)
1+K K +1
where K is the Rice factor, dened as the ratio between the powers in the LOS
path and the non-LOS paths. K = 0 corresponds to a pure multipath channel
without a LOS component, and K = corresponds to an unfaded AWGN
channel.
RH contains the correlations of all the elements of the channel matrix and
describes the spatial statistics. Now, realizations of vector h with distribution
(11.2), and hence the realizations of the channel matrix H, can be obtained by
1/2
h = RH g, (11.3)
1/2 1/2 1/2
where RH is any matrix satisfying RH RH H = RH , and g is an nt nr 1
vector with iid Gaussian entries with zero mean and unit variance. Note that the
number of real-valued parameters required to fully specify RH is n2t n2r , which
is large for large nt , nr . Imposing a certain structure on the correlation matrix
can reduce this requirement. Dierent ways to reduce this requirement lead to
dierent correlation based models which are described below.
i.e., all entries of H are uncorrelated and have equal variance 2 . A single real-
valued parameter fully species the channel model in this case. This model
corresponds to a spatially white MIMO channel, which, in practice, can occur in
rich scattering environments having multipath components uniformly distributed
in all directions. The model is attractive due to its simplicity. It has found ex-
tensive use in theoretical studies (e.g., information theoretic analysis of MIMO
systems) and simulations (e.g., performance evaluation of MIMO systems and
algorithms) [11]. The simplicity of the iid model makes it quite attractive for
large MIMO system studies.
and
1/2 1/2
H = RTx GRRx , (11.7)
254 MIMO channel models
where, as before, g is an nt nr 1 vector with iid Gaussian entries with zero mean
and unit variance, and G is an iid unit variance matrix obtained by performing
an inverse vec(.) operation on g. The number of parameters that characterize
this model is n2t (parameters in RTx ) plus n2r (parameters in RRx ), unlike n2t n2r
parameters in the full correlation matrix.
A limitation with the Kronecker model is that it does not take into account
the coupling between the direction of departure (DoD) at the transmitter and
the direction of arrival (DoA) at the receiver, which is typical in MIMO channels.
So the Kronecker model is suitable only in certain environments. In small MIMO
systems (e.g., 2 2 and 3 3 indoor MIMO channels at 5.2 GHz and 5.8 GHz),
because of the reduced spatial resolution involved, the Kronecker model has been
shown to be a good t to the measured channel [12],[13]. However, application
of the Kronecker model to an 8 8 measured channel (e.g., an 8 8 NLOS
MIMO channel at 5.2 GHz) has indicated discrepancies in the modeled capacity
(the Kronecker model underestimated the capacity extracted from the channel
measurements) and the joint spatial DoDDoA spectra [14]. These discrepancies
are likely to be more pronounced in large arrays with high angular resolution [14].
Similar results have been reported in outdoor-to-indoor oce MIMO channel
measurements in the 5.2 GHz band [15]; while the Kronecker model was found
to be a good t in a 2 8 LOS setup, the t was found to be not as good
in a 16 8 NLOS setup. These observations and results in [15] have served as
input to the COST 273 MIMO channel model [16]. The Kronecker model has
also been shown to underestimate throughputs in adaptive modulation in MIMO
compared to throughputs obtained using measured channel matrices [17].
Ray-tracing methods have been used in [18] to study the suitability of the
Kronecker model for use in the two-ring model (Fig. 11.1(a)) and the elliptical
model (Fig. 11.1(b)) of scatterer distributions. The two-ring model of scatterers
can be viewed as typical of outdoor environments; e.g., in cellular systems where
the BS and the mobile are surrounded by dierent sets of scatterers and there is
no LOS between the mobile and the BS. The elliptical model of scatterers can be
viewed as typical of indoor environments; e.g., indoor Wi-Fi environments where
the transmitter and receiver share the same scatterers [19]. It has been shown
that the Kronecker model is suitable for the two-ring model (i.e., suitable for some
outdoor environments) and not suitable for the elliptical model (i.e., not suitable
for some indoor environments) due to the separability and non-separability of
the correlation structures in the former and latter models, respectively.
Despite the limitation of ignoring the coupling between the DoD and DoA at
the transmit and receive ends, the Kronecker model has been popularly used
in information theoretic capacity analysis and simulation studies [5],[6],[20],[21].
Though the Kronecker model is expected to be increasingly inaccurate for in-
creasing array size and angular resolution, it will still nd use in large MIMO
system studies because of its simplicity.
11.1 Analytical channel models 255
l th Tx K th Rx k th scatterer
scatterer scatterer
lpk lkq
dlk
dt 1 1 d dt 1 k k k 1
2 dr
r
2 2 2
Tx Rx
nt nr nt nr
Scatterers at Scatterers at
Tx end Rx end
(a) (b)
Figure 11.1 Models of scatterer distribution: (a) two-ring model; (b) elliptical model.
R T x = UT x T x UH
T x, (11.8)
RRx = URx Rx UH
Rx , (11.9)
where UT x and URx are unitary matrices whose columns are the eigenvectors
of RT x and RRx , respectively, and T x and Rx are diagonal matrices with the
corresponding eigenvalues. The Weichselberger model is given by
= Rx TT x , (11.11)
where T x and Rx are vectors containing the eigenvectors of the transmit and
receive correlation matrices, respectively.
The Weichselberger model is specied by the transmit and receive eigen-
modes (UT x , URx ), and the coupling matrix . The real-valued parameters
that characterize the Weichselberger model include: nr (nr 1) parameters for
256 MIMO channel models
2 Path 5 2
T ,1 R, 1
Path 1 Path 1
T , 2 Gain 1
nt R, 2 nr
Path 3 Path 2 Path 4
Transmit Path 2 Receive
array Gain 2 array
scattering with split components, where there can be one AoD but two or more
AoAs; likewise, there can be multiple AoDs and a single AoA.
Let s and r denote the transmit and received signal vectors of size nt 1 and
nr 1, respectively. Let p denote the nr 1 sized response vector at the receive
antenna array due to signal with AoA R,p . Likewise, let p denote the nt 1
sized steering vector at the transmit side for the signal with AoD T,p . The p s
can be ignored if the signal bandwidth is narrow compared to the coherence
bandwidth of the channel. Assuming narrowband signaling where the p s can be
neglected, the received signal vector is given by
P
r= p p Tp s = T s, (11.12)
p=1
where = [ 1 2 P ] is an nr P matrix, = [ 1 2 P ] is an nt P
matrix, and is a P P diagonal matrix with p as the pth diagonal element.
The nite scatterers channel model is then given by
H = T . (11.13)
In this model, the steering and response vectors, p s and p s, incorporate the
geometry, directivity, and coupling of the antenna elements. A p vector for
uniform linear antenna array at the receiver with constant inter-element spacing
dr is given by [8]
9 :T
ndr
p = exp j2 sin(R,p ) , n = 0, . . . , nr 1 , (11.14)
which enables all the scatterers to be captured, and the Gaussian approximation
becomes more realistic.
The placement of MIMO antenna elements and the propagation conditions wit-
nessed in practice often render the iid fading model inadequate. For example, spa-
tial correlation on the transmit and/or receive side can aect the rank structure of
the MIMO channel resulting in degraded channel capacity. The non-LOS (NLOS)
correlated MIMO channel model for an outdoor propagation scenario shown in
Fig. 11.3 was proposed in [27]. The model explains the existence of pinhole chan-
nels which exhibit poor rank properties, even if the spatial fading correlations
at both ends of the link are low. In other words, the realization of high MIMO
capacity in actual wireless channels is sensitive not only to the fading correlation,
but also to the structure of the scattering in the propagation environment.
The channel model in [27] considers linear arrays of antennas at the transmit-
ter and the receiver. The transmitter has nt omnidirectional transmit antenna
elements with inter-element spacing dt . Likewise, the receiver has nr omnidirec-
tional receive antenna elements with inter-element spacing dr . The propagation
path between the transmit and receive arrays is obstructed on both sides of the
link by a number of signicant near-eld scatterers (e.g., large objects) referred
to as transmit and receive scatterers, which are modeled as omnidirectional ideal
scatterers. The maximum ranges of the scatterers from the horizontal axis on
the transmit and receive sides are denoted by Dt and Dr , respectively. When
omnidirectional antennas are used, Dt and Dr correspond to the transmit and
receive scattering radii, respectively. On the receive side, the signal reected by
the scatterers onto the antennas impinges on the array with an angular spread
r , which is a function of the distance between the array and the scatterers.
Similarly, the angular spread on the transmit side is denoted t . The range be-
tween the local scatterers on the transmit and receive sides is denoted by R.
It is assumed that the scatterers are located suciently far from the antennas
that the plane-wave assumption holds. Further, the local scattering condition is
assumed, i.e., Dt R and Dr R. The number of scatterers on each side, S, is
considered to be large enough (typically > 10) for random fading to occur. The
complex channel gain matrix of this model is given by [27]
Dr
dt
dr
s r
t
nr Rx antennas
Dt
nt Tx antennas
Figure 11.3 Propagation scenario for the analytical MIMO fading channel model in
[27].
1/2
h = Rr ,dr g, where g CN (0, Inr ). A similar denition holds for the transmit
correlation matrix Rt ,dt . Accordingly, Gt in (11.21) is an iid Rayleigh fading
matrix of size S nt , given by Gt = [g1 g2 gnt ], where gn CN (0, IS ). Sim-
ilarly, Gr is an iid Rayleigh matrix of size nr S. The Rt ,dt and Rr ,dr matrices
control the transmit and receive antenna correlations, respectively. Dierent as-
sumptions on the statistics of the DoDs/DoAs will yield dierent expressions for
these matrices [19],[27]. The RS ,2Dr /S matrix in (11.21) is dened as follows.
Each scatterer at the transmit side captures the signal from the transmit anten-
nas and radiates it in the form of a plane wave toward the scatterers on the receive
side. The receive side scatterers are viewed as an array of S virtual antennas with
average spacing 2Dr /S and experience an angular spread S = 2 tan1 (Dt /R).
Let the nth transmit antennas signal captured by the S receive side scatter-
ers be denoted by the vector yn = [yn,1 yn,2 yn,S ]T . Using the approxi-
mation that the receive side scatterers form a uniform array of sensors, yn
1/2
CN (0, RS ,2Dr /S ), or equivalently yn = RS ,2Dr /S gn with gn CN (0, IS ).
receiver, preventing the channel rank from building up. This happens when the
1/2
rank of the matrix RS ,2Dr /S in (11.21) drops, which can be caused by, for ex-
ample, a large transmitreceive range R or small scattering radii (i.e., small Dt
or small Dr or both). It has been also argued that the measurements of [28]
showed only rare occurrences of weak pinholes, and that although experimental
evidence of pinholes was established in controlled environments in the labora-
tory in [29],[30], not many true occurrences of pinholes have been reported [31].
Nonetheless, the eect of pinholes on the performance of MIMO systems contin-
ues to be widely studied [32][35].
10-3
10-4
10-5
5 10 15 20 25 30 35 40 45 50
Average received SNR (dB)
Figure 11.4 Uncoded/coded BER performance of a 1-LAS detector in iid fading and
correlated fading in NO-STBC MIMO system with nt = nr = 16, 16 16 NO-STBC,
16-QAM, rate-3/4 turbo code, 48 bps/Hz.
fading compared to iid fading. From the BER plots in Fig. 11.4, it can be ob-
served that at an uncoded BER of 103 , the performance in correlated fading
degrades by about 7 dB compared to that in iid fading. Likewise, at a rate-3/4
turbo coded BER of 104 , a performance loss of about 6 dB is observed in corre-
lated fading compared to that in iid fading. In terms of nearness to capacity, the
vertical fall of the coded BER for iid fading occurs at about 24 dB SNR, which
is about 13 dB away from theoretical minimum required SNR of 11.1 dB. With
correlated fading, the detector is observed to perform close to capacity within
about 18.5 dB. One way to alleviate such degradation in BER performance due
to spatial correlation is to provide more dimensions (i.e., more antennas) on the
receive side. The results presented in Fig. 11.5 illustrate this point.
nt = nr = 12, uncoded
nt = 12, nr = 18, uncoded
100 Uncoded SISO AWGN
nt = nr = 12, rate 3/4 turbo coded
nt = 12, nr = 18, rate 3/4 turbo coded
Min. SNR for capacity = 36 bps/Hz (nt = nr = 12)
Min. SNR for capacity = 36 bps/Hz (nt = 12, nr = 18)
10-1
10-3
10-4
10-5
5 10 15 20 25 30 35 40 45 50
Average received SNR (dB)
Figure 11.5 BER performance of a large MIMO system with nr = nt and nr > nt in
correlated fading keeping nr dr constant and dt = dr . nr dr = 72 cm. 12 12
NO-STBC, nt = 12, nr = 12, 18, 16-QAM, rate-3/4 turbo code, 36 bps/Hz.
BER performance with [nt = nr = 12] by about 11.5 dB at 103 BER. With
a rate-3/4 turbo code (36 bps/Hz), at a coded BER of 104 , there is an im-
provement in performance of about 13 dB with [nt = 12, nr = 18] compared
to [nt = nr = 12]. With [nt = 12, nr = 18], the vertical fall of coded BER is
such that it is about 8 dB from the theoretical minimum SNR needed to achieve
capacity. Therefore, it is seen that the BER performance loss that occurs due to
spatial correlation can be alleviated by using more receive antennas.
At this point, it is appropriate to note that spatial correlation need not
always be harmful. For example, transmit correlation in MIMO fading can be ex-
ploited by using non-isotropic inputs (precoding) based on knowledge of the chan-
nel correlation matrices. While [37][39] have proposed correlation-exploiting
precoders for orthogonal/quasi-orthogonal small MIMO systems in correlated
Rayleigh/Ricean fading channels, such precoders for large MIMO systems re-
main to be studied.
area networks (WLAN) in the 2.4 GHz and 5 GHz frequency bands for unlicensed
use. Earlier standards in the WiFi family, including IEEE 802.11b/11g/11a,
use single-antenna terminals and access points. Multiantenna techniques were
adopted in later standards including IEEE 802.11n/11ac in order to signicantly
increase the data rates compared to the rates in the 802.11b/g/a standards. IEEE
802.11n supports up to 600 Mbps through spatial multiplexing of up to four data
streams simultaneously in the same frequency using up to four antennas. IEEE
802.11ac aims to support multigigabit rates (up to 3.6 Gbps) using up to eight
data streams, and allows multiuser MIMO conguration. IEEE 802.11ac is also
referred to as 5G WiFi or gigabit WiFi.
3GPP refers to a family of standards for third generation (3G) mobile radio
communication and beyond. Channel modeling eorts in 2G and 3G mobile
radio systems under the European research initiative cooperation in science
and technology (COST) have resulted in COST channel models for dierent
radio environments including micro-, macro-, and pico-cell scenarios [40]. These
initiatives dened channel models (e.g., COST 259, COST 273 models) that
include the directional characteristics of radio propagation, and are therefore
suitable for simulation of smart antennas and MIMO systems. In particular, the
spatial channel model (SCM) was developed by 3GPP as a common reference
model for evaluating dierent MIMO techniques in the 2 GHz band in outdoor
environments [41]. The system bandwidth in the SCM model is 5 MHz. The
issue of this narrow bandwidth was addressed in the WINNER channel model
[42], where the SCM model was extended for a 100 MHz system bandwidth as
well as a 5 GHz center frequency. This extended model is referred to as the
SCM-Extended (SCME) model. The WINNER model is very general and covers
many scenarios. It is a multicluster model similar to SCM. Cluster based models
are quite relevant because practical measurements show clusters, and clusters
reduce the number of parameters considerably. Many standard MIMO channel
models rely on clusters (e.g., IEEE 802.11n, 3GPP-SCM, COST 273, WINNER
II). In the WINNER model, clusters are placed to generate given azimuth power
spectra at the transmit side and at the receive side. Each cluster has 20 multipath
components. Eighteen dierent scenarios are parameterized by large numbers of
measurements in outdoor, indoor, outdoor-to-indoor, with and without LOS, and
high-speed scenarios. A simplied version of the SCME model has been adopted
for standardization of the 3GPP long-term evolution [43],[44].
Parameters A B C D E F
MIMO WLANs [45]. Six models (TGn models AF) are dened. The models
assume linear antenna arrays at the transmitter and receiver with 1/2, 1, and 4
wavelength spacing between antenna elements.
To model the frequency selectivity/delay spread characteristics of the channel,
multicluster tapped delay line models with dierent numbers of taps are dened.
Table 11.1 shows the parameter sets that dene the delay spread characteris-
tics for the TGn models AF, which are meant to reect the characteristics of
the modeled environment. Note that TGn model A is the frequency-at fading
model. The multicluster model is based on the cluster model developed by Saleh
and Valenzuela [46]. Depending on the model, the number of clusters varies from
2 to 6, which is chosen to be consistent with several experimentally determined
results reported in the literature. Power, angular spread, AoA, and AoD values
are assigned to each tap and cluster using statistical methods that agree with ex-
perimentally determined values reported in the literature. Cluster angular spread
has been experimentally found to be in the range 20 40 , and the mean AoA has
been found to be random with a uniform distribution. The power, angular spread
at the transmitter and receiver, and AoA and AoD for each tap and cluster for
models AF are tabulated in Appendix C of [46]. For a given antenna congura-
tion, the channel matrix H of size nr nt for each tap can be determined with
the knowledge of each tap power, AS, and AoA/AoD. Modeling of this channel
matrix also considers transmit and receive correlation matrices, following the ap-
proach in [47],[48], where the correlation matrix for each tap is based on the power
angular spectrum (PAS) with angular spread being the second moment of PAS.
The temporal fading characteristics of indoor wireless channels are quite dier-
ent from those of outdoor mobile channels. In indoor wireless systems, the trans-
mitter and receiver are often stationary with people moving in between. Whereas,
in outdoor mobile systems, user terminals often move at dierent speeds through
an environment. Therefore, the Doppler bandwidth in indoor wireless channels
can be signicantly smaller than the Doppler bandwidths in outdoor mobile
channels. To model the time selectivity of the indoor wireless channel, a bell-
shaped Doppler spectrum is dened in the WiFi standard. The Doppler spectrum
assumes reectors moving in the environment at 1.2 km/h speed. This corre-
sponds to about 6 Hz Doppler in the 5 GHz band and 3 Hz in the 2.4 GHz band.
11.3 Standardized channel models 267
Channel models for polarized antennas are also dened. Reference [46] refers to a
weblink [49], where a Matlab implementation of the TGn MIMO channel models
with appropriate antenna correlation properties is available for download.
a parameter . Likewise, correlation matrices for the UE (RU E ) are dened for
one antenna, two antennas, and four antennas, in terms of another parameter
. Spatial correlation matrices (Rspat = ReN B RU E ) are dened to charac-
terize the spatial correlation between the antennas at the eNodeB and UE in
terms of the parameters and . These correlation matrices are dened for the
cases of 1 2, 2 2, 4 2, and 4 4 MIMO congurations. For cases with
more antennas at either eNodeB or UE or both, the channel spatial correlation
matrix can still be expressed as the Kronecker product of ReN B and RU E ac-
cording to Rspat = ReN B RU E . The (, ) values for low, medium, and high
correlation levels are dened to be (0, 0), (0.3, 0.9), and (0.9, 0.9), respectively.
The spatial correlation matrix for the low correlation case with = = 0 for
an n m MIMO conguration is nothing but Inm . MIMO correlation matrices
using cross-polarized antennas are also dened, in which case the Rspat is given
by Rspat = P(ReN B RU E )PT , where ReN B is the correlation matrix at
the eNodeB with the same polarization, RU E is the correlation matrix at the
UE with the same polarization, is a polarization correlation matrix dened to
be a function of a parameter , and P is a permutation matrix. ReN B , RU E
matrices for a two-antenna transmitter using one pair of cross-polarized antenna
elements, a four-antenna transmitter using two pairs of cross-polarized antenna
elements, and an eight-antenna transmitter using four pairs of cross-polarized
antenna elements are dened. For the high-correlation case, the values of , ,
are 0.9, 0.9, 0.3, respectively.
The delay proles are dened for low, medium, and high delay spread envi-
ronments, representative of extended pedestrian A (EPA), extended vehicular A
model (EVA), and extended typical urban (ETU) models, respectively. Likewise,
the time variation of the channel is dened through low (5 Hz), medium (70 Hz),
and high (300 Hz) Doppler frequencies. Table 11.2 shows the delay proles and
maximum Doppler frequencies, and Table 11.3 shows the power delay proles
for the EPA, EVA, and ETU scenarios.
MIMO channel measurements using a large number of antennas have been re-
ported in the literature. With the growing interest in the practical implemen-
tation of large MIMO systems, more and more channel sounding measurements
11.4 Large MIMO channel measurement campaigns 269
using large antenna arrays are being reported. Indoor and outdoor measurements
in the 2 GHz and 5 GHz bands are common. Some key requirements of MIMO
channel measurement systems include: (i) good angular resolution to distinguish
DoA and DoD, (ii) polarization discrimination capability to determine the use-
fulness of orthogonal polarizations as parallel channels, and (iii) the capability to
record the channel continuously to investigate the multipath behavior as the user
terminal moves, which allows the use of the Doppler domain in the signal anal-
ysis. Some of the large MIMO channel sounding campaigns and measurements
reported in the literature are summarized in this subsection.
Figure 11.6 Sixteen-element antenna array in a user terminal (laptop). Urban outdoor
MIMO channel sounding in 2.11 GHz. Photo source: [53].
32 64 indoor measurements
Indoor MIMO channel sounding experiments in the 5.3 GHz band with 120
MHz bandwidth under LOS and NLOS scenarios (representative of WLAN en-
vironments) using dual polarized 64-element cylindrical antenna structure and a
21-element semi-spherical antenna structure were reported in [56],[57]. Channel
measurements between one transmitter (semi-spherical mount) and two receivers
(one semi-spherical mount and one cylindrical mount) each having 32 channels
were carried out, essentially acquiring two 32 32 channel matrices or equiva-
lently one eective channel matrix of size 64 32. The path loss (exponents 2.2,
11.4 Large MIMO channel measurement campaigns 271
8.2 for Tx-Rx1 and 1.5, 9.7 for Tx-Rx2) and power delay prole (maximum delay
of about 450 ns) characteristics, and MIMO capacity results obtained from these
measurements were reported.
16 16 indoor measurements
Contributions to IEEE 802.11 task group TGac towards 802.11ac channel model-
ing have considered indoor channel sounding measurements with 88 and 1616
MIMO congurations in the 5.17 GHz band under LOS and NLOS settings [58].
The eight-element antenna was a polarized slot antenna array with /2 sepa-
ration between slot pairs. The 16-element antenna was a linear dipole antenna
array with /2 separation between the elements. Results from measurements
suggest that TGn channel models can be used for 11ac if the system bandwidth
is less than 100 MHz. For bandwidths more than 100 MHz, the channel tap
spacing may have to be reduced to 1 ns instead of the 10 ns spacing in 11n and
wider bandwidth channel measurements are needed to derive channel models for
such large bandwidths.
16 32 indoor measurements
In [59], wideband indoor channel sounding measurements were carried out for
the 16 32 MIMO conguration in the 5.8 GHz band with 100 MHz bandwidth
to investigate both the spatial as well as the temporal characteristics of the
channel. The antennas used at the transmit and receive ends were planar arrays
of monopoles. The monopoles were arranged in an 8 12 rectangular grid (see
Fig. 11.7). Sixteen monopoles (4 4 grid) were used at the transmitter, and 32
monopoles (4 8 grid) were used at the receiver. The monopoles were about 0.3
in length and spaced about /2 apart. Measurements were carried out at dierent
locations in the same building, e.g., open laboratory, room-to-room, basement,
building level crossing. At 10 dB SNR, the statistics of the obtained capacities
showed mean capacities in the range 3251 bps/Hz. The Doppler bandwidth was
found to be within about 2 Hz, which was comparable to the expected 0.6 Hz;
the receive array was moved at a speed of about 1/30 m/s, which corresponds to
a maximum Doppler of 0.6 Hz at 5.8 GHz carrier frequency. The excess Doppler
was attributed to other changes in the environment, such as people walking by,
etc. The coherence bandwidths of the channels were found to be in the range
528 MHz.
is the MIMO cube [60]. The MIMO cube takes advantage of spatial and polar-
ization diversities in a compact volume. In [61], 24-port and 36-port MIMO cube
geometries have been proposed and tested (see Fig. 11.8).
(a) (b)
Figure 11.8 24 24 and 36 36 channel sounding using MIMO cubes in 2.7 GHz:
(a) 24-antenna MIMO cube (80 mm 80 mm 80 mm); (b) 36-antenna MIMO cube
(120 mm 120 mm 120 mm). Photo source: [61].
In the 24-port cube, twelve pairs of /4 slot antennas are distributed on each
edge of an 80 mm 80 mm 80 mm cube providing low mutual coupling at
2.7 GHz operating frequency. The 36-port design consists of a combination of
24 /2 slot antennas and 12 /4 slot antennas built on a 120 mm 120 mm
120 mm cube with an operating frequency of 2.82 GHz. Due to inadequate test
equipment/hardware for activating and testing all the 36 ports simultaneously,
channel measurements were carried out in 4 4 congurations in an indoor
environment for a total of around 100 combinations of four ports from the total of
36 ports. Measurement results indicated that the mutual coupling in the MIMO
cube is low enough not to aect the performance signicantly. At an SNR of 20
dB, the estimated channel capacity of the 36-port MIMO cube was 159 bps/Hz
compared to the 36 36 iid MIMO capacity of 197 bps/Hz.
11.4 Large MIMO channel measurement campaigns 273
8 16 outdoor-to-indoor measurements
A MIMO channel measurement campaign for an outdoor-to-indoor oce scenario
in the 5.2 GHz band with 120 MHz system bandwidth has been reported in
[62]. The transmit antenna was an eight-element dual polarized uniform linear
array (ULA) of patch elements with /2 spacing. The receive antenna was a 16-
element uniform circular array (UCA) of vertically polarized monopole elements
(see Fig. 11.9). Various parameters including DoA, DoD, distributions of rms
directional spreads and delay spreads, and complex amplitudes were derived
from the measured data. The angular dispersion at the outdoor link end was
found to be small; the mean direction spread was in the range of 0.090.24. At
the indoor link end, the angular dispersion was much larger (mean direction
spreads in the range of 0.690.82). The delay spread was measured to be in
the range of 525 ns. The DoA spectrum was found to depend noticeably on the
DoD. Using the ergodic channel capacity as a metric, the performances of the
Kronecker, virtual channel representation, and Weichselberger models for this
outdoor-to-indoor scenario were compared. The Kronecker model was found not
to be suitable due to the breakdown of the DoADoD decoupling assumptions.
The Weichselberger model was found to provide a better t to the measured
capacity for both the LOS and NLOS scenarios.
(a) (b)
Figure 11.10 128-element antenna arrays. Measurements in 2.6 GHz: (a) Cylindrical
patch array (b) Planar patch array. Photo source: [63].
NLOS conditions, and the results were presented in [64]. Parameters like the
Rice factor, the received power levels over the array, the antenna correlation,
and the eigenvalue distributions were analyzed. It has been remarked that the
propagation conditions, from a large-array point of view, are actually better
than expected. It has also been observed that the near-eld eects and the non-
stationarities over the array help to decorrelate the channel for dierent users,
thereby providing favorable channel conditions with stable channels and low
interference for the considered single antenna users.
An outdoor measurement campaign with large-scale antenna arrays in the 2.6
GHz band with a 20 MHz bandwidth was reported in [65]. A virtual antenna ar-
ray (a rotating antenna array with 16 angular positions to emulate a cylindrical
array of 112 elements) was mounted on the top of a large building at a height
of about 20 m. Two mobile single-antenna receivers were 2 m apart on top of a
car. The measurement positions were selected to provide a good mix of dierent
channel conditions (LOS and NLOS) which can be considered as representative
for a residential urban area. The channel capacity and achievable sum-rates with
linear precoding were estimated using the measured data. Dierent metrics in-
cluding the correlation coecient and condition number were analyzed to nd
to what extent channel orthogonality between dierent terminals can be estab-
lished by scaling up the number of transmit antennas. The results indicated that
in spite of signicant dierences between the iid and the measured channels, a
large fraction of the theoretical performance gains of large antenna arrays could
be achieved in practice.
For a given antenna aperture constraint, increasing the number of antenna ele-
ments decreases the amount of spacing and thus increases the correlation. As a
rule of thumb, the inter-element spacing should be not less than /2 to success-
fully decorrelate the incoming waves. This spacing may not be always possible
in large MIMO systems. Compact antenna arrays are antenna arrays with inter-
element spacing less than /2. They play an important role in large MIMO
systems because of the limited space available for mounting antenna elements
in communication terminals and the detrimental eects of mutual coupling in
antenna arrays. Mutual coupling not only aects the MIMO channel capacity
but also aects the arrays radiation eciency. The coupling becomes more pro-
nounced as the antenna spacing decreases. Compact arrays designed to preserve
MIMO channel capacity are crucial in large MIMO systems. Designing antenna
arrays which are compact yet demonstrate acceptable mutual coupling and ra-
diation eciency is challenging.
One approach to designing antenna arrays that preserve MIMO channel capac-
ity is to use matching networks. Conjugate matching networks and load match-
ing networks have been developed to combat coupling-induced correlation and
276 MIMO channel models
L2
L1
Shorting plate
11.5.1 PIFA
PIFA was introduced in 1987 [66]. It has good attributes like a low prole, good
radiation characteristics, and a wide bandwidth. Because of these attributes,
PIFA has emerged as one of the most promising low prole antenna design
approaches. A broad range of applications use PIFA as the basic antenna el-
ement. These include mobile phones, wireless sensors, radio-frequency identica-
tion (RFID), ultra wideband (UWB) and MIMO systems, and wearable devices,
covering several frequency bands of interest in outdoor and indoor applications;
e.g., GSM (890960 MHz), PCS (18501990 MHz), Bluetooth (2.42.48 GHz),
DVB-H (UHF: 470862 MHz; L: 14521492 MHz), WiFi (2.42.485 GHz; 5.16
5.5 GHz).
The name PIFA originates from the linear inverted F antennas (IFA), which
are wire structures above a ground plane, forming an F shape. IFA is a variant
of the monopole where the top section is folded down so as to be parallel to the
ground plane. PIFA can be considered as a kind of IFA with the wire radiator
element replaced by a plate to enhance the bandwidth. A PIFA typically consists
of (i) a ground plane, (ii) a rectangular planar element of length L1 and width
L2, placed above the ground plane at a certain height H, (iii) a short-circuit
plate of width W (typically of a narrower width than that of the short side L2
of the planar element), and (iv) a feed pin placed at a distance D along the
long side of the planar element, as shown in Fig. 11.11. PIFA characteristics
are aected by a number of parameters including the dimensions of the ground
plane, length, width, height, and position of the planar element (top plate), the
positions and widths of shorting plate and feed pin/plate.
11.5 Compact antenna arrays 277
When W = L2, the shorting plate spans the entire short side of the planar
element. In this case, the PIFA is resonant (i.e., has the maximum radiation
eciency) when
L1 = , if W = L2. (11.22)
4
The above relation between the resonant length and the shorting plate width
can be explained by considering how a /4 patch antenna radiates. A /4 patch
antenna needs a quarter-wavelength of space between the edge and the shorting
area. If W = L2, then the distance from one edge to the short is simply L1,
which gives (11.22).
When W = 0 (or W L2), the shorting plate becomes a shorting pin. In this
case, the PIFA is resonant at
L1 + L2 = if W = 0. (11.23)
4
Then, since it is the fringing elds along the edge that give rise to radiation in
microstrip antennas, the length from the open-circuited radiating edge (the far
edge in Fig. 11.11) to the shorting pin is on average equal to L1 + L2. This can
be seen by measuring the distance from any point on the far edge of the PIFA
to the shorting pin. The clockwise and counter-clockwise paths always add up to
2(L1 + L2). So, on average, resonance will occur when the path length (L1 + L2)
for a single path is a quarter-wavelength.
In general, the resonant length of a PIFA as a function of its parameters can
be approximated as
L1 + L2 W = . (11.24)
4
For example, a PIFA with L1 = 2.5 cm, L2 = 1.5 cm, W = 1 cm, and an air
dielectric between the ground plane and planar element will resonate at 2.5 GHz.
The width of the feed plate plays an important role in broadening the antenna
bandwidth. The length of the PIFA antenna can be further reduced by capacitive
loading, e.g., by adding a capacitance between the feed point and the open edge.
The capacitive load can be produced by adding a plate (parallel to the ground)
to produce a parallel plate capacitor. While this capacitive loading approach can
reduce the resonance length from /4 to less than /8, it comes at the cost of
some loss in radiation eciency and bandwidth.
of the 1 dipole uniform linear array, despite the fact that the PIFA array was
subject to the same or worse propagation-induced correlation. Compact antenna
arrays with PIFA as the basic element are a promising approach in large MIMO
systems.
(a) (b)
Figure 11.12 MIMO cube antennas: (a) 12 dipoles arranged along the edges of a cube.
(b) 18-antenna MIMO cube. Photo source: [74].
References
[1] H. Ozcelik, N. Czink, and E. Bonek, What makes a good MIMO channel model?
in IEEE VTC2005 Spring, Stockholm, vol. 1, Sep. 2005, pp. 156160.
[2] W. Weichselberger, Spatial structure of multiple antenna radio channels a sig-
nal processing viewpoint. Ph.D dissertation, Technische Universitt Wien, Vienna,
Austria, Dec. 2003.
[3] H. Ozcelik, Indoor MIMO channel models. Ph.D dissertation, Institutfr Nachrich-
tentechnik, Technische Universitt Wien, Vienna, Austria, Dec. 2004.
[4] P. Almers, E. Bonek, A. Burr, et al., Survey of channel and radio propagation
models for wireless MIMO systems, EURASIP J. Wireless Commun. and Net-
working, pp. volume 2007, articleID 19 070, 19 pages, 2007.
280 References
[5] D.-S. Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, Fading correlation and its
eect on the capacity of multielement antenna systems, IEEE Trans. Commun.,
vol. 48, no. 3, pp. 502513, Mar. 2000.
[6] J. P. Kermoal, L. Schumacher, K. I. Pedersen, P. E. Mogensen, and F. Frederiksen,
A stochastic MIMO radio channel model with experimental validation, IEEE J.
Sel. Areas Commun., vol. 20, no. 6, pp. 12111226, Jun. 2002.
[7] W. Weichselberger, M. Herdin, H. Ozcelik, and E. Bonek, A stochastic MIMO
channel model with joint correlation of both link ends, IEEE Trans. Wireless
Commun., vol. 5, no. 1, pp. 90100, Jan. 2006.
[8] A. G. Burr, Capacity bounds and estimates for the nite scatterers MIMO wire-
less channel, IEEE J. Sel. Areas in Commun., vol. 21, no. 5, pp. 812818, May
2003.
[9] M. Debbah and R. R. Muller, MIMO channel modeling and the principle of
maximum entropy, IEEE Trans. Inform. Theory, vol. 51, no. 5, pp. 16671690,
May 2005.
[10] A. M. Sayeed, Deconstructing multiantenna fading channels, IEEE Trans. Signal
Process., vol. 50, no. 10, pp. 25632579, Oct. 2002.
[11] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge,
UK: Cambridge University, 2005.
[12] K. Yu, M. Bengtsson, B. Ottersten, et al., Second order statistics of NLOS indoor
MIMO channels based on 5.2 GHz measurements, in IEEE GLOBECOM2001,
San Antonio, TX, Nov. 2001, pp. 156160.
[13] R. Stridh, K. Yu, B. Ottersten, and P. Karlsson, MIMO channel capacity and
modeling issues on a measured indoor radio channel at 5.8 GHz, IEEE Trans.
Wireless Commun., vol. 4, no. 3, pp. 895903, May 2005.
[14] H. Ozcelik, M. Herdin, W. Weichselberger, J. Wallace, and E. Bonek, Deciencies
of Kronecker MIMO radio channel model, Electronics Lett., vol. 39, pp. 1209
1210, Aug. 2003.
[15] S. Wyne, A. F. Molisch, P. Almers, et al., Outdoor-to-indoor oce MIMO mea-
surements and analysis at 5.2 GHz, IEEE Trans. Veh. Tech., vol. 57, no. 3, pp.
13741386, May 2008.
[16] L. Correia, COST 273 Final Report: Towards Mobile Broadband Multimedia Net-
works. Amsterdam, Netherlands: Elsevier, 2006.
[17] L. Wood and W. S. Hodgkiss, Impact of channel models on adaptive M -QAM
modulation for MIMO systems, in IEEE WCNC2008, Las vegas, NV, Apr. 2008,
pp. 13161321.
[18] H. Tong and S. A. Zekavat, On the suitable environments of the Kronecker prod-
uct form in MIMO channel modeling, in IEEE WCNC2008, Las Vegas, NV, Apr.
2008, pp. 780784.
[19] R. Ertel, P. Cardieri, K. Sowerby, T. Rappaport, and J. Reed, Overview of spatial
channel models for antenna array communication systems, IEEE Pers. Commun.,
vol. 5, no. 1, pp. 1022, Feb. 1998.
[20] D. Chizhik, F. Rashid-Farrokhi, J. Ling, and A. Lozano, Eect of antenna sepa-
ration on the capacity of BLAST in correlated channels, IEEE Commun. Lett.,
vol. 4, no. 11, pp. 337339, Nov. 2000.
References 281
[37] M. Vu and A. J. Paulraj, Optimal linear precoders for MIMO wireless correlated
channels with nonzero mean in spacetime coded systems, IEEE Trans. Signal
Process., vol. 54, no. 6, pp. 23182332, Jun. 2006.
[38] H. R. Bahrami and T. Le-Ngoc, Precoder design based on correlation matrices for
MIMO systems, IEEE Trans. Wireless Commun., vol. 5, no. 12, pp. 35793587,
Dec. 2006.
[39] K. T. Phan, S. A. Vorobyov, and C. Tellambura, Precoder design for space-time
coded systems with correlated Rayleigh fading channels using convex optimiza-
tion, IEEE Trans. Signal Process., vol. 57, no. 2, pp. 814819, Feb. 2009.
[40] A. F. Molisch, H. Asplund, H. Heddergott, M. Steinbauer, and T. Zwick, The
COST259 directional channel model - part i: Overview and methodology, IEEE
Trans. Wireless Commun., vol. 5, no. 12, pp. 34213433, Dec. 2006.
[41] Spatial channel model for multiple input multiple output (MIMO) simulations,
3GPP-3GPP2 Spatial Channel Model Ad-hoc Group; 3GPP TR 25.996, v6.1.0,
200309.
[42] M. Narandic, C. Schneider, R. Thoma, et al., Comparison of SCM, SCME, and
WINNER channel models, IEEE VTC2007 Spring, Dublin, pp. 413417, Apr.
2007.
[43] Spatial channel model for multiple input multiple output (MIMO) simulations
(Release 11), 3GPP Technical Specication Group Radio Access Network; 3GPP
TR 25.996, v11.0.0, 2012-09.
[44] User equipment (UE) radio transmission and reception (Release 10), 3GPP
Technical Specication Group Radio Access Network; 3GPP TS 36.101, v10.6.0,
2012-03.
[45] V. Erceg, TGn channel models, IEEE P802.11 wireless LANs: doc.: IEEE 802.11-
03/940r43, May 2004.
[46] A. A. M. Saleh and R. A. Valenzuela, A statistical model for indoor multipath
propagation, IEEE J. Sel. Areas in Commun., vol. 5, no. 2, pp. 128137, Oct.
1987.
[47] J. P. Kermoal, L. Schumacher, P. E. Mogensen, and K. I. Pedersen, Experimental
investigation of correlation properties of MIMO radio channels for indoor picocell
scenario, in IEEE VTC2000, Boston, MA, Sep. 2000, pp. 1421.
[48] L. Schumacher, K. I. Pedersen, and P. E. Mogensen, From antenna spacings
to theoretical capacities - guidelines for simulating MIMO systems, in IEEE
PIMRC2002, Cannes, Sep. 2002, pp. 587592.
[49] L. Schumacher, LAN MIMO channel Matlab program, http://www.info.fundp.
ac.be/lsc/Research.
[50] P. Kyritsi, P. W. Wolniansky, and R. A. Valenzuela, Indoor BLAST measure-
ments: capacity of multi-element antenna systems, in Multi-Access, Mobility and
Teletrac for Wireless Commun., vol. 5, Dec. 2000, pp. 4960.
[51] P. Kyritsi, D. C. Cox, R. A. Valenzuela, and P. W. Wolniansky, Correlation
analysis based on MIMO channel measurements in an indoor environment, IEEE
J. Sel. Areas Commun., vol. 21, no. 5, pp. 713720, Jun. 2003.
[52] , Eect of antenna polarization on the capacity of a multiple element system
in an indoor environment, IEEE J. Sel. Areas Commun., vol. 20, no. 6, pp. 1227
1239, Aug. 2002.
References 283
[71] A. Nemeth, L. Sziics, and L. Nagy, MIMO cube formed of slot dipoles, in IST
Mobile and Wireless Commun. Summit, Budapest, Jul. 2007, pp. 15.
[72] L. Nagy, Modied MIMO cube for enhanced channel capacity, Intl J. Antennas
and Propagation, vol. 2012, Article ID 734896, 10 pages. doi:10.1155/2012/734896.
[73] C.-Y. Chiu and R. D. Murch, Overview of multiple antenna designs for handheld
devices and base stations, in Intl Workshop on Antenna Technology (iWAT2011),
Hong Kong, Mar. 2011, pp. 7477.
[74] J. Zheng, X. Gao, and Z. Feng, A compact eighteen-port antenna cube for MIMO
systems, IEEE Trans. Antennas and Propagation., vol. 60, no. 2, pp. 445455,
Feb. 2012.
[75] X. Gao, H. Zhong, Z. Zhang, Z. Feng, and M. F. Iskander, Low-prole planar
tri-polarization antenna for WLAN communications, IEEE Trans. Antennas and
Prop. Lett., vol. 9, pp. 8386, Feb. 2010.
12 Large MIMO testbeds
From these denitions, one can see that testbeds and prototypes can play crucial
roles in the research and development phase. While prototypes need necessarily
to operate in real time, a testbed can be a real-time testbed or a non-real-time
(oine) testbed depending on the available resources in comparison with the
real-time computation need.
The rst laboratory demonstration of a MIMO wireless communication sys-
tem was reported in the late 1990s by the Bell Labs [3], where a laboratory
prototype of an 8 12 V-BLAST MIMO system using eight transmit antennas
and twelve receive antennas was demonstrated in an indoor laboratory/oce
environment. The distance between the transmitter and the receiver was about
12 m. It was a narrowband system which operated at a carrier frequency of
1.9 GHz and a bandwidth of 30 kHz. The antenna arrays consisted of /2 wire
dipoles mounted in various arrangements; receive dipoles were mounted on the
surface of a metallic hemisphere approximately 20 cm in diameter, and transmit
dipoles were mounted on a at metal sheet in a rectangular array conguration
with about /2 inter-element spacing. The system employed spatial multiplexing,
286 Large MIMO testbeds
1
2 UT
3 #1
4
1
2 UT
3 #2
1 4
AP 1
Tx
2 UT
3 #3
4
16
1
2 UT
3 #4
4
The design, realization, and evaluation of a multiuser MIMO system that serves
several single-antenna user terminals through a BS with a large number ( 10)
of antennas was reported in [15]. The system is called Argos the name of a
100-eyed giant in Greek mythology. This study reported results from an Argos
prototype system with 64 antennas at the BS capable of serving 15 user terminals
simultaneously. Built using o-the-shelf WARP boards [16] in a modular fashion,
the Argos prototype system employs TDD and multiuser beamforming (MUBF)
to send independent data streams to multiple user terminals simultaneously.
Linear precoding techniques are used for MUBF. These techniques include ZF
MUBF and conjugate MUBF. Let x denote the nu -length vector consisting of the
12.4 64 15 multiuser MIMO system (Argos) 289
TxWeight multiplexer
Mapper
S/P
data encoder leaver
CSI from Rx
De AGC Down #1
FFT
Timing detection
mapper ADC conv.
SDM decoder
Rx Viterbi Deinter
S/P
De AGC Down #2
data Dec. leaver FFT
mapper ADC conv.
De AGC Down #4
FFT conv.
mapper ADC
Figure 12.2 Transmit and receive chains in a 16 16 multiuser MIMO testbed [14]
with nt = 16 and nu = nr = 4.
data symbols meant for nu user terminals on the downlink. The linear precoder
generates an nt -length vector s from x for the nt BS antennas to transmit.
It obtains s by multiplying x by an nt nu precoding matrix P, such that
s = Px. The entries of the matrix P are the beamforming weights, obtained using
knowledge of the CSI. Let H denote the nt nu channel matrix between the nt
BS antennas and nu user terminals. In conjugate beamforming, the beamforming
weights are the complex conjugates of the CSI, i.e., the precoding matrix P =
cH , where c is a normalizing constant and H isthe complex
1 conjugate of H.
In ZF MUBF, the precoding matrix is P = H HT H , which forces the
inter-user interference to zero. While conjugate MUBF has the advantage of
signicantly lower complexity, ZF MUBF achieves a much better performance.
The Argos BS includes 16 WARP boards, each board acting as four radios
with daughter cards and four antennas. The radios operate at the 2.4 or 5 GHz
ISM bands with a 20 MHz bandwidth and 625 kHz subcarrier spacing. The
64 antennas are compactly placed on a custom rack-mount platform. The system
achieves 85 bps/Hz spectral eciency using ZF MUBF. With the computation-
ally less demanding conjugate MUBF, the system achieves a spectral eciency of
38 bps/Hz. Adoption of TDD allows the estimated CSI on the uplink to be used
for downlink beamforming (due to channel reciprocity). This avoids the need for
CSI feedback from the user terminals. A scalable channel estimation architecture
that computes the full CSI at the BS using only nu uplink pilots, independent of
the number of BS antennas, is employed. The computation of the beamforming
weights is carried out locally at each antenna to avoid the data-transfer over-
290 Large MIMO testbeds
Air interface
1 UT 1 PC
Application
server 2
Access
.
.
Internet point
.
. .
UT 2 PC
. .
. . . .
. .
. .
Application 32
server UT 14 PC
Downlink
6387MHz
Uplink
8067MHz
Figure 12.3 Ngara 32 14 multiuser MIMO testbed for high-speed internet access in
rural areas [18]. (UT: user terminal.)
ements in order to provide a uniform azimuthal coverage (see Fig. 12.4). The
user terminal is provided with a single antenna, typically mounted outdoors
(e.g., on a rooftop) so that GPS signal is always available for synchronization.
In order to reduce the computational load for signal processing at the AP, ZF
detection on the uplink and ZF precoding on the downlink have been chosen.
Since a LOS component is typical in rural environments, the performance of ZF
precoding based multiuser MIMO downlink when the access point is equipped
with a UCA in an LOS environment was studied in [19].
Figure 12.4 Ngara 32 14 multiuser MIMO testbed in the UHF band. Photo
source: [18].
FPGAs interface with ADCs and DACs of the dierent channels and perform
time-domain/frequency-domain processing front-end FPGAs 0 and 1 serve
12 channels each and front-end FPGA 2 serves the remaining 8 channels. Three
back-end FPGAs one for transmit and two for receive perform information
bit processing functions. For example, LDPC decoding functions are carried out
by the two back-end receive FPGAs. Another major computational load in base-
band processing comes from the matrix inversions needed for ZF detection on
the uplink and ZF precoding on the downlink. Since there are 54 subcarriers in
the system, for 14 user terminals and 32 antennas at the access point, inversion
of 3456 matrices, each matrix of size 14 32, is needed. Through ecient matrix
inversion implementations, all these inversions are implemented in one Virtex-6
FPGA. One more FPGA is used to implement the MAC functions. There is a
total of nine FPGAs. The FPGAs are congured and controlled through the
USB2 interface. The USBs from all the FPGAs are connected to a USB hub so
that the controlling PC/laptop can connect to all the FPGAs via a single USB
cable. The MAC unit interfaces with an Ethernet virtual LAN (VLAN) trunking
switch through a 10GbE Ethernet interface. The Ethernet VLAN switch, on the
other end, supports multiple 1GbE Ethernet ports, logically one for each UT.
This design provides for exible multicast and quality of service which can be
implemented by standard Ethernet switches located on the AP network.
Demonstration
The Ngara demonstrator was tested in the laboratory environment shown in
Fig. 12.4. The user terminals were located uniformly on the two sides of a rect-
angular room, and the access point was located at the center. Thirty-two verti-
cally polarized folded dipole antennas with an inter-element separation of about
0.4 form a UCA. While tests with 18 user terminals were carried out using o-
line signal processing, tests with 14 user terminals were carried out in real time
and these demonstrated simultaneous real-time video streaming from 14 user
12.6 Summary 293
terminals. In the real-time tests with 14 user terminals, each user terminal was
connected to a NetBook or a video streamer via an Ethernet cable. The source of
the streamed video was a DVD quality MPEG2 le. The total Ethernet through-
put per user terminal was approximately 25 Mbps (though the full capacity of
the user terminal was 41.18 Mbps). Each user terminal then converted the Eth-
ernet packets into uplink wireless packets and sent them to the access point. The
access point decoded the wireless packets and passed the corresponding Ether-
net packets to the Ethernet VLAN switch via the 10GbE Ethernet port. The
total Ethernet throughput at the access point was approximately 350 Mbps. At
the access point, 14 MacBooks were connected to the Ethernet VLAN switch to
display the videos streamed from the user terminals. The results demonstrated
very good quality video transfer.
12.6 Summary
As can be seen, the spectral eciencies achieved in all of the large MIMO testbeds
presented in this chapter are about an order higher than those in current wireless
standards. Results from all these testbeds clearly demonstrate that the high-
spectral-eciency potential promised by large MIMO systems can indeed be re-
alized in practice. Of course, more work is needed to make large MIMO systems
commercially viable. Towards this end, more and more testbeds and prototypes
in dierent congurations, in dierent frequency bands, in dierent environments
are expected to be reported in the years to come. Improved and ecient large
MIMO signal processing algorithms and architectures will be devised and tried
out in these testbeds. These large MIMO testbed experiences will naturally be-
come valuable inputs for dening the next generation wireless standards like 5G
and beyond.
To take large MIMO systems forward in a big way from here, development of
low power application-specic integrated circuits (ASICs) for large MIMO signal
processing, highly integrated RF/mixed signal ICs, single chip ADCs/DACs with
more and more converters built in, and compact and conformal antenna arrays
needs investment and focused eorts. Identifying application scenarios that can
exploit large MIMO benets (including scenarios where user terminals can also
have a large number of antennas e.g., TVs, laptops, note pads, tablets, smart
phones) and devising suitable large MIMO architectures, algorithms, low cost
implementation approaches, and solutions for those scenarios will be rewarding.
References
[1] A. Burg and M. Rupp, Demonstrators and testbeds, In Smart Antennas: State of
the Art, T. Kaiser, A. Bourcloux, H. Boche, et al., Eds., EURASIP Book Series on
Signal Processing and Communications, Vol. 3. New York, NY: Hindawi Publishing
Corporation, 2005.
294 References
[18] H. Suzuki, R. Kendall, K. Anderson, et al., Highly spectrally ecient Ngara rural
wireless broadband access demonstrator, in Intl Symp. on Commun. and Inform.
Tech. (ISCIT2012), Gold Coast, Oct. 2012, pp. 914919.
[19] H. Suzuki, D. B. Hayman, J. Pathikulangara, et al., Design criteria of uniform
circular array for multi-user MIMO in rural areas, in IEEE WCNC2010, Sydney,
Apr. 2010, pp. 16.
Author index
Hofstetter, H., 251, 258, 259 Kumar, A., 81, 84, 182
Holtzman, J., 41 Kumar, N. A., 43, 182, 183, 191
Hong, X., 286 Kumar, P. V., 29
Hong, Y., 221, 223, 225, 226 Kurkoski, B. M., 151
Honma, N., 287289 Kurtas, E. M., 42, 128, 131, 137
Hou, C., 33 Kusume, K., 230
Hou, Y., 33 Kyosti, P., 251
Hoydis, J., 20, 43, 274, 275 Kyritsi, P., 269
Hu, J., 64
Huang, J., 110 LaMacchia, B., 41, 49, 51, 52
Huang, Y., 111 Lampe, A., 111
Huber, J. B., 111 Lampe, L., 210
Huber, K., 270 Lang, S., 286
Humphrey, D., 290292 Langwieser, R., 285, 286
Hunger, R., 229, 230 Lapidoth, A., 197
Larsson, E. G., 2, 23, 43
Iskander, M. F., 279 Latsoudas, G., 111
Lau, B. K., 2
Jafarkhani, H., 1, 6, 25, 2729, 77 Laurenson, D., 251
Jakes, W. C., 270 Le-Ngoc, T., 264
Jalden, N., 286 Lee, H., 286
Jeanclaude, I., 210, 211 Lee, J., 210
Jeganathan, J., 32 Lehne, P., 258, 259
Jia, S., 35 Lenstra, A. K., 49
Jia, Y., 111 Lenstra, H. W., 49
Jin, S., 262 Lenstra, J. K., 62
Jindal, N., 12, 227, 240 Lerjen, M., 286
Joham, M., 229, 230 Leshem, A., 148
Jordan, M., 130 Letessier, J., 221, 223, 224
Jose, J., 23, 240, 242, 244 Leu, J.-S., 111
Leus, G., 111
Kabashima, Y., 42, 128 Levchuk, G., 64
Kahn, J. M., 251, 254 Levin, G., 262
Kailath, T., 41, 58 Lewis, S., 174
Kanterakis, E., 19 Li, C.-H., 111
Kappen, B., 130 Li, F., 111
Kappen, H. J., 130 Li, J., 111
Karam, G., 210, 211 Li, L., 288
Karedal, J., 254, 256, 273 Li, P., 64
Karlsson, P., 254 Li, T.-H., 177
Kawai, H., 210 Li, X., 112
Kaynak, M. N., 42, 128, 131, 137 Li, X. R., 110
Kendall, R., 290292 Li, Y., 111, 210
Kermoal, J. P., 251, 254, 266 Liang, Y.-C., 2
Kernighan, B., 63 Lim, H. S., 64, 85
Khachan, A. M., 229 Lin, C. A., 211
Kim, K., 286 Lin, S., 63
Koch, T., 286 Linde, L. P., 258
Koetter, R., 128, 151 Ling, J., 254, 270
Kohno, R., 41 Liu, C. N., 149
Koivunen, J., 270 Liu, D. N., 286
Koller, D., 123 Liu, J. S., 177
Kolmonen, V.-M., 270 Liu, K. J. R., 111
Kountouris, R., 227229 Liu, S., 111
Kramer, G., 157, 160 Loeliger, H.-A., 123, 125, 128, 129
Kschischang, F. R., 123, 125, 128, 129 Long, H., 64, 85
Kudo, R., 287289 Louveaux, J., 210, 211
300 Author index
Tufvesson, F., 2, 20, 254, 256, 262, 270, Wolf, J. K., 128, 151
273275 Wolniansky, P. W., 6, 25, 41, 269, 270
Tulino, A., 17 Wong, K.-K., 229, 262
Tutchler, M., 128 Wong, S. G., 286
Wood, L., 254, 256
Ulam, S., 173 Wornell, G. W., 48
Urbanke, R., 157, 163, 164 Wu, L., 112
Utschick, W., 229, 230 Wu, Y., 229
Wymeersch, H., 128
Vainikainen, P., 270 Wyne, S., 254, 256, 273
Valenzuela, R. A., 6, 25, 41, 261, 262, 266,
267, 269, 270 Xiang, W., 33
van der Veen, A.-J., 1 Xiong. Y., 128
Vandendorpe, L., 210, 211 Xu, H., 270
Vanhaverbeke, F., 111 Xu, W., 181, 182
Vardhan, K. V., 2, 43, 64
Vaughan, R. G., 278 Yan, J.-B., 272, 278
Venkatesh, B., 64, 85 Yan, L., 33
Verdu, S., 17, 40, 41, 64 Yang, B., 111
Vikalo, H., 41, 42, 58, 59 Yang, J., 111
Vishwanath, S., 12, 23, 227, 240, 242, Yang, L., 198, 211
244 Yang, R., 288
Visuri, S., 229 Yang, S., 112
Viswanath, P., 1, 3, 8, 12, 17, 25, 26, 40, 41, Yang, X., 128
221, 222, 227, 253 Yang, Z., 112
Viterbo, E., 6, 22, 29, 30, 42, 59, 221, 223, Yao, H., 48
225, 226 Yedidia, J. S., 123, 129, 140
Vithanage, C. M., 111 Yee, M. S., 111
Vojcic, B. R., 64 Yin, Y., 111
Vorobyov, S. A., 264 Yoo, T., 197
Vrigneau, B., 221, 223, 224 Younis, A., 32, 33, 35
Vu, M., 228, 264 Yu, K., 254
Vuokko, L., 254 Yuille, A. L., 130
Yun, J. X., 278
Wallace, J. W., 254, 258 Yun, S., 21, 22, 31
Wang, C.-C., 128
Wang, C.-X., 286 Zaki, A., 2, 43, 64, 79, 201
Wang, F., 128 Zamora, A. P., 221
Wang, J., 35, 229, 286 Zepernick, H.-J., 262
Wang, W., 64, 85 Zetterberg, P., 286
Wang, X., 177 Zhang, H., 111
Wang, Y., 42 Zhang, J., 111, 229
Wang, Z., 210, 211, 221 Zhang, Z., 279
Wang, Z. J., 111 Zhao, H., 64, 85
Weichselberger, W., 251, 254256, 262 Zhao, M., 111
Weingarten, H., 227 Zheng, J., 279
Weiss, A. J., 128 Zheng, L., 6
Weiss, Y., 123, 128130, 140 Zhong, C., 262
Weller, S. R., 177 Zhong, H., 279
Wellman, M. P., 123 Zhong, L., 288
Wenk, M., 286 Zhou, S., 229
Werner, K., 286 Zhu, H., 177, 180, 188
Wiberg, N., 128 Zhu, W., 286
Wild, T., 20, 274, 275 Zhu, X., 44
Willett, P. K., 42, 64, 111 Zou, X., 64
Wo, T., 128 Zwick, T., 265
Subject index
Markov chain Monte Carlo techniques, 13, mixture distribution, 182, 184
169 multirestart, 188
Gibbs sampling, 169, 173, 176 performance, complexity, 186, 191, 193
Metropolis algorithm, 173, 174 random walk, 174, 183
MetropolisHastings algorithm, 169, 173, repetitions, 191
174 restart criterion, 190, 191
simulated annealing, 169, 173, 175 stalling count, 185
Markov random eld, 124 standardized ML cost, 185, 191
clique, maximal clique, 124 stopping criterion, 185
clique potential, 124 mobile station, 286
compatibility function, 124 modulation alphabet, 9, 22, 25, 26, 31, 32,
explicit variable, 125, 126 41, 43, 45, 46, 71, 83, 8688, 104, 148,
hidden variable, 125, 126 152, 177, 178, 183, 202, 287
pair-wise MRF, 125, 126 4-PAM, 66, 67, 233
undirected graph, 124, 138 4-QAM, 13, 3438, 48, 53, 54, 58, 59, 66,
maximum a posteriori probability, 21, 113, 74, 75, 78, 83, 84, 9294, 98, 102, 103,
138 106, 107, 117119, 180, 181, 183, 184,
MAP estimate, 138 186190, 207, 209, 216, 233, 234, 237,
maximum likelihood, 21, 32, 33, 35, 42, 44, 238, 240, 287
66, 86, 179, 220 8-PSK, 98, 99
bounds on ML performance, 104 8-QAM, 37, 38
MCMC based MIMO detection, 177 16-QAM, 37, 38, 67, 69, 94, 9799,
conventional Gibbs sampling, 179, 180, 102104, 106, 107, 119, 120, 149, 151,
182, 183, 188, 189, 192 188190, 192194, 224, 226, 233, 286,
initial vector, 179 287
mixed-Gibbs sampling, 180 64-QAM, 9, 10, 94, 98, 99, 102104, 106,
sampling distribution, 177 107, 188, 191193, 286, 287
stalling, 180, 182, 184 256-QAM, 287
stalling limit, 184, 185 1024-QAM, 287
stopping criteria, 177 BPSK, 10, 13, 37, 38, 73, 74, 131, 141,
target distribution, 180, 181 144, 146148, 152
temperature parameter, 181 M -PSK, 86
metaheuristics, 22 M -QAM, 9, 31, 32, 36, 67, 69, 78, 86, 112,
reactive tabu search, 13 231
MetropolisHastings algorithm, 173 PAM, 44, 55, 56, 65, 66, 69, 70, 73, 78, 87,
acceptreject test, 174 112, 149, 179, 231
burn-in period, 174 QPSK, 152, 157
independent chain sampling, 174 Monte Carlo estimate, 170, 177
proposal distribution, 173 Monte Carlo integration, 170, 173
random walk, 174 multiantenna wireless channels, 2
starting value, 174 multicell operation, 13, 20
MIMO cube, 14, 271, 272, 278, 279 COMP, 23
mutual coupling, 272, 278 inter-cell interference, 23
MIMO encoding, 25 pilot contamination, 23
generalized spatial modulation, 33 multicell precoding, 239
space shift keying, 32 BS cooperation, 240, 242, 243, 245, 247,
space-time coding, 25 248
spatial modulation, 31 multicell MMSE precoding, 240, 247
spatial multiplexing, 25, 26, 36, 285, 286 multicell precoding matrix, 242, 244
MIMO-CPSC, 210, 211 multiple access, 17
multiuser MIMO-CPSC, 211, 212 CDMA, 26, 40, 64, 111, 128
MIMO-OFDM, 1, 208, 210, 286, 287 FDMA, 11, 26
MIMO-ZPSC, 211 OFDMA, 11, 290
mixed-Gibbs sampling, 180, 182185, 187, SC-FDMA, 22
189 SDMA, 17, 26, 290
mixing ratio, 182, 183 TDMA, 11, 26
Subject index 307