Signal Detection For 3GPP LTE Downlink: Algorithm and Implementation

Signal Detection for 3GPP LTE Downlink:
Algorithm and Implementation

Huan Xuan Nguyen
School of Engineering and Information Sciences
Middlesex University
The Burroughs, London, NW4 4BT, United Kingdom
Email: [email protected]
AbstractIn this paper
1
, we investigate an efcient signal
detection algorithm, which combines lattice reduction (LR) and
list decoding (LD) techniques for the 3rd generation long term
evolution (LTE) downlink systems. The resulting detector, called
LRLD based detector, is carried out within the framework of
successive interference cancellation (SIC), which takes full advan-
tages of the reliable LR detection. We then extend our studies to
the implementation possibility of the LRLD based detector and
provide reference for the possible real silicon implementation.
Simulation results show that the proposed detector provides a
near maximum likelihood (ML) performance with a signicantly
reduced complexity.
Index Terms3GPP LTE downlink, signal detection, lattice
reduction, successive interference cancellation, implementation
study.
I. INTRODUCTION
The 3rd generation partnership project (3GPP) [2] is in
the process of dening the long-term evolution (LTE) and
Advanced-LTE for 3G radio access, in order to maintain the
future competitiveness of 3G technology. The main targets for
this evolution concern increased data rates, improved spectrum
efciency, improved coverage, and reduced latency. The LTE
downlink is based on orthogonal frequency division multiple
access (OFDMA) that allows multiple access on the same
channel [3]. This allows simple receivers in case of large
bandwidth, frequency selective scheduling and adaptive mod-
ulation and coding. The LTE uplink is based on single carrier
frequency division multiple access (SC-FDMA) technique [4].
In order to fulll the requirements on coverage, capacity,
and high data rates, novel multiple input multiple output
1
This work was partly presented at the 2010 International Conference on
Digital Communications (see reference [1].)
(MIMO) schemes need to be supported as part of the long-term
3G evolution. Signal detection in MIMO systems have recently
drawn signicant attention. If the maximum likelihood (ML)
detection is used, the complexity grows exponentially with the
number of transmit antennas. Thus, various approaches are
devised to reduce the complexity. The successive interference
cancellation (SIC) approach is employed in [5]. The relation
between the SIC based MIMO detection and the decision
feedback equalizer (DFE) is exploited in [6]. A probabilistic
data association (PDA) algorithm, which was devised for the
multiuser detection in [7], is applied to the MIMO detection
in [8]. In [9], the partial maximum a posteriori probability
(MAP) principle is derived to discuss the optimality of the
SIC based detection. List decoding (LD) based detectors are
also considered for the MIMO detection to obtain soft-decision
in [10] and [11]. In [12], a lattice reduction (LR) based
MIMO detector used as a low complexity MIMO detector
is rst discussed. In [13], more LR based MIMO detectors
are proposed. Following this trend, this paper considers the
signal detection in the LTE downlink, where an efcient signal
detection algorithm based on the LR and LD techniques is
investigated. The resulting detector (called LRLD detector)
produces a list in the LR domain, which results in a much more
reliable list and thus is efcient in mitigating error propagation
when the SIC based detection is employed. Simulation results
show that the LRLD detector provides a near ML performance
with a signicantly reduced complexity.
However, the potential capacity of the MIMO channel can
only be exploited if implementable hardware architecture is
available. The main issue in implementing the MIMO detector
is the latency incurred by preprocessing the channel matrices
34
International Journal on Advances in Telecommunications, vol 4 no 1 & 2, year 2011, http://www.iariajournals.org/telecommunications/
2011, Copyright by authors, Published under agreement with IARIA - www.iaria.org
Bits
Modulation
Spatial
Multiplex
(1)
(K)
Remove
CP
(1)
FFT .. ..
MIMO
Detection
Remove
CP
(N)
FFT .. ..
SoIt-bit
Generator
IFFT
Add
CP
..
IFFT
Add
CP
..
Data
Mapping
..
Data
Mapping
..
Data
Demapping
..
Data
Demapping
..

Bits
a) Transmitter b) Receiver
Turbo
Encoder
Turbo
Decoder
Fig. 1. Block diagram of a MIMO-OFDMA LTE downlink.
[14]. There have been extensive work on the implementation
of the MIMO detection either with minimum mean square
error-successive interference cancellation (MMSE-SIC) [15],
vertical-Bell Laboratories layered space-time (V-BLAST) [16]
or Maximum Likelihood (ML) receivers [17]-[22]. However,
while the formers usually provide an inferior performance,
the latter demandingly requires a large silicon complexity.
Thus, nding a reasonable trade-off between an implementable
architecture of the MIMO detector and a near ML performance
is always a motivation. We therefore extend our studies to the
implementation possibility of the proposed detector and then
provide references for the possible real silicon implementation.
The rest of the paper is structured as follows. Section II
describes the system and channel models. The signal detection
algorithm is designed and discussed in Section III. Section IV
studies the implementation possibility of the proposed detector.
Section V provides simulation results and some concluding
remarks are provided in Section VI.
Notation: Bold-face upper (lower) letters denote matrices
(column vectors); ()
, ()
T
and ()
H
denote complex conjuga-
tion, transpose and Hermitian transpose, respectively; I is the
identity matrix; E[] denotes statistical expectation; Diag(x)
denotes a matrix with vector x being its diagonal; N(,
2
)
denotes Gaussian distribution with mean and variance
2
;
n,n
denotes Kronecker delta; J
0
() denotes zero-order Bessel
function of the rst kind; | | denotes absolute value; and
denotes Frobenius norm.
II. SYSTEM AND CHANNEL MODELS
The MIMO-OFDMA LTE system is a parallel of single-
input single-output OFDMA (SISO-OFDMA) where blocks
of K data symbols are mapped onto the spatial multiplexing
(SM) module followed by the data mapping and inverse fast
Fourier transform (IFFT) operations, as shown in Figure 1.
Note that we do not consider MIMO encoding (e.g., space-
time coding) in this work. The data mapping operation is
used for subcarrier mapping (e.g., distributed or localized
mapping in multiple access [4]). Reversed operations are
carried out at the receiver, which are then followed by the
signal detection and MIMO processing. Assume that there
are K transmit antennas and N receive antennas. Let P and
Q denote the number of subcarriers used in one orthogonal
frequency division multiplexing (OFDM) symbol for the user
of interest and the size of the IFFT, respectively. We denote
s
P,k
= [s
1,k
, s
2,k
, , s
P,k
]
T
(1)
as the transmitted signal vector from the kth transmit antenna.
For convenience, it is assumed that E[s
p,k
s
p,k
] = 1 for 1
p P, 1 k K.
Assuming that the guard interval (i.e., cyclic prex (CP))
is longer than the maximum channel span, the received signal
vector after removing CP and taking fast Fourier transform
(FFT) at the nth receive antenna can be written as
r
P,n
[r
1,n
, r
2,n
, , r
P,n
]
T
(2)
=
K
k=1
Diag(h
n,k
)s
P,k
+w
n
(3)
35
where h
n,k
= [h
n,k
(i
1
), h
n,k
(i
2
), , h
n,k
(i
P
)]
T
is the
frequency-domain channel vector from the kth transmit an-
tenna to the nth receive antenna and w
n
is a zero-mean
complex Gaussian vector with variance
2
w
. Here, i
p
= P(p)
where P() is the subcarrier mapping function that maps a
data symbol onto one of the Q subcarriers. Obviously, i
p
is obtained depending on the subcarrier mapping pattern and
i
p
{1, 2, , Q}. Note that
h
n,k
(i
p
) =
L
l=1
g
n,k
(l)e
2j
Q
(l1)(ip1)
where g
n,k
(l) is the lth tap of the fading channel from kth
transmit antenna to the nth receive antenna and L is the
number of paths. We can rewrite the received signal for each
subcarrier as follow
r
p,N
= H(i
p
)s
p,K
+w
p
(4)
where r
p,N
= [r
p,1
, r
p,2
, ..., r
p,N
]
T
, p = 0, 1, ..., P 1, is
the signal vector at the i
p
th subcarrier received through the
N receive antennas. s
p,K
= [s
p,1
, s
p,2
, ..., s
p,K
]
T
is the data
symbol vector at the i
p
th subcarrier transmitted through K
transmit antennas. w
p
is also the complex Gaussian noise
vector. H(i
p
) is the frequency-domain channel matrix at the
i
p
th subcarrier given as
H(i
p
) =
h
1,1
(i
p
) h
1,2
(i
p
) h
1,K
(i
p
)
h
2,1
(i
p
) h
2,2
(i
p
) h
2,K
(i
p
)
.
.
.
.
.
.
.
.
.
.
.
.
h
N,1
(i
p
) h
N,2
(i
p
) h
N,K
(i
p
)
. (5)
We assume that the channel is unchanged during one OFDM
symbol interval and g
n,k
(l) is independent and has identical
Gaussian distribution g
n,k
(l) N(0,
2
l
). Here,
2
l
is the
normalized average power of each propagation path with
L1
l=0
2
l
= 1. (6)
Typical urban (TU) [23] and spatial channel model (SCM)
[24] power delay proles are used in this paper.
1) Typical Urban: We consider the time varying channel
whose channel impulse response (CIR) is modeled by L
propagation paths,
g(, t) =
L1
l=0
l
(t)(
l
). (7)
Assume that the channel is a wide-sense stationary uncor-
related scattering (WSSUS) Rayleigh fading and unchanged
during one OFDM symbol interval. The maximum chan-
nel impulse span is also assumed to be within the guard
interval. For convenience, let
l
= lT
s
, T
b
= T + T
g
where T
s
= T/Q. Here, T, T
b
and T
g
denote the useful
OFDM symbol interval, the whole OFDM symbol interval
and the guard interval, respectively. Then, the channel impulse
vector at each (OFDM symbol) time index n, denoted by
g(t) = [g
0
(t), g
1
(t), ..., g
L1
(t)]
T
, can represent the discrete
CIR. The autocorrelation function of g
l
(t) = g(lT
s
, tT
b
) is
expressed as
E{g
l
(t)g
l
(t
)} =
2
l
J
0
(2f
D
(t t
)T
b
)
l,l
, (8)
where f
D
is the maximum Doppler frequency and
2
l
is the
normalized average power of each propagation tap with
L1
l=0
2
l
= 1. (9)
An typical urban (TU) power delay prole [23] is used to
model {
2
l
}.
2) Spatial Channel Model: SCM was proposed by the
3GPP for both link- and system-level simulations. The 3GPP
SCM emulates the double-directional and clustering effects of
small scale fading mechanisms in a variety of environments,
such as suburban macrocell, urban macrocell, and urban
microcell. It considers N clusters of scatterers. A cluster
can be considered as a resolvable path. Within a resolvable
path (cluster), there are M subpaths which are regarded
as the unresolvable rays. A simplied plot of the SCM is
given in Figure 2, where only one cluster of scatterers is
shown as an example. Here,
v
is the angle of the mobile
station (MS) velocity vector with respect to the MS broadside,
n,m,AoD
is the absolute angle of departure (AoD) for the
mth (m = 1, ..., M) subpath of the nth (n = 1, ..., N) path
at the base station (BS) with respect to the BS broadside,
and
n,m,AoA
is the absolute angle of arrival (AoA) for the
mth subpath of the nth path at the MS with respect to the
MS broadside. Details of the generation of SCM simulation
parameters can be found in [24].
III. SIGNAL DETECTION
For convenience, the indices in (4) are omitted. The N 1
received signal vector r
p,N
, now denoted by r, is given by
r = Hs +w, (10)
36
T
,
G
, ,
'
, ,
T
:
N
N
Clusicr
, ,
T
, ,
'
,
G
T
Tv
DS array lroadsidc
MS array lroadsidc
DS array
MS dircciion
of iravcl
MS array
SulaiI
Fig. 2. BS and MS angle parameters in the 3GPP SCM with one cluster of scatterers [24].
where H, s, and w are the N K channel matrix, the
K 1 transmitted signal vector, and the N 1 noise vector,
respectively. Let S denote the signal alphabet for symbols, i.e.,
s
k
S, where s
k
denotes the kth element of s, and its size
is denoted by M, i.e., M = |S|.
A. Conventional Detectors
We consider two conventional detection approaches: ML
and MMSE.
1) ML Detection: The ML detection nds the data symbol
vector that maximizes the likelihood function as follows:
s
ml
= arg max
sS
K
f(r|s)
= arg min
sS
K
||r Hs||
2
. (11)
To identify the ML vector, an exhaustive search is required.
Because the number of candidate vectors for s is M
K
, the
complexity grows exponentially with K.
If the a priori probability of s is available, the maximum
a posteriori (MAP) sequence detection can be formulated.
Suppose that b is a bit-level symbol vector representation
of s. The elements of b are binary and the size of b is
(K log
2
M) 1. With the a priori probability of b, the MAP
vector (at the bit-level) becomes
b
map
= arg max
b
Pr(b|r)
= arg max
b
f(r|b) Pr(b), (12)
where Pr(b) denotes the a priori probability of b. In addition,
the a posteriori probability of each bit can be found by
marginalization as
Pr(b
i
= +1|r) =
bB
+
i
Pr(b|r)
Pr(b
i
= 1|r) =
bB
i
Pr(b|r), (13)
where B
i
= {[b
1
b
2
. . . b
K
]
T
| b
i
= 1, b
m

{+1, 1}, m = i} and

K = K log
2
M.
2) MMSE Detection: It is easy to perform the (linear)
MMSE detection if the constraint on the symbol vector,
s
k
S, k, is not imposed. Using the orthogonality principle,
the MMSE estimator for s can be found as
W
mmse
= arg min
W
E[||s W
H
r||
2
]
=
E[rr
H
]
1
E[rs
H
]. (14)
We can show that
E[rr
H
] = HH
H
+
2
w
I
E[rs
H
] = H.
It follows that
W
mmse
= (HH
H
+
2
w
I)
1
H
and
s
mmse
= W
H
mmse
r
= H
H
(HH
H
+
2
w
I)
1
r. (15)
B. Proposed Detector
We assume that N K and consider the QR factorization
of the channel matrix as H = QR, where Q is unitary and
R is upper triangle. We have
x = Q
H
r = Rs +Q
H
w. (16)
Since the statistical properties of Q
H
w are identical to that
of w, Q
H
w will be denoted by w. If N = K, there is no
zero rows in R, otherwise the last N K rows would be
zero. Thus, the last N K elements of x would be ignored
37
for the detection if N > K. Accordingly, the rst K rows
of R would be considered. If there is no risk of confusion,
hereafter, we assume that the sizes of x, R, and w are K1,
K K, and K 1, respectively.
The complexity of the conventional LR based detector can
grows signicantly with the number of basis vectors. To
avoid this problem, we propose an LRLD based detection
algorithm, which breaks a high dimensional MIMO detection
problem into multiple lower dimensional MIMO sub-detection
problems.
To perform the proposed LRLD based detection, we con-
sider the partition of x as follows:
x
1
x
2
R
1
R
3
0 R
2
s
1
s
2
w
1
w
2
, (17)
where x
i
, s
i
, and w
i
denote the K
i
1 ith subvectors of x,
s, and w, i = 1, 2, respectively. Note that K
1
+ K
2
= K.
From (17), we can have two lower dimensional MIMO sub-
detection problems to detect s
1
and s
2
. It is straightforward
to extend the partition into more than two groups. However,
for the sake of simplicity, we only consider the partition into
two groups as in (17).
In the proposed LRLD based detection, the sub-detection
of s
2
is carried out rst using the LR based detector. Then,
a list of candidate vectors of s
2
is generated. With the list
of s
2
, the sub-detection of s
1
is performed with the LR based
detector. The candidate vector in the list is used for the SIC to
mitigate the interference from s
2
. The algorithm steps (AS) of
the proposed LRLD based detector is summarized as follows.
AS1) The LR based detection of s
2
is performed with the
received signal x
2
, i.e.,
c
2
= LRDet(x
2
),
where LRDet() is the function of the LR detection op-
eration (see Appendix A for details of the LR detection),
and c
2
is the estimated vector of s
2
in the corresponding
LR domain. Note that there is no interference from s
1
in
detecting s
2
.
AS2) A list of candidate vectors in the lattice-reduced domain
is generated by
C
2
= List( c
2
),
where List is a function that chooses the Q closest vectors
to c
2
(1 Q M
K2
) in the LR domain. The details of
the list generation is discussed in Appendix B.
AS3) The list of candidates of s
2
, denoted by S
2
, can be
converted from C
2
. For convenience, denote S
2
=
{s
(1)
2
, s
(2)
2
, , s
(Q)
2
}.
AS4) Once S
2
is available, the LR-based detection of s
1
can
be carried out with SIC, i.e.,
c
(q)
1
= LRDet(x
1
R
3
s
(q)
2
),
where s
(q)
2
is the qth decision vector of s
2
from list S
2
.
AS5) Let s
(q)
1
denote the signal vector corresponding to c
(q)
1
in
the LR domain and s
(q)
= [(s
(q)
1
)
T
(s
(q)
2
)
T
]
T
, the nal
decision of s is found as
s = arg min
q=1,2, ,Q
x Rs
(q)
2
.
Softbit Generation: As we are using turbo code for channel
coding, its inputs should be soft bits. The probability of the
qth candidate s
(q)
in the list can be found as
P(s
(q)
) = C
Q
exp
2
w
||x Rs
(q)
||
2
, (18)
where C
Q
is the normalization constant, which is given by
C
Q
=
1
q=1, ,Q
exp
2
w
||x Rs
(q)
||
2
.
Note that
q=1, ,Q
P(s
(q)
) = 1. (19)
Suppose that

b
(q)
is a bit-level symbol vector representation
of s
(q)
, i.e., s
(q)
= M(
b
(q)
) where M() denotes the mapping
rule. The elements of

b
(q)
are binary and the size of

b
(q)
is
K1 where

K = K log
2
M. Correspondingly, the probability
of

b
(q)
can be written as
P(
b
(q)
) = C
Q
exp
2
w
||x RM(
b
(q)
)||
2
, (20)
The soft log-likelihood ratio (LLR) value of the ith bit b
i
(i = 1, 2, ,

K) can then be obtained as
(b
i
) = log
b
(q)
B
+
i
P(
b
(q)
)
b
(q)
B
i
P(
b
(q)
)
, (21)
where B
i
= {[b
1
b
2
. . . b
K
]
T
| b
i
= 1, b
m

{+1, 1}, m = i}.
IV. IMPLEMENTATION STUDY OF THE PROPOSED
DETECTOR
In this section, we study the implementation possibility of
the proposed LRLD detector. Note that some details of the
proposed detector and denition of certain parameters, e.g.,
, , are presented in Appendix A and B.
38
A. Detector Structure
For convenience, we outline the implementation steps (IS)
required for the proposed detector as follows.
IS1) QR decomposition:
H = QR,
where
R =
R
1
R
3
0 R
2
.
IS2) Gaussian lattice reduction:
R
1
= R
1
U
1
,
R
2
= R
2
U
2
.
IS3) MMSE ltering weight matrices:
W
1
= (R
1
R
H
1

2
E
s
+||
2
2
w
I)
1
R
1
U
H
1

2
E
s
,
W
2
= (R
2
R
H
2

2
E
s
+||
2
2
w
I)
1
R
2
U
H
2

2
E
s
.
IS4) Unitary transformation:
x = Q
H
r
= Rs +w,
or
x
1
x
2
R
1
R
3
0 R
2
s
1
s
2
w
1
w
2
.
IS5) Scaling/shifting:
d
2
= x
2
+ R
2
1,
b
2
= s
2
+ 1,
d
(q)
1
= (x
1
R
3
s
(q)
2
) + R
1
1,
b
1
= s
1
+ 1.
IS6) LR based list detection: This step includes three stages:
one MMSE ltering operation to estimate c
2
(i.e.,
signal vector s
2
in the LR domain):
c
2
= W
H
2
(d
2
R
2
1) +U
1
2
1
= W
H
2
x
2
+U
1
2
1.
sorting and storing the list of c
2
(of length Q):
C
2
= {c
2
|| c
2
c
2
|| < r(Q)}.
Q parallel MMSE ltering operations to estimate c
1
with respect to each candidate of the list of c
2
:
c
(q)
1
= W
H
1
(x
1
R
3
s
(q)
2
) +U
1
1
1,
where s
(q)
2
= (U
2
c
(q)
2
1)/ and c
(q)
2
C
2
.
The implementation operations can be classied into two
types: Pre-processing and detection processing.
Pre-processing: This is often referred to as channel-rate
processing, in which all operations need to be carried out only
when there is a new channel update. All steps from IS1) to
IS3) belong to this type.
Detection Processing: This can be referred to as symbol-rate
processing. This type of processing includes all operations that
are carried out after each received signal vector arrives. In our
proposed detector, the received data will be processed in a
rst in rst out (FIFO) manner. The FIFO buffer is used to
bridge the latency incurred among the received signals. All
steps from IS4) to IS6) belong to this type.
Figure 3 shows a high-level structure of the proposed
detector with respect to hardware implementation. We will
describe each major operation next. Some operations such as
unitary transformation, shifting/scaling and nal decision are
straightforward and thus ignored. Since memory is nowadays
not a big issue in the hardware implementation, we assume that
a certain amount of memory is available wherever needed.
B. Pre-Processing
In our proposed detector, there are three dominant com-
ponents in the pre-processing stage QR decomposition,
Gaussian lattice reduction and matrix inversion operations. It
is always desirable to have a low latency in preprocessing
the channel matrices. Thus, selection of algorithm to be
implemented for each of the three above operations may well
decide the real silicon complexity. We will consider each
operation in details next.
1) QR Decomposition: As shown in [25], QR decompo-
sition is preferred to Cholesky decomposition due to the
numerical stability. In our detection algorithm, although the
QR operation is required only once for each channel update,
it still provides a signicant load of computations as the
operation is carried out to the channel matrix of full size.
We therefore study different algorithms in the literature for
the QR decomposition.
Gram-Schmidt:
The Gram-Schmidt (GS) procedure nds the QR decom-
position of a matrix H such that H = QR, where Q is
unitary and R is upper triangular. An obvious drawback of
39
Detection Processing
LR based list detector
QR Decomposition H
H
Q u
QR Memory
Scaling
/Shifting
Gaussian LR
Pre-processing
R Q,
i
R
Matrix Inversion
MMSE Filter Weight
Memory
MMSE filtering (
2
s )
List Sorting &
Memory
LR based List
i
U
x
2
W
1
W
2
x
1
x
2
d
) (
1
q
d
2
~
c
} {
) (
2 3
q
s R
Data
FIFO
r
Q
LR based List
MMSE filtering (
1
s )
s
Scaling
/Shifting
+
_
Final
Decision
LR look-up
table of
2
c
Fig. 3. High-level structure diagram of the implementation of the proposed LR based list detector.
the GS algorithm is the fact that it requires costly square-
root and division operations and that the overall computational
complexity is high. Thus, a modied version of the GS is
presented (see [26]). The details of the modied GS are
discussed in [27], [28]. The corresponding algorithm proceeds
as follows.
Gram-Schmidt algorithm:
1) initialize: Q = H, R = 0
2) for k = 1 to K
3) [R]
k,k
=
q
H
k
q
k
4) q
k
= q
k
/[R]
k,k
5) for i = k + 1 to K
6) [R]
k,i
= q
H
k
q
i
7) q
i
= q
i
[R]
k,i
q
k
8) end for
9) end for
Generally, the GS is accurate to the oating-point precision.
For xed-point arithmetic, the problem of quantization and
round-off errors is not ignorable and therefore there is loss
in accuracy (e.g., loss in the orthogonality of Q) [27]. It was
shown in [29] that the orthogonalization error (
o
) in xed-
point version of the GS algorithm is bounded by the product
of condition number (H) of matrix H and machine precision
, as follows
o
= I Q
H
Q
(K) (H),
where (K) is a low degree polynomial in K depending only
on details of computer arithmetic. This implies that for a well-
conditioned matrix, xed-point architecture for the GS is still
accurate to the integer multiples of the machine precision .
However, for ill-conditioned matrices, the computed Q can be
very far from orthogonal. Thus, we can consider the numer-
ically more favorable scheme, Householder Transformation,
which is based on unitary transformation.
Householder Transformation:
The use of unitary transformations instead of the conven-
tional methods is to alleviate the numerical problem such as
requirement of high number precision, i.e., large silicon area in
xed-point very-large-scale integration (VLSI) implementation
is required. The reason for this more favorable behavior is
that unitary transformations do not alter the length of a vector
40
and thus cannot lead to an excessive increase in dynamic
range or to an enhancement of quantization noise. Two typi-
cal algorithms using unitary transformations are Householder
Transformation and Givens Rotation. For illustrative purpose,
we overview the Householder Reection algorithm only.
The Householder Transformation algorithm recursively ap-
plies a sequence of unitary transformations Q
H
i
to matrix H
as follows:
R
(k+1)
= Q
H
k
R
(k)
,
where R
(1)
= H. Each transformation will eliminate
more subdiagonal entries until nally R = R
(K1)
=
Q
H
K1
Q
H
1
H. The unitary matrix Q
H
is readily obtained
from
Q
H
= Q
H
K1
Q
H
1
.
The algorithm can be described in details as follows.
Householder Transformation algorithm:
1) initialize: Q
(0)
= I, R
(1)
= H
2) for k = 1 to K 1
3) q
k
= r
k
+ r
k
1
4)

Q
k
= I 2
q
k
q
H
k
q
k
2
5) P
k
=
I
k1
0
0

Q
k
6) [R]
H
k+1
= P
k
R
(k)
7) Q
(k)
= P
k
Q
(k1)
8) end for
9) Q
H
= Q
(K1)
We compare the complexity of the two methods in Table
I. The Householder Reection algorithm provides a slightly
lower number of complex multiplications (CMs), divisions
and square root operations compared to the Gram-Schmidt
algorithm. In addition, for xed-point implementation, the
Householder Reection algorithm is supposed to be more
stable.
Note that (K
2
+ K(K + 1)/2) words of memory
2
are
required to store matrices Q and R at the output of the QR
decomposition operation.
2) Lattice Reduction Using Gaussian Method: In the pro-
posed LR based list detector, the LR is applied to the sub-
channel matrix R
1
and R
2
. For convenience, we consider
2
The term word of memory is referred to the amount of memory required
to store one complex number. The number of bits in one word may vary
depending on the dynamic range of the observing data. Thus, throughout the
section, we use word as a unit of memory.
these matrices of size 2 2 only. Thus, this basis-2 LR
can be carried out using the simple Gaussian method. We
can limit the maximum number of iterations in this Gaussian
lattice reduction algorithm to a small number (e.g., 2 iterations
is reasonable) while keeping the overall performance almost
the same. For the implementation purpose, we can x the
maximum number of iterations to T, and the Gaussian LR
algorithm is summarized as follows.
1) Input (b
1
, b
2
, T)
2) Set J =
0 1
1 0
and U =
1 0
0 1
3) i = 0
4) do
5) if ||b
1
|| > ||b
2
||
6) swap b
1
and b
2
, and U = UJ
7) end if
8) if | < b
2
, b
1
>| > 1/2
9)

t =
<b2,b1>
||b1||
2

10) b
2
= b
2
tb
1
and U = U
1 t
0 1
11) end if
12) i = i + 1
13) while (||b
1
|| < ||b
2
||)&&(i T)
14) return (b
1
, b
2
, U)
In the worst case where the Gaussian LR algorithm runs until
the maximum iteration i = T, the number of CMs required
for the Gaussian LR is 4T. Six words of memory are required
to store data of the unimodular matrix at the output.
3) Matrix Inversion: In our proposed detector, the dominant
complexity component in obtaining the MMSE ltering weight
matrices is the matrix inversion operations, (R
1
R
H
1

2
E
s
+
||
2
2
w
I)
1
and (R
2
R
H
2

2
E
s
+||
2
2
w
I)
1
. Fortunately, the
fact that the size of these submatrices to be inverted is reason-
ably small leads to a reasonably low load of computations. For
example, a 2 2 matrix R =
r
1,1
r
1,2
r
2,1
r
2,2
can be simply
inverted using adjoint method
R
1
=
1
r
1,2
r
2,1
r
1,1
r
1,1
r
2,2
r
2,1
r
1,2
r
1,1
,
which requires 1 division and 6 CMs.
In a general case of matrix H of size KK, the complexity
of inversion operation may vary depending on implementation
method. We overview some typical methods:
41
TABLE I
COMPLEXITY COMPARISON OF THE TWO METHODS: GRAM-SCHMIDT (GS) AND HOUSEHOLDER REFLECTION (HR)
Algorithm Division Square root Complex multiplications (CMs) CMs with K = 4
GS K K 2K
2
+ 2
K
k=1
K(K k) 80
HR K 1 K 1 2
K1
k=1
(K k + 1)
2
78
a) Adjoint Method:
H
1
=
adj(H)
det(H)
.
Unfortunately, for the matrix inversion using adjoint method,
there is no generic expression for the number of CMs as it
depends heavily on the dimension K. However, the approxi-
mated number of CMs can be of up to scale in 2
K
as [30]
C
m
a2
K
+ K
2
+ K.
b) LR Decomposition: Matrix H is decomposed into a
lower-triangular matrix L and a upper-triangular matrix R,
i.e., H
1
= R
1
L
1
. The algorithm is as follows
1) Initiate L = H, R = I
2) For i = 1 to K
3) For j = 1 to K
4) [R]
j,i
= [L]
j,i
j1
k=1
[L]
j,k
[R]
k,j
5) [L]
j,i
=
[R]j,i
[R]i,i
6) end for
7) end for
The number of CMs for matrix inversion using LR decompo-
sition is 4(K
3
K)/3.
c) QR Decomposition: Matrix H can be inverted using
QR decomposition as H
1
= R
1
Q
H
. If Gram-Schmidt
algorithm is used for QR decomposition, the total number of
CMs required for matrix inversion is (9K
3
+ 10K
2
K)/6.
In general, a major concern with matrix inversion algorithms
is the need for a high number precision which gives rise to a
large silicon area in xed-point VLSI implementations. The
two main reasons for these numerical requirements are: i)
the use of costly operation such as square root and divisions,
which leads to a signicant increase of the dynamic range
for some intermediate variables; and ii) the desire to replace
repeated divisions by multiplications with the corresponding
inverse in order to reduce the number of costly operations.
Unfortunately, multiplications often results in an enhancement
of the quantization noise and thus requires a high xed point
precision.
A VLSI architecture has therefore been proposed in [28] to
deal with numerical problems for xed-point implementation.
It was based on the QR decomposition with modied Gram-
Schmidt algorithm. The results showed that for typical 4 4
MIMO channel matrices, the architecture was able to achieve
a clock rate of 277 MHz with a latency of 18 time units and
area of 72K gates using 0.18 m CMOS technology, which
is impressive compared to previously known architectures. In
other direction, the architecture can be designed focusing on
reducing number of matrix inversions, which is well-suited to
the systems with multiple channels to be processed such as
MIMO-OFDM systems [31], [30].
C. Detection Processing
This is where all operations are carried out when a new
set of received signal symbols arrives. The resources required
for the detection processing is in fact much less compared
to the preprocessing stage. In addition, the hardware for
preprocessing can be conveniently reused for the detection
processing. As a result, the latency in the detection processing
is reasonably low. Two operations will be discussed in this
section: List sorting in the lattice domain and MMSE ltering
to nd the estimates of s
1
and s
2
.
1) List Sorting in LR Domain: The list of candidate vectors
in the LR domain is formed by
C
2
= {c
2
|| c
2
c
2
|| < r(Q)}.
The problem is that the alphabet of signal in LR domain (c
2
)
varies depending on channel. For example, while the alphabet
of s
2
is known, that of c
2
= U
1
2
(s
2
+1) depends on U
2
.
However, with Gaussian reduction method, U
2
has always a
form of
U
2
=
1 t
0 1
,
42
> @
j 2
~
c
> @
i 2
u
> @
i 2
x
> @
i j, 2
W D
+
+
n bits
m bits
m bits m bits
u
Fig. 4. Block diagram of the linear ltering operation: Inputs are x
2
, ,
W
2
and u
2
while output is c
2
.
where t is an integer. As the maximum number of iterations
in the Gaussian LR algorithm is limited to T = 2 or 3
only, we can easily obtain a known set of t (and accordingly
U
2
). Thus, a look-up table can be formed for the alphabet of
c
2
. This look-up table is formed in the pre-processing stage
after the Gaussian LR algorithm is carried out to subchannel
matrix R
2
. Memory is required to store this pre-calculated
data. For example, it requires TM words of memory to store
the alphabet of c
2
, where M is the size of alphabet of s
2
. In
addition, 2Q words are required for storing C
2
.
2) MMSE ltering: This is a matrix-multiplication based
operation. One MMSE ltering operation to estimate c
2
is
applied to received signal vector x
2
:
c
2
= W
H
2
x
2
+u
2
,
where u
2
= U
1
2
1. Q times of same operation are applied
to received signal vector x
1
:
c
(q)
1
= W
H
1
x
(q)
1
+u
1
, (22)
where u
1
= U
1
1
1 and x
(q)
1
= x
1
R
3
s
(q)
2
. Note that Q
operations in (22) can be carried out in parallel (see Figure
3). The parallel structure often allows low latency and high
throughput. The most complex steps can then be processed
in a single cycle, however, at the expense of large silicon
area. In addition, with parallel structure, memories need to
be implemented based on register les for sufcient access
bandwidth. Thus, trade-off between latency/throughput and
silicon area needs to be considered.
The weight matrices W
1
and W
2
are pre-calculated and
stored in the pre-processing stage. Note that only 8 words
of memory are needed for this storage requirement. A simple
VLSI architecture for MMSE ltering of x
2
is shown in Figure
4. Filtering operation for x
(q)
1
can be carried out similarly.
Due to different dynamic ranges, variables can be represented
by different numbers of bits (e.g., n bits for x
2
whereas m
bits for W
2
). It is expected that m > n as entries of W
2
has a larger dynamic range, thus they should be presented
with considerable number of bits for the accurate xed-point
implementation.
Memory-wise, there are 2Q words required to store the
outputs { c
1
, c
2
, , c
Q
}.
D. Fixed-Point Considerations
A critical issue in xed-point arithmetic is the difference in
dynamic ranges of variables. Number of integer and fractional
bits for each variable should be carefully determined to avoid
overows and, at the same time, not to waste hardware
resources.
For example, entries of channel matrix His usually assumed
to be Gaussian distributed, thus has a innite dynamic range.
To deal with this problem, two common approaches can be
employed:
A sufciently large number of integer bits is used for rep-
resenting H to ensure that overows occur only rarely. At
the same time, the round-off error (i.e., accumulation of
rounding errors during xed point arithmetic operations)
should be purely due to loss in fractional precision. In this
case, it is shown in [27] that the error variance varies only
with the number of fractional bits, , in the form:
2
e
= 2
2
/3.
Automatic gain control adjusts the data of H to the
available number of integer bits with an appropriate
scaling factor in which the new channel matrix become
H = H. can be chosen as
=
1
max |[H]
i,j
|
.
Depending on hardware resources, each approach can be ap-
plied. However, practical systems tend to compromise between
the two approaches.
V. SIMULATION RESULTS
We run simulations for MIMO-OFDMA LTE downlink
system with parameters being given in Table II.
Figures 5 and 6 show bit error rate (BER) performance
of different detectors for TU and SCM channels. 4-QAM is
43
TABLE II
SIMULATION PARAMETERS
Parameter Value
Center Frequency 3.5GHz
Bandwidth 10MHz
Subcarrier Spacing 15kHz
FFT size 1024
Number of usable subcarriers 601
Cyclic Prex (CP) FFT size / 8
Channel Model & Velocity TU-30km/h and SCM-3km/h
Modulation 16-QAM, Gray Mapping
Channel Coding Turbo Coding, Code Rate 1/2
Channel Estimation Ideal
Data Mapping Localized Subcarrier Pattern
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
10
6
10
5
10
4
10
3
10
2
10
1
E
b
/N
0
(dB)
B
E
R
OFDMA4QAMRate1/2TU30kmh

LR based MMSE (LLL)
Proposed LRLD (Q = 6)
Sphere ML
Fig. 5. BER performance comparison of different detectors with 4QAM
modulation and TU channel (receiver velocity of 30kmh.)
4 5 6 7 8 9 10 11 12
10
6
10
5
10
4
10
3
10
2
10
1
10
0
E
b
/N
0
(dB)
B
E
R
OFDMA4QAMRate1/2SCM3kmh

LR based MMSE (LLL)
Proposed LRLD (Q=6)
Sphere ML
Fig. 6. BER performance comparison of different detectors 4QAM modu-
lation and SCM channel (receiver velocity of 3kmh.)
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9
10
5
10
4
10
3
10
2
10
1
10
0
E
b
/N
0
(dB)
B
E
R
OFDMA16QAMRate1/2TU30kmh

LRbased MMSE (LLL)
Proposed LRLD
Sphere ML
modulation and TU channel (receiver velocity of 30kmh.)
8 9 10 11 12 13 14 15 16
10
6
10
5
10
4
10
3
10
2
10
1
10
0
E
b
/N
0
(dB)
B
E
R
OFDMA16QAMRate1/2SCM3kmh

LR based MMSE (LLL)
Proposed LRLD (Q=12)
Sphere ML
modulation and SCM channel (receiver velocity of 3kmh.)
used for modulation. We compare the proposed LRLD based
detector with the conventional LR based Minimum Mean
Square Error (MMSE) detector that uses LenstraLenstraLovsz
(LLL) algorithm [32] and the optimal sphere ML detector.
It can be seen that the proposed detector provides a near
ML performance and outperform the conventional LR based
MMSE detector. The same behaviour is observed with 16-
QAM modulation in Figures 7 and 8.
Complexity comparison: To fully examine the complexity
of different detection methods, simulation is considered and
results are shown in Figure 9 where the estimated ops using
MATLAB execution time were obtained over all operations for
each detector under the same environment. The execution time
44
F
o
r

P
e
e
r

R
e
v
i
e
w
4 5 6 7 8 9 10 11 12
5
6
7
8
9
10
11
12
x 10
13
E
b
/N
0
(dB)
E
s
t
im
a
t
e
d

F
lo
p
s
16~QAM
LR based (LLL) MMSE~SC
LR based list detector
ML sphere detector
Fig. 9. Complexity comparison.
is averaged over hundreds of thousands of channel realizations.
Note that Schnorr-Euchner algorithm [33] is used for sphere
ML detector. The LLL-reduced algorithm with reduction factor
= 3/4 [32] is chosen for the LR based MMSE-SIC detector.
No limitation on the number of iterations is imposed for any
LR algorithm. The proposed LRLD based detector clearly
requires the lowest number of ops. We can also see that the
number of ops of the proposed detector is slightly higher
than half of that of the LR based MMSE-SIC detector where
the LLL-reduced algorithm is used.
VI. CONCLUSION
An efcient signal detector based on two techniques, namely
LR and LD, has been investigated in this paper for the
MIMO-OFDMA LTE downlink systems. By generating the
list in LR domain, a more reliable list detection is obtained
to facilitate SIC detection. As a result, the proposed detector
outperforms conventional LR based detectors and provides a
near ML performance with signicantly reduced complexity.
The implementation possibility was then studied to provide
references for the real silicon implementation.
APPENDIX A
LR BASED SIGNAL DETECTION
We describe the LR based detection that is used in Steps
AS1 and AS4. Let C denote the set of complex integers or
Gaussian integers, C = Z + jZ, where Z denotes the set of
integers and j =
1. We assume that {s + |s S}
C, where and are the scaling and shifting coefcients,
TABLE III
SIGNALS AND PARAMETERS FOR THE LR-BASED DETECTION
Steps y A z c K
i
AS1) x
2
R
2
s
2
c
2
K
2
AS4) x
1
R
2
s
(q)
2
R
1
s
1
c
(q)
1
K
1
respectively. For example, for M-QAM, if M = 2
2m
, we
have
S = {s = a + jb|a, b {A, 3A, . . . , (2m1)A}},
where A =
(3E
s
/2(M 1)) and E
s
= E[|s|
2
] denotes the
symbol energy. Thus, = 1/(2A) and = ((2m1)/2)(1+
j). Note that the pair of and is not uniquely decided.
Consider the MIMO detection with the following signal:
y = Az +v, (23)
where A is a MIMO channel matrix, z S
Ki
is the signal
vector, and v is a zero-mean Gaussian noise with E[vv
H
] =
2
w
I. We scale and shift y as
d = y + A1 = A(z + 1) + v = Ab + v, (24)
where 1 = [1 1 . . . 1]
T
, and b = z + 1 C
Ki
. Let
A = AU where U is a unimodular matrix. Using any LR

algorithm including LLL algorithm [32], we can nd U that
makes the column vectors of

A shorter. It follows that
d = AUU
1
b + v =

Ac + v, (25)
where c = U
1
b. The MMSE lter to estimate c is given by
W
mmse
= min
W
E[||W
H
(d
d) (c c)||
2
]
= (AA
H
2
E
s
+||
2
2
w
I)
1
AU
H
2
E
s
, (26)
where

d = E[d] = A1, c = E[c] = U
1
1, and Cov(c) =
||
2
U
1
U
H
E
s
. The estimate of c is given by:
c = c +W
H
mmse
(d
d).
In Table III, the signals and parameters for the LR based
MMSE detection for each step are shown.
APPENDIX B
LIST GENERATION IN THE LR DOMAIN
To avoid or mitigate the error propagation, the use of a list
of candidate vectors of s
2
in detecting s
1
is crucial. Using the
45
ML metric, we can nd the candidate vectors in the list, S
2
.
Let
||r R
2
s
(1)
2
||
2
||r R
2
s
(2)
2
||
2
. . . ||r R
2
s
(M
K
2
)
2
||
2
,
where s
(q)
2
denotes the symbol vector that corresponds to the
qth largest likelihood. Therefore, an ideal list would be
S
2
= {s
(1)
2
, s
(2)
2
, . . . , s
(Q)
2
}. (27)
However, this requires an exhaustive search, which results in
a high computational complexity due to computing of R
2
s
2
for all s
2
S
K2
.
To avoid a high computational complexity, we can nd
a suboptimal list in the LR domain with low complexity.
Consider (24). According to Table III, let A = R
2
, d =
x
2
+ A1, and b = s
2
+ 1. Then, since

A = AU, we
can see that the ML metric to construct the list is given by
||d Ab|| = ||d

Ac||. (28)
It is noteworthy that the metric on the right hand side in (28)
is dened in the LR domain. Let s
2
be the signal vector in
S
K2
corresponding to c
2
and assume that s
2
is sufciently
close to s
(1)
2
. Then, we can have d

A c
2
. From this, the ML
metric (ignoring a scaling factor) for constructing the list in
the LR domain becomes
||d

Ac|| = ||
A c
2

Ac|| = || c
2
c||
A
H
A
, (29)
where ||x||
A
=
x
H
Ax is a weighted norm. The list in the
LR domain becomes
C
2
= {c
|| c
2
c||
A
H
A
< r
A
(Q)}, (30)
where r
A
(Q) > 0 is the radius of an ellipsoid centered at
c
2
, which contains Q elements in the LR domain. If the
column vectors of

A or the basis vectors in the LR domain
are orthogonal,

A
H

A becomes diagonal. Furthermore, if they
have the same norm,

A
H

A I. Thus, for nearly orthogonal
basis vectors of almost equal norm, the list of c
2
can be
approximated as
C
2
{c
|| c
2
c|| < r(Q)}, (31)
where r(Q) > 0 is the radius of a sphere centered at c
2
, which
contains Q elements. Since the LR provides a set of nearly
orthogonal basis vectors for the LR based detection, we can
see that the column vectors in

A are nearly orthogonal with a a
two-basis system. Let S
2
denotes the list in the original domain
converted from C
2
as in step AS3. Since no matrix-vector
multiplications are required to generate C
2
or S
2
, we can use
S
2
as the list in the proposed detector to reduce computational
complexity. Note that the list generated in the LR domain
is much more reliable than the list generated in the original
domain (this list is different from S
2
).
REFERENCES
[1] H. X. Nguyen, An efcient signal detection algorithm for 3GPP LTE
downlink, in Proc. IEEE International Conf. on Digital Telecommuni-
cations (ICDT 2010), Athens, Greece, Jun. 2010, pp. 77-81.
[2] 3rd Generation Partnership Project (3GPP) TR 25.814, Technical spec-
ication group radio access network: Physical layer aspects for Evolved
UTRA, http://www.3gpp.org/ftp/Specs/html-info/25814.htm.
[3] H. Ekstrom, A. Furuskar, J. Karlsson, M. Meyer, S. Parkvall, J. Torsner,
and M. Wahlqvist, Technical solutions for the 3G Long-Term Evolu-
tion, IEEE Commun. Mag., vol. 44, pp. 38-45, Mar. 2006.
[4] H. G. Myung, J. Lim, and D. J. Goodman, Single carrier FDMA for
uplink wireless transmission, IEEE Veh. Technol. Mag., vol. 1, pp. 30-
38, Sep. 2006.
[5] G. J. Foschini, G. Golden, R. Valenzuela, and P. Wolniansky, Simplied
processing for wireless communication at high spectral efciency, IEEE
J. Select. Areas Commun., no. 11, pp. 1841-1852, 1999.
[6] W. J. Choi, R. Negi, and J. Ciof, Combined ML and DFE decoding
fo the V-BLAST system, in Proc. IEEE International Conf. Communi-
cations, New Orleans, LA, 2000, pp. 1243-1248.
[7] J. Luo, K. Pattipati, P. Willett, and F. Hasegawa, Near optimal multiuser
detection in synchronous CDMA using probabilistic data association,
IEEE Commun. Lett., vol. 5, pp. 361-363, Sep. 2001.
[8] D. Pham, K. R. Pattipati, P. K. Willett, and J. Luo A generalized
probabilistic data association detector for multiple antenna systems,
IEEE Commun. Lett., vol. 8, no. 4, April 2004.
[9] J. Choi, On the partial MAP detection with applications to MIMO
channels, IEEE Trans. Signal Proc., vol.53, pp.158-167, Jan. 2005.
[10] D. J. Love, S. Hosur, A. Batra, and R. W. Heath, Chase decoding for
space-time codes, in Proc. IEEE Vehicular Technology Conf., vol. 3,
Nov. 2004, pp. 1663-1667.
[11] D. W. Waters and J. R. Barry, The Chase family of detection algorithms
for multiple-input multiple-output channels, IEEE Trans. Signal Proc.,
vol. 56, No. 2, pp. 739-747, February 2008.
[12] H. Yao and G. W. Wornell, Lattice-reduction-aided detectors for MIMO
communication systems, in Proc. IEEE Global Telecommunications
Conf., Taiwan, Nov. 2002, pp. 424-428.
[13] D. Wubben, R. Bohnke, V. Kuhn and K. -D. Kammeyer, Near-
maximum-likelihood detection of MIMO systems using MMSE-based
lattice reduction in Proc. IEEE International Conf. Communications,
vol. 2, Paris, Jun. 2004. pp. 798-802.
[14] H. Bolcskei, MIMO-OFDM wireless systems: Basics, perspectives, and
challenges, IEEE Wireless Commmun., vol. 13, pp. 31-37, Aug. 2006.
[15] D. Perels, S. Haene, P. Luethi, A. Burg, N. Felber, W. Fichtner, and
H. Bolcskei, ASIC Implementation of a MIMO-OFDM Transceiver for
192 Mbps WLANs, Proc. ESSCIRC, Grenoble, France, 2005, pp. 215-
218.
46
[16] Z. Guo and P. Nilsson, A VLSI implementation of MIMO detection for
future wireless communications, in Proc. 14th IEEE 2003 Int. Symp.
Personal, Indoor and Mobile Radio Communication, 2003, pp. 2852 -
2856.
[17] G. Knagge, L. Davis, G. Woodwar, S. R. Weller, VLSI preprocessing
techniques for MUD and MIMO sphere detection, in Proc. 6th Aus-
tralian Communications Theory Workshop, Feb. 2005, pp. 221 - 228.
[18] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and H.
Bolcskei VLSI implementation of MIMO detection using the sphere
decoding algorithm, IEEE J. Solid-State Circuits, vol. 40, pp. 1566 -
1577, Jul. 2005.
[19] A. Burg, M. Borgmann, M. Wenk, C. Studer, and H. Bolcskei, Ad-
vanced receiver algorithms for MIMO wireless communications, in
Proc. Design, Automation and Test in Europe (DATE 06), vol. 1, Mar.
2006.
[20] C. Studer, A. Burg, and H. Bolcskei, Soft-output sphere decoding:
Algorithms and VLSI implementation, submitted to IEEE J. Select.
Areas Commun., Apr. 2007.
[21] D. Garrett, L. Davis, S. ten Brink, B. Hochwald, and G. Knagge, Silicon
complexity for maximum likelihood MIMO detection using spherical
decoding, IEEE J. Solid-State Circuits, vol. 39, pp. 1544 - 1552, Sep.
2004.
[22] S. Chen, T. Zhang, and Y. Xin, Relaxed K-Best MIMO signal detector
design and VLSI implementation, IEEE Trans. VLSI Syst., vol. 15, pp.
328 - 337, Mar. 2007.
[23] R. Steele, Mobile Radio Communications, New York: IEEE Press, 1992.
[24] 3GPP, TR 25.996, Spatial channel model for multiple input multiple
output (MIMO) simulations (Rel. 6), 2003.
[25] L. M. Davis, Scaled and decoupled cholesky and QR decompositions
with application to spherical MIMO detection, in Proc. IEEE Wireless
Communications and Networking Conf., vol. 1, Mar. 2003, pp. 326-331.
[26] G. H. Golub and C. F. V. Loan, Matrix computations, 3rd ed. Baltimore,
MD: John Hopkins University Press, 1996.
[27] C. K. Singh, S. H. Prasad, and P. T. Balsara, A xed-point implementa-
tion for QR Decomposition, in Proc. 2006 IEEE Dallas/CAS Workshop
Design, Applications, Integration and Software, Oct. 2006, pp. 75-78.
[28] C. K. Singh, S. H. Prasad, and P. T. Balsara, VLSI architecture for ma-
trix inversion using modied Gram-Schmidt based QR decomposition,
in Proc. 20th IEEE Int. Conf. VLSI Design, Jan. 2007, pp. 836-841.
[29] A. Bjorck, and C. Paige, Loss and recapture of orthogonality in the
modied gram-schmidt algorithm, SIAM J. Matrix Anal. Appl., vol.
13(1), pp. 176-190, 1992.
[30] M. Borgmann and H. Bolcskei, Interpolation-based efcient matrix
inversion for MIMO-OFDM receivers, in Proc. 38th Asilomar Conf.
Signals, Systems, Computers, vol. 2, Pacic Grove, CA, Nov. 2004, pp.
1941-1947.
[31] D. Cescato, M. Borgmann, H. Bolcskei, J. Hansen, and A. Burg,
Interpolation-based QR decomposition in MIMO-OFDM systems, in
Proc. 6th IEEE Workshop Signal Processing Advances in Wireless
Communications (SPAWC), New York, NY, Jun. 2005, pp. 945-949.
[32] A. K. Lenstra, J. H. W. Lenstra, and L. Lovasz, Factorizing polynomials
with rational coefcients, Math. Ann., vol. 261, pp. 515-534, 1982.
[33] C. P. Schnorr and M. Euchner, Lattice basis reduction: Improved prac-
tical algorithms and solving subset sum problems, Math.Programming,
vol. 66, pp. 181-191, 1994.
47

Signal Detection For 3GPP LTE Downlink: Algorithm and Implementation

Uploaded by

Copyright:

Available Formats

Signal Detection For 3GPP LTE Downlink: Algorithm and Implementation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Signal Detection For 3GPP LTE Downlink: Algorithm and Implementation

Uploaded by

Copyright:

Available Formats

Signal Detection for 3GPP LTE Downlink:

Algorithm and Implementation

A = AU where U is a unimodular matrix. Using any LR

You might also like