0% found this document useful (0 votes)
83 views

Using Linear Programming To Decode Binary Linear Codecs

A new method is given for performing approximate maximum-likelihood (ML) decoding of an arbitrary binary linear code based on observations received from any discrete memory less symmetric channel.

Uploaded by

Tony Stark
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Using Linear Programming To Decode Binary Linear Codecs

A new method is given for performing approximate maximum-likelihood (ML) decoding of an arbitrary binary linear code based on observations received from any discrete memory less symmetric channel.

Uploaded by

Tony Stark
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

954

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

Using Linear Programming to Decode


Binary Linear Codes
Jon Feldman, Martin J. Wainwright, Member, IEEE, and David R. Karger, Associate Member, IEEE

AbstractA new method is given for performing approximate


maximum-likelihood (ML) decoding of an arbitrary binary linear
code based on observations received from any discrete memoryless
symmetric channel. The decoding algorithm is based on a linear
programming (LP) relaxation that is defined by a factor graph
or parity-check representation of the code. The resulting LP decoder generalizes our previous work on turbo-like codes.
A precise combinatorial characterization of when the LP decoder succeeds is provided, based on pseudocodewords associated
with the factor graph. Our definition of a pseudocodeword unifies
other such notions known for iterative algorithms, including stopping sets, irreducible closed walks, trellis cycles, deviation
sets, and graph covers.
The fractional distance frac of a code is introduced, which is a
lower bound on the classical distance. It is shown that the efficient
LP decoder will correct up to frac
errors and that there
1
are codes with frac
. An efficient algorithm to compute the fractional distance is presented. Experimental evidence
shows a similar performance on low-density parity-check (LDPC)
codes between LP decoding and the min-sum and sum-product algorithms. Methods for tightening the LP relaxation to improve performance are also provided.

=
(

Index TermsBelief propagation (BP), iterative decoding, lowdensity parity-check (LDPC) codes, linear codes, linear programming (LP), LP decoding, minimum distance, pseudocodewords.

I. INTRODUCTION

OW-density parity-check (LDPC) codes were first discovered by Gallager in 1962 [7]. In the 1990s, they were rediscovered by a number of researchers [8], [4], [9], and have
since received a lot of attention. The error-correcting performance of these codes is unsurpassed; in fact, Chung et al. [10]
have given a family of LDPC codes that come within 0.0045 dB
of the capacity of the channel (as the block length goes to infinity). The decoders most often used for this family are based
Manuscript received May 6, 2003; revised December 8, 2004. The work of
J. Feldman was conducted while the author was at the MIT Laboratory of Computer Science and supported in part by the National Science Foundation Postdoctoral Research Fellowship DMS-0303407. The work of D. Karger was supported in part by the National Science Foundation under Contract CCR-9624239
and a David and Lucille Packard Foundation Fellowship. The material in this
paper was presented in part at the Conference on Information Sciences and Systems, Baltimore, MD, June 2003.
J. Feldman is with the Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027 USA (e-mail:
jonfeld@ieor.columbia.edu).
M. J. Wainwright is with the Department of Electrical Engineering and
Computer Science and the Department of Statistics, University of California,
Berkeley, Berkeley, CA, 94720 USA (e-mail: wainwrig@eecs.berkeley.edu).
D. R. Karger is with the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, MA 02139
USA (e-mail: karger@mit.edu).
Communicated by R. L. Urbanke, Associate Editor for Coding Techniques.
Digital Object Identifier 10.1109/TIT.2004.842696

on the belief-propagation algorithm [11], where messages are


iteratively sent across a factor graph modeling the structure of
the code. While the performance of this decoder is quite good,
analyzing its behavior is often difficult when the factor graph
contains cycles.
In this paper, we introduce a new algorithm for decoding
an arbitrary binary linear code based on the method of linear
programming (LP) relaxation. We design a polytope that contains all valid codewords, and an objective function for which
the maximum-likelihood (ML) codeword is the optimum point
with integral coordinates. We use linear programming to find the
polytopes (possibly fractional) optimum, and achieve success
when that optimum is the transmitted codeword. Experiments
on LDPC codes show that the performance of the resulting LP
decoder is better than the iterative min-sum algorithm. In addition, the LP decoder has the ML certificate property; whenever
it outputs a codeword, it is guaranteed to be the ML codeword.
None of the standard iterative methods are known to have this
desirable property.
A desirable feature of the LP decoder is its amenability to
analysis. We introduce a variety of techniques for analyzing the
performance of this algorithm. We give an exact combinatorial
characterization of the conditions for LP decoding success, even
in the presence of cycles in the factor graph. This characterization holds for any discrete memoryless symmetric channel; in
such channels, a linear cost function can be defined on the code
bits such that that the lowest cost codeword is the ML codeword.
We define the set of pseudocodewords, which is a superset of the
set of codewords, and we prove that the LP decoder always finds
the lowest cost pseudocodeword. Thus, the LP decoder succeeds
if and only if the lowest cost pseudocodeword is actually the
transmitted codeword.
of
Next we define the notion of the fractional distance
a factor graph, which is essentially the minimum distance between a codeword and a pseudocodeword. In analogy to the performance guarantees of exact ML decoding with respect to classical distance, we prove that the LP decoder can correct up to
errors in the binary-symmetric channel (BSC). We
prove that the fractional distance of a linear code with check degree at least three is at least exponential in the girth of the graph
associated with that code. Thus, given a graph with logarithmic
,
girth, the fractional distance can be lower-bounded by
for some constant , where is the code length.
For the case of LDPC codes, we show how to compute the
fractional distance efficiently. This fractional distance is not
only useful for evaluating the performance of the code under
LP decoding, but it also serves as a lower bound on the true
distance of the code.

0018-9448/$20.00 2005 IEEE

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

A. Relation to Iterative Algorithms


The techniques used by Chung et al. [10] to analyze LDPC
codes are based on those of Richardson and Urbanke [12] and
Luby et al. [13], who give an algorithm to calculate the threshold
of a randomly constructed LDPC code. This threshold acts as a
limit on the channel noise; if the noise is below the threshold,
then reliable decoding (using belief propagation (BP)) can be
achieved as the block length goes to infinity.
The threshold analysis is based on the idea of considering an
ensemble of codes for the purposes of analysis, then averaging
the behavior of this ensemble as the block length of the code
goes to infinity. For many ensembles, it is known [3] that for any
constant , the difference in error rate (under belief-propagation
decoding) between a random code and the average code in the
ensemble is less than with probability exponentially small in
the block length.
Calculating the error rate of the ensemble average can become difficult when the factor graph contains cycles; because
iterative algorithms can traverse cycles repeatedly, noise in the
channel can affect the final decision in complicated, highly dependent ways. This complication is avoided by considering the
limiting case of infinite block length, such that the probability
of a message traversing a cycle converges to zero. However, for
many practical block lengths, this leads to a poor approximation
of the true error rate [3].
Therefore, it is valuable to examine the behavior of a code
ensemble at fixed block lengths, and try to analyze the effect of
cycles. Recently, Di et al. [3] took on the finite length analysis
of LDPC codes under the binary erasure channel (BEC). Key
to their results is the notion of a purely combinatorial structure
known as a stopping set. BP fails if and only if a stopping set
exists among the erased bits; therefore, the error rate of BP is
reduced to a purely combinatorial question.
For the case of the BEC, we show that the pseudocodewords
we define in this paper are exactly stopping sets. Thus, the performance of the LP decoder is equivalent to BP on the BEC. Our
notion of a pseudocodeword also unifies other known results
for particular cases of codes and channels. For tail-biting trellises, our pseudocodewords are equivalent to those introduced
by Forney et al. [5]. Also, when applied to the analysis of computation trees for min-sum decoding, pseudocodewords have a
connection to the deviation sets defined by Wiberg [4], and refined by Forney et al. [6] and Frey, Koetter, and Vardy [14].
B. Previous Results
In previous work [1], [2], we introduced the approach of decoding any turbo-like code based on similar network flow and
linear programming relaxation techniques. We gave a precise
combinatorial characterization of the conditions under which
this decoder succeeds. We used properties of this LP decoder to
design a raterepeataccumulate (RA) code (a certain class
of simple turbo codes), and proved an upper bound on the probability of decoding error. We also showed how to derive a more
classical iterative algorithm whose performance is identical to
that of our LP decoder.

955

C. Outline
We begin the paper in Section II by giving background on
factor graphs for binary linear codes, and the ML decoding
problem. We present the LP relaxation of ML decoding in
Section III. In Section IV, we discuss the basic analysis of
LP decoding. We define pseudocodewords in Section V, and
fractional distance in Section VI. In Section VII, we draw
connections between various iterative decoding algorithms and
our LP decoder, and present some experiments. In Section VIII,
we discuss various methods for tightening the LP in order to
get even better performance. We conclude and discuss future
work in Section IX.
D. Notes and Recent Developments
Preliminary forms of part of the work in this paper have appeared in the conference papers [15], [16], and in the thesis of
one of the authors [17]. Since the submission of this work, it
has been shown that the LP decoder defined here can correct a
constant fraction of error in certain LDPC codes [18], and that
a variant of the LP can achieve capacity using expander codes
[19].
Additionally, relationships between LP decoding and iterative
decoding have been further refined. Discovered independently
of this work, Koetter and Vontobels notion of a graph cover
[20] is equivalent to the notion of a pseudocodeword graph
defined here. More recent work by the same authors [21], [22]
explores these notions in more detail, and gives new bounds for
error performance.
II. BACKGROUND
A linear code with parity-check matrix
can be represented by a Tanner or factor graph , which is defined in the
and
following way. Let
be indices for the columns (respectively, rows) of the
parity-check matrix of the code. With this notation, is a bipartite graph with independent node sets and . We refer to
the nodes in as variable nodes, and the nodes in as check
nodes. All edges in have one endpoint in and the other in
. For each
, the edge
is included in if
.
and only if
, denoted by
,
The neighborhood of a check node
is the set of nodes
such that check node is incident to
be the set of check
variable node in . Similarly, we let
.
nodes incident to a particular variable node
a value in
Imagine assigning to each variable node
, representing the value of a particular code bit. A paritycheck node is satisfied if the collection of bits assigned to
have even parity. The binary
the variable nodes s.t.
is a codeword if and only if all check
vector
nodes are satisfied. Fig. 1 shows an example of a linear code
and its associated factor graph. In this Hamming code, if we set
, and
, then the
neighborhood of every check node has even parity. Therefore,
represents a codeword, which we can write as
. Other
,
, and
.
codewords include
denote the maximum variable (left) degree of the
Let
, of the
factor graph; i.e., the maximum, among all nodes

956

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

We will frequently exploit the fact that the cost vector can
be uniformly rescaled without affecting the solution of the ML
problem. In the BSC, for example, rescaling by
allows us to assume that
if
, and
if
.
III. DECODING WITH LINEAR PROGRAMMING
A factor graph for the (7; 4; 3) Hamming code. The nodes
f1; 2; 3; 4; 5; 6; 7g drawn in open circles correspond to variable nodes,
whereas nodes fA; B; C g in black squares correspond to check nodes.
Fig. 1.

degree of . Let
denote the minimum variable degree. Let
and
denote the maximum and minimum check (right) degree
of the factor graph.
A. Channel Assumptions
A binary-codeword of length is sent over a noisy channel,
and a corrupted word is received. In this paper, we assume
an arbitrary discrete memoryless symmetric channel. We use
to denote the probability that was the
the notation
codeword sent over the channel, given that was received. We
assume that all information words are equally likely a priori. By
Bayes rule, this assumption implies that

for any code . Moreover, the memoryless property of the


channel implies that

Let be the space of possible received symbols. For example,


, and in the additive white Gaussian
in the BSC,
. By symmetry, the set can be
noise (AWGN) channel,
partitioned into pairs
such that
(1)
and
(2)
B. ML Decoding
Given the received word , the ML decoding problem is to
. It is equivalent to
find the codeword that maximizes
minimizing the negative log-likelihood, which we will call our
cost function. Using our assumptions on the channel, this cost
, where
function can be written as
(3)
is the (known) negative log-likelihood ratio (LLR) at each variable node. For example, given a BSC with crossover probability
, we set
if the received bit
, and
if
. The interpretation of is the
. Note that this cost may be negative,
cost of decoding
if decoding to is the better choice.

In this section, we formulate the ML decoding problem for


an arbitrary binary linear code, and show that it is equivalent to
solving a linear program over the codeword polytope. We then
define a modified linear program that represents a relaxation of
the exact problem.
A. Codeword Polytope
To motivate our LP relaxation, we first show how ML decoding can formulated as an equivalent LP. For a given code ,
we define the codeword polytope to be the convex hull of all
possible codewords

Note that
is a polytope contained within the -hyper, and includes exactly those vertices of the hypercube
correcube corresponding to codewords. Every point in
sponds to a vector
, where element is defined
.
by the summation
The vertices of a polytope are those points that cannot be expressed as convex combinations of other points in the polytope.
A key fact is that any linear program attains its optimum at a
vertex of the polytope [23]. Consequently, the optimum will al, and these vertices are
ways be attained at a vertex of
in one-to-one correspondence with codewords.
We can therefore define ML decoding as the problem of minsubject to the constraint
. This
imizing
formulation is a linear program, since it involves minimizing a
.
linear cost function over the polytope
B. LP Relaxation
The most common practical method for solving a linear program is the simplex algorithm [23], which generally requires an
explicit representation of the constraints. In the LP formulation
of exact ML decoding we have just described, although
can be characterized by a finite number of linear constraints,
the number of constraints is exponential in the code length .
Even the Ellipsoid algorithm [24], which does not require such
an explicit representation, is not useful in this case, since ML
decoding is NP-hard in general [25].
Therefore, our strategy will be to formulate a relaxed polytope, one that contains all the codewords, but has a more manageable representation. More concretely, we motivate our LP
relaxation with the following observation. Each check node in a
factor graph defines a local code; i.e., the set of binary vectors
that have even weight on its neighborhood variables. The global
code corresponds to the intersection of all the local codes. In LP

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

terminology, each check node defines a local codeword polytope


(meaning the set of convex combinations of local codewords),
and our global relaxation polytope will be the intersection of all
of these polytopes.
to denote our code bits.
We use the variables
Naturally, we have
(4)
To define a local codeword polytope, we consider the set of
that are neighbors of a given check node
variable nodes
. Of interest are subsets
that contain an even
number of variable nodes; each such subset corresponds to a
for each index
local codeword set, defined by setting
,
for each
but
, and setting all other
arbitrarily.
even , we
For each in the set
, which is an indicator
introduce an auxiliary LP variable
for the local codeword set associated with notionally, setting
equal to indicates that is the set of bits of
that
is also present for each
are set to . Note that the variable
equal
parity check, and it represents setting all variables in
to zero.
must satisfy the
As indicator variables, the variables
constraints
(5)
The variable
can also be seen as indicating that the codeword satisfies check using the configuration . Since each
parity check is satisfied with one particular even-sized subset of
nodes in its neighborhood set to one, we may enforce
(6)
as a constraint that is satisfied by every codeword. Finally, the
at each variable node must belong to the local
indicator
codeword polytope associated with check node . This leads to
the constraint
(7)
Let the polytope
be the set of points
such that (4)(7)
be the intersection of
hold for check node . Let
such that (4)(7)
these polytopes; i.e., the set of points
hold for all
. Overall, the Linear Code Linear Program
(LCLP) corresponds to the problem
minimize

s.t.

(8)

An integral point in a polytope (also referred to as an integral


solution to a linear program) is a point in the polytope whose
values are all integers. We begin by observing that there is a
one-to-one correspondence between codewords and integral solutions to LCLP.
, the seProposition 1: For all integral points
represents a codeword. Furthermore, for all
quence
codewords
, there exists a
such that
is
for all
.
an integral point in where

957

is a point in
where all
Proof: Suppose
and all
. Now suppose
is not a
codeword, and let be some parity check unsatisfied by setting
for all
. By the constraints (6), and the fact that
is integral,
for some
, and
for all
where
. By the constraints (7), we have
other
for all
, and
for all
,
. Since
is even, is satisfied by setting
, a contradiction.
be a codeFor the second part of the claim, let
. For all
, let be the set of nodes
word, and let
in
where
. Since
is a codeword,
check is satisfied by , so
is even, and the variable
is
and
for all other
. All
present. Set
constraints are satisfied, and all variables are integral.
Overall, the decoding algorithm based on LCLP consists
of the following steps. We first solve the LP in (8) to obtain
. If
, we output it as the optimal codeis fractional, and we output an error.
word; otherwise,
From Proposition 1, we get the following.
Proposition 2: LP decoding has the ML certificate property:
if the algorithm outputs a codeword, it is guaranteed to be the
ML codeword.
Proof: If the algorithm outputs a codeword , then
has cost less than or equal to all points in . For some codeword
, we have that
is a point in by Proposition 1.
Therefore, has cost less than or equal to .
Given a cycle-free factor graph, it can be shown that any optimal solution to LCLP is integral [26]. Therefore, LCLP is an
exact formulation of the ML decoding problem in the cycle-free
case. In contrast, for a factor graph with cycles, the optimal solution to LCLP may not be integral. Take, for example, the Hamming code in Fig. 1. Suppose that we define a cost vector as
follows: for variable node , set
, and for all other
, set
. It is not hard to verify that
nodes
under this cost function, all codewords have nonnegative cost:
any codeword with negative cost would have to set
, and
therefore set at least two other
, for a total cost of at least
. Consider, however, the following fractional solution to
LCLP: first, set
and then for check
node , set
; at check node , assign
; and lastly at check node , set
. It can be verified that
satisfies
all of the LCLP constraints. However, the cost of this solution
is
, which is strictly less than the cost of any codeword.
Note that this solution is not a convex combination of codewords, and so is not contained in
. This solution gets
outside of
by exploiting the local perspective of the relaxation: check node is satisfied by using the configuration
, whereas in check node , the configuration
is not
used. The analysis to follow will provide further insight into the
nature of such fractional (i.e., nonintegral) solutions to LCLP.
It is worthwhile noting that the local codeword constraints (7)
are identical to those enforced in the Bethe free energy formulation of BP [27]. For this reason, it is not surprising that the
performance of our LP decoder turns out to be closely related to
that of the BP and min-sum algorithms.

958

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

C. LP Solving and Polytope Representations


The efficiency of practical LP solvers depends on how the
LP is represented. An LP is defined by a set of variables, a
cost function, and a polytope (set of linear constraints). The
ellipsoid algorithm is guaranteed to run in time polynomial in
the size of the LP representation, which is proportional to the
number of variables and constraints. The simplex algorithm,
though not guaranteed to run efficiently in the worst case, still
has a dependence on the representation size, and is usually
much more efficient than the ellipsoid algorithm. For more details on solving linear programs, we refer the reader to standard
texts [28], [23].
described by (4)(7) is the most intuitive
The polytope
form of the relaxation. For LDPC codes, has size linear in
. Thus, the ellipsoid algorithm is provably efficient, and we
can also reasonably expect the simplex algorithm to be even
more efficient in practice. For arbitrary binary linear codes, the
number of constraints in is exponential in the degree of each
(as one
check node. So, if some check node has degree
would expect in random codes, for example), the polytope has
a number of constraints that is exponential in . Therefore, to
solve efficiently, we need to define a smaller polytope that
produces the same results.
Alternative representations of the LP are useful for analytical
purposes as well. We will see this when we discuss fractional
distance in Section VI.
1) Polytope Equivalence: All the polytopes we use in this
. They may also involve auxpaper have variables for all
variables in the description
iliary variables, such as the
of . However, if two polytopes share the same set of possible
variables, then we may use either one. We
settings to the
formalize this notion.
Definition 3: Let
where
. We define

be some polytope defined over variables


, as well as some auxiliary variables
s.t.

as the projection of onto the


we say is equivalent to if

variables. Given such a

Fig. 2. The equivalence of the polytopes


and
in three dimensions.
The polytope
is defined as the set of points inside the unit hypercube with
distance at least one from all odd-weight hypercube vertices. The polytope
is the convex hull of even-weight hypercube vertices.

l
Q

. Then, for every check , we explicitly forbid every bad configuration of the neighborhood of . Specifically, for all
,
odd, we require
(9)
Note that the integral settings of the bits that satisfy these constraints for some check are exactly the local codewords for ,
as before.
be the set of points that satisfy (9) for a particular
Let
check , and all
with
odd. We can further underby rewriting (9) as follows:
stand the constraints in
(10)
In other words, the distance between (the relevant portion of)
and and the incidence vector for each set is at least one. This
constraint ensures that is separated by at least one bit flip from
),
all illegal configurations. In three dimensions (i.e,
it is easy to see that these constraints are equivalent to the convex
, as shown in Fig. 2. In
hull of the even-sized subsets
fact, the following theorem states that in general, if we enforce
(9) for all checks, we get an explicit description of .
Theorem 4: Let the polytope
equivalent. In other words, the polytope

. Then

and

are

s.t.
In other words, we require that the projections of and onto
the
variables are the same. Since the objective function of
variables, optimizing over and
LCLP only involves the
will produce the same result.
In the remainder of this section we define two new polytopes.
The first is an explicit description of that will be useful for
defining (and computing) the fractional distance of the code,
which we cover in Section VI. The second polytope is equivalent
to , but has a small overall representation, even for high-density codes. This equivalence shows that LCLP can be solved efficiently for any binary linear code.
2) Projected Polytope: In this subsection, we derive an explicit description of the polytope . The following definition of
in terms of constraints on was derived from the parity polyfor all
tope of Jeroslow [29], [30]. We first enforce

is exactly the set of points that satisfy (9) for all checks and
where
odd.
all
is the set of points
that satisfy
Proof: Recall that
the local codeword polytope for check . Consider the projection
s.t.
In other words,
is the convex hull of local codeword sets
defined by sets
. Note that
, since each
exactly expresses the constraints associated with check .
is the set of points that satisfy the constraints
Recall that
(9) for a particular check . Since
, it suffices to
for all
. This is shown by Jeroslow [29].
show
For completeness, we include a proof of this fact in Appendix I.

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

3) High-Density Code Polytope: Recall that


is the maximum degree of any check node in the graph. As stated, LCLP
variables and constraints. For turbo and LDPC
has
codes, this complexity is linear in , since
is constant.
For arbitrary binary linear codes, we give a characterization
of LCLP with

variables and constraints. To derive this characterization, we


give a new polytope for each local codeword polytope, based
on a construction of Yannakakis [30], whose size does not have
an exponential dependence on the size of the check neighborhood. We refer to this representation of the polytope as . The
details of this representation, as well as a proof that and are
equivalent, can be found in Appendix II.
IV. ANALYSIS OF LP DECODING
When using the LP decoding method, an error can arise in one
is not integral, in which
of two ways. Either the LP optimum
case the algorithm outputs error; or, the LP optimum may be
integral (and therefore corresponds to the ML codeword), but
the ML codeword is not what was transmitted. In this latter case,
the code itself has failed, so even exact ML decoding would
make an error.
to denote the probability that
We use the notation
the LP decoder makes an error, given that was transmitted. By
to LCLP
Proposition 1, there is some feasible solution
corresponding to the transmitted codeword . We can characterize the conditions under which LP decoding will succeed as
follows.
Theorem 5: Suppose the codeword is transmitted. If all feahave cost more than
sible solutions to LCLP other than
the cost of
, the LCLP decoder succeeds. If some solu, the decoder
tion to LCLP has cost less than the cost of
fails.
is a feasible solution to
Proof: By Proposition 1,
LCLP. If all feasible solutions to LCLP other than
have
, then
must be the
cost more than the cost of
unique optimal solution to LCLP. Therefore, the decoder will
output , which is the transmitted codeword.
If some solution
to LCLP has cost less than the cost of
, then
is not an optimal solution to LCLP. Since
variables do not affect the cost of the solution, it must
the
. Therefore, the decoder either outputs error, or
be that
it outputs , which is not the transmitted codeword.
In the degenerate case where
is one of multiple optima of LCLP, the decoder may or may not succeed. We will be
conservative and consider this case to be decoding failure, and
so by Theorem 5

(11)
We now proceed to provide combinatorial characterizations of
decoding success and analyze the performance of LP decoding
in various settings.

959

A. The All-Zeros Assumption


When analyzing linear codes, it is common to assume that
the codeword sent over the channel is the all-zeros vector (i.e.,
), since it tends to simplify analysis. In the context of our
LP relaxation, however, the validity of this assumption is not
immediately clear. In this section, we prove that one can make
the all-zeros assumption when analyzing LCLP. Basically, this
follows from the fact that the polytope is highly symmetric;
from any codeword, the polytope looks exactly the same.
Theorem 6: The probability that the LP decoder fails is independent of the codeword that was transmitted.
Proof: See Appendix III.
From this point forward in our analysis of LP decoding, we
assume that the all-zeros codeword was the transmitted codeword. Since the all-zeros codeword has zero cost, Theorem 5,
along with our consideration of multiple LP optima as failure,
gives the following.
Corollary 7: Given that the all-zeros codeword was transmitted (which we may assume by Theorem 6), the LP decoder
other than
will fail if and only if there is some point in
with cost less than or equal to zero.

V. PSEUDOCODEWORDS
In this section, we introduce the concept of a pseudocodeword
for LP decoding, which we will define as a scaled version of a
solution to LCLP. As a consequence, Theorem 5 will hold for
pseudocodewords in the same way that it holds for solutions to
LCLP.
The following definition of a codeword motivates the notion
is the set of even-sized subof a pseudocodeword. Recall that
sets of the neighborhood of check node . Let
.
, and let be a setting of nonnegative
Let be a vector in
for each check and
.
integer weights, one weight
is a codeword if, for all edges
in the
We say that
factor graph ,
. This corresponds exactly to the consistency constraint (7) in LCLP. It is not difficult
to see that this construction guarantees that the binary vector
is always a codeword of the original code.
by reWe obtain the definition of a pseudocodeword
moving the restriction
, and instead allowing each
to take on arbitrary nonnegative integer values. In other words,
of nonnegative
a pseudocodeword is a vector
, the neighborintegers such that, for every parity check
hood
is a sum of local codewords (incidence
vectors of even-sized sets in ).
With this definition, any codeword is (trivially) a pseudocodeword as well; moreover, any sum of codewords is a pseudocodeword. However, in general, there exist pseudocodewords that
cannot be decomposed into a sum of codewords. As an illustration, consider the Hamming code of Fig. 1; earlier, we constructed a fractional LCLP solution for this code. If we simply
scale this fractional solution by a factor of two, the result is
of the following form. We begin by
a pseudocodeword

960

setting
pseudocodeword, set

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

. To satisfy the constraints of a

We refer to the set of copies of the variable node as

and the copies of the check node


This pseudocodeword cannot be expressed as the sum of individual codewords.
In the following, we use the fact that all optimum points of a
linear program with rational coefficients are themselves rational
[23]. Using simple scaling arguments, and the all-zeros assumption, we can restate Corollary 7 in terms of pseudocodewords as
follows.
Theorem 8: Given that the all-zeros codeword was transmitted (which we may assume by Theorem 6), the LP decoder
will fail if and only if there is some pseudocodeword
,
, where
.
be the opProof: Suppose the decoder fails. Let
. By
timal point of the LP, the point in that minimizes
. Construct a pseudocodeword
Corollary 7,
as follows. Let be the lowest common denominator of the
and
, which exists because
is the optimal point of the
is
LP and all optimal points of the LP are rational. Then
, and
is an integer for all for all
an integer for all
and sets
. For all bits , set
; for all
, set
.
checks and sets
meets the definition of a
By the constraints (7) of ,
is exactly
. Since
pseudocodeword. The cost of
, we have
. This implies that
. Since
and
, we see that
.
To establish the converse, suppose a pseudocodeword
,
, has
. Let
. We
as follows: Set
for all
construct a point
code bits . For all checks , do the following:
for all sets
;
i) set
.
ii) set
We must handle
as a special case since
does not exist.
By construction, and the definition of a pseudocodeword,
meets all the constraints of the polytope . Since
, we
. The cost of
is exactly
. Since
have
, the point
has cost less than or equal to zero.
Therefore, by Corollary 7, the LP decoder fails.
This theorem will be essential in proving the equivalence to
iterative decoding in the BEC in Section VII.
A. Pseudocodeword Graphs
A codeword corresponds to a particular subgraph of the factor
graph . In particular, the vertex set of this subgraph consists
of all the variable nodes
for which
, as well as all
check nodes to which these variable nodes are incident.
Any pseudocodeword
can be associated with a graph
in an analogous way. The graph consists of the following
vertices.
, the graph
contains
copies of each
For all
node .
, the graph
contains
copies of
For all ,
each check node , with label .

with label

as

The edges of the graph are connected according to membership


in . There
in the sets . More precisely, consider an edge
. Now consider the
are copies of node in , i.e.,
set
of nodes in that are copies of check node labeled
with sets that include . In other words, the set

By the definition of a pseudocodeword

and so

In the pseudocodeword graph , connect the same-sized node


sets
and using an arbitrary matching (one-to-one correin .
spondence). This process is repeated for every edge
appears in exactly
sets
Note that every check node in
, one for each
. Therefore, the neighbor set of any
node in
consists of exactly one copy of each variable node
. Furthermore, every variable node in will be connected
. The cost of the
to exactly one copy of each check node in
pseudocodeword graph is the sum of the costs of the variable
nodes in the graph, and is equal to the cost of the pseudocodeword from which it was derived. Therefore, Theorem 8 holds
for pseudocodeword graphs as well.
Fig. 3 gives the graph of the pseudocodeword example given
earlier, and Fig. 4 gives the graph of a different more complex
pseudocodeword.
This graphical characterization of a pseudocodeword is essential for proving our lower bound on the fractional distance.
Additionally, the pseudocodeword graph is helpful in making
connections with other notions of pseudocodewords in the literature. We discuss this further in Section VII.
VI. FRACTIONAL DISTANCE
A classical quantity associated with a code is its distance,
which for a linear code is equal to the minimum weight of any
nonzero codeword. In this section, we introduce a fractional
analog of distance, and use it to prove additional results on
the performance of LP decoding. Roughly speaking, the fractional distance is the minimum weight of any nonzero vertex
of ; since all codewords are nonzero vertices of , the fractional distance is a lower bound on the true distance. This fractional distance has connections to the minimum weight of a
pseudocodeword, as defined by Wiberg [4], and also studied by
Forney et al. [6].

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

Fig. 3. The graph of a pseudocodeword for the (7; 4; 3) Hamming code.


In this particular pseudocodeword, there are two copies of node 1, and also two
copies of check A.

961

, the
Theorem 9: For a code with fractional distance
bits are flipped
LP decoder is successful if at most
by the binary symmetric channel.
Proof: Suppose the LP decoder fails; i.e., the optimal soto LCLP has
. We know that must be
lution
, we have
. This implies
a vertex of . Since
, since the fractional distance is at least
.
that
Let
be the set of bits flipped by the channel.
Under the BSC, and the all-zeros assumption, we have
if
, and
if
. Therefore, we can write the cost
as the following:
of
(12)
Since at most
have that

bits are flipped by the channel, we


, and so

It follows that

Fig. 4. A graph H of the pseudocodeword [0; 1; 0; 1; 0; 2; 3] of the (7; 4; 3)


Hamming code. The dotted circles show the original variable nodes i of the
factor graph G, which are now sets Y of nodes in H . The dotted squares are
original check nodes j in G, and contain sets C
(shown with dashed lines)
for each S 2 E .

since
. Therefore, by (12), we have
.
However, by Theorem 5 and the fact that the decoder failed, the
to LCLP must have cost less than or
optimal solution
equal to zero; i.e.,
. This is a contradiction.
Note again the analogy to the classical case: just as exact ML
decoding has a performance guarantee in terms of classical distance, Theorem 9 establishes that the LP decoder has a performance guarantee specified by the fractional distance of the code.

A. Definitions and Basic Properties


Since there is a one-to-one correspondence between codewords and integral vertices of , the (classical) distance of the
code is equal to the minimum weight of a nonzero integral
vertex in the polytope. However, the relaxed polytope may
have additional nonintegral vertices. In particular, our earlier example with the Hamming code involved constructing precisely
such a fractional or nonintegral vertex.
to LCLP
As stated previously, any optimal solution
must be a vertex of . However, note that the objective func; as a contion for LCLP only affects the variables
sequence, the point
must also be a vertex of the projection
. (In general, not all vertices of will be projected to vertices
of .)
Therefore, we use the projected polytope in our definition
of the fractional distance, since its vertices are exactly the settings of that could be optimal solutions to LCLP. (Using
would introduce false vertices that would be optimal points
variables.)
only if the problem included costs on the
For a point in , define the weight of to be
, and let
be the set of nonzero vertices of . We define the fractional
distance of a code to be the minimum weight of any vertex in
. Note that this fractional distance is always a lower bound
on the classical distance of the code, since every nonzero code. Moreover, the performance of LP deword is contained in
coding is tied to this fractional distance, as we make precise in
the following.

B. Computing the Fractional Distance


In contrast to the classical distance, the fractional distance of
an LDPC code can be computed efficiently. Since the fractional
distance is a lower bound on the real distance, we thus have
an efficient algorithm to give a nontrivial lower bound on the
distance of an LDPC code.
To compute the fractional distance, we must compute the
. We first consider a more genminimum-weight vertex in
eral problem: given the facets of a polytope over vertices
, a specified vertex
of , and a linear function
, find the vertex in other than that minimizes
.
An efficient algorithm for this problem is the following: let be
the set of all facets of on which does not sit. Now for each
facet in , intersect with the facet to obtain , and then optimize
over . The minimum value obtained over all facets
over all vertices
other than .
in is the minimum of
The running time of this algorithm is equal to the time taken by
calls to an LP solver.
This algorithm is correct by the following argument. It is
well known [23] that a vertex of a polytope of dimension is
uniquely determined by giving linearly independent facets of
the polytope on which the vertex sits. Using this fact, it is clear
we are looking for must sit on some facet in
that the vertex
; otherwise, it would be the same point as . Therefore, at
is considered.
some point in our procedure, each potential
Furthermore, when we intersect with a facet of to obtain

962

Fig. 5. The average fractional distance d


as a function of length for a
randomly generated LDPC code, with left degree 3, right degree 4, from an
ensemble of Gallager [7].

we have that all vertices of


are vertices of not equal to
. Therefore, the true
will obtain the minimum value over
all facets in .
For our problem, we are interested in the polytope , and the
special vertex
. In order to run the above procedure,
we use the small explicit representation of given by (9) and
Theorem 4. The number of facets in this representation of has
an exponential dependence on the check degree of the code. For
an LDPC code, the number of facets will be linear in , so that
we can compute the exact fractional distance efficiently. For arbitrary linear codes, we can still compute the minimum-weight
nonzero vertex of (from Section III-C), which provides a (possibly weaker) lower bound on the fractional distance. However,
this representation (given explicitly in Appendix II) introduces
many auxiliary variables, and therefore may have many false
vertices with low weight.
C. Experiments
Fig. 5 gives the average fractional distance of a randomly
chosen LDPC factor graph, computed using the algorithm we
just described. The graph has left degree , right degree , and
is randomly chosen from an ensemble of Gallager [7]. This data
is insufficient to extrapolate the growth rate of the fractional
distance; however, it certainly grows nontrivially with the block
length. We had conjectured that this growth rate could be made
linear in the block length [17]; for the case of graphs with regular
degree, this conjecture has since been disproved by Koetter and
Vontobel [20].
Fig. 6 gives the fractional distance of the normal realiza, ) codes [31].1 These codes,
tions of the ReedMuller(
well defined for lengths equal to a power of , have a classical
distance of exactly
. The curve in the figure suggests that the
. Note that
fractional distance of these graphs is roughly
for both these code families, there may be alternate realizations
(factor graphs) with better fractional distance.

1We thank G. David Forney for suggesting the study of the normal realizations
of the ReedMuller codes.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

Fig. 6. The classical versus fractional distance of the normal realizations of


the ReedMuller(n 1, n) codes [31]. The classical distance of these codes
is exactly n=2. The upper part of the fractional distance curve follows roughly
n .

D. The Max-Fractional Distance


In this subsection, we define another notion of fractional distance, which we call the max-fractional distance. This is simply
the fractional distance, normalized by the maximum value.
We can also show that the LP decoder corrects up to half the
max-fractional distance. Furthermore, we prove in the next section that the max-fractional distance grows exponentially in the
girth of .
of the code using
We define the max-fractional distance
polytope as

Using essentially the same proof as for Theorem 9, we obtain


the following.
Theorem 10: For a code
with max-fractional distance
, the LP decoder is successful if at most
bits
are flipped by the binary-symmetric channel.
The exact relationship between
and
is an interin general, since
esting question. Clearly,
is always at most . In fact, we know that
,
which follows from the fact that for all
, there is some
for which
. (The proof of this fact comes from simple
scaling arguments.) Therefore, for LDPC codes, the two quantities are the same up to a constant factor. We can compute the
max-fractional distance efficiently using an algorithm similar to
the one used for the fractional distance: we reduce the problem
to finding the minof finding the point with the minimum
imum-weight point in a polytope.
E. A Lower Bound Using the Girth
The following theorem asserts that the max-fractional distance is exponential in the girth of . It is analogous to an earlier result of Tanner [32], which provides a similar bound on the
classical distance of a code in terms of the girth of the associated factor graph.

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

Theorem 11: Let


Let be the girth of
is at least

be a factor graph with


and
.
,
. Then the max-fractional distance
.

This theorem is proved in Appendix IV, and makes heavy


use of the combinatorial properties of pseudocodewords. One
consequence of Theorem 11 is that the max-fractional distance
for some constant (where
),
is at least
. Note that there are many
for any graph with girth
known constructions of such graphs (e.g., [33]). Although Theorem 11 does not yield a bound on the word error rate (WER)
for the BSC, it demonstrates that LP decoding can always corerrors, for any code defined by a graph with logarect
rithmic girth.

VII. COMPARISON TO ITERATIVE DECODING


In this section, we draw several connections between LP decoding and iterative decoding for several code types and channel
models. We show that many known combinatorial characterizations of decoding success are in fact special cases of our
definition of a pseudocodeword. We discuss stopping sets in
the BEC, cycle codes, tail-biting trellises, the tree-reweighted
max-product algorithm of Wainwright et al. [26], and min-sum
decoding.
At the end of the section, we give some experimental results
comparing LP decoding with the min-sum and sum-product
(BP) algorithms.

963

Theorem 12: Under the BEC, there is a nonzero pseudocodeword with zero cost if and only if there is a stopping set. Therefore, the performance of LP and BP decoding are equivalent for
the BEC.
Proof: If there is a zero-cost pseudocodeword, then
be a pseudocodeword where
there is a stopping set. Let
. Let
. Since all
, we must
have
for all
; therefore,
.
Suppose is not a stopping set; then
where check node has only one neighbor in . By the defi.
nition of a pseudocodeword, we have
(by the definition of ), there must be some
Since
,
such that
. Since has even cardinality, there must be at least one other code bit in , which
by the defis also a neighbor of check . We have
, implying
.
inition of pseudocodeword, and so
This contradicts the fact that has only one neighbor in .
If there is a stopping set, then there is a zero-cost pseudocodeword. Let be a stopping set. Construct pseudocodeword
as follows. For all
, set
; for all
, set
.
Since
, we immediately have
.
. For all
,
For a check , let
even, set
. By the definition of a stopwhere
, so if
is odd, then
. For all
ping set,
, where
odd, let
be an
. If
, set
.
arbitrary size- subset of
Set all other
Set
that we have not set in this process. We have
for all

A. Stopping Sets in the BEC


In the BEC, bits are not flipped but rather erased. Consequently, for each bit, the decoder receives either , , or an erasure. If either symbol or is received, then it must be correct.
On the other hand, if an erasure (which we denote by ) is received, there is no information about that bit. It is well known
[3] that in the BEC, the iterative BP decoder fails if and only if
a stopping set exists among the erased bits. The main result
of this section is that stopping sets are the special case of pseudocodewords on the BEC, and so LP decoding exhibits the same
property.
We can model the BEC in LCLP with our cost function . As
in the BSC,
if the received bit
, and
if
. If
, we set
, since we have no information about that bit. Note that under the all-zeros assumption, all
the costs are nonnegative, since no bits are flipped. Therefore,
Theorem 8 implies that the LP decoder will fail only if there is
a nonzero pseudocodeword with zero cost.
Let be the set of code bits erased by the channel. A subset
is a stopping set if all the checks in the neighborhood
of have degree at least two with respect to . In
the following statement, we have assumed that both the iterative
and the LCLP decoders fail when the answer is ambiguous. For
the iterative algorithm, this ambiguity corresponds to the existence of a stopping set; for the LCLP algorithm, it corresponds
to a nonzero pseudocodeword with zero cost, and hence multiple optima for the LP.

Additionally
for all
Therefore,

is a pseudocodeword.

B. Cycle Codes
A cycle code is a binary linear code described by a factor
graph whose variable nodes all have degree . In this case, pseudocodewords consist of a collection of cycle-like structures we
call promenades [1]. This structure is a closed walk through the
graph that is allowed to repeat nodes, and even traverse edges in
different directions, as long as it makes no U-turns; i.e., it does
not use the same edge twice in a row. Wiberg [4] calls these same
structures irreducible closed walks. We may conclude from this
connection that iterative and LP decoding have identical performance in the case of cycle codes.
We note that even though cycle codes are poor in general,
they are an excellent example of when LP decoding can decode
beyond the minimum distance. For cycle codes, the minimum
distance is no better than logarithmic. However, we showed [1]
that there are cycle codes for which LP decoding has a WER of
for any
, requiring only that the crossover probability
is bounded by a certain function of the constant (independent
of ).

964

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

C. Tail-Biting Trellises
On tail-biting trellises, one can write down a linear program
similar to the one we explored for turbo codes [1] such that pseudocodewords in this LP correspond to those analyzed by Forney
et al. [5]. This linear program is, in fact, an instance of network
flow, and therefore is solvable by a more efficient algorithm than
a generic LP solver. (See [17] for a general treatment of LPs for
trellis-based codes, including turbo-like codes.)
In this case, pseudocodewords correspond to cycles in a directed graph (a circular trellis). All cycles in this graph have
for some integer
. Codewords are simple cylength
cles of length exactly . Forney et al. [5] show that iterative
decoding will find the pseudocodeword with minimum weightper-symbol. Using basic network flow theory, it can be shown
that the weight-per-symbol of a pseudocodeword is the same as
the cost of the corresponding LP solution. Thus, these two algorithms have identical performance.
We note that to get this connection to tail-biting trellises, if
the code has a factor graph representation, it is not sufficient
simply to write down the factor graph for a code and plug in
the polytope . This would be a weaker relaxation in general.
One has to define a new linear program like the one we used
for turbo-like codes [1]. With this setup, the problem reduces
directly to min-cost flow.
D. Tree-Reweighted Max-Product
In earlier work [2], we explored the connection between
this LP-based approach applied to turbo codes, and the
tree-reweighted max-product message-passing algorithm developed by Wainwright, Jaakkola, and Willsky [26]. Similar
to the usual max-product (min-sum) algorithm, the algorithm
is based on passing messages between nodes in the factor
graph. It differs from the usual updates in that the messages are
suitably reweighted according the structure of the factor graph.
By drawing a connection to the dual of our linear program, we
showed that whenever this algorithm converges to a codeword,
it must be the ML codeword. Note that the usual min-sum
algorithm does not have such a guarantee.
E. Min-Sum Decoding
The deviation sets defined by Wiberg [4], and further refined
by Forney et al. [6] can be compared to pseudocodeword graphs.
The computation tree of the iterative min-sum algorithm is a
map of the computations that lead to the decoding of a single
bit at the root of the tree. This bit will be decoded correctly
(assuming the all-zeros word is sent) unless there is a negativecost locally consistent minimal configuration of the tree that sets
this bit to . Such a configuration is called a deviation set, or a
pseudocodeword.
All deviation sets have a support, which is the set of nodes
are
in the configuration that are set to . All such supports
acyclic graphs of the following form. The nodes of are nodes
from the factor graph, possibly with multiple copies of a node.
Furthermore,

all the leaves of

are variable nodes;

Fig. 7. A waterfall-region comparison between the performance of LP


decoding and min-sum decoding (with 100 iterations) under the BSC using
the same random rate-1=2 LDPC code with length 200, left degree 3, and
right degree 6. For each trial, both decoders were tested with the same channel
output. The Both Error curve represents the trials where both decoders failed.

is connected to one copy


each nonleaf variable node
; and
of each check node in
each check node has even degree.
As is clear from the definition, deviation sets are quite similar to pseudocodeword graphs; essentially the only difference is
that deviation sets are acyclic. In fact, if we removed the nonleaf condition above, the two would be equivalent. In his thesis,
Wiberg states:

Since the (graph) is finite, an infinite deviation cannot behave completely irregularly; it must repeat itself somehow.
It appears natural to look for repeatable, or closed
structures (in the graph), with the property that any deviation can be decomposed into such structures. [4]
Our definition of a pseudocodeword is the natural closed
structure within a deviation set. However, an arbitrary deviation
set cannot be decomposed into pseudocodewords, since it may
be irregular near the leaves. Furthermore, as Wiberg points out,
the cost of a deviation set is dominated by the cost near the
leaves, since the number of nodes grows exponentially with the
depth of the tree.
Thus, strictly speaking, min-sum decoding and LP decoding
are incomparable. However, experiments suggest that it is rare
for min-sum decoding to succeed and LP decoding to fail (see
Fig. 7). We also conclude from our experiments that the irregular
unclosed portions of the min-sum computation tree are not
worth considering; they more often hurt the decoder than help it.
F. New Iterative Algorithms and ML Certificates
From the LP Dual
In earlier work [2], we described how the iterative subgradient ascent [34] algorithm can be used to solve the LP dual for
RA codes. Thus, we have an iterative decoder whose error-correcting performance is identical to that of LP decoding in this
case. This technique may also be applied in the general setting
of LDPC codes [17]; thus, we have an iterative algorithm for any
LDPC code with all the performance guarantees of LP decoding.

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

Fig. 8. A comparison between the performance of LP decoding, min-sum


decoding (100 iterations) and BP (100 iterations) under the BSC using the same
random rate-1=4 LDPC code with length 200, left degree 3, and right degree 4.

965

Fig. 10. Error-correcting performance gained by adding a set of (redundant)


parity checks to the factor graph. The code is a randomly selected regular LDPC
code, with length 40, left degree 3, and right degree 4, from an ensemble of
Gallager [7]. The First-Order Decoder is the LP decoder using the polytope
Q defined on the original factor graph. The Second-Order Decoder uses the
polytope Q defined on the factor graph after adding a set of redundant parity
checks; the set consists of all checks that are the sum (mod2) of two original
parity checks.

VIII. TIGHTER RELAXATIONS

Fig. 9. A comparison between the performance of ML decoding, LP decoding,


min-sum decoding (100 iterations), and BP (100 iterations) under the BSC using
the same random rate-1=4 LDPC code with length 60, left degree 3, and right
degree 4. The ML decoder is a mixed-integer programming decoder using the
LP relaxation.

Furthermore, we show [17] that LP duality can be used to give


any iterative algorithm the ML certificate property; that is, we
derive conditions under which the output of a message-passing
decoder is provably the ML codeword.
G. Experimental Comparison
We have compared the performance of the LP decoder with
the min-sum and sum-product decoders on the BSC. We used
LDPC code with left degree ,
a randomly generated rateand right degree . Fig. 8 shows an error-rate comparison in
. We see that LP
the waterfall region for a block length of
decoding performs better than min-sum in this region, but not
as well as sum-product.
However, when we compare all three algorithms to ML decoding, it seems that at least on random codes, all three have
similar performance. This is shown in Fig. 9. In fact, we see
that LP decoding slightly outperforms sum-product at very low
noise levels. It would be interesting to see whether this is a general phenomenon, and whether it can be explained analytically.

It is important to observe that LCLP has been defined with


respect to a specific factor graph. Since a given code has
many such representations, there are many possible LP-based
relaxations, and some may be better than others. Of particular
significance is the fact that adding redundant parity checks to
the factor graph, though it does not affect the code, provides
new constraints for the LP relaxation, and will in general
Hamming
strengthen it. For example, returning to the
code of Fig. 1, suppose we add a new check node whose neigh. This parity check is redundant for the
borhood is
code, since it is simply the mod two sum of checks and .
However, the linear constraints added by this check tighten the
relaxation; in fact, they render our example pseudocodeword
infeasible. Whereas redundant
constraints may degrade the performance of BP decoding (due
to the creation of small cycles), adding new constraints can
only improve LP performance.
As an example, Fig. 10 shows the performance improvement
achieved by adding all second-order parity checks to a factor
graph . By second-order we mean all parity checks that are the
sum of two original parity checks. A natural question is whether
adding all redundant parity checks results in the codeword poly. This turns out not to be the case; the dual of the
tope
Hamming code provides one counterexample.
In addition to redundant parity checks, there are various
generic ways in which an LP relaxation can be strengthened
(e.g., [35], [36]). Such lifting techniques provide a nested
sequence of relaxations increasing in both tightness and com(albeit with
plexity, the last of which is equivalent to
exponential complexity). Therefore, we obtain a sequence of
decoders, increasing in both performance and complexity, the
last of which is an (intractable) ML decoder. It would be interesting to analyze the rate of performance improvement along

966

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

Fig. 11. The WER of the lift-and-project relaxation compared with LP


decoding and ML decoding.

this sequence. Fig. 11 shows the performance gained by one


application of the lift-and-project [35] method on a random
LDPC code of length . Another interesting question
is how complex a decoder is needed in order to surpass the
performance of BP.
Finally, the fractional distance of a code, as defined here, is
also a function of the factor graph representation of the code.
Fractional distance yields a lower bound on the true distance,
and the quality of this bound could also be improved by adding
redundant constraints, or other methods of tightening the LP.

which corresponds to a rescaled solution of the LP relaxation.


Our definition of pseudocodeword unifies previous work on iterative decoding (e.g., [5], [6], [4], [3]). We also introduced the
fractional distance of a code, a quantity which shares the worst
case error-correcting guarantees with the classical notion, but
with an efficient algorithm to realize those guarantees.
There are a number of open questions and future directions
suggested by the work presented in this paper, some of which
we detail here. In an earlier version of this work [15], we had
suggested using graph expansion to improve the performance
bounds given here. This has been accomplished [18] to some
degree, and we now know that LP decoding can correct a constant fraction of error. However, there is still work to be done in
order to improve the constant to the level of the best known (e.g.,
on the Zyablov bound or beyond). We also know that LP decoding can achieve the capacity of many commonly considered
channels [19]. It would be interesting to see if these methods
can be extended to achieve capacity without an exponential dependence on the gap to capacity (all known capacity-achieving
decoders also have this dependence).
Since turbo codes and turbo-like codes have much more efficient encoders than LDPC codes, it would be interesting to see
if LP decoding can be used to obtain good performance bounds
in this setting as well. In previous work on RA codes [1], we
were able to prove a bound on the error rate of LP decoding
stronger than that implied by the minimum distance. Analogous LP decoders for general turbo-like codes have also been
given [17], but it remains to provide satisfying analysis of their
performance.

A. ML Decoding Using Integer Programming


Another interesting application of LP decoding is to use the
polytope to perform true ML decoding. An integer program
(IP) is an optimization problem that allows integer constraints;
that is, we may force variables to be integers. If we add the conto our linear program, then we get an exact
straint
formulation of ML decoding. In general, integer programming
is NP-hard, but there are various methods for solving an IP that
far outperform the naive exhaustive search routines for ML decoding. Using the program CPLEX [37], we were able to perform ML decoding on LDPC codes with moderate block lengths
(up to about 100) in a reasonable amount of time. Fig. 9 includes an error curve for ML decoding an LDPC code with a
block length of . Each trial took no more than a few seconds
(and often much less) on a Pentium IV (2-GHz) processor.
Drawing this curve allows us to see the gap between various
suboptimal algorithms and the optimal ML decoder. This gap
further motivates the search for tighter LP relaxations to approach ML decoding. The running time of this decoder becomes
prohibitive at longer block lengths; however, ML decoding at
small block lengths can be very useful in evaluating algorithms,
and determining whether decoding errors are the fault of the decoder or the code.
IX. DISCUSSION
We have described an LP-based decoding method, and proved
a number of results on its error-correcting performance. Central to this characterization is the notion of a pseudocodeword,

APPENDIX I
PROVING THEOREM 4
Recall that
is the set of points such that
all
, and for all
,
odd

for
(13)

To prove Theorem 4, it remains to show the following.


Theorem 13: [29] The polytope
s.t.
Proof: For all
, the variable is unconstrained
and
(aside from the constraints
);
in both
.
thus, we may ignore those indices, and assume that
(We henceforth use to denote
.)
is the
Let be a point in . By the constraints (6) and (7),
.
convex hull of the incidence vectors of even-sized sets
Since all such vectors satisfy the constraints (13) for check node
, then must also satisfy these constraints. Therefore,
.
is not
For the other direction, suppose some point
. Then some facet of
cuts
(makes it
contained in
for all
, it must be the case
infeasible). Since
that passes through the hypercube
, and so it must cut
. Since
off some vertex of the hypercube; i.e., some
all incidence vectors of even-sized sets are feasible for , the
vertex must be the incidence vector for some odd-sized set
.

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

For a particular
, let
if
. We specify the facet
and
variables , using the equation

if
,
in terms of the

Since

is infeasible for , it must be that


for all
, we have
, so
.
conclude that
, let
denote vector with bit
For some
i.e.,
and
for all
. Since
parity, we have that for all ,
has even parity, so
is not cut by . This implies that for all
and

. Since
we may

967

and
, we have a variable
,
. This variable indicates the contribution
of weight- local codewords.
For all
,
, and
, we have a variable
,
, indicating the portion of
locally assigned to local codewords of weight .
Using these variables, we have the following constraint set:

For all

(14)

flipped;
has odd
,

(15)
(16)
(17)

Note that

for all

, and

So, we may conclude


, for all
.
Except for the case
, the polytope
is full-dimensional. For example, the set of points with exactly two (cyclically) consecutive s is a full-dimensional set. (For a full proof,
see [29]). Therefore, must pass through vertices of ; i.e.,
it must pass through at least even-parity binary vectors. This
, since both faces in this case
claim is still true for the case
pass through both points.
We claim that those vectors must be the points
. Suppose this is not the case. Then some vertex
of
is on the facet , and differs from in more than one place.
and
Suppose without loss of generality (w.l.o.g.) that
, and so
,
. Since is on , we have
. Therefore,

Since

, we have

, and so

This contradicts the fact that


.
Thus,
passes through the vertices
. It is
not hard to see that is exactly the odd-set constraint (13) corresponding to the set for which is the incidence vector.
,a
Since cuts , and is a facet of , we have
contradiction.
APPENDIX II
HIGH-DENSITY POLYTOPE REPRESENTATION
In this appendix, we give a polytope of for use in LCLP
variables and conwith
straints. This polytope provides an efficient way to perform LP
decoding for any binary linear code. This polytope was derived
from the parity polytope of Yannakakis [30].
, let
For a check node
be the set of even numbers between and
. Our new representation has three sets of variables.
For all
, we have a variable , where
.
This variable represents the code bit , as before.

(18)
(19)
be the set of points
such that the above
Let
constraints hold. This polytope
has only
varivariables, for a total of
ables per check node , plus the
variables. The number of constraints is at
. In total, this representation has at
most
most
variables and constraints. We must now show that
is equivalent to optimizing over . Since
optimizing over
variables, it suffices to
the cost function only affects the
show that the two polytopes have the same projection onto the
variables. Before proving this, we need the following fact.
Lemma 14: Let
,
, and
, where , , , and all are nonnegative integers. Then,
can be expressed as the sum of sets of size . Specifically,
there exists a setting of the variables
to nonnegative integers such that
, and for all
,
.
Proof: By induction on .2 The base case
is
simple; all are equal to either or , and so exactly of them
.
are equal to . Set
For the induction step, assume w.l.o.g. that
. Set
, where
if
, and
otherwise. The fact that
and
for all
implies that
for all
, and
for all
.
for all . We also have
Therefore,

Therefore, by induction,
can be expressed as the sum of
,
where has size . Set
, then increase
by .
This setting of expresses .
Proposition 15: The set
is
s.t.
. Therefore,
equal to the set
optimizing over is equivalent to optimizing over .
Proof: Suppose
. Set
(20)

2We

thank Ryan ODonnell for showing us this proof.

968

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

(21)

It is clear that the constraints (17)(19) are satisfied by this setting. Constraint (14) is implied by (7) and (21). Constraint (15)
,
is implied by (6) and (20). Finally, we have, for all
(by (21))

(by (20))
giving constraint (16).
Now suppose
is a vertex of the polytope , and so
,
, consider the set
all variables are rational. For all

By (19), all members of


are between and . Let
be a
common divisor of the numbers in
such that is an integer.
Let

APPENDIX III
PROVING THEOREM 6
In this appendix, we show that the all-zeros assumption is
valid when analyzing LP decoders defined on factor graphs.
Specifically, we prove the following theorem.
Theorem 6: The probability that the LP decoder fails is independent of the codeword that was transmitted.
Proof: Recall that
is the probability that the
LP decoder makes an error, given that was transmitted. For an
arbitrary transmitted word , we need to show that
Define
to be the set of received words that
would cause decoding failure, assuming was transmitted. By
Theorem 5

Note that in the above, the cost vector is a function of the


received word . Rewriting (11), we have for all codewords
(24)
Applying this to the codeword

, we get
(25)

The set
consists of integers between and . By (16), we
is equal to . So, by
have that the sum of the elements in
can be expressed as the sum of sets of
Lemma 14, the set
according to
size . Set the variables
Lemma 14. Now set
, for all
.
We immediately satisfy (5). By Lemma 14 we get

We will show that the space


can be partitioned into pairs
and
if and only if
and (25), gives

of possible received vectors


such that
. This, along with (24)

(22)

and
(23)
By (14), we have
(by (22))

The partition is performed according to the symmetry of the


as follows: let
channel. Fix some received vector . Define
if
, and
if
, where
is the
symmetric symbol to in the channel. Note that this operation
is its own inverse and therefore gives a valid partition of
into
pairs.
. From the
First, we show that
channel being memoryless, we have

(26)

giving (7). By (15), we have


(by (23))

giving (6). Since this construction works for all vertices of ,


variables must be
the projection of any point in onto the
in
s.t.
.

(27)

(28)
Equations (26) and (28) follow from the definition of
(27) follows from the symmetry of the channel.

, and

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

Now it remains to show that


if and only if
. Let be the cost vector when is received, and let
be the cost vector when
is received, as defined in (3).
Suppose
. Then,
, and so
. Now
; then
, and so
suppose

969

to LCLP, the relaLemma 17: For a feasible solution


with respect to
is also a feasible
tive solution
solution to LCLP.
Proof: First consider the bounds on the variables (see (4)
, and the
and (5)). These are satisfied by definition of
and
are feasible. Now consider the
fact that both
and
distribution constraints (6). From the feasibility of
the definition of
we have, for all checks
(31)

(29)
where

is defined as in the definition of

. Note that

Equation (29) follows from the symmetry of the channel. We


conclude that
and

if

(30)

The following lemma shows a correspondence between the


points of under cost function , and the points of under
cost function .
Lemma 16: Fix some codeword . For every
, there is some
,
, such that

so we get
, satisfying the distribution
constraints.
satisfies the consistency conIt remains to show that
straints (7). In the following, we assume that sets are contained within the appropriate set , which will be clear from
in , we have
context. For all edges

,
(32)
Case 1:

Furthermore, for every


,
, such that

. In this case, from (32) we have

, there is some
The last step follows from the fact that

We prove this lemma later in this section. Now suppose


, and so by the definition of there is some
where

Case 2:

as long as
.
. From (32), we have
(by (31))
(33)

By Lemma 16, there is some

, such that
The last step follows from the fact that

Therefore,
. A symmetric argument (using the other
then
.
half of the lemma) shows that if
Before proving Lemma 16, we need to define the notion of a
relative solution in , and prove results about its feasibility and
denote the symmetric
cost. For two sets and , let
. Let
difference of and , i.e.,
be the point in LCLP corresponding to the codeword
sent over the channel. For a particular feasible solution
to LCLP, set
to be the relative solution with respect to
as follows: For all bits , set
. For all
checks , let
be the member of
where
. For all
, set
.
Note that for a fixed
, the operation of making a relative solution is its own inverse; i.e., the relative solution to
is
.

as long as
, we get

. Finally, from the distribution constraints on


, and therefore, by (33),

Lemma 18: Given a point


, we have
tion

Proof: From the definition of

, and its relative solu-

, we have

970

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

Using (30), we get

and
represent the sets of variable and
Recall that
that are copies of the same node in the uncheck nodes in
, let
derlying factor graph . For a variable node
be the corresponding node in ; i.e.,
if
is a variable node, and (
for some
)
if is a check node.

The lemma follows from the fact that

of
in

Claim 19: For all promenades of length less than the girth
, is a simple path in , and also represents a simple path
. More precisely, for all promenades

We are now ready to prove Lemma 16, which completes the


proof of Theorem 6. We restate the lemma here for reference.
Lemma 16: Fix some codeword . For every
, there is some
,
, such that

Furthermore, for every


,
, such that

, there is some

Proof: Consider some


, and let
be
. By Lemma 17,
the relative solution with respect to
. By definition,
, since
. By Lemma
18

For the second part of the lemma, consider some


, and let
be the relative solution with respect to . By
. By definition,
, since
.
Lemma 17,
Because the operation of making a relative solution is its own
is
. Therefore, by
inverse, the relative solution to
Lemma 18

APPENDIX IV
PROVING THEOREM 11
Before proving Theorem 11, we will prove a few useful facts
about pseudocodewords and pseudocodeword graphs. For all
be a factor graph with all
the theorems in this section, let
variable nodes having degree at least , where
and all
. Let
check nodes having degree at least , where
be the girth of ,
. Let be the graph of some arbitrary
of ,
.
pseudocodeword
We define a promenade to be a path
in that may repeat nodes and edges, but takes no U-turns; i.e.,
,
. We will also use to
for all ,
represent the set of nodes on the path (the particular use will
be clear from context). Note that each could be a variable or a
check node. These paths are similar to the notion of promenade
in [1], and to the irreducible closed walk of Wiberg [4]. A simple
path of a graph is one that does not repeat nodes.

is a simple path in

, and that

is a simple path in .
is a valid path. By construcProof: First note that
in , there must be an edge
tion, if there is an edge
in . If
is simple, then
must be
is simple. This is
simple, so we only need to show that
is less than the girth of the graph.
true since the length of
For the remainder of this appendix, suppose w.l.o.g. that
. Thus,
. Note that is even, since is
, let be the set of nodes in
bipartite. For all
within distance
of
; i.e., is the set of nodes
with a path in of length at most
from
.
Claim 20: The subgraph induced by the node set is a tree.
Proof: Suppose this is not the case. Then, for some node
, there are at least two different paths from
to ,
in
. This implies a cycle in
each with length at most
of length less than ; a contradiction to Claim 19.
Claim 21: The node subsets
in are all mutually disjoint.
,
Proof: Suppose this is not the case; then, for some
and
share at least one vertex. Let be the vertex in
closest to the root
that also appears in . Now consider
the promenade
, where the
to is the unique such path in the tree ,
subpath from
is the unique such path in the
and the subpath from to
. We must show that has no U-turns. The subpaths
tree
and
are simple, so we must
. Since we chose to be the node closest
show only that
that appears in
,
must not appear in
, and so
to
. Since
,
must be simple path by Claim
19. However, it is not, since node appears twice in
, once
at the beginning and once at the end. This is a contradiction.
Claim 22: The number of variable nodes in
is at least
.
Proof: Take one node set . We will count the number
of nodes on each level of the tree induced by . Each level
consists of all the nodes at distance from
. Note that
even levels contain variable nodes, and odd levels contain check
nodes.

FELDMAN et al.: USING LINEAR PROGRAMMING TO DECODE BINARY LINEAR CODES

Consider a variable node on an even level. All variable


other nodes, by the connodes in are incident to at least
children
struction of . Therefore, has at least
in the tree on the next level. Now consider a check node on an
odd level; check nodes are each incident to at least two nodes,
so this check node has at least one child on the next level.
Thus, the tree expands by a factor of at least
from an even to an odd level. From an odd to an even level, it
may not expand, but it does not contract. The final level of the
, and thus, the final even level is level
tree is level
. By the expansion properties we showed, this
level (and, therefore, the tree ) must contain at least
variable nodes.
By Claim 21, each tree is independent, so the number of vari.
able nodes in is at least
Theorem 11: Let be a factor graph with
and
. Let be the girth of ,
. Then the max-fractional
.
distance is at least
be an arbitrary vertex in
. Construct
Proof: Let
from
as in Lemma 8; i.e., let
a pseudocodeword
be an integer such that
is an integer for all bits , and
is an integer for all for all checks and sets
. Such
is a vertex of , and therefore
an integer exists because
; for all checks and
rational [23]. For all bits , set
, set
.
sets
Let
be a graph of the pseudocodeword
, as defined
in Section V-A. By Lemma 22, has at least

variable nodes. Since the number of variable nodes is equal to


, we have
(34)

Recall that

. Substituting into (34), we have

It follows that

This argument holds for an arbitrary


. Therefore.

where

971

REFERENCES
[1] J. Feldman and D. R. Karger, Decoding turbo-like codes via linear programming, in Proc. 43rd Annu. IEEE Symp. Foundations of Computer
Science (FOCS), Vancouver, BC, Canada, Nov. 2002, pp. 251260.
[2] J. Feldman, M. J. Wainwright, and D. R. Karger, Linear programming-based decoding of turbo-like codes and its relation to iterative
approaches, in Proc. Allerton Conf. Communications, Control and
Computing, Monticello, IL, Oct. 2002.
[3] C. Di, D. Proietti, I. E. Telatar, T. J. Richardson, and R. L. Urbanke, Finite-length analysis of low-density parity-check codes on the binary erasure channels, IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 15701579,
Jun. 2002.
[4] N. Wiberg, Codes and decoding on general graphs, Ph.D. dissertation,
Linkping University, Linkping, Sweden, 1996.
[5] G. D. Forney, F. R. Kschischang, B. Marcus, and S. Tuncel, Iterative decoding of tail-biting trellises and connections with symbolic dynamics,
in Codes, Systems and Graphical Models. New York: Springer-Verlag,
2001, pp. 239241.
[6] G. D. Forney, R. Koetter, F. R. Kschischang, and A. Reznik, On the
effective weights of pseudocodewords for codes defined on graphs
with cycles, in Codes, Systems and Graphical Models. New York:
Springer-Verlag, 2001, pp. 101112.
[7] R. Gallager, Low-density parity-check codes, IRE Trans. Inf. Theory,
vol. IT-8, no. 1, pp. 2128, Jan. 1962.
[8] D. MacKay, Good error correcting codes based on very sparse matrices, IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399431, Mar.
1999.
[9] M. Sipser and D. Spielman, Expander codes, IEEE Trans. Inf. Theory,
vol. 42, no. 6, pp. 17101722, Nov. 1996.
[10] S.-Y. Chung, G. D. Forney, T. Richardson, and R. Urbanke, On the design of low-density parity-check codes within 0.0045 dB of the Shannon
limit, IEEE Commun. Lett., vol. 5, no. 2, pp. 5860, Feb. 2001.
[11] R. McEliece, D. MacKay, and J. Cheng, Turbo decoding as an instance
of Pearls belief propagation algorithm, IEEE J. Sel. Areas Commun.,
vol. 16, no. 2, pp. 140152, Feb. 1998.
[12] T. J. Richardson and R. L. Urbanke, The capacity of low-density paritycheck codes under message-passing decoding, IEEE Trans. Inf. Theory,
vol. 47, no. 2, pp. 599618, Feb. 2001.
[13] M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and D. A. Spielman,
Improved low-density parity-check codes using irregular graphs and
belief propagation, in Proc. IEEE Int. Symp. Information Theory, Cambridge, MA, Oct. 1998, p. 117.
[14] B. J. Frey, R. Koetter, and A. Vardy, Signal-space characterization of
iterative decoding, IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 766781,
Feb. 2001.
[15] J. Feldman, M. J. Wainwright, and D. R. Karger, Using linear programming to decode linear codes, presented at the 37th Annu. Conf. on Information Sciences and Systems (CISS 03), Baltimore, MD, Mar. 2003.
[16] J. Feldman, D. R. Karger, and M. J. Wainwright, LP decoding, in Proc.
41st Annu. Allerton Conf. Communications, Control, and Computing,
Monticello, IL, Oct. 2003.
[17] J. Feldman, Decoding error-correcting codes via linear programming,
Ph.D. dissertation, MIT, Cambridge, MA, 2003.
[18] J. Feldman, T. Malkin, R. A. Servedio, C. Stein, and M. J. Wainwright,
LP decoding corrects a constant fraction of errors, in Proc. IEEE Int.
Symp. Information Theory, Chicago, IL, Jun./Jul. 2004, p. 68.
[19] J. Feldman and C. Stein, LP decoding achieves capacity, in Proc.
Symp. Discrete Algorithms (SODA 05), Vancouver, BC, Canada, Jan.
2005.
[20] R. Koetter and P. O. Vontobel, Graph-covers and iterative decoding of
finite length codes, in Proc. 3rd Int. Symp. Turbo Codes, Brest, France,
Sep. 2003, pp. 7582.
[21] P. Vontobel and R. Koetter, On the relationship between linear programming decoding and max-product decoding, paper submitted to Int.
Symp. Information Theory and its Applications, Parma, Italy, Oct. 2004.
, Lower bounds on the minimum pseudo-weight of linear codes,
[22]
in Proc. IEEE Int. Symp. Information Theory, Chicago, IL, Jun./Jul.
2004, p. 70.
[23] A. Schrijver, Theory of Linear and Integer Programming. New York:
Wiley, 1987.
[24] M. Grotschel, L. Lovsz, and A. Schrijver, The ellipsoid method and
its consequences in combinatorial optimization, Combinatorica, vol. 1,
no. 2, pp. 169197, 1981.

972

[25] E. Berlekamp, R. J. McEliece, and H. van Tilborg, On the inherent


intractability of certain coding problems, IEEE Trans. Inf. Theory, vol.
IT-24, no. 3, pp. 384386, May 1978.
[26] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, MAP estimation
via agreement on (hyper)trees: Message-passing and linear programming approaches, in Proc. 40th Annu. Allerton Conf. Communication,
Control, and Computing, Monticello, IL, Oct. 2002.
[27] J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding Belief Propagation and its Generalizations, Mitsubishi Electric Res. Labs, Tech.
Rep. TR2001-22, 2002.
[28] D. Bertsimas and J. Tsitsiklis, Introduction to Linear Optimization. Belmont, MA: Athena Scientific, 1997.
[29] R. G. Jeroslow, On defining sets of vertices of the hypercube by linear
inequalities, Discr. Math., vol. 11, pp. 119124, 1975.
[30] M. Yannakakis, Expressing combinatorial optimization problems by
linear programs, J. Comp. Syst. Sci., vol. 43, no. 3, pp. 441466, 1991.
[31] G. D. Forney Jr., Codes on graphs: Normal realizations, IEEE Trans.
Inf. Theory, vol. 47, no. 2, pp. 529548, Feb. 2001.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 3, MARCH 2005

[32] R. M. Tanner, A recursive approach to low complexity codes, IEEE


Trans. Inf. Theory, vol. IT-27, no. 5, pp. 533547, Sep. 1981.
[33] J. Rosenthal and P. O. Vontobel, Constructions of LDPC codes using
Ramanujan graphs and ideas from Margulis, in Proc. 38th Annu.
Allerton Conf. Communication, Control, and Computing, Monticello,
IL, Oct. 2000, pp. 248257.
[34] D. Bertsekas, Nonlinear Programming. Belmont, MA: Athena Scientific, 1995.
[35] L. Lovsz and A. Schrijver, Cones of matrices and set-functions
and 01 optimization, SIAM J. Optimiz., vol. 1, no. 2, pp. 166190,
1991.
[36] H. D. Sherali and W. P. Adams, A hierarchy of relaxations between the
continuous and convex hull representations for zero-one programming
problems, SIAM J. Optimiz., vol. 3, pp. 411430, 1990.
[37] Users Manual for ILOG CPLEX, 7.1 ed., ILOG, Inc., Mountain View,
CA, 2001.

You might also like