I Preliminaries
We consider the problem of reliable communication of equiprobable messages over noisy channels described by a random transformation , where and are the channel input and output sequences, and and are the input and output alphabets, respectively. Each message , where , being the code rate, is mapped onto an -length codeword sent over the channel. The code is defined as .
We denote with the error probability when codeword from code is transmitted; similarly denotes the average error probability of the code.
Let be a random code, i.e., a set of random codewords generated with probability . We assume that codewords are generated in a pairwise independent manner, that is, for any two indices , it holds that , where is a probability distribution defined over .
Let and be the random variables denoting the error probability of the -th codeword for random code and the average error probability of the code, respectively.
We denote the -length error exponents of such random variables by and , respectively.
For some ensembles and channels the ensemble-average of the code error probability is known to decay exponentially in [1]. A lower bound on the error exponent is given by Gallager’s multi-letter random coding exponent in [2, Eq. (5.6.16)]. For the DMC (DMC), this bound is known to coincide with the sphere-packing upper bound on the reliability function [3, 4] in the high rate region.
In [5, Sec. 5.7] Gallager showed that, for some channels and ensembles, there exists a code with strictly higher error exponent than at low rates. In order to show this, Gallager considered a pairwise-independent ensemble with codewords. Using Markov’s inequality he showed that
|
|
|
(1) |
for any .
He then introduced the indicator function
|
|
|
|
|
|
|
|
(2) |
and showed that, using (1) and (I), the following inequality holds
|
|
|
(3) |
From (3) it follows that, since the average number of codewords that have a probability of error smaller than in a randomly generated code with codewords is at least , there must exist a code having at least codewords, out of the , fulfilling this property. Thus, by removing (expurgating) the worst half of the codewords from the code with codewords we obtain a new code with codewords, each of which satisfies the condition in the first line of the right-hand side in (I). Finally, restricting to , Gallager derives a lower bound on the exponent of , given by
|
|
|
(4) |
where
|
|
|
|
(5) |
is the Bhattacharyya coefficient between codewords while
|
|
|
(6) |
is the parameter that yields the highest exponent. The preceding argument is valid for the maximal probability of error, since every codeword in the expurgated code attains the same exponent. In addition, observe that since (3) uses the standard ensemble-average argument (i.e. by taking the average over the ensemble) we show the existence of a code with the desired property.
The exponent in (4) is the expurgated exponent. We refer to the code with codewords before expurgation as a mother code. We say that a mother code is good if, once expurgated, we obtain a code with asymptotically the same rate, the codewords of which each have an exponent at least as large as the expurgated.
A refinement of the above follows from (1). Specifically, for it can be shown that there exists a code with codewords such that removing codewords yields a code that attains the expurgated exponent [6, Lemma 1]. Although [6, Lemma 1] generalizes Gallager’s method, it still only shows the existence of a code that attains the expurgated exponent.
II Main Result
This paper strengthens existing results on expurgation by showing that the probability of finding a code with codewords that contains a code with at least codewords each of which achieving the expurgated exponent tends to with the code length. We define the sequence , where is such that while , being a positive sequence defined in (6) that depends on the channel, the ensemble and the rate. From the definition of it can be seen that if either converges to a constant or grows sufficiently slowly, there exists a such that . Similarly to Gallager, for a given , we define the indicator function
|
|
|
(7) |
and the number of codewords attaining an exponent higher than as
|
|
|
(8) |
Theorem 1
Consider a pairwise-independent code ensemble with codewords and any . If the sequence , which depends on the channel and the ensemble, satisfies , then for any , it holds that
|
|
|
(9) |
In words, with high probability we find a mother code with codewords, of which attain the expurgated exponent. That is, good mother codes are found easily and only contain an arbitrarily small fraction of codewords that need to be expurgated. Theorem 1 extends Gallager’s method, and applies, among others, to i.i.d. (i.i.d.) and constant composition codes over DMCs, as well as channels with memory such as the finite-state channel in [2, Sec. 4.6], for which the expurgated exponent is derived in [7].
As a final remark, recent works [8, 9, 7, 10] show that for many ensembles, most low-rate codes have an error exponent that is strictly larger than the exponent of the ensemble average error probability, i.e., the random coding exponent. Similarly, Theorem 1 implies that for most codes, almost any codeword has an associated error exponent that is strictly larger than the ensemble average of the exponent of the error probability of the codebook . In both cases the smaller error exponent of the average probability of error is due to a relatively small number of elements (codes in the first case, codewords in the second) that perform poorly.
Furthermore, as shown in [9, 10] for i.i.d. and constant composition codes over DMC, the error exponents of the codes in the ensemble concentrate around the TRC (TRC) exponent [11, 8]. Similarly to such works, it can be shown that the error exponent , for any , concentrates around its mean, the expurgated exponent. The proof makes use of Lemma 1 in Section III, and follows almost identical steps as in [10, Theorem 1], [7, Theorem 1] and [7, Theorem 2] once is replaced by and it is omitted here.
III Proof of Theorem 1
We start with the following lemma, whose proof is almost identical to that of [7, Lemma 1].
Lemma 1
For a channel and a pairwise-independent -codewords code ensemble with codeword distribution , for any it holds that
|
|
|
(10) |
where and are positive real-valued sequences.
The proof of Lemma 1 follows from Markov’s inequality
|
|
|
(11) |
and applying the same steps as in [7, Theorem 1] once is replaced with . The sequences and are the same as those introduced in Section II.
Observe that using inequality (11) and following similar steps as in [7] it can be shown that is a lower bound on . Furthermore, using similar arguments as in [10] it can be shown that such bound is tight at least for i.i.d. and constant composition codes over DMC. That is, for such ensembles and channels , i.e., the expurgated is the typical codeword exponent.
If the positive sequence , defined in (6), converges or grows sufficiently slowly, then there exists a sequence such that , , for which . For rate zero, that is when , the -length error exponent in (4) depends on the particular subexponential growth of , while tends to infinity with a growth that depends on the channel and the ensemble. In this case, as discussed in the paragraph succeeding [7, Eq. (89)], the assumption that holds if the normalized variance of the Bhattacharyya coefficient grows slower than . In any case, choosing such and applying Lemma 1 we have that
|
|
|
(12) |
The random variable , averaged across the ensemble, satisfies
|
|
|
|
(13) |
|
|
|
|
(14) |
|
|
|
|
(15) |
where (14) follows from the definition of the indicator function (7) and (12).
We define , which is the number of codewords with exponent smaller than . From (15) it follows that
|
|
|
(16) |
Then, for sufficiently large we have that
|
|
|
|
(17) |
where (17) follows from Markov’s inequality and (16).
This shows that the probability of finding a code with many codewords with exponent strictly smaller than vanishes with . To prove our main result, we write the tail probability in (9) as
|
|
|
|
|
|
|
|
(18) |
|
|
|
|
(19) |
where we used the definitions of and . Since tends to infinity, there must exist an such that for and therefore
|
|
|
|
|
|
(20) |
|
|
|
(21) |
where (21) follows from (17).
Finally, solving the limit yields the desired result.