10 1 1 108 40 PDF
10 1 1 108 40 PDF
10 1 1 108 40 PDF
JOHN MATTSSON
JOHN MATTSSON
TRITA-CSC-E 2006:111
ISRN-KTH/CSC/E--06/111--SE
ISSN-1653-5715
KTH CSC
SE-100 44 Stockholm, Sweden
URL: www.csc.kth.se
Abstract
As a response to the lack of efficient and secure stream ciphers, ECRYPT (a 4-year
Network of Excellence funded by the European Union) manages and coordinates a multi-
year effort called eSTREAM to identify new stream ciphers suitable for widespread adop-
tion. Polar Bear, one of the eSTREAM candidates, is a new synchronous stream cipher
proposed by Johan Håstad, and Mats Näslund. In this thesis, the first known attack
is presented. It is a guess-and-determine attack with a computational complexity of
O(278.8 ) that recovers the initial state. We propose that this weakness is fixed by adding
a key-dependent pre-mixing of the dynamic permutation in conjunction with the key
schedule. Further suggested tweaks strengthen the security and improves performance
on long sequences. The updated Polar Bear specification that will be sent to eSTREAM
before June 30, 2006, is based on tweaks suggested in this thesis. We have also optimized
the source code of Polar Bear, which enables it to run almost twice as fast. We have not
found any other weaknesses in Polar Bear, and it seems resistant to all known generic
attacks.
Konstruktion av strömkrypton
En utvärdering av eSTREAM-kandidaten Polar Bear
Sammanfattning
På grund av bristen på säkra och effektiva strömkrypton driver ECRYPT (ett fyra år
långt EU-projekt) ett delprojekt kallat eSTREAM för att identifiera nya strömkrypton
lämpliga för implementation. Polar Bear, en av kandidaterna, är ett nytt strömkrypto
skapat av Johan Håstad och Mats Näslund. I denna uppsats presenterar vi den första
kända attacken. Det är en guess-and-determine attack som bestämmer det initiala
tillståndet med en tidkomplexitet på O(278.8 ). Vi föreslår att liknande attacker und-
viks genom att den dynamiska permutationen blandas under nyckelschemat. Ytterligare
förslag förstärker säkerheten och förbättrar prestandan vid långa sekvenser. Den up-
daterade Polar Bear-specifikationen kommer att bygga på dessa förslag. Vi har också
optimerat källkoden, vilket nästan fördubblar prestandan. Vi har inte hittat några andra
svagheter i designen, och Polar Bear verkar stå emot alla kända typer av attacker.
Acknowledgments
I like to thank the people at the Communication Security Lab at Ericsson Research
in Kista for making my time there as enjoyable as it was. Especially my supervisor Mats
Näslund that always answered the questions that I had. And Eva Gustafsson and Rolf
Blom for letting my go to the SASC workshop in Leuven, Belgium.
I also want to thank Johan Håstad, my supervisor and examiner at the Royal Institute
of Technology.
Contents
1 Introduction 1
1.1 Classification of Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 This Master’s Project and eSTREAM . . . . . . . . . . . . . . . . . . . . 4
1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Basic Characteristics 15
3.1 Statistical Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Golomb’s randomness postulates . . . . . . . . . . . . . . . . . . . 16
3.1.3 Statistical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Measures of Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Linear complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Quadratic span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Maximum order complexity . . . . . . . . . . . . . . . . . . . . . . 19
3.2.4 2-adic span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.5 Ziv-lempel complexity . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Key, State and IV size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6 Results 35
6.1 Permutation Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 A Guess-and-Determine Attack . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2.1 An improved attack . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.3 Evaluation with Respect to Other Attacks . . . . . . . . . . . . . . . . . . 39
6.3.1 Time-memory trade-offs . . . . . . . . . . . . . . . . . . . . . . . . 39
6.3.2 Algebraic attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.3.3 Correlation attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.4 A new version of Polar Bear - Pig Bear . . . . . . . . . . . . . . . . . . . 39
6.4.1 Key schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.4.2 The output cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.4.3 Security analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.5 Statistical Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.6 Optimization and Performance . . . . . . . . . . . . . . . . . . . . . . . . 42
6.6.1 Further performance tweaks . . . . . . . . . . . . . . . . . . . . . . 44
7 Conclusions 46
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Bibliography 48
List of Figures
Introduction
1
The scientific study of cryptology started around WWII with a pioneering paper [47]
written by Shannon. Cryptology uses ideas from several other fields such as information
theory, computer science, number theory, and abstract algebra. Cryptologic research
has historically been done by governments and kept secret. Only the last decades there
has been a widespread open research in cryptology. But the number of researchers
involved in and the money spent on secret research still far exceed that in open research.
Traditionally even publicly used cryptosystems like the GSM cryptosystems A5/1 and
A5/2 have been kept secret, but this has changed recently. The international standards
DES (Data Encryption Standard) and AES (Advances Encryption Standard) used by
governments and financial institutions are publicly known.
This classification is not absolute, and any block cipher can be used as a stream cipher
by using certain modes of operation (see Section 2.4).
The state of a stream cipher can informally be defined as the values of the set of
variables that describes the current status of the cipher. For each new state, the cipher
outputs some bits, and then jumps to the next state where the process is repeated. The
output digits zi of the cipher are called the keystream, and the ciphertext ci , i = 0, 1, . . .
is a function of the keystream and the plaintext.
Stream ciphers are further classified as being either synchronous or self-synchronizing.
In a synchronous cipher, the keystream depends only on the key and the position i, but
is independent of the plaintext and the ciphertext. In a self-synchronous cipher, the
2
keystream depends on the key and a fixed amount of previous ciphertext, but is indepen-
dent of the position i. Both designs have their respective advantages and disadvantages.
In a synchronous stream cipher the sender and receiver have to be synchronized. If
synchronization is lost because a ciphertext digit is lost in the transmission, the decryp-
tion fails and the cipher has to be re-synchronized. A synchronous stream cipher has no
error propagation as a modified ciphertext digit does not affect the decryption of any
other ciphertext digits.
As the decryption in a self-synchronizing stream cipher is independent of the position
and only depends on a fixed amount of ciphertext, the cipher is able to automatically re-
establish decryption when a ciphertext digit is lost. Suppose that the state depends on
the t preceding ciphertext digits. If a single ciphertext digit is lost, inserted or modified it
will affect the decryption of t other ciphertext digits. A self-synchronizing steam cipher
is therefore said to have a limited error propagation.
Self-synchronizing stream ciphers seem to be vulnerable to chosen-ciphertext attacks
where the attacker has access to a decryption unit (see Chapter 4). The most common
way of constructing self-synchronizing stream ciphers is to use a block cipher in some
mode (see Section 2.4).
k - Key schedule
k key
?
zi keystream
σi f
? IV initiation vector
σi state
IV - IV schedule
?
g - zi
?
σ0
The output cycle of a synchronous stream cipher can be described by the equations
σi+1 = f (σi ),
zi = g(σi ),
ci = h(zi , mi )
where σ0 is the initial state and may be determined from the key k and the initiation
vector IV , f is the next-state function, and g is the output function. The schematic can
be seen in Figure 1.1.
A (binary) additive stream cipher is a synchronous cipher where the ciphertext is
obtained by combining the keystream and the plaintext using the exclusive or (XOR)
operation (see Figure 1.2). Almost all currently suggested stream ciphers are additive,
and these are the main topic of this thesis. In many common uses of ciphers, the
transmission is not error free. In such cases frequent reinitialization is required due to
3
k k
k key
Keystream Keystream
zi keystream
generator generator
mi plaintext
ci chipertext
zi zi
mi ci ci mi
Encryption Decryption
4
Table 1.1: The eSTREAM timetable
Date Event
November 2004 Call for Primitives
April 2005 The beginning of the first evaluation phase of eSTREAM
March 2006 The end of the first evaluation phase of eSTREAM
July 2006 The beginning of the second evaluation phase of eSTREAM
End of 2006 Second classification
September 2007 The end of the second evaluation phase of eSTREAM
January 2008 The final report of the eSTREAM
I (software) and profile II (hardware). In this thesis we present the first known attack on
Polar Bear. Our attack is a so called guess-and-determine attack with a computational
complexity of O(279 ). This kind of attack is not practical as is would take a billion
years on a standard PC. It is however of theoretical interest as is indicates a major
weakness in the design. As Polar Bear uses a key size of 128 bit, no attack should have a
computational complexity lower than O(2128 ), see Section 4.1. Recently a similar attack
with the improved complexity of O(257.4 ) has been presented by Hasanzadeh et al. [31].
We analyze why these attacks are possible, and suggest how the cipher can be tweaked
to avoid this type of attack. We have also optimized the source code of Polar Bear,
which enables it to run almost twice as fast. We have not found any other weaknesses
in the design.
1.3 Purpose
This master’s project aims to evaluate the security and performance of the stream cipher
Polar Bear. The four main goals were:
• Evaluation of Polar Bear’s security
5
Parts of this chapter have previously been published in [35]. The results are summarized
and discussed in chapter 7.
6
Chapter 2
The two main goals when designing a practical cipher system are security and perfor-
mance. Given a fixed level of security the goal is to optimize performance. Performance
can be measured as speed, chip area, or power consumption. Other important features
is clarity of the design and flexibility in its implementation. It seems like it is difficult
to design both a fast and secure stream cipher, as most proposed stream ciphers have
documented weaknesses.
When designing a stream cipher there is also a trade-off between the speed in soft-
ware and hardware, as it is difficult to optimize for both at the same time. The A5/1
and A5/2 stream ciphers that are used in the GSM standard uses linear feedback shift
registers (LFSR) over the finite field F2 (see Section 2.2.1). Consequently they are fast
in hardware, but slow in software. Other ciphers like SNOW [19] uses a LFSR over the
finite field F232 , and are very fast in software on 32-bit processors. Henceforth, we will
write Fq for the unique finite field with q = pn elements (p is prime). Another cipher
designed for good performance in software is SEAL [41].
7
ciphertext. This does not mean that the system is secure, as the set of possible plaintext
may be very small.
8
over Fq . The contents of [sn−1 , sn−2 , . . . , s0 ] is called the state of the LFSR, and p(x) is
called the feedback or connection polynomial. The register is controlled by a clock, and
at each stepping the elements are moved to the right so that si = si+1 for i = 0 . . . n − 2
and s0 is outputted. The contents of sn−1 is calculated according to
A polynomial r(x) over Fq is called a primitive polynomial iff it generates all the elements
of an extension field of Fq . All primitive polynomials are irreducible (can not be factored
into polynomials of lower degree). An irreducible polynomial r(x) of degree n over Fp ,
p prime, is primitive if the smallest positive integer m such that r(x) divides xm − 1 is
m = pn − 1. If the feedback polynomial p(x) is a primitive polynomial of degree n each
of the non-zero initial states produces an output sequence with the maximum possible
period q n − 1. Such a LFSR is called a maximum-length LFSR. The output sequence of
a maximum-length LFSR is called a m-sequence.
The nonlinear order or algebraic degree of a Boolean function is the maximum order of
the terms in the algebraic normal form (2.1). A Boolean function f (x) is balanced if
Example Take for example the Boolean function used in the Geffe generator [22]. The
algebraic normal form is
f (x1 , x2 , x3 ) = x1 x2 ⊕ x2 x3 ⊕ x3 .
9
The nonlinear order is 2. It is balanced but only 0th -order correlation immune because
P (f (x) = 0|x1 = 0) = 0.75.
Siegenthaler showed that there is a trade-off between the nonlinear order k and the
correlation immunity m [48]. A m-th order correlation immune function can have at
most nonlinear order n − m.
10
m - ar−1 ar−2 ...... a1 a0 -
6
q1 q2 ......... qr−1 qr
div 2 mod 2
P
-
Pr
1. Form the integer sum σ = 1 qk ar−k +m
4. Let m = bσ/2c
11
If n maximum-length LFSRs with lengths `1 , `2 , . . . , `n are used together with the
boolean function f , the linear complexity (see Section 3.2.1 for a definition) of the
keystream is
where a0 , a1 , . . . are the coefficients in the algebraic normal form of f , and the expression
is evaluated over the ordinary integers instead of the finite field F2 . Thus, it is desirable
to use a combining function with a high nonlinear order. It is also desirable that f has
high correlation immunity to withstand correlation attacks. The trade-off between high
linear complexity and high correlation immunity can be avoided by permitting f to have
memory as in the summation generator. Let the memory at time t be mt . In each
step the integer sum of the output bits x1 , x2 , . . . , xn and the memory mt is calculated.
The least significant bit is outputted and the remaining bit form the new memory mt+1 .
Unfortunately, this makes the 2-adic span of the sequence low. A fact that was used to
attack the summation generator [29]. An examples of a nonlinear combination generators
is the broken Geffe generator.
LFSR 1
sn−1 sn s2 s1 s0
LFSR 2
f
f
LFSR n
12
So for large n, the probability is close to one. An example of a nonlinear filter generators
is the unsecure knapsack generator.
Ci = Pi ⊕ Ek (Ci−1 )
C−1 = IV
Ci = Pi ⊕ Zi
Zi = Ek (Zi−1 )
Z−1 = IV
13
choice is an actual counter f (i) = i. CTR mode allows a random access property for
decryption.
Ci = Pi ⊕ Ek (Bi )
Bi = IV ||f (i)
14
Chapter 3
Basic Characteristics
Random sequence A random bit sequence r = {ri }∞ i=0 is a sequence such that every
bit ri follows a Bernoulli distribution with p = 0.5, and all the bits are independent of
each other. The goal of every stream cipher is to be computationally indistinguishable
from such a sequence.
15
3.1.1 Period
Having a period is clearly a statistical defect, that distinguish a sequence from a random
one. A cipher with a too small period is obviously easy to predict. The period must be
large enough to ensure that is is never repeated. It is usually done by using a building
block that can be proven to have a large period, for instance a maximum-length LFSR.
16
test, that is able to detect a very general class of defects including the ones detectable
by the five tests above. The idea behind the test is that it should not be possible to
significantly compress a random sequence. FIPS 140-2 is a NIST standard, that specifies
a set of tests and significance levels, that a cipher should satisfy to qualify for a specific
security rating (1–4, from lowest to highest). All the tests above are empirical and
not theoretical in the sense that samples of sequences are generated from which certain
statistics are evaluated.
Passing a fixed set of statistical tests is a necessary, but not sufficient condition for
a stream cipher to be secure. Passing a test only makes it unlikely that the keystream
has a specific defect.
For cryptographic purposes it is important that any sequence obtained by concate-
nating bits from several keystream sequences also is indistinguishable from a random
sequence. Therefore, it is important to test that the nth bit is Bernoulli distributed with
p = 0.5. It is done by generating a sequence consisting of the nth bit from a large number
of sequences (perhaps with related keys).
Test suites
Several test suites with batteries of statistical test aimed at making it easy to test PRNGs
exist. All of the following test suites are freely available for download.
- Marsaglia’s Diehard Battery of Tests is the oldest and most well known test suite.
It contains 15 statistical tests. [33]
- NIST Statistical Test Suite contains 16 statistical tests and is developed and main-
tained by the U.S. National Institute of Standards and Technology. They provide
guidance in the use and application of the tests. [2]
- DieHarder is a test suite by Robert G. Brown, that aims to include test from
Diehard, the NIST STS, and several other test in a single package. It fixes several
problems with the Diehard package, and uses the GNU Scientific Library interface.
The test suite is still under active development. [8]
If these requirements are fulfilled, the complexity measure can be used as a statistical
test. Complexity measures are often more interesting in cryptology than other statistical
17
tests, as some of them give methods to recreate the sequence using building blocks com-
monly used in stream ciphers. For the linear complexity, the distribution for a random
sequence has been exactly calculated. It has been approximated for the maximum order
complexity. Some complexity measures are only of theoretical interest because there is
no efficient algorithm to calculate them.
This is also the length of the smallest LFSR that can be used to duplicate the sequence.
Berlekamp-Massey algorithm
The Berlekamp-Massey algorithm [34] is an efficient algorithm to find the linear com-
plexity of a binary sequence of length n. The iterative algorithm finds one or all of the
shortest LFSR capable of generating the sequence. The running time of the algorithm
is O(n2 ).
Let s = {si }∞
i=0 be a binary sequence with linear complexity L. Then the Berlekamp-
Massey algorithm finds a LFSR of length L which generates s, given a subsequence of
length 2L.
The original Berlekamp-Massey algorithm works for sequences over the binary field
F2 , but can be generalized to an arbitrary finite field Fq .
18
This measure is only of theoretical interest as no efficient algorithm to determine the
quadratic span of a sequence is known, and the statistics of random sequences is un-
known.
Approximate values for the mean and variation were found by [20]. For a random bit
sequence r n = {ri }n−1
i=0 , the expected maximum order complexity is
E M (r n ) ≈ 2 lg n.
19
3.3 Key, State and IV size
It is obvious that a cipher should never have a state smaller than the size of the key.
Otherwise it would be faster to search through the internal states than the keys. This
is however not enough. By using various time memory trade-offs (TMTO) attacks, the
attack time can be severely cut at the expense of memory. Babbage [5] suggest the
principle:
Hong et al. [27] discovered another generic TMTO attack, and proposes the following
design principle:
The entropy of the key and the IV should always be at least twice the key
size.
It implies that the IV size should be at least as big as the key size. For an explanation
of the attacks behind these recommendations, see Section 4.2.
20
Chapter 4
The methods for attacking a stream cipher can be classified according to the information
available to the cryptanalyst, the aim of the attack, or the way the attack is done. It is
often assumed that the attacker has knowledge of the cryptographic algorithm, but not
the key. To attack an unknown cipher is a much harder problem, and it is the reason
why all military ciphers are kept secret. History has however showed that it is diffucult
to keep the cryptographic function secret.
Jönsson [28] lists four different categories according to the information available to
the attacker.
1. Ciphertext-only The worst case for the cryptanalysist. Given only the ciphertext
the attacker tries to recover the key or plaintext. The plaintext must have redun-
dancies for such an attack to be successful.
2. Known-plaintext The attacker knows the ciphertext and all or part of the plain-
text. For additive stream ciphers this is equivalent of knowing all or part of the
keystream. The aim is to deduce the key or more plaintext.
3. Chosen-plaintext The attacker has access to an encryption unit and can encrypt
any chosen plaintexts. The aim is to deduce the key.
4. Chosen-ciphertext The attacker has access a decryption unit and can decrypt any
chosen ciphertext. The aim is to deduce the key.
This can also be seen as an ordering of how realistic the different kinds of attacks are.
Because ciphertext is transmitted in public, it is easy to access. To get access to a
corresponding plaintext is more difficult. It is possible that the attacker guesses some
parts of the message, for example names or numbers that are likely to appear. In
several implementations, the first encrypted characters are known or partly known. For
example, most file types have a fixed header in the beginning of the file. If the attacker
can get someone to encrypt the messages, the chosen-plaintext attack can be realistic.
21
The chosen-ciphertext attack is less realistic, but a good cipher should be secure against
even this kind of attack.
For a synchronous stream cipher, category 2–4 above are equivalent. This is not true
for a self-synchronizing stream cipher, where chosen-ciphertext attack often are most
effective. Other categories are chosen-IV attacks, were the attacker can chose initiation
vectors, and related key attacks were the attacker can chose a specific relation between
the keys.
The Nessie report [44] classifies attacks depending on the aim of the attack. The
assumption is that the plaintext is known.
1. Key recovery A method to recover the key.
2. Prediction A method for predicting a bit or sequence of bits of the keystream with
a probability better than guessing.
22
The first TMTO attack on stream ciphers is due to Babbage [5], and was used by
Golić to break the A5 cipher used in GSM standard [23]. The Babbage-Golić attack is
a birthday attack. Assume that we have a stream cipher with a key size of k bits and
an state size of s bits. Suppose we generate keystream for 2m different states and store
them in a table. We then observe 2d different keystreams. By the birthday paradox, we
will on average be able to break one of these keystreams when
s
m + d = s, with the special case m=d= .
2
So if the state size is smaller than twice the key size, these complexities will be lower
than an exhaustive key search.
Hong et al. [27] was the first to realize that trade-offs can work on key/IV pair
instead of the state. The birthday attack below is taken from [11]. We have a stream
cipher with a key size of k bits and an IV size of v bits. Suppose we generate keystream
for 2m different key/IV pairs and store them in a table. We then observe 2d different
keystreams. By the birthday paradox, we will be able to break one of these keystreams
when
k+v
m + d = k + v, with the special case m = d = .
2
So if the IV size is smaller than the key size, these complexities will be lower than an
exhaustive key search. In this case we can also break all the other keystreams that uses
the found key.
To avoid the known TMTO attacks for stream ciphers, the state size should be at
least twice the key size, and the IV size should be at least as large as the key size.
Otherwise the security level of the cipher can never be as large as the key size.
If a cipher fails any of the ordinary statistical tests, this can be used to distinguish the
keystream. But as the generic statistical tests, see Section 3.1.3, were designed to evalu-
ate randomness properties of PRNGs, they seldom find weaknesses in new stream cipher
proposals. As a result, they are only used for catching implementation errors. Several
new tests have been developed specifically toward stream ciphers [43] [51]. They concen-
trate on the correlation between key, IV, and keystream. Saarinen describe a chosen-IV
distinguishing attack that is able to distinguish 6 of the 35 eSTREAM candidates. The
attack can be summarized as
23
1. Choose n bits x = (x1 , x2 , . . . , xn ) in the IV as variables . The rest of the IV and
the key are given a fixed value.
2. Find the boolean function f from x to a single keystream bit (typically, the first).
3. Check if the ANF (Algebraic Normal Form) expression of the Boolean function has
the expected number of d-degree monomials. A monomial is a product of positive
integer powers of a fixed sets of variables, for example, x1 , x1 x3 , or x2 x3 x7 .
Distinguishing attacks often require large amounts of keystream. An easy way to get
away from such attacks is to state that the cipher must be rekeyed after a certain amount
of keystream. For example the authors of the cipher SNOW 2.0 states that the cipher
must be rekeyed after at most 250 words [19]. For block ciphers in OFB or counter mode,
there exists generic distinguishing attacks [26]. As the encryption function in a block
ciher is a permutation, the ciphertext blocks in these modes will all be different. For a
random funtion
√ on n bit blocks, there is likely to be a match when the number of blocks
m equals 2n .
2. Determine other parts of the key/state under some assumption. The assumption
is that the key/IV pair is of some subset of the total set that makes the cipher
weak.
3. By calculating keystream from the deduced values and compare with the known
keystream we can check if the guess is right and the assumption holds.
The attack is successful if 2g · (1/p) · w < 2k , where g is the number of guessed bits, p
is the probability that the assumption holds, w is the work needed to determine if the
guess is right and the assumption holds, and k is the key size. The work involved in each
step w is often very small compared to the other values and ignored.
In Section 6.2, a guess-and-determine attack on Polar Bear is given.
24
4.5 Correlation Attacks
Correlation attacks is probably the most important class of general attacks on stream
ciphers, and efficient correlation attacks have been found for many stream ciphers. For
a correlation attack to be applicable, the keystream z1 , z2 , . . . must be correlated with
the output sequence a1 , a2 . . . of a much simpler internal device, such as a LFSR. The
two sequences are correlated if the probability P (zi = ai ) 6= 0.5. If this is the case, it
might be possible to recover the initial state of the target LFSR.
25
faster than exhaustive search over the target LFSR, but requires received sequences of
large length. Instead of using exhaustive search, the attacks use certain parity check
equations that are created from the feedback polynomial. The attacks have two phases.
In the first, a set of parity check equations are found. In the second these equations are
used in a decoding algorithm to recover the transmitted codeword (the internal output
sequence).
Parity check equations can be created in the following way. Suppose that the feedback
polynomial g(x) has t non-zero coefficients (g(x) has weight t).
g(x) = 1 + c1 x + c2 x2 + . . . + c` x`
From this we get t different parity check equations for the digit ai . And by noting that
for a polynomial over F2
k k k+1 k
g(x)2 = 1 + c1 x2 + c2 x2 + . . . + c` x`2
we get t more equations for each squaring of the polynomial g(x). This can easily be
generalized to polynomials over any finite field by taking the polynomial to the power
of the characteristic of the field, instead of squaring. The obtained equations are valid
for all indexes i. The total number of check equations that can be obtained by squaring
the feedback polynomial is
N
m ≈ t log( )
2`
where N is the length of the sequences u, z. The m parity check equations can be written
as
u i + b1 = 0
u i + b2 = 0
..
.
u i + bm = 0
zi + y1 = L1
zi + y2 = L2
..
.
zi + ym = Lm
26
Two different algorithms were suggested. One in which p∗ is calculated for each observed
symbol and the l positions with highest value of p∗ are used to find the correct initial
state, and one iterative algorithm.
The algorithm works when the weight t is small, but for LFSRs with many taps
the algorithm fails, as the number of required equations is to large. The correlation
probability p, that the algorithm can handle is much lower for polynomials with many
taps. As a consequence, feedback polynomials of large weight should be used.
This algorithm has been improved in several ways. There are other methods of finding
parity equations of low weight using matrix representation of the LFSR, polynomial
division, and linear codes.
1. Find a system of equations in keystream bits zi , and the unknown initial state s.
Algebraic attacks are more time-consuming than correlation attacks, but requires less
keystream. Algebraic attacks are the fastest known attacks on several ciphers and they
have been used to break Toyocrypt, E0 (used in Bluetooth), and a modified version of
SNOW.
Finding equations
For a pure combiner (one without memory), the keystream is a function of the input
variables zi = f (xi1 , xi2 , . . . , xin ). But xij is typically a linear function of the initial state
s so xi = Li (s) where Li is a linear function in matrix form applied i times. For each
observed keystream symbol we get an equation
z1 = f (L1 (s))
z2 = f (L2 (s))
..
.
27
This system of equations can be solved with the algorithms described below. For
combiners with memory this strategy do not work directly. The equations look like
zi = f (Li (s), mi ), where mi is the m memory bits at time i. These equations have
too many unknowns, and inserting mi = g(mi−1 ) gives equations with a degree that
increases exponentially with i. Armknecht et al. [4] solved this by showing that for
a combiner with m memory bits, it is always possible to find a boolean function H of
degree at most ds(m + 1)/2e such that H(f (Li (s), zi , . . . , zi+r−1 )) = 0 and r > m. They
also described a systematic way of finding such a function. This requires larger amounts
of keystream, and is only practical when the number of memory bits m is small.
Equation Solving
The problem of solving systems of multivariate polynomial equations over any finite
field is NP-complete. When the number of equations m is the same as the number of
unknowns n, the best known algorithm for small fields is exhaustive search. For vastly
overdetermined systems, somewhat faster algorithms exists. Most of these algorithms is
based on linearization. The basic idea is that we linearize by replacing each monomial
with a new variable. The system is then solved as a linear system. Take for example the
system
x+y+z = 0
xyz + xy + z = 0
y + xyz = 0
x+y+z = 0
u+v+z = 0
y+u = 0
Some of the suggested algorithms include XL (eXtended Linearization) and XSL (eX-
tended Sparse Linearization). In XL, each equation is multiplied by all monomials of
some bounded degree. The expanded system is then linearized. In XSL, the monomials
are more carefully selected.
Another option is to use algorithms involving Gröbner bases. However, it is difficult
to predict the complexity of these algorithms, and therefore attacks involving them.
28
In a timing attack the attacker tries to break a cipher by analyzing the execution time
for encryption or decryption. It can be done if the encryption or decryption time depends
on the input. This is usually the case for asymmetric algorithms. An example is the
common algorithm for modular exponentiation (usually used in RSA implementations),
where execution time is proportional to the Hamming weight of the input. In 2003 Boneh
and Brumley [9] presented a practical timing attack on SSL-enabled web servers, that
recovered the private key in hours. The vulnerability used was an RSA-implementation
with an optimized Chinese remainder theorem.
Power analysis is similar to a timing attack, but the attacker studies the power
consumption of a cryptographic device (smart card, CPU).
Other information that has been suggested includes leaked electromagnetic radiation
and sound. Shamir and Tromer showed that it might be possible to conduct a timing
attack based only on the humming noise of the CPU [46].
To circumvent side channel attacks, blinding should be used. Blinding means that
the execution time and power consumption are made independent of the inputs. This is
often possible, but the downside is that it often leads to a less efficient cipher and higher
development costs.
29
Chapter 5
Polar Bear reuses parts of the block cipher Rijndael, and the dynamically changing
table from the stream cipher RC4. These ciphers are also included as benchmarks in the
eSTREAM testing framework. The two ciphers are here presented briefly.
5.1 Rijndael
The Rijndael block cipher was selected as the Advanced Encryption Standard (AES)
in October 2000. It is intended to be used by U.S. Governments, but has like DES
become a global standard for software and hardware encryption. The difference between
the original Rijndael proposal and the AES standard is that Rijndael allows the block
length and the key length to be any multiple of 32 bits in the range 128–256 bits, whereas
the AES standard fixes the block length to 128 bits and allows key lengths of 128, 192
and 256 bits only.
Rijndael was designed to be simple, fast, and resist all known attacks, including linear
and differential attacks. It is a key-iterated block cipher, and consists of the repeated
application of a round transformation of the state. The key is enlarged to an expanded
key, and a part of the expanded key is added to the state before and after each round
transformation.
For a block length of 128 bits, two rounds of Rijndael provides ‘full diffusion’, in the
sense that every state bit depends on all state bits two rounds ago. Change in one state
bit is likely to affect half of the state bits after two rounds. For block lengths larger than
128 bits, three round are needed. For a detailed description of the design of Rijndael,
see [17].
5.1.1 Attacks
As of 2006, the only successful attacks against AES have been side channel attacks.
On AES with reduced rounds there exist several attacks. Examples of such attacks are
Truncated differentials attack and Boomerang attacks. Attacks faster than exhaustive
key search exists for AES-128 with 7 rounds, AES-192 with 8 rounds, and AES-256 with
30
9 rounds. This should be compared with the 10,12, and 14 rounds used in full round
AES. None of these attacks seems to be able to attack the full round AES, and they
require an extremely large number of chosen plaintexts.
In 2002, Courtois et al. claimed that AES could be broken with an algebraic attack
using the XSL algorithm [15]. The equation system obtained from Rijndael is over
defined, sparse, and very structured. Courtois writes that AES seem over designed in
respect to linear and differential cryptanalysis, and that it is an extremely bad cipher
from the point of algebraic attacks. There has been much debate over this claim and the
effectivity and reliability of the XSL algorithm. Several problems have been found in the
underlying mathematics of the XSL algorithm, and the ECRYPT AES Security Report
[12] states that “AES cannot currently be considered vulnerable to such attacks”. They
also conclude that “There are still no discernible cryptographic weaknesses in the AES.”
Another indication of the security of AES is that in 2003, NSA extended their support
for AES. As the first publicly available cipher it was approved for the security levels
SECRET and TOP-SECRET (192 or 256 bit keys).
5.2 RC4
RC4 is probably the currently most used stream cipher. It is used in the SSL/TLS
standard for secure communication between web browsers and servers, and in the WEP
protocol used in 802.11 wireless LAN. It was designed by Ron Rivest in 1987 for RSA
Security. The algorithm was long held as a trade secret, but in 1994 the source code was
anonymously leaked to the Cypherpunks mailing list.
The inner state of RC4 consists of a dynamically changing table S, and two byte
variables. A key of variable length k (8–2048 bits) is used to permute S, which initially
contains the values 0 . . . 255 in ascending order.
j=0
For i = 0 to 255
j = (j + S[i] + key[i mod k]) mod 256
Swap(S[i], S[j])
Output is then generated depending on the state, and in each step some elements in
S are permuted. The huge state size of log(256!) ≈ 1684 bits seems to rule out linear
cryptanalysis. Analysis by Robshaw shows that the period with very high probability
exceeds 10100 .
5.2.1 Attacks
Somewhat surprisingly for such a widely known and analyzed cipher, Mantin and Shamir
found a trivial distinguishing attack as late as 2001 [32]. The first few hundred output
bytes are non-random and leak information about the key. Especially the first few bytes
are highly biased and the second output byte of RC4 takes on the value 0 with probability
2−7 instead of the expected 2−8 .
31
The reason for these weaknesses, is that the table S does not have a uniform distri-
bution after the initial permutation. Mossel et al. [39] showed that for a table of size n,
shuffling by semi-random transpositions have a computation complexity of Θ(n log n) to
get the table uniformly permuted. In the paper [38], Mironov shows that the only pass
used in RC4 is clearly insufficient.
This weakness can be used to recover the key in the WEP protocol of the IEEE
802.11b standard. By analyzing only a small amount of keystream from a a few sessions,
it is possible to recover the key instantly. The reason is that in WEP, the session key
is created by concatenating the secret key and the IV. This turns out to be extremely
weak. The SSL standard is not affected as it uses a hash function to combine the secret
key with the IV.
If the first thousand bytes are discarded, such practical attacks can be avoided.
But even then RC4 are not theoretically secure as there exists a distinguishing attack
requiring 233 bytes of keystream [21].
5.3.1 Description
The cipher uses one 7-word (112-bit) LFSR R0 and one 9-word (144-bit) LFSR R1 .
These are viewed as acting over F216 , which is represented as
F2 [y]/(y 16 + y 8 + y 7 + y 5 + 1).
Besides these registers, the internal state of the cipher also depends on a word quantity,
S, and a dynamic permutation of bytes, D8 .
The cipher is primarily designed for a key length of 128 bits. The IV can be any
number of bytes up to a maximum of 31. The key schedule is (in the case of 128-bit
keys) identical to the Rijndael key schedule.
On each message to be processed, the cipher is initialized by taking the key (more
precisely, the expanded Rijndael key), interpreting the IV as a plaintext block, and
applying a (slightly modified) five round Rijndael encryption with block length 256.
The resulting cipher text block is loaded into R0 and R1 . Finally, D8 is initialized to
equal the table T8 , the Rijndael S-box, and S is set to zero (see Figure 5.1).
Output is produced 4 bytes at a time. To this end, the two LFSRs are first irregularly
clocked, determined by S. Eight bytes, selected from R0 and R1 , are run through the
permutation D8 to produce the four output bytes. Selected entries in D8 are swapped.
Finally, S and R0 are modified in preparation for the next output cycle. Entries in R1
are not modified apart from the LFSR stepping.
32
Figure 5.1: Key and IV schedule
• feedback R`i i −1 ← f i .
After stepping both R0 and R1 above, do the following steps, first for i = 0, then repeat
them for i = 1:
33
Figure 5.2: The output cycle
At this point, the internal state is updated, and the output is formed from the above
(βj0 , βj1 )-pairs as described next.
Output generation
Form four output bytes b0 ||b1 ||b2 ||b3 where
bj = βj0 ⊕ βj1 .
If more output bytes are required, the output cycle above is repeated.
34
Chapter 6
Results
35
which consists of two transpositions. All further references to Polar Bear will be Polar
Bear with this corrected permutation.
where `i is the length of register Ri . The notation ∗Ri0 will be used for stages in R0 after
their update.
Let the first 24 bytes of plaintext be known, and let the corresponding first twelve
16-bit block of keystream be Z0 , Z1 , . . . , Z11 .
For the attack to be successful, three assumptions have to be made.
1. During the first six updates of the state, let the steppings for both LFSR R0
and R1 be 2-steppings, where the register is stepped two steps. This happens if
the fourteenth and fifteenth bit of the word quantity S are 0. Because the word
quantity S is initialized to zero, the first stepping for both registers is always a
2-stepping. The probability that the six first steppings is 2-steppings can therefore
be assumed to be (1/2)10 = 2−10 .
2. Let no pair of the first eight α be equal. The probability for this is
256!
≈ 0.90.
(256 − 8)! · 2568
3. Let no pair of the following forty α be equal. The probability for this is
256!
≈ 0.04.
(256 − 40)! · 25640
Because all the steppings for both the registers are 2-steppings, all the stages in both
LFSRs are used to generate keystream. The probability that all three of the above
assumptions holds is greater than 2−14.8 .
Under these assumptions, it suffices to guess the four stages R91 , R10
1 , R1 and R1 , a
11 13
total of 64 bits, to recover the state. The state can now be recovered with the four equa-
tions obtained from the feedback polynomials, the output function, and the nonlinear
36
update of R0 .
Ri0 = θ0 ·(∗)Ri−6
0
+ µ0 ·(∗)Ri−7
0
(6.1)
Ri1 = 1 1
θ · Ri−4 + µ1 · Ri−9
1
(6.2)
0 1
Zi = ∆(Ri+7 ) + ∆(Ri+9 ) (6.3)
∗ 0
Ri = Ri + Ri+2 mod 216
0 1
(6.4)
The operations in (6.1)–(6.3) are in the finite field F216 , whereas the + in (6.4) is addition
modulo 216 . The constants are the ones from the feedback polynomials, and the function
∆(x) is obtained by looking up the two bytes of x in D8, and then concatenate them.
From R91 , R10
1 , R1 , R1 , and (6.3), we get R0 , R0 , R0 , and R0 . With a knowledge
11 13 7 8 9 11
of the four stages R70 , R80 , R91 , and R10
1 , we can calculate how D8 will be permuted after
the first update of the inner state. Let the result of this permutation be D80 . As the
next 32 α-values are all different, we can treat D8 as a constant equal to D80 during
the next five updates of the state. The rest of the stages in the registers can now be
determined in the following order.
(Where Ri , (6.3) → Rj should be read as Ri and (6.3) gives Rj )
R70 , R90 , R110 , R1 , R1 , R1 , (6.4) → ∗R0 , ∗R0 , ∗R0
9 11 13 7 9 11
∗R0 , R0 , ∗R0 , (6.1) → R140 , R0
7 8 9 15
R140 , R0 , (6.3) → R161 , R1
15 17
R150 , R1 , (6.4) → ∗R0
17 15
R111 , R1 , (6.2) → 1
R20
16
1 , (6.3)
R20 → 0
R18
∗R0 , R0 , (6.1) → 0
R12
11 18
0 , (6.3)
R12 → 1
R14
1 , (6.2)
R91 , R14 → 1
R18
1
R18 , (6.3) → 0
R16
∗R0 , R0 , (6.1) → 0
R10
9 16
0 , (6.3)
R10 → 1
R12
R10 0 , ∗R0 , (6.3) → 0
R17
11
0 , (6.3)
R17 → 1
R19
R170 , R1 , (6.4) → ∗R0
19 17
R101 , R1 , (6.2) → 1
R15
19
R151 , (6.3), (6.4) → R130 , ∗R0
13
37
steppings and ‘guessed’ values, they reach an overall attack complexity of O(257.4 ). The
attack assumes that R0 is clocked two steps in the first eight steppings, and that the
stepping sequence for R1 is {2, 3, 3, 3, 3, 2, 3, 2}. In this way, they only need to guess
the values of two stages. They also notice that, because of the known stepping, only 231
of the 232 values are possible.
6.2.2 Analysis
There are several unfortunate coincidences that make these attacks possible. It is rela-
tively straightforward to see that the attack resistance of Polar Bear does not meet it’s
key size. For instance, by guessing one (the shorter) LFSR value, it is possible to deduce
the value of the other, by observing output. Hence, we have an attack with complexity
about 2112 . A solution to this would be to make one, or both, of the LFSRs longer.
But the LFSRs would have to be almost double the length to withstand attacks like the
two describes above. The IV schedule would have to be expanded in some way, that
probably would decrease performance.
A second observation is that the dynamic permutation of bytes D8 is not permuted
before encryption starts, and therefore initially known. If the table D8 were mixed
during the key/IV schedule, like in RC4, guess-and-determine attacks of this kind would
be avoided. But to do a good mixing is slow (The mixing in RC4 is both slow and
insufficient). A mixing during each IV schedule would severely reduce the performance
on short packages. A better solution, if possible, is to mix the D8 table only during the
key schedule.
A third observation is that is to easy to determine more values as soon as some of the
stages are known. The reason is that the relations between the register stages and the
keystream involves few terms, and are easy to invert, see Section 6.2. These relations
come from the feedback trinomials, the choice of the nonlinear updating, and the output
generation. It should be mentioned, that an invertible next-state function is desirable to
ensure that the state does not loose entropy and converge to a fixpoint. We have tested
many different combinations of trinomials, and all seems to be equally vulnerable. If
polynomials with more taps than three were used, attacks like the one above would be
more difficult, but the cycles/byte performance would be further decreased.
Our last observation is that, even though our attack depends on the possibility of a
regular stepping, it is possible that the irregular stepping of Polar Bear actually lowered
the attack resistance. Our analysis shows that with a constant 2-stepping, it is impossible
to guess fewer than four stages. This implies that no guess-and-determine attack similar
to the two described above can have a computational complexity lower than O(261 ). So
as far as we know, the irregular stepping of Polar Bear actually had a negative impact
on security. It is however possible that a constant 2-stepping would enable other attacks
with an even lower complexity.
38
6.3 Evaluation with Respect to Other Attacks
6.3.1 Time-memory trade-offs
The huge state of 1956 bits seems to rule out any trade-off involving the state. A birthday
attack on the set of key/IV pairs is however possible when the IV is smaller than the
key. Polar Bear allows IV sizes as small as 8 bits. This makes a birthday attack with 268
bits memory and computational complexity 268 possible. This is not practical due to
the huge amount of memory used, and the attack is not specific to Polar Bear. Almost
all eSTREAM candidates allows IVs smaller than the key, and are therefore vulnerable
to this kind of generic attack. To avoid such attacks, an IV size v of at least the key size
k should be used, or the security of the cipher should be stated to be (k + v)/2.
39
As small IV sizes enables birthday attacks on the set of key/IV pairs, we recommend
that the allowed minimum size is raised. We also think that the consequences of such
small IVs should be clearly stated.
This will only affect the performance of the key schedule. As far as we have been
able to tell, no other change is needed.
The reason that the table is mixed three full rounds is that the complexity to get a
uniform distribution with the above algorithm is Θ(n log n), where n is the size of the
table [39]. And this is when the expanded key is a truly random sequence. The RC4
key schedule only mixes the table one full round, and this makes the cipher vulnerable
to attacks. According to Mironov [38], this weakness do not disappear unless the table
are shuffled two or three full rounds. The AES key expansion is far from random, it
is in fact quite weak, especially when related keys are considered. Our analysis shows
that if two 128 bits keys with one bit different are used, the resulting expanded keys will
on average differ in as few as 49 bits of the first 64 bytes, depending on where the the
different bit is. The AES key schedule is therefore insufficient for the above purpose. To
fix this another key expansion can be used. We suggest than the key expansion from the
Hash function Whirlpool [6] is used instead. We also suggest that this expanded key is
used in AES during the IV schedule. Very little of AES security depends on properties
of the key schedule. For that reason we do not think that the change makes Polar Bear’s
IV schedule weaker from a security perspective.
Update
Update R30 instead of R50 and change the indexes of the β-values.
40
• Update S according to S ← S + β01 ||β31 mod 216 .
Output generation
Change the indexes of the β-values in the output generation. Form four output bytes
b0 ||b1 ||b2 ||b3 where
b0 = β00 ⊕ β11
b1 = β10 ⊕ β21
b2 = β20 ⊕ β31
b3 = β30 ⊕ β01
Permutation tweak
During the permutation of D8, D8(α1 ) or D8(α3 ) might have been changed in the first
swap. Because of this, they have to be read from memory a second time, which decreases
performance. By reading the β-values in two steps instead of one, the performance
of Polar Bear can be increased without affecting security. We propose the following
permutation.
On a Pentium M, this makes Pig Bear approximately 1.7 cycles/byte faster on long
streams than Polar Bear.
RC4 attacks
The main weaknesses of RC4 are that the first output bytes are not uniformly distributed,
and that there exist weak keys that reveals information about specific key bits. No similar
weaknesses have been found for Polar Bear, and Pig Bear should be even stronger in
this aspect. As the table in Pig Bear are far better shuffled than in RC4 and the output
from the LFSRs have nice statistical properties, we do not think that such weak keys
should exist.
41
Related keys
It is generally bad practice to use related keys, but the topic should be discussed anyhow.
If the AES key expansion had been used, related key attacks could have been an issue.
But as Pig Bear uses the strong key expansion from Whirlpool, a changed bit in the key
will result in that half of the bits in the expanded key changes. Hence, as far as we can
see, related keys will not be a security issue.
Chosen-IV attacks
If an attacker is able to determine the D8 permutation, a guess-and-determine attack can
probably be used to attack Pig Bear. From a security perspective, the worst case is when
one key is used a large number of times, and the attacker can choose the corresponding
IVs. We now have a constant D8 unknown for the attacker.
But such an attack would be harder than to break reduced round AES, as the attacker
do not have knowledge of the AES ciphertext. The attacker has only knowledge of a
complex function of the AES ciphertext.
If patterns in the output from the LFSRs are detectable in the keystream, this could
be used to determine the permutation. A simplified version with only one LFSR can be
broken in this way, but we believe that by outputting the XOR of the β-values, such
attacks are impossible.
42
Table 6.1: Performance figures
CPU Name Stream 40 bytes Agility Key Setup IV Setup
AMD Athlon 64 1.8 GHz AES-CTR 18.96 23.78 20.57 187.95 12.09
Polar Bear∗ 27.63 43.66 30.07 297.81 606.64
HP 9000/785 975 MHz AES-CTR 17.56 25.92 19.64 215.98 79.57
Polar Bear∗ 36.57 57.91 41.12 354.60 819.02
Intel Pentium M 1.7 GHz SNOW-2.0 4.61 29.82 5.98 63.81 801.73
RC4∗∗ 7.52 335.37 19.52 112.41 13005.38
AES-CTR 21.78 28.79 24.59 217.74 43.01
Polar Bear∗ 39.31 59.29 42.95 273.67 783.70
Intel Pentium M 1.6 GHz Pig Bear 20.96 . . . .
Optimized PB 22.69 45.37 26.06 281.81 906.70
Polar Bear∗ 39.11 60.74 42.66 269.29 851.63
PowerPC G4 1.67 GHz AES-CTR 27.06 35.55 31.67 242.69 36.10
Polar Bear∗ 44.45 74.52 50.86 276.64 1099.51
UltraSPARC-III 750 MHz AES-CTR 25.05 34.62 28.50 547.06 121.50
Polar Bear∗ 46.50 87.46 49.94 344.22 1646.77
Intel Pentium 4 2.4 GHz AES-CTR 22.77 31.81 26.69 259.43 68.11
Polar Bear∗ 53.40 80.06 59.27 322.85 785.01
Intel Pentium 4 3.0 GHz SNOW-2.0 5.19 39.04 7.78 90.26 1209.19
RC4∗∗ 10.98 581.88 15.33 193.34 22663.12
AES-CTR 24.13 33.91 28.01 286.04 93.16
Optimized PB 30.91 58.22 34.71 343.57 859.00
Stream – Asymptotic encryption rate (cycles/byte). 40 bytes – Packet encryption rate (cycles/byte). Agility –
Parallel encryption rate (cycles/byte). Key and IV setup – The efficiency of the key setup and IV setup (cycles)
∗ These performance tests are made with first reference code that included the permutation error. The corrected
support for initialization vectors, the benchmarks are for RC4 with a 256 bit key.
testing framework, and all data except the ones for Intel Pentium M 1.6 GHz are taken
from the framework’s homepage [10]. The performance figures for the Pentium M 1.6
were created by us with the testing framework Live CD. The Live CD includes Ubuntu
Linux, Intel C++ Compiler 8.1, Microsoft Visual C++ Toolkit 2003, and several different
version of GCC. The source code is compiled with all three compilers with a large number
of compiler options, and the one with the fastest stream performance is chosen.
The stream performance is probably the most important criteria as it is here that
stream ciphers has the biggest potential advantage over block ciphers. The key setup is
probably the least critical as the time for key setup is typically negligible to the work
needed to generate and exchange the key [10].
It should be noted that the data from the two Pentium 4 processors cannot be directly
compared as the 3.0 GHz version actually perform less per clock cycle (probably because
of memory bottlenecks).
As all optimization was done on a Pentium M, the difference in performance can be
expected to be largest on this architecture. As the values in the table above were created
with the compiler and options that gave the best performance on long streams, the other
benchmarks, such as the numbers of cycles for IV schedule, are not comparable. The IV
schedule is not slower in the optimized code, than in the original code. The optimized
code is actually a little bit faster, but the value changes with different compiler options.
43
All variants of the Polar Bear code are fastest with Intel’s compiler, SNOW 2.0 and
RC4 are fastest with GCC, whereas Microsoft Visual C++ creates the fastest code for
AES-CTR.
Name Stream
Pig Bear RF∗ 12.69
Pig Bear R 15.68
Pig Bear F∗ 18.62
Pig Bear 20.96
Optimized PB 22.69
Asymptotic encryption rate in cycles/byte on Pentium M 1.6 GHz.
∗ These figures are from a modified Pig Bear source, and can likely be optimized further.
Regular stepping
As it is not sure that the irregular stepping improved the security, one might argue
that it should be removed. Such a tweaked version (Pig Bear R) with only 2-steppings,
improves the stream performance drastically (see table 6.2). The reason is probably that
the branch prediction of the CPU works better.
Fewer swaps
Another tweak that improves the performance, is fewer swaps in the table D8. The
fact is that Polar Bear does twice as many swaps per outputted byte as RC4. If the
α-values from the LFSR R0 are not filtered through D8 (Pig Bear F), the encryption
speed for long streams is improved (see Table 6.2). The four output bytes b0 ||b1 ||b2 ||b3
44
now becomes
b0 = α00 ⊕ β11
b1 = α10 ⊕ β21
b2 = α20 ⊕ β31
b3 = α30 ⊕ β01
If the tweaks are combined, the output measured in Mbit/s is improved with 65 percent
over Pig Bear. The tweaked versions of Pig Bear, R and F, are not as optimized as the
Pig Bear implementation and can probably be optimized further. The above tweaks are
not recommended on short streams, unless further analysis of their security implications
is made. We do however recommend that the tweaks are used on long streams. The
cipher could switch to this simpler form when the table are thoroughly shuffled, perhaps
after 768 or 1024 outputted bytes. We believe that the D8 table makes the cipher secure
asymptoticly.
45
Chapter 7
Conclusions
The original Polar Bear specification had a major weakness as it could be attacked
using a guess-and-determine attack with a computational complexity of only O(257.4 )
(Hasanzadeh et al.). We believe that Polar Bear can be made secure by adding a key-
dependent pre-mixing of the D8 table in conjunction with the key schedule. Further
tweaks strengthen the security and improves the performance on long sequences.
We have not found any other weaknesses in Polar Bear, and it seems resistant to all
known generic attacks. Polar Bear and Pig Bear passes all of the statistical tests in the
NIST statistical test suite. Polar Bear also passes new statistical tests that focuses on
correlation and are tailored for stream ciphers.
The first evaluation phase of eSTREAM ended in late march 2006. The 35 candidates
are initially classified into three categories.
• Focus Phase 2 – Ciphers of particular interest. Mainly unbroken ciphers with very
good performance.
• Phase 2 – Ciphers moved to the second phase. Mostly broken ciphers with good
performance.
• Archived – Ciphers no longer considered for the final portfolio.
The main criteria for the classification are cryptanalysis and performance, but instead
of exact limits, a committee is evaluating each cipher. Further, patented ciphers have
not been placed in the focus category. The authors are permitted to submit tweaks to
their algorithms before June 30, 2006.
Polar Bear has advanced to the second phase of eSTREAM in both the software
and hardware profile. Polar Bear is in both cases placed in the Phase 2 category. The
tweak will consist of the one suggested in this thesis. Unfortunately, Polar Bear with its
current implementation do not have any ”significant performance advantage compared
to AES”. The performance tests show that Polar Bear perform worst of the remaining
ciphers in the software profile. It is a reasonable guess that Polar Bear will not survive
the second classification in the end of 2006 unless this is changed.
It might be argued that it is unfair to compare the performance of new ciphers
with the best known C implementation of AES, a cipher that since five years ago is an
46
international standard. The number of hours devoted by a large number of people to
optimize the performance of AES can probably be counted in the tenths of thousands.
47
Bibliography
[3] NIST FIPS-PUB 197. Advanced Encryption Standard (AES). Technical report,
November 2001.
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
[4] Frederik Armknecht and Matthias Krause. Algebraic attacks on combiners with
memory. In Advances in Cryptology - CRYPTO 2003, pages 162–175. Springer-
Verlag, 2003
[5] S. H. Babbage. Improved exhaustive search attacks on stream ciphers. In ECOS 95
(European Convention on Security and Detection), pages 161–166. IEEE Conference
publication, May 1995
[6] Paulo S.L.M. Barreto and Vincent Rijmen. The Whirlpool Hashing Function, 2003.
http://planeta.terra.com.br/informatica/paulobarreto/whirlpool.zip
[7] Norman L. Biggs. Discrete Mathematics. Oxford University Press, 2003. ISBN
0198507178
[8] Robert G. Brown. DieHarder: A Random Number Test Suite.
http://www.phy.duke.edu/~rgb/General/dieharder.php
[9] David Brumley and Dan Boneh. Remote Timing Attacks are Practical. In Proceed-
ings of the 12th USENIX Security Symposium, August 2003.
http://www.cs.cmu.edu/~dbrumley/pubs/openssltiming.pdf
[11] Christophe De Cannire, Joseph Lano, and Bart Preneel. Comments on the Redis-
covery of Time Memory Tradeoffs. eSTREAM, ECRYPT Stream Cipher Project,
Report 2005/040, 2005.
http://www.ecrypt.eu.org/stream/papersdir/040.pdf
48
[12] Carlos Cid and Henri Gilbert. AES Security Report. Technical Report ST-2002-
507932, ECRYPT, January 2006.
http://www.ecrypt.eu.org/documents/D.STVL.2-1.0.pdf
[13] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliff Stein. Intro-
duction to Algorithms. MIT Press, 2001. ISBN 0-262-53196-8
[14] Nicolas Courtois. Fast algebraic attacks on stream ciphers with linear feedback. In
Advances in Cryptology - CRYPTO 2003, pages 176–194. Springer-Verlag, 2003
[15] Nicolas Courtois and Josef Pieprzyk. Cryptanalysis of Block Ciphers with Overde-
fined Systems of Equations. Cryptology ePrint Archive, Report 2002/044, 2002.
http://eprint.iacr.org/2002/044.pdf
[16] Nicolas T. Courtois. Higher Order Correlation Attacks, XL algorithm and Crypt-
analysis of Toyocrypt. Cryptology ePrint Archive, Report 2002/087, 2002.
http://eprint.iacr.org/2002/087.pdf
[17] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES – The Advanced
Encryption Standard. Information Security and Cryptography. Springer-Verlag,
Berlin, 2002. ISBN 3-540-42580-2
[18] Whitfield Diffie and Martin E. Hellman. New Directions in Cryptography. IEEE
Transactions on Information Theory, 22(6):644–654, November 1976.
http://www.cs.jhu.edu/~rubin/courses/sp03/papers/diffie.hellman.pdf
[19] Patrik Ekdahl and Thomas Johansson. A New Version of the Stream Cipher SNOW.
In SAC ’02: Revised Papers from the 9th Annual International Workshop on Se-
lected Areas in Cryptography, pages 47–61, London, UK, 2003. Springer-Verlag.
ISBN 3-540-00622-2.
http://www.it.lth.se/cryptology/snow/snow20.pdf
[20] Diane Erdmann and Sean Murphy. An Approximate Distribution for the Maximum
Order Complexity. Des. Codes Cryptography, 10(3):325–339, 1997.
http://www.isg.rhul.ac.uk/~sean/moc.ps
[21] Scott R. Fluhrer and David A. McGrew. Statistical Analysis of the Alleged RC4
Keystream Generator. In FSE ’00: Proceedings of the 7th International Workshop
on Fast Software Encryption, pages 19–30, London, UK, 2001. Springer-Verlag.
ISBN 3-540-41728-1
[22] P. R. Geffe. How to Protect Data with Ciphers That Are Really Hard to Break.
Electronics, pages 99–101, Jan 1973
[23] Jovan Dj. Golić. Cryptanalysis of alleged A5 stream cipher. In Eurocrypt’97 LNCS
1233, pages 239–255. Springer-Verlag, 1997.
http://www.gsm-security.net/papers/Cryptanalysis_of_Alleged_A5_Stream_Cipher.pdf
49
[24] Johan Håstad and Mats Näslund. BMGL: Synchronous Key-stream Generator with
Provable Security, 2000.
https://www.cosic.esat.kuleuven.ac.be/nessie/workshop/submissions/bmgl4.pdf
[25] Johan Håstad and Mats Näslund. The Stream Cipher Polar Bear. eSTREAM,
ECRYPT Stream Cipher Project, Report 2005/021, 2005.
http://www.ecrypt.eu.org/stream/ciphers/polarbear/polarbear.pdf
[26] P. Hawkes and G. Rose. The applicability of distinguishing attacks against stream
ciphers. in Proceedings of the Third NESSIE Workshop, 2002, 2002.
http://eprint.iacr.org/2002/142.pdf
[27] Jin Hong and Palash Sarkar. Rediscovery of Time Memory Tradeoffs. Cryptology
ePrint Archive, Report 2005/090, 2005.
http://eprint.iacr.org/2005/090.pdf
[28] Fredrik Jönsson. Some results on fast correlation attacks. PhD thesis, Lund Uni-
versity, May 2002.
http://homes.esat.kuleuven.be/~jlano/stream/papers/jon02.ps
[29] Andrew Klapper and Mark Goresky. Feedback Shift Registers, 2-Adic Span, and
Combiners With Memory. Journal of Cryptology, 10(2):111–147, March 1997.
http://www.math.ias.edu/~goresky/pdf/2adic.jour.pdf
[30] Andrew Klapper and Jinzhong Xu. Algebraic feedback shift registers. Theor. Com-
put. Sci., 226(1-2):61–92, 1999
[31] Elham Shakour Mahdi Hasanzadeh and Shahram Khazaei. Improved Cryptanalysis
of Polar Bear. eSTREAM, ECRYPT Stream Cipher Project, Report 2005/084,
2005.
http://www.ecrypt.eu.org/stream/papersdir/084.pdf
[32] Itsik Mantin and Adi Shamir. A Practical Attack on Broadcast RC4. In FSE ’01:
Revised Papers from the 8th International Workshop on Fast Software Encryption,
pages 152–164, London, UK, 2002. Springer-Verlag. ISBN 3-540-43869-6.
http://www.wisdom.weizmann.ac.il/~itsik/RC4/Papers/bc_rc4.ps
[33] George Marsaglia. The Marsaglia Random Number CDROM including the Diehard
Battery of Tests of Randomness
[34] James L. Massey. Shift-Register Synthesis and BCH Decoding. IEEE Transactions
on Information Theory, IT-15(1):122–127, January 1969
[35] John Mattsson. A Guess-and-Determine Attack on the Stream Cipher Polar Bear.
eSTREAM, ECRYPT Stream Cipher Project, Report 2006/017, 2006.
http://www.ecrypt.eu.org/stream/papersdir/2006/017.pdf
50
[36] Willi Meier and Othmar Staffelbach. Fast correlation attacks on certain stream
ciphers. Journal of Cryptology, 1(3):159–176, 1989
[37] Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. Handbook of
Applied Cryptography. CRC Press, 1996. ISBN 0-8493-8523-7.
http://www.cacr.math.uwaterloo.ca/hac/
[38] Ilya Mironov. (Not So) Random Shuffles of RC4. In CRYPTO ’02: Proceedings of
the 22nd Annual International Cryptology Conference on Advances in Cryptology,
pages 304–319, London, UK, 2002. Springer-Verlag. ISBN 3-540-44050-X.
http://eprint.iacr.org/2002/067.pdf
[39] Elchanan Mossel, Yuval Peres, and Alistair Sinclair. Shuffling by semi-random
transpositions, 2004.
http://arxiv.org/PS_cache/math/pdf/0404/0404438.pdf
[40] Ron Rivest, Adi Shamir, and Len Adleman. A Method for Obtaining Digital Signa-
tures and Public-Key Cryptosystems. Communications of the ACM, 21(2):120–126,
1978.
http://theory.lcs.mit.edu/~rivest/rsapaper.pdf
[44] Marcus Schafheutle and Stefan Pyka. Stream Ciphers. Technical report, NESSIE
consortium, February 2003. 103–122 pp.
https://www.cosic.esat.kuleuven.be/nessie/deliverables/D20-v2.pdf
[45] Bea Uusma Schyffert and Kari Modn. En grisbj”orn. In Birgitta Westin, editor,
Bom Bom, page 31. Raben & Sj”‘ogren, Stockholm, 2005. ISBN 91-29-65987-6
[46] Adi Shamir and Eran Tromer. Acoustic cryptanalysis: on nosy people and noisy
machines.
http://www.wisdom.weizmann.ac.il/~tromer/acoustic/
[47] Claude Shannon. Communication Theory of Secrecy Systems. Bell System Technical
Journal, 28(4):656–715, 1949.
http://www.cs.ucla.edu/~jkong/research/security/shannon1949.pdf
51
[48] Thomas Siegenthaler. Correlation-immunity of nonlinear combining functions for
cryptographic applications. IEEE Transactions on Information Theory, 30(5):776–
780, 1984
[49] Thomas Siegenthaler. Decrypting a Class of Stream Ciphers Using Ciphertext Only.
IEEE Transactions on Computers, 34(1):81–85, January 1985
[50] Simon Singh. The Code Book: The Science of Secrecy from Ancient Egypt to Quan-
tum Cryptography. Anchor Books, New York, 2000. ISBN 0-385-49532-3
[51] Meltem Sönmez Turan, Ali Doğanaksoy, and Çağdas Çalik. Statistical Analysis of
Synchronous Stream Ciphers. eSTREAM, ECRYPT Stream Cipher Project, Report
2006/012, 2006.
http://www.ecrypt.eu.org/stream/papersdir/2006/012.pdf
[52] A.C. Yao. Theory and applications of trapdoor functions. In Proceeings of the 23rd
IEEE Symposium on Foundations of Computer Science, pages 80–91, 1982
[53] Arm M. Youssef and Guang Gong. On the Quadratic Span of Binary Sequences.
Technical Report CORR 2000-20, University of Waterloo, March 2000.
http://www.cacr.math.uwaterloo.ca/techreports/2000/corr2000-20.ps
52
TRITA-CSC-E 2006: 111
ISRN-KTH/CSC/E--06/111--SE
ISSN-1653-5715
www.kth.se