Crypto Primer
Crypto Primer
Crypto Primer
Basic Terminology
ALGORITM
The following are commonly used cryptographic algorithms and methods and
explains the basic underlying concepts.
Public-key cryptosystems were invented in the late 1970's, with some help
from the development of complexity theory around that time. It was
observed that based on a problem so difficult that it would need thousands of
years to solve, and with some luck, a cryptosystem could be developed which
would have two keys, a private key and a public key. With the public key one
could encrypt messages, and decrypt them with the private key. Thus the
owner of the private key would be the only one who could decrypt the
messages, but anyone knowing the public key could send them in privacy.
Another idea that was observed was that of a key exchange. In a two-party
communication it would be useful to generate a common secret key for bulk
encryption using a secret-key cryptosystem (for example, some block
cipher).
Indeed, Whitfield Diffie and Martin Hellman used ideas from number theory
to construct a key exchange protocol that started the era of public-key
cryptosystems. Shortly after that Ron Rivest, Adi Shamir, and Leonard
Adleman developed a cryptosystem that was the first real public-key
cryptosystem capable of encryption and digital signatures.
Terminology
The easy instance of factoring is the case where the given integer has
only small prime factors. For example, 759375 is easy to factor as we
can write it as 35* 55. In cryptography we want to use only those
integers that have only large prime factors. Preferably we select an
integer with two large prime factors, as is done in the RSA
cryptosystem.
Currently one of the best factoring algorithms is the number field sieve
algorithm (NFS) that consists of a sieving phase and a matrix step.
The sieving phase can be distributed (and has been several times)
among a large number of participants, but the matrix step needs to be
performed on large supercomputers. The effectiveness of the NFS
algorithm becomes apparent for very large integers, it can factor any
integer of size 10150 in a few months time. The NFS algorithm takes
sub-exponential time (which is still not very efficient).
The numbers 0, m, 2m, 3m, ... all cover the same point on the circle,
and therefore are said to be in the same equivalence class (we also
write "0 = m = 2m = ... (mod m)"). Each equivalence class has a least
representative in 0 .. m-1. So you can write any integer n as t + km
for any integer t, where 0 <= t < m. It is a convention to write n = t
(mod m) in this case. Here m is said to be the modulus.
It can be shown that you can add, subtract and multiply with these
classes of integers (modulo some m).
The discrete logarithm problem in the finite field GF(p) is then stated
as follows:
given two positive non-zero integers a, g (both less than p), compute
n such that a = gn (mod p). We can choose g so that a solution for n
exists for any non-zero a. To make this problem cryptographically hard
p should be a large prime number (about 10300 and n, in general, of
same magnitude.
The problem of finding the shortest vector in a lattice (using the usual
Euclidean distance) is a NP-hard problem (for lattices of sufficiently
large dimension).
Practical cryptosystems
For a good survey on appropriate key lengths see Lenstra and Verheul's
Selecting Cryptographic Key Sizes (appeared in Public Key Cryptography
2000). They present a complete analysis of key sizes for almost all
cryptosystems.
Below, along with each cryptosystem, you will find the current
recommendations for key sizes where appropriate. These recommendations
are not always equal to Lenstra's and Verheul's.
The key size (the size of the modulus) should be greater than 1024
bits (i.e. it should be of magnitude 10300) for a reasonable margin of
security. Keys of size, say, 2048 bits should give security for decades.
The equivalence to factoring means only that being able to decrypt any
message encrypted by the Rabin cryptosystem enables one to factor
the modulus. Thus it is no guarantee of security in the strong sense.
• References:
The main problem with DSS is the fixed subgroup size (the order of
the generator element), which limits the security to around only 80
bits. Hardware attacks can be a concern to some implementations of
DSS. However, it is widely used and accepted as a good algorithm.
The points on elliptic curves can be added together and they form a
structure called a group (in fact an abelian group). This is just a way of
saying that you can do arithmetic with them as you can do with
integers when using just addition and subtraction.
The security of elliptic curve cryptosystems has been rather stable for
years, although significant advances have been achieved in attacks
against special instances. Nevertheless, these have been conjectured
by the leading researchers several years ago and no great surprises
have yet emerged.
The fact that values used in the LUC algorithms can be represented as
a pair of values gives some additional advantage against just using
integers modulo p. The computations only involve numbers needing
half the bits that would be required in the latter case. As the LUC
group operation is easy to compute this makes LUC algorithms
competitive with RSA and DSS.
• References:
Knapsacks
• References:
Lattices
Some of the initial versions had problems, but the current version has
been proposed for some US standards.
• References:
Secret key algorithms use the same key for both encryption and decryption
(or one is easily derivable from the other). This is the more straightforward
approach to data encryption, it is mathematically less complicated than
public key cryptography, and has been used for many centuries.
Terminology
Some of the more well-known stream ciphers are RC4 and SEAL.
Several stream ciphers are based on linear-feedback shift registers
(LFSR), such as A5/1 used in GSM. These have the benefit of being
very fast (several times faster than usual block ciphers).
It is easy to see that a one bit error in a ciphertext cannot affect the
decrypted plaintext after n bits. This makes the cipher self-
synchronizing.
The block cipher used should have sufficiently large block size to avoid
substitution attacks, for example.
The S-box may even be the only non-linear part of the cipher. This is
the case in the block cipher DES, and thus may be regarded as the
single most important part of the algorithm. In fact, many consider
DES's S-boxes so good that they use them in their own designs (for
example, Serpent).
If the round function depends on, say, k bits of a key, then the Feistel
cipher requires rk bits of the key where r is the number of rounds
used.
The security of the Feistel structure is not obvious, but analysis of DES
has shown that it is a good way to construct ciphers. It is compulsory
that a Feistel cipher has enough rounds, but just adding more rounds
does not always guarantee security.
• The operation of taking the user key and expanding it into rk bits for
the Feistel rounds is called key scheduling. This is often a non-linear
operation, so that finding out any of the rk bits of the round keys does
not directly provide any information about the actual user key. There
are many ciphers that have this basic structure; Lucifer, DES, and
Twofish, just to name a few.
• bitslice operations (bitwise logic operations XOR, AND, OR, NOT and
bit permutations): The idea of bitslice implementations of block ciphers
is due to Eli Biham. It is common practice in vector machines to
achieve parallel operation. However, Biham applied it on serial
machines by using large registers as available in modern computers.
The term "bitslice" is due to Matthew Kwan.
Many commonly used ciphers are block ciphers. Block ciphers transform a
fixed-size block of data (usually 64 bits) it into another fixed-size block
(possibly 64 bits wide again) using a function selected by the key. If the key,
input block and output block have all n bits, a block cipher basically defines a
one-to-one mapping from n-bit integers to permutations of n-bit integers.
If the same block is encrypted twice with the same key, the resulting
ciphertext blocks are also the same (this mode of encryption is called
electronic code book, or ECB). This information could be useful for an
attacker. To cause identical plaintext blocks being encrypt to different
ciphertext blocks, three standard modes are commonly used:
The one-time pad (OTP) is the only cipher that has been proven to be
unconditionally secure, i.e. unbreakable in practice. It has also be proven
that any unbreakable, unconditionally secure, cipher must in principle be a
one-time pad.
The practical problem is that the key does not have small constant length,
but the same length as the message, and one part of a key should never be
used twice (or the cipher can be broken). So, we just have traded the
problem of exchanging secret data for the problem of exchanging secret
random keys of the same length. However, this
cipher has allegedly been in widespread use since its invention, and even
more since the security proof by C. Shannon in 1949. Although admittedly
the security of this cipher had been conjectured earlier, it was Shannon who
actually found a formal proof for it.
DES
DES is a block cipher with a 64-bit block size. It uses 56-bit keys. This makes
it suspectible to exhaustive key search with modern computers and special-
purpose hardware. DES is still strong enough to keep most random hackers
and individuals out, but it is easily breakable with special hardware by
government, criminal organizations, or major corporations. DES is getting
too weak, and should not be used in new applications. As a consequence,
NIST proposed in 2004 to withdraw the DES standard.
A variant of DES, Triple-DES (also 3DES) is based on using DES three times
(normally in an encrypt-decrypt-encrypt sequence with three different,
unrelated keys). The Triple-DES is arguably much stronger than (single)
DES, however, it is rather slow compared to some new block ciphers.
Also, the design was exceptionally good for a cipher that was meant to be
used only a few years. DES proved to be a very strong cipher and it took
over a decade for any interesting cryptanalytical attacks against it to develop
(not to underestimate the pioneering efforts that lead to this breakthrough).
The development of differential cryptanalysis and linear cryptanalysis opened
ways to really understand the design of block ciphers.
Although at the time of DES's introduction its design philosophy was held
secret, it did not discourage its analysis - to the contrary. Some information
has been published about its design, and one of the original designers, Don
Coppersmith, has commented that they discovered ideas similar to
differential cryptanalysis already while designing DES in 1974. However, it
was just a matter of time that these fundamental ideas were re-discovered.
AES
All the ciphers have a 128-bit block size and they support 128, 192, and 256
bit keys. The rather large key sizes are probably required to give means for
construction of efficient hash functions.
NIST stated that all five finalists had adequate security and that there
was nothing wrong with the other four ciphers. After all analysis and
received comments were considered, NIST considered Rijndael the
best choice for the AES. The other four finalists are mentioned below.
RC6 follows the ideas of RC5 - but with many improvements. For
example, it attempts to avoid some of the differential attacks against
RC5's data dependent rotations. However, there are some attacks that
get quite far, and it is unclear whether RC6 is well enough analyzed
yet.
The 32 rounds lead to probably the highest security margin on all AES
candidates, while it is still fast enough for all purposes.
This cipher has key dependent S-boxes like Blowfish (another cipher
by Bruce Schneier).
Blowfish
Blowfish utilizes the idea of randomized S-boxes: while doing key scheduling,
it generates large pseudo-random lookup tables by doing several
encryptions. The tables depend on the user supplied key in a very complex
way. This approach has been proven to be highly resistant against many
attacks such as differential and linear cryptanalysis. Unfortunately it also
means that it is not the algorithm of choice for environments where large
memory space (something like 4096 bytes) is not available..
The only known attacks against Blowfish are based on its weak key classes.
CAST-128
CAST-128 is a DES-like Substitution-Permutation Network (SPN)
cryptosystem which appears to have good resistance to differential, linear
and related-key cryptanalysis. It has the Feistel structure and utilizes eight
fixed S-boxes. CAST-128 supports variable key lenghts between 40 and 128
bits. It is described in RFC 2144.
IDEA
Rabbit (CryptiCore)
Rabbit is a recent stream cipher, based on iterating a set of coupled
nonlinear functions. It is characterized by a high performance in software. A
detailed description and security analysis of Rabbit is available from the
designers' web site (www.cryptico.com).
RC4
RC4 is a stream cipher designed by Ron Rivest at RSA Data Security, Inc. It
used to be a trade secret, until someone posted source code for an algorithm
on the usenet, claiming it to be equivalent to RC4. There is very strong
evidence that the posted algorithm is indeed equivalent to RC4. The
algorithm is very fast. Its security is unknown, but breaking it does not seem
trivial either. Because of its speed, it may have uses in certain applications.
It accepts keys of arbitrary length.
RC4 is essentially a pseudo random number generator, and the output of the
generator is exclusive-ored with the data stream. For this reason, it is very
important that the same RC4 key never be used to encrypt two different data
streams.
SEED
SEED is a block cipher developed by the Korea Information Security Agency
since 1998. Both the block and key size of SEED are 128 bits and it has a
Feistel Network structure which is iterated 16 times. It has been designed to
resist differential and linear cryptanalysis as well as related key attacks.
SEED uses two 8x8 S-boxes and mixes the XOR operation with modular
addition. The algorithm specification is available through the SEED
homepage. SEED has been adopted as an ISO/IEC standard (ISO/IEC 18033-
3), an IETF RFC, RFC 4269 as well as an industrial association standard of
Korea (TTAS.KO-12.0004/0025).
In this section some of the famous ciphers of the past are listed, with links to
more complete information where possible.
• Fish was used by the German army in WWII to encipher high-
command communications. It was produced by a stream cipher called
the Lorentz machine. Fish was the name given to it by British
cryptanalysts. It was important because it caused difficulties for British
analysts, who finally developed a machine called Colossus, which was
arguably the first, or one of the first, digital computers.
• Enigma was another cipher used by the Germans in World War II. The
machine used several rotors and looked like a typing machine.
However, first Polish and then later British mathematicians were able
to keep up with the development of the machine. Most communication
using the basic version of Enigma was deciphered by British analysts
at Bletchley Park within few hours of the interception. One of the
strongest Enigma's were used in submarine communication, but British
analysts managed to break them with great implications for the battle
on the Atlantic.
There are several good books about Enigma and Bletchley Park. In
addition, the work of the major figure of British cryptanalysis, Alan
Turing, has been explained in many articles and books. Recently his
original notes about cryptanalysis from that time has been released for
the public.
• Vigenere. This cipher uses clock arithmetic to add together the key
and the message. The difference between OTP and Vigenere is that in
Vigenere we explicitly reuse the short key several times for one
message.
Methods for attacking Vigenere ciphers are the Kasiski test, index of
coincidence etc. These lead to effective methods which break even
very short message (relative to the key size of course).
• Hill cipher. The Hill cipher uses matrices in clock arithmetic, and is
highly suspectible to known plaintext attacks.
MD5's ancestors, MD2 and MD4 have been broken, and there are some
concerns about the safety of MD5 as well. In 1996 a collision of the
MD5 compression function was found by Hans Dobbertin (Hans
Dobbertin, FSE'96, LNCS 1039). Although this result does not directly
compromise its security, as a precaution the use of MD5 is not
recommended in new applications.
Some machines may have special purpose hardware noise generators. Noise
from the leak current of a diode or transistor, least significant bits of audio
inputs, times between interrupts, etc. are all good sources of randomness
when processed with a suitable cryptographical hash function. It is a good
idea to acquire true environmental noise whenever possible.
Cryptography works on many levels. On one level you have algorithms, such
as symmetric block ciphers and public-key algorithms. Building upon these
you obtain protocols, and building upon protocols you find applications (or
other protocols).
SSL is one of the two protocols for secure WWW connections (the
other is SHTTP). WWW security has become important as increasing
amounts of sensitive information, such as credit card numbers, are
being transmitted over the Internet.
SSL was originally developed by Netscape in 1994 as an open protocol
standard. The internet draft of the SSL Protocol 3.0 can be found here.
In 1996, SSL development became the responsibility of the Internet
Engineering Task Force (IETF) and it renamed SSL to TLS (Transport
Layer Security). However, TLS 1.0 differs very little from SSL 3.0.
Extensions for TLS are described in RFC 3546.
These standards are developed at RSA Data Security and define safe
ways to use RSA. The PKCS documents published by RSA Laboratories
are available at their web site.
• Secure Shell
• IPSec
While all the above protocols operate on the application layer of the
internet, allowing particular programs to communicate on a secure
channel in an inherently insecure network, IPSec attempts to make the
internet secure in its essence, the internet protocol (IP). The IPSec
protocols are defined in RFC 2401.
Next
SECTION 3