Literature Survey: 2.1 Cryptographic Hash Function
Literature Survey: 2.1 Cryptographic Hash Function
Literature Survey: 2.1 Cryptographic Hash Function
LITERATURE SURVEY
This chapter reviews the appropriate background literature and describes the
concept of cryptographic hash function and image enhancement. Scientific publications
included in the literature survey have been chosen in order to build a sufficient
background that will help out in solving the research sub-problems. In addition, this
chapter presents general concepts and definitions used and developed in more detail in
this project.
The ideal cryptographic hash function has four main or significant properties:
2.1.1 PROPERTIES:
Most cryptographic hash functions are designed to take a string of any length as
input and produce a fixed-length hash value.
Second pre-image resistance: given an input m1, it should be hard to find another
input, m2 (not equal to m1) such that hash (m1) = hash (m2). This property is
sometimes referred to as weak collision resistance. Functions that lack this
property are vulnerable to second pre-image attacks.
These properties imply that a malicious adversary cannot replace or modify the input
data without changing its digest. Thus, if two strings have the same digest, one can be
very confident that they are identical.
A function meeting these criteria may still have undesirable properties. Currently
popular cryptographic hash functions are vulnerable to length-extension attacks: given
h(m) and len(m) but not m, by choosing a suitable m' an attacker can calculate h (m ||
m'), where || denotes concatenation. This property can be used to break naive
authentication schemes based on hash functions. The HMAC construction works
around these problems.
Ideally, one may wish for even stronger conditions. It should be impossible for an
adversary to find two messages with substantially similar digests; or to infer any useful
information about the data, given only its digest. Therefore, a cryptographic hash
function should behave as much as possible like a random function while still being
deterministic and efficiently computable.
Checksum algorithms, such as CRC32 and other cyclic redundancy checks, are
designed to meet much weaker requirements, and are generally unsuitable as
cryptographic hash functions. For example, a CRC was used for message integrity in
the WEP encryption standard, but an attack was readily discovered which exploited the
linearity of the checksum.
2.1.2 APPLICATIONS:
MD5 digests have been widely used in the software world to provide some
assurance that a transferred file has arrived intact. For example, file servers often
provide a pre-computed MD5 checksum for the files, so that a user can compare the
checksum of the downloaded file to it. Unix-based operating systems include MD5 sum
utilities in their distribution packages, whereas Windows users use third-party
applications.
2.1.3 ALGORITHM:
denotes a left bit rotation by s places; s varies for each operation. denotes addition
modulo 232.
The main MD5 algorithm operates on a 128-bit state, divided into four 32-bit
words, denoted A, B, C and D. These are initialized to certain fixed constants. The main
algorithm then operates on each 512-bit message block in turn, each block modifying
the state. The processing of a message block consists of four similar stages, termed
rounds; each round is composed of 16 similar operations based on a non-linear function
F, modular addition, and left rotation. Figure 1 illustrates one operation within a round.
There are four possible functions F; a different one is used in each round: denote the
XOR, AND, OR and NOT operations respectively.
Fig 2.1 MD5 Operation
2.2 AUTHENTICATION
1. You can tell the computer something that you know; for example, a password.
This is the traditional password system. Things that cannot be beaten out of you.
Passwords cannot be compelled to be told, they cannot be stolen (from your
mind), and they cannot be duplicated. The main disadvantages of single static
passwords include how easy it is to crack them. They are short and based on
topics close to the user, such as birthdays, partner names, children’s names, etc;
and they are typically letters only. They are also vulnerable to social engineering
i.e. people asking for your password or guessing it. They can also be picked up
by spy ware.
2. You can “show” the computer something you have; for example, a digital
certificate, a card key, a smart card, one-time pads, a challenge-response list,
and so on. This makes it easy to loan your verification for temporary uses like
valet parking, but objects can be stolen. Keys can be duplicated, IDs can be
faked, and nobody knows what the heck a valid badge looks like anyway.
3. You can let the computer measure something about you; for example, your
fingerprint, a retina scan, voiceprint analysis, your DNA, cadence of your typing,
your walk, talk, act. Your smell, shoeprints, aura, your vein patterns and so on.
These are things that can be taken from you. They cannot be faked but can be
stolen. Secondary level of security, what you are is better than what you have.
Physiological are related to the shape of the body. Examples include, but are
not limited to fingerprint, face recognition, DNA, hand and palm geometry, iris
recognition, which has largely replaced retina, and odor/scent.
Behavioral are related to the behavior of a person. Examples include, but are
not limited to typing rhythm, gait, and voice. Some researchers[1] have coined the
term behaviometrics for this class of biometrics.
2.2.1.1 Introduction
FIG 2.2 The basic block diagram of a biometric system
The first block (sensor) is the interface between the real world and the system; it
has to acquire all the necessary data. Most of the times it is an image acquisition
system, but it can change according to the characteristics desired. The second block
performs all the necessary pre-processing: it has to remove artifacts from the sensor, to
enhance the input (e.g. removing background noise), to use some kind of normalization,
etc. In the third block necessary features are extracted. This step is an important step
as the correct features need to be extracted in the optimal way.
If enrollment is being performed the template is simply stored somewhere (on a card
or within a database or both). If a matching phase is being performed, the obtained
template is passed to a matcher that compares it with other existing templates,
estimating the distance between them using any algorithm (e.g. Hamming distance).
The matching program will analyze the template with the input. This will then be output
for any specified use or purpose (e.g. entrance in a restricted area).
2.2.1.1 Performance
false accept rate or false match rate (FAR or FMR) – the probability that the
system incorrectly matches the input pattern to a non-matching template in the
database. It measures the percent of invalid inputs which are incorrectly
accepted.
false reject rate or false non-match rate (FRR or FNMR) – the probability that
the system fails to detects a match between the input pattern and a matching
template in the database. It measures the percent of valid inputs which are
incorrectly rejected.
receiver operating characteristic or relative operating characteristic (ROC)
– The ROC plot is a visual characterization of the trade-off between the FAR and
the FRR. In general, the matching algorithm performs a decision based on a
threshold which determines how close to a template the input needs to be for it to
be considered a match. If the threshold is reduced, there will be less false non-
matches but more false accepts. Correspondingly, a higher threshold will reduce
the FAR but increase the FRR. A common variation is the Detection error trade-
off (DET), which is obtained using normal deviate scales on both axes. This more
linear graph illuminates the differences for higher performances (rarer errors).
equal error rate or crossover error rate (EER or CER) – the rate at which both
accept and reject errors are equal. The value of the EER can be easily obtained
from the ROC curve. The EER is a quick way to compare the accuarcy of
devices with different ROC curves. In general, the device with the lowest EER is
most accurate. Obtained from the ROC plot by taking the point where FAR and
FRR have the same value. The lower the EER, the more accurate the system is
considered to be.
failure to enroll rate (FTE or FER) – the rate at which attempts to create a
template from an input is unsuccessful. This is most commonly caused by low
quality inputs.
failure to capture rate (FTC) – Within automatic systems, the probability that the
system fails to detect a biometric input when presented correctly.
template capacity – the maximum number of sets of data which can be stored in
the system..