Literature Survey: 2.1 Cryptographic Hash Function

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

CHAPTER 2

LITERATURE SURVEY
This chapter reviews the appropriate background literature and describes the
concept of cryptographic hash function and image enhancement. Scientific publications
included in the literature survey have been chosen in order to build a sufficient
background that will help out in solving the research sub-problems. In addition, this
chapter presents general concepts and definitions used and developed in more detail in
this project.

2.1 CRYPTOGRAPHIC HASH FUNCTION:

A cryptographic hash function is a deterministic procedure that takes an arbitrary


block of data and returns a fixed-size bit string, the (cryptographic) hash value, such
that an accidental or intentional change to the data will change the hash value. The data
to be encoded is often called the "message", and the hash value is sometimes called
the message digest or simply digest.

The ideal cryptographic hash function has four main or significant properties:

 it is easy to compute the hash value for any given message,


 it is infeasible to find a message that has a given hash,
 it is infeasible to modify a message without changing its hash,
 it is infeasible to find two different messages with the same hash.

Cryptographic hash functions have many information security applications, notably in


digital signatures, message authentication codes (MACs), and other forms of
authentication. They can also be used as ordinary hash functions, to index data in hash
tables, for fingerprinting, to detect duplicate data or uniquely identify files, and as
checksums to detect accidental data corruption. Indeed, in information security
contexts, cryptographic hash values are sometimes called (digital) fingerprints,
checksums, or just hash values, even though all these terms stand for functions with
rather different properties and purposes.

2.1.1 PROPERTIES:

Most cryptographic hash functions are designed to take a string of any length as
input and produce a fixed-length hash value.

A cryptographic hash function must be able to withstand all known types of


cryptanalytic attack. As a minimum, it must have the following properties:

 Pre-image resistance: given a hash h it should be hard to find any message m


such that h = hash(m). This concept is related to that of one way function.
Functions that lack this property are vulnerable to pre-image attacks.

 Second pre-image resistance: given an input m1, it should be hard to find another
input, m2 (not equal to m1) such that hash (m1) = hash (m2). This property is
sometimes referred to as weak collision resistance. Functions that lack this
property are vulnerable to second pre-image attacks.

 Collision resistance: it should be hard to find two different messages m1 and m2


such that hash (m1) = hash (m2). Such a pair is called a (cryptographic) hash
collision, and this property is sometimes referred to as strong collision resistance.
It requires a hash value at least twice as long as what is required for pre-image
resistance, otherwise collisions may be found by a birthday attack.

These properties imply that a malicious adversary cannot replace or modify the input
data without changing its digest. Thus, if two strings have the same digest, one can be
very confident that they are identical.

A function meeting these criteria may still have undesirable properties. Currently
popular cryptographic hash functions are vulnerable to length-extension attacks: given
h(m) and len(m) but not m, by choosing a suitable m' an attacker can calculate h (m ||
m'), where || denotes concatenation. This property can be used to break naive
authentication schemes based on hash functions. The HMAC construction works
around these problems.

Ideally, one may wish for even stronger conditions. It should be impossible for an
adversary to find two messages with substantially similar digests; or to infer any useful
information about the data, given only its digest. Therefore, a cryptographic hash
function should behave as much as possible like a random function while still being
deterministic and efficiently computable.

Checksum algorithms, such as CRC32 and other cyclic redundancy checks, are
designed to meet much weaker requirements, and are generally unsuitable as
cryptographic hash functions. For example, a CRC was used for message integrity in
the WEP encryption standard, but an attack was readily discovered which exploited the
linearity of the checksum.

In cryptography, MD5 (Message-Digest algorithm 5) is a widely used cryptographic


hash function with a 128-bit hash value. As an Internet standard (RFC 1321), MD5 has
been employed in a wide variety of security applications, and is also commonly used to
check the integrity of files. An MD5 hash is typically expressed as a 32-digit
hexadecimal number.MD5 was designed by Ron Rivest in 1991 to replace an earlier
hash function, MD4. In 1996, a flaw was found with the design of MD5.

2.1.2 APPLICATIONS:

MD5 digests have been widely used in the software world to provide some
assurance that a transferred file has arrived intact. For example, file servers often
provide a pre-computed MD5 checksum for the files, so that a user can compare the
checksum of the downloaded file to it. Unix-based operating systems include MD5 sum
utilities in their distribution packages, whereas Windows users use third-party
applications.

MD5 is widely used to store passwords. To mitigate against the vulnerabilities


mentioned above, one can add a salt to the passwords before hashing them. Some
implementations may apply the hashing function more than once—see key
strengthening.

2.1.3 ALGORITHM:

MD5 consists of 64 of these operations, grouped in four rounds of 16 operations.


F is a nonlinear function; one function is used in each round. Mi denotes a 32-bit block
of the message input, and Ki denotes a 32-bit constant, different for each operation. S

denotes a left bit rotation by s places; s varies for each operation. denotes addition
modulo 232.

MD5 processes a variable-length message into a fixed-length output of 128 bits.


The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit little
endian integers); the message is padded so that its length is divisible by 512. The
padding works as follows: first a single bit, 1, is appended to the end of the message.
This is followed by as many zeros as are required to bring the length of the message up
to 64 bits fewer than a multiple of 512. The remaining bits are filled up with a 64-bit
integer representing the length of the original message, in bits.

The main MD5 algorithm operates on a 128-bit state, divided into four 32-bit
words, denoted A, B, C and D. These are initialized to certain fixed constants. The main
algorithm then operates on each 512-bit message block in turn, each block modifying
the state. The processing of a message block consists of four similar stages, termed
rounds; each round is composed of 16 similar operations based on a non-linear function
F, modular addition, and left rotation. Figure 1 illustrates one operation within a round.
There are four possible functions F; a different one is used in each round: denote the
XOR, AND, OR and NOT operations respectively.
Fig 2.1 MD5 Operation

2.2 AUTHENTICATION

Authentication is the process of reliably verifying the identity of someone


(or something). This is distinct from the assertion of identity (known as identification)
and from deciding what privileges accrue to that identity (authorization). Authentication
is the most difficult from the perspective of network security. Classically, there are three
different ways that you can authenticate yourself or a computer to another computer
system:

1. You can tell the computer something that you know; for example, a password.
This is the traditional password system. Things that cannot be beaten out of you.
Passwords cannot be compelled to be told, they cannot be stolen (from your
mind), and they cannot be duplicated. The main disadvantages of single static
passwords include how easy it is to crack them. They are short and based on
topics close to the user, such as birthdays, partner names, children’s names, etc;
and they are typically letters only. They are also vulnerable to social engineering
i.e. people asking for your password or guessing it. They can also be picked up
by spy ware.

2. You can “show” the computer something you have; for example, a digital
certificate, a card key, a smart card, one-time pads, a challenge-response list,
and so on. This makes it easy to loan your verification for temporary uses like
valet parking, but objects can be stolen. Keys can be duplicated, IDs can be
faked, and nobody knows what the heck a valid badge looks like anyway.

3. You can let the computer measure something about you; for example, your
fingerprint, a retina scan, voiceprint analysis, your DNA, cadence of your typing,
your walk, talk, act. Your smell, shoeprints, aura, your vein patterns and so on.
These are things that can be taken from you. They cannot be faked but can be
stolen. Secondary level of security, what you are is better than what you have.

2.2.1 BIOMETRIC AUTHENTICATION

Biometrics comprises methods for uniquely recognizing humans based upon


one or more intrinsic physical or behavioral traits. In computer science, in particular,
biometrics is used as a form of identity access management and access control. It is
also used to identify individuals in groups that are under surveillance.

Biometric characteristics can be divided in two main classes:

 Physiological are related to the shape of the body. Examples include, but are
not limited to fingerprint, face recognition, DNA, hand and palm geometry, iris
recognition, which has largely replaced retina, and odor/scent.
 Behavioral are related to the behavior of a person. Examples include, but are
not limited to typing rhythm, gait, and voice. Some researchers[1] have coined the
term behaviometrics for this class of biometrics.

2.2.1.1 Introduction
FIG 2.2 The basic block diagram of a biometric system

It is possible to understand if a human characteristic can be used for biometrics in terms


of the following parameters:[2]

 Universality – each person should have the characteristic.


 Uniqueness – is how well the biometric separates individuals from another.
 Permanence – measures how well a biometric resists aging and other variance
over time.
 Collectability – ease of acquisition for measurement.
 Performance – accuracy, speed, and robustness of technology used.
 Acceptability – degree of approval of a technology.
 Circumvention – ease of use of a substitute.

A biometric system can operate in the following two modes:

 Verification – A one to one comparison of a captured biometric with a stored


template to verify that the individual is who he claims to be. Can be done in
conjunction with a smart card, username or ID number.
 Identification – A one to many comparison of the captured biometric against a
biometric database in attempt to identify an unknown individual. The identification
only succeeds in identifying the individual if the comparison of the biometric
sample to a template in the database falls within a previously set threshold.
The first time an individual uses a biometric system is called an enrollment. During
the enrollment, biometric information from an individual is stored. In subsequent uses,
biometric information is detected and compared with the information stored at the time
of enrollment. Note that it is crucial that storage and retrieval of such systems
themselves be secure if the biometric system is to be robust.

The first block (sensor) is the interface between the real world and the system; it
has to acquire all the necessary data. Most of the times it is an image acquisition
system, but it can change according to the characteristics desired. The second block
performs all the necessary pre-processing: it has to remove artifacts from the sensor, to
enhance the input (e.g. removing background noise), to use some kind of normalization,
etc. In the third block necessary features are extracted. This step is an important step
as the correct features need to be extracted in the optimal way.

A vector of numbers or an image with particular properties is used to create a


template. A template is a synthesis of the relevant characteristics extracted from the
source. Elements of the biometric measurement that are not used in the comparison
algorithm are discarded in the template to reduce the filesize and to protect the identity
of the enrollee.

If enrollment is being performed the template is simply stored somewhere (on a card
or within a database or both). If a matching phase is being performed, the obtained
template is passed to a matcher that compares it with other existing templates,
estimating the distance between them using any algorithm (e.g. Hamming distance).
The matching program will analyze the template with the input. This will then be output
for any specified use or purpose (e.g. entrance in a restricted area).

2.2.1.1 Performance

The following are used as performance metrics for biometric systems:

 false accept rate or false match rate (FAR or FMR) – the probability that the
system incorrectly matches the input pattern to a non-matching template in the
database. It measures the percent of invalid inputs which are incorrectly
accepted.
 false reject rate or false non-match rate (FRR or FNMR) – the probability that
the system fails to detects a match between the input pattern and a matching
template in the database. It measures the percent of valid inputs which are
incorrectly rejected.
 receiver operating characteristic or relative operating characteristic (ROC)
– The ROC plot is a visual characterization of the trade-off between the FAR and
the FRR. In general, the matching algorithm performs a decision based on a
threshold which determines how close to a template the input needs to be for it to
be considered a match. If the threshold is reduced, there will be less false non-
matches but more false accepts. Correspondingly, a higher threshold will reduce
the FAR but increase the FRR. A common variation is the Detection error trade-
off (DET), which is obtained using normal deviate scales on both axes. This more
linear graph illuminates the differences for higher performances (rarer errors).
 equal error rate or crossover error rate (EER or CER) – the rate at which both
accept and reject errors are equal. The value of the EER can be easily obtained
from the ROC curve. The EER is a quick way to compare the accuarcy of
devices with different ROC curves. In general, the device with the lowest EER is
most accurate. Obtained from the ROC plot by taking the point where FAR and
FRR have the same value. The lower the EER, the more accurate the system is
considered to be.
 failure to enroll rate (FTE or FER) – the rate at which attempts to create a
template from an input is unsuccessful. This is most commonly caused by low
quality inputs.
 failure to capture rate (FTC) – Within automatic systems, the probability that the
system fails to detect a biometric input when presented correctly.
 template capacity – the maximum number of sets of data which can be stored in
the system..

You might also like