Bioinformatics Finalfinal

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

GENETIC ALGORITHM

INTRODUCTION:
Genetic algorithms in particular became popular through the work of John Holland in the early
1970s, and particularly his book Adaptation in Natural and Artificial Systems (1975). His work
originated with studies of cellular automata, conducted by Holland and his students at
the University of Michigan. Holland introduced a formalized framework for predicting the
quality of the next generation, known as Holland's Schema Theorem. Research in GAs
remained largely theoretical until the mid-1980s, when The First International Conference on
Genetic Algorithms was held in Pittsburgh, Pennsylvania.In computer science and operations
research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural
selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms
are commonly used to generate high-quality solutions to optimization and search problems by
relying on bio-inspired operators such as mutation, crossover and selection. John
Holland introduced genetic algorithms in 1960 based on the concept of Darwin’s theory of
evolution; afterwards, his student David E. Goldberg extended GA in 1989.Inspired by
“Natural evolution”, GA’s involve direct manipulation of the coding achieved by the
crossover and mutation operators.GA’s begin their search from many points, not from
a single point, contain population of feasible solutions to the problem.GA’s do not need
auxiliary formation like gradients at points. They search via sampling.GA’s search by
stochastic operators, not by deterministic rules. They use random choice to guide
highly exploitative search.
Genetic Algorithm Process
Encode potential solutions in terms of chromosome-like data structure.Select parents
on the basis of the fitness of the solutions to produce offspring for next generation,
who contain the characteristics of both parents.Employ recombination
operators(selection, crossover and mutation) repeatedly to preserve the good portions
of the strings.Good portions of the strings usually lead to an optimal or near-optimal
solution. The method is applied over a desired over a desired number of generation. If
well designed, population will converge faster.
GA: Evolutionary Cycle
Binary Encoding uses 0’s and 1’s in a chromosome.Each bit corresponds to a gene.The
values for a given gene are alleles.
GA Over Generations

Chromosome Encoding
 Binary Encoding
 Real Encoding
 Permutation Encoding
 Value Encoding
 Tree Encoding

Selection Schemes
 Roulette wheel selection without scaling
 Roulette wheel selection with scaling
 Stochastic tournament selection with a tournament size of two
 Remainder stochastic sampling without replacement
 Remainder stochastic sampling with replacement
 Elitism
Crossover and Mutation Examples
Parameters for GA
Population size- population size is problem specific. A good population size is about
20-30, however even sizes of 50-100 are reported.The best population size depends on
the size of encoded string(chromosomes). More the encoded sizes, more should be the
population size of.
Crossover probability- should be high generally, about 80%-95%.(for some problems it
could even be as low as 60%) .
Mutation rate should be very low. Best rates seems to be about 0.5%-1%.Crossover and
mutation type: Operators depend on the chosen encoding and on the problem.
Benefits of Genetic Algorithms
Easy to understand and modular in structure- separate from application.Supports
multi-objective optimization.Good for “noisy” environments.Solution is obtained all the
time- solution quality improves with additional knowledge gained .Inherently parallel;
easily distributed.Easy to exploit previous or alternate solutions.Flexible building
blocks for hybrid applications
When to use genetic algorithm?
 Alternate solutions are too slow or overly complicated
 Need an exploratory tool to examine new approaches
 Problem is similar to one that has already been successfully solved by using GA
 Want to hybridize with an existing solution
 Benefits of the GA technology meet key problem requirements
Near-optimal solution will suffice
Adequate computational power is available
The problem does converge to an optimal solution

Adaptive GAs
Genetic algorithms with adaptive parameters (adaptive genetic algorithms, AGAs) is another
significant and promising variant of genetic algorithms. The probabilities of crossover (pc) and
mutation (pm) greatly determine the degree of solution accuracy and the convergence speed that
genetic algorithms can obtain. Instead of using fixed values of pc and pm, AGAs utilize the population
information in each generation and adaptively adjust the pc and pm in order to maintain the
population diversity as well as to sustain the convergence capacity. In AGA (adaptive genetic
algorithm),the adjustment of pc and pm depends on the fitness values of the solutions. It can be quite
effective to combine GA with other optimization methods. GA tends to be quite good at finding
generally good global solutions, but quite inefficient at finding the last few mutations to find the
absolute optimum. Other techniques (such as simple hill climbing) are quite efficient at finding
absolute optimum in a limited region.
A variation, where the population as a whole is evolved rather than its individual members, is known
as gene pool recombination.
A number of variations have been developed to attempt to improve performance of GAs on problems
with a high degree of fitness epistasis, i.e. where the fitness of a solution consists of interacting
subsets of its variables. Such algorithms aim to learn (before exploiting) these beneficial phenotypic
interactions. As such, they are aligned with the Building Block Hypothesis in adaptively reducing
disruptive recombination.

 It is a Nature inspired algorithm.


 Crossover/mutation improves the fitness of species when they move from one generation to
another.
 Similarly, for optimization problems, we create a population of feasible solutions where fitness
function is equivalent to objective function.
 The more fit ones are selected. Moving from generation to generation average fitness values
are created . The one closer to the optimal solution is considered the best.
 This is genetic algorithm. There are few variations depending on the problem and designer

Methodology:
The evolution usually starts from a population of randomly generated individuals, and is an iterative
process, with the population in each iteration called a generation. In each generation, the fitness of
every individual in the population is evaluated; the fitness is usually the value of the objective
function in the optimization problem being solved. The more fit individuals are stochastically selected
from the current population, and each individual's genome is modified (recombined and possibly
randomly mutated) to form a new generation. The new generation of candidate solutions is then used
in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum
number of generations has been produced, or a satisfactory fitness level has been reached for the
population.
Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a
population of solutions and then to improve it through repetitive application of the mutation, crossover,
inversion and selection operators.
Step1: ENCODING
 Encoding solution of problem to a chromosome.
• Binary encoding: Difficult to apply directly
Good option for knapsack problems.
• Value/real number encoding : For constrained optimization problem.
Each gene will be a real number.
• Permutation encoding: For combinatorial optimization problem, T S ,Quadratic assignment
problem.
 T S problem
• Tree encoding: To find formula,patterns,programs.
Every chromosome is a tree of some objects, such as functions,
commands in programming language.
To obtain empirical relationships.
Binary encoding

1 0 0 1 1 1 0 1
Permutation encoding

3 2 1 5 4 6 8 7 2.5698 5.6322 6.8549 4.1325


Real or value encoding

Tree encoding:
 Sometimes ,we need to obtain empirical relationships, that is ,if there are three variables and we
don't know the relationship between them ,then using functional values, we can find the relation
by tree encoding.
 Chromosomes are functions represented in a tree.
 Solution space-represents actual solutions.
o Eg: tour in TS problem.
 Coding space: chromosomes are defined.
o It can have feasible solutions, infeasible solutions for problems with constraints, andillegal solutions.
eg:TS problem.
 There must be one-one mapping between coding and solution space.
 One chromosome should not represent >1 solutions.
 One solution should not be coded differently.

Critical issues with encoding


 Feasibility of a chromosome
Solution decoded from a chromosome lies in a feasible region of the problem.
 Legality of a chromosome
Chromosomes represent a solution to a problem.
 Uniqueness of mapping between chromosomes and solution to the problem
Between one to many, many to one ,and one to one mapping it is highly desirable with one
chromosome representing only one solution to the problem.

Initialization:
• The population size depends on the nature of the problem, but typically contains several
hundreds or thousands of possible solutions. Often, the initial population is generated
randomly, allowing the entire range of possible solutions (the search space). Occasionally, the
solutions may be "seeded" in areas where optimal solutions are likely to be found.

• Population of 10 chromosomes are created i.e., coin is tossed 120 times and fitness values
are evaluated according to objective function for each chromosome.
• By taking their average, we get the fitness value.

Selection:
During each successive generation, a portion of the existing population is selected to breed a new
generation. Individual solutions are selected through a fitness-based process, where fitter solutions
(as measured by a fitness function) are typically more likely to be selected. Certain selection methods
rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a
random sample of the population, as the former processmay be verytime-consuming.

The fitness function is defined over the genetic representation and measures the quality of the
represented solution. The fitness function is always problem dependent. For instance, in the knapsack
problem one wants to maximize the total value of objects that can be put in a knapsack of some fixed
capacity. A representation of a solution might be an array of bits, where each bit represents a different
object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not
every such representation is valid, as the size of objects may

exceed the capacity of the knapsack. The fitness of the solution is the sum of values of all objects in
the knapsack if the representation is valid, or 0 otherwise.
In some problems, it is hard or even impossible to define the fitness expression; in these cases,
a simulation may be used to determine the fitness function value of a phenotype (e.g. computational
fluid dynamics is used to determine the air resistance of a vehicle whose shape is encoded as the
phenotype), or even interactive genetic algorithms are used.
REPRODUCTION:
Crossover Operator: This represents mating between individuals. Two individuals are selected
using selection operator and crossover sites are chosen randomly. Then the genes at these
crossover sites are exchanged thus creating a completely new individual (offspring). For example

Mutation Operator: The key idea is to insert random genes in offspring to maintain the diversity
in population to avoid the premature convergence.

Termination
This generational process is repeated until a termination condition has been reached. Common
terminating conditions are:
 A solution is found that satisfies minimum criteria
 Fixed number of generations reached
 Allocated budget (computation time/money) reached
 The highest ranking solution's fitness is reaching or has reached a plateau such that successive
iterations no longer produce better results
 Manual inspection
 Combinations of the above

Chromosome representation
The simplest algorithm represents each chromosome as a bit string. Typically, numeric parameters
can be represented by integers, though it is possible to use floating point representations. The floating
point representation is natural to evolution strategies and evolutionary programming. The notion of
real-valued genetic algorithms has been offered but is really a misnomer because it does not really
represent the building block theory that was proposed by John Henry Holland in the 1970s. This
theory is not without support though, based on theoretical and experimental results (see below). The
basic algorithm performs crossover and mutation at the bit level. Other variants treat the chromosome
as a list of numbers which are indexes into an instruction table, nodes in a linked list, hashes, objects,
or any other imaginable data structure. Crossover and mutation are performed so as to respect data
element boundaries. For most data types, specific variation operators can be designed. Different
chromosomal data types seem to work better or worse for different specific problem domains.
When bit-string representations of integers are used, Gray coding is often employed. In this way,
small changes in the integer can be readily affected through mutations or crossovers. This has been
found to help prevent premature convergence at so-called Hamming walls, in which too many
simultaneous mutations (or crossover events) must occur in order to change the chromosome to a
better solution.
Other approaches involve using arrays of real-valued numbers instead of bit strings to represent
chromosomes. Results from the theory of schemata suggest that in general the smaller the alphabet,
the better the performance, but it was initially surprising to researchers that good results were
obtained from using real-valued chromosomes. This was explained as the set of real values in a finite
population of chromosomes as forming a virtual alphabet (when selection and recombination are
dominant) with a much lower cardinality than would be expected from a floating point representation.

Genetic algorithms are a sub-field:

 Evolutionary algorithms
 Evolutionary computing
 Metaheuristics
 Stochastic optimization
 Optimization
The whole algorithm can be summarized as –

Algorithm phases:

Issues for Genetic Algorithm

• Choosing basic implementation issues:

 Encoding
 Population size,mutation rate, crossover rate.
 Selection, deletion policies.
 Types of crossover, mutation operators.

• Termination criteria
• Performance, scalability
• Solution is only as good as evaluation function
Parameters for genetic algorithm

Empirical studies of genetic algorithms shows the following:


 Crossover rate should be high generally about 80-95%. For some problems it could be as low as
60%.
 Mutation rate should be very low. Best rates seems to be about 0.5% to 1%.
 Crossover and mutation type: operators depend on the chosen encoding and on the problem.
 Population size : very big population size usually e does not improve performance of genetic
algorithm- speed actually reduces . good population size is about 20-30 , however even sizes of
50-100 are reported as the best.
 The best population size depends on the size of encoded string . more the coded sizes, more
should be the size of the population.
 Selection: basic roulette wheel selection can be used but sometimes rank selection can be better.
Elitism should be used for sure if you do not use other method for saving the best found solution.
you can also try study state selection.
 Encoding: encoding depends on the problem and also on the size of instance of the problem.

Why use Genetic Algorithms


 They are Robust
 Provide optimisation over large space state.
 Unlike traditional AI, they do not break on slight change in input or presence of
noise

Examples for genetic algorithm

Maximize (x^2+1) over {0,1,……,31}


• Representation –binary code
• Chromosome length :4
• Population size:4
• 1-point crossover
• Roulette wheel selection
Considering initial population
1.01101 13
2.11000 24
3.01000 8
4.10010 18
Taking 1 ,2 and 4 string numbers for the further iteration we get,

Mutation probability:1/5=0.2.
Genes in the above table are to be flipped i.e.0 to 1 and 1 to 0

Comparing the iterations we get the best fitness value as 785.

Example 2
• A supply chain of computer manufacturing industry.
• Demand of each customer zone is satisfied exactly by one production plant
• Each production plant supplies to exactly one customer zone.

Fittness value of a string =100-D
• Crossover procedure:-permutation encoding
• Crossover probability = .8

String no Iteration 1 Iteration 2


1 .98 .47
2 .93 .55
3 .45 .88
4 .57 .93
Applications –

Application of Genetic Algorithms


Genetic algorithms have many applications, some of them are –
 Recurrent Neural Network
 Mutation testing
 Code breaking
 Filtering and signal processing
 Learning fuzzy rule base etc

You might also like