Genetic Programming: Examples and Theory: Natural Computing

Natural Computing
Lecture 9
Michael Herrmann
[email protected]
phone: 0131 6 517177
Informatics Forum 1.42
18/10/2011
Genetic Programming:
Examples and Theory:
see: http://www.genetic-programming.org, http://www.geneticprogramming.us

Example 1: Learning to Plan using GP
Aim:
To nd a program to transform any initial state into UNIVERSAL
NAT09 18/10/2011 J. M. Herrmann

Genetic Programming: Learning to Plan
Terminals:
CS returns the current stack's top block
TB returns the highest correct block in the stack (or NIL)
NN next needed block, i.e. the one above TB in the goal
Functions:
MS(x) move block x from table to the current stack. Return
T if does something, else NIL.
MT(x) move x to the table
DU(exp1, exp2) do exp1 until exp2 becomes TRUE
NOT(exp1) logical not (or exp1 is not executable)
EQ(exp1, exp2) test for equality

Learning to Plan: Results
Generation 0: (EQ (MT CS) NN) 0 tness cases
Generation 5: (DU (MS NN) (NOT NN)) 10 tness cases
Generation 10: (EQ (DU (MT CS) (NOT CS)) (DU (MS NN) (NOT NN)))
166 tness cases
population size 500
Koza shows how to amend the tness function for ecient, small
programs: Combined tness measure rewards
Correctness (number of solved tess cases)
AND eciency (moving as few blocks as possible)
AND small number of tree nodes (parsimony: number of

symbols in the string)

Automatically Dened Functions
Ecient code : Loops, subroutines, functions, classes, or . . .

variables
Automatically dened iterations (ADIs), automatically dened

loops (ADLs) and automatically dened recursions (ADRs)
provide means to re-use code. (Koza)
Automatically dened stores

(ADSs) provide means to
re-use the result of
executing code.
Solution: function- dening

branches (i.e., ADFs) and
result-producing branches
(the RPB)
e.g. RPB: ADF(ADF(ADF(x))), where ADF: arg0 arg0

Example 2: The Santa Fe Trail
Objective: To evolve a program which eats all the food on a trail
without searching too much when there are gaps in the trail.
Sensor can see the next cell in the direction it is facing
Terminals: move, (turn) left, (turn) right
Functions: if-food-ahead, progn2, progn3 (unconditional

connectives: evaluate 2 or 3 arguments in the given order)
Program with high tness:

(if-food-ahead move
(progn3
left
(progn2 (if-food-ahead move right)
(progn2 right (progn2 left right)))
(progn2 (if-food-ahead move left) move)
)
)
Fitness: E.g. amount of food collected in 400 time steps

Genetic programming: A practical example
Photo: Selena von Eichendorf

Evolving Structures
Example: Design of electronic circuits by composing
Non-terminals: e.g. frequency multiplier, integrator, rectier,

resistors, wiring ...
Terminals: input and output, pulse waves, noise generator
Structure usually not tree-like: Meaningful substructures

(boxes or subtrees) for crossover and structural mutations
Fitness by desired input-output relation (e.g. by wide-band

frequency response)

Initialisation
The initial population might be lost quickly, but general
features may determine the solutions
Assume the functions and terminal are sucient
Structural properties of the expected solution (uniformity,

symmetry, depth, . . . )
Practical: Start at root and choose k = 0, ..., K with

probability p (k ), choose a non-terminal with k > 0 arguments
or a terminal for k = 0. If k > 0 repeat until no non-terminals
are left or if maximal depth is reached (then k = 0)
Lagrange initialisation: Crossover can be shown to produce

programs with a typical distribution (Lagrange distribution of
the second kind) which can be used also for initialization
Seeding: Start with many copies of good candidates
Riccardo Poli, William B Langdon, Nicholas F. McPhee (2008) A Field Guide to Genetic Programming.

Genetic Programming: General Points
Suciency of the representation: Appropriate choice of
non-terminals
Variables: Terminals (variables) implied by the problem
Is there a bug in the code? Closure: Typed algorithms,

grammar based encoding
Program structure: Terminals also for auxiliary variables or

pointers to (automatically dened) functions
There are no silver bullets: Expect multiple runs (each with a

population of solutions)
Local search: Terminals (numbers) can often be found by

hill-climbing
Can you trust your results? Fitness: From tness cases using
crossvalidation (e.g. for symbolic regression)
Tree-related operators: Shrink, hoist, grow (in addition to

standard mutation and crossover)

Genetic programming: Troubleshooting
Study your populations: Analyse means and variances of

tness, depth, size, code used, run time, ... and correlations
among these
Runs can be very long: Checkpoint results (e.g. mean tness)
Control bloat in order to obtain small ecient programs: Size

limitations prevent unreasonable growth of programs e.g. by
soft thresholds
Control parameters during run-time
Small changes can have big eects
Big changes can have no eect
Encourage diversity and save good candidates,
Embrace approximation: No program is error-free

GP: Application Areas
Problem areas involving many variables that are interrelated in
a non-linear or unknown way (predicting electricity demand)
A good approximate solution is satisfactory
design, control (e.g. in simulations), classication and pattern
recognition, data mining, system identication and forecasting
Discovery of the size and shape of the solution is a major part
of the problem
Areas where humans nd it dicult to write programs
parallel computers, cellular automata, multi-agent strategies,
distributed AI, FPGAs
"Black art" problems
synthesis of topology and sizing of analog circuits, synthesis of
topology and tuning of controllers, quantum computing circuits
Areas where you simply have no idea how to program a
solution, but where the objective (tness measure) is clear
(e.g. generation of nancial trading rules)
Areas where large computerised databases are accumulating
and computerized techniques are needed to analyse the data
Genetic programming: Theory
Schema theorem (sub-tree at a

particular position)
worst case (Koza 1992)

exact for one-point crossover
(Poli 2000)
for many types of crossover
(Poli et al., 2003)
Two functions are equivalent if they coin-
Markov chain theory cide after a permutation of inputs. Pro-
gram trees are composed of NAND gates.
Distribution of tness in search space (s. gure)
as the length of programs increases, the proportion of

programs implementing a function approaches a limit
Halting probability
for programs of length L is of order 1/L1/2, while the expected

number of instructions executed by halting programs is of
order L1/2.

Genetic programming: Bloat
Bloat is an increase in
program size that is not
accompanied by any
corresponding increase in
tness. Problem: The
optimal solution might still
From: Genetic Programming by Riccardo Poli
be a large program
Theories (none of these is universally accepted) focus on
replication accuracy theory

inactive code
nature of program search-spaces theory
crossover bias (1-step-mean constant, but Lagrange variance)
Size-evolution equation (similar to exact schema theorem)
Practical solutions: Size and depth limits, parsimony pressure

(tness reduced by size: f - c l(i))

Genetic programming: Bloat Control
Constant: constant target size of 150.

Sin: target size
sin((generation + 1)/50) 50 + 150.
Linear: target size (150 + generation).
Limited: no size control until the size
reached 250 then hard-limited.
Local: adaptive target size
c = Cov(l , f )/Var(l ): a certain
amount of drift, avoided runaway bloat.
Average program size over 500 generations for multiple runs of the
6-MUX problem [decode a 2-bit address and return the value from the
corresponding register] with various forms of parsimony pressure.
Riccardo Poli, William B Langdon, Nicholas F. McPhee (2008) A Field Guide to Genetic Programming.

Genetic programming: Theory
GP applied to one-then-zeros
problem: independently of tree
structure tness is maximal if
all nodes have a identical sym-
bol. Expected to bloat, but
doesn't. Why?
E. Crane and N. McPhee The Eects of Size and Depth Limits on Tree Based Genetic Programming

Genetic programming: Control parameters
Representation and tness function
Population size (thousands or millions of individuals)
Probabilities of applying genetic operators
reproduction (unmodied) 0.08

crossover: 0.9
mutation 0.01
architecture altering operations 0.01
Limits on the size, depth, run time of the programs

Exact Schema Theory
Following: Genetic Programming by Riccardo Poli (University of Essex)
Exact schema theoretic models of GP have become available

only recently (rst proof for a simplied case: Poli 2001)
For a given schema H the selection/crossover/mutation

process can be seen as a Bernoulli trial, because a newly
created individual either samples or does not sample H
Therefore, the number of individuals sampling H at the next
generation, m (H , t + 1) is a binomially stochastic variable
So, if we denote with (H , t ) the success probability of each
trial (i.e. the probability that a newly created individual
samples H ), an exact schema theorem is simply
E [m (H , t + 1)] = M (H , t ), where M is the population size
and E [] is the mathematical expectation.

Exact Schema Theory
Variable size tree structure does not permit the same denition
of a schema as in GA
A schema is a (sub-)tree with some don't-care nodes ()
A schema represents a primitive function (or a terminal)
E.g. H = ( x (+y )
represents the programs
{(+x (+x y ) , (+x (+y y )) , (x (+y x )) , . . . }
(prex notation, can be terminal or non-terminal)

Exact Schema Theory
Assume: Only reproduction and one-ospring crossover are

performed (no mutation)
(H , t ) the success probability can be calculated because the

two operators are mutually exclusive
(H , t ) = Pr [an individual in H is obtained via reproduction]

+ Pr [an ospring matching H is produced by crossover]
Reproduction is performed with probability pr and crossover
with probability pc (pr + pc = 1), so
(H , t ) = pr Pr [an individual in H is selected for cloning]

parents and crossover points are
+ pc Pr such that the ospring matches H
where Pr [an individual in H is selected for cloning] = p (H , t )
Exact Schema Theory
parents and crossover points are

Pr such that the ospring matches H
Pr Choosing crossover points

X X
=
i and j in shapes k and l
For all pairs of For all crossover points
parent shapes k , l i , j in shapes k , l
Selecting parents with shapes k and l such that if

Pr
crossed over at points i and j produce and ospring in H

Exact Schema Theory
Crossover excises a subtree rooted at the chosen crossover

point in a parent , and replaces it with a subtree excises from
the chosen crossover point in the other parent.
This means that the ospring will have the right shape and
primitives to match the schema of interest if and only if, after
the excision of the chosen subtree, the rst parent has shape
and primitives compatible with the schema, and the subtree to
be inserted has shape and primitives compatible with the
schema.
Assume that crossover points are selected with uniform

probability

Choosing crossover points 1 1
Pr i j k l =
and in shapes and Nodes in shape k Nodes in shape l

Exact Schema Theory
k l such that if

Selecting parents with shapes and
Pr crossed over at points i and j produce and ospring in H
k such that its upper

Selecting a root-donating parent with shape
Pr part w.r.t crossover point i matches the upper part of H w.r.t. j
a subtree-donating parent with shape l such that its lower

Pr Selecting
part w.r.t crossover point j matches the lower part of H w.r.t. i
These two selection probabilities can be calculated exactly, but this

requires a bit more work .... cf. R. Poli and N. F. McPhee (2003)
General schema theory for GP with subtree swapping crossover:
Parts I&II. Evolutionary Computation 11 (1&2).

Conclusions on GP
In order to be successful GP algorithms need well structured

problems and lots of computing power
GPs have proven very successful in many applications, see the

lists of success stories in Poli's talk, in Koza's tutorial and in
GA in the news (many of these were actually GPs)
GP provide an interesting view on the art of programming
Exact schema theoretic models of GP have started shedding

some light on fundamental questions regarding the how and
why GP works and have also started providing useful recipes
for practitioners.
Next time: Ant Colony Optimisation (ACO)

Genetic Programming: Examples and Theory: Natural Computing

Uploaded by

Genetic Programming: Examples and Theory: Natural Computing

Uploaded by

Natural Computing

Examples and Theory:

see: http://www.genetic-programming.org, http://www.geneticprogramming.us

NAT09 18/10/2011 J. M. Herrmann

TB  returns the highest correct block in the stack (or NIL)

NN  next needed block, i.e. the one above TB in the goal

MT(x)  move x to the table

DU(exp1, exp2)  do exp1 until exp2 becomes TRUE

NOT(exp1)  logical not (or exp1 is not executable)

EQ(exp1, exp2)  test for equality

NAT09 18/10/2011 J. M. Herrmann

Generation 0: (EQ (MT CS) NN) 0 tness cases

Generation 5: (DU (MS NN) (NOT NN)) 10 tness cases

population size 500

Correctness (number of solved tess cases)

AND eciency (moving as few blocks as possible)

AND small number of tree nodes (parsimony: number of

NAT09 18/10/2011 J. M. Herrmann

 Ecient code : Loops, subroutines, functions, classes, or . . .

Automatically dened iterations (ADIs), automatically dened

Automatically dened stores

Solution: function- dening

e.g. RPB: ADF(ADF(ADF(x))), where ADF: arg0 arg0

Terminals: move, (turn) left, (turn) right

Functions: if-food-ahead, progn2, progn3 (unconditional

Program with high tness:

(progn2 (if-food-ahead move right)

(progn2 right (progn2 left right)))

(progn2 (if-food-ahead move left) move)

Fitness: E.g. amount of food collected in 400 time steps

Photo: Selena von Eichendorf

NAT09 18/10/2011 J. M. Herrmann

Example: Design of electronic circuits by composing

Non-terminals: e.g. frequency multiplier, integrator, rectier,

Terminals: input and output, pulse waves, noise generator

Structure usually not tree-like: Meaningful substructures

Fitness by desired input-output relation (e.g. by wide-band

NAT09 18/10/2011 J. M. Herrmann

Assume the functions and terminal are sucient

Structural properties of the expected solution (uniformity,

Practical: Start at root and choose k = 0, ..., K with

Lagrange initialisation: Crossover can be shown to produce

Seeding: Start with many copies of good candidates

NAT09 18/10/2011 J. M. Herrmann

Variables: Terminals (variables) implied by the problem

Is there a bug in the code? Closure: Typed algorithms,

Program structure: Terminals also for auxiliary variables or

There are no silver bullets: Expect multiple runs (each with a

Local search: Terminals (numbers) can often be found by

Tree-related operators: Shrink, hoist, grow (in addition to

NAT09 18/10/2011 J. M. Herrmann

Study your populations: Analyse means and variances of

Runs can be very long: Checkpoint results (e.g. mean tness)

Control bloat in order to obtain small ecient programs: Size

Control parameters during run-time

Small changes can have big eects

Big changes can have no eect

Encourage diversity and save good candidates,

Embrace approximation: No program is error-free

NAT09 18/10/2011 J. M. Herrmann

Schema theorem (sub-tree at a

worst case (Koza 1992)

Distribution of tness in search space (s. gure)

as the length of programs increases, the proportion of

for programs of length L is of order 1/L1/2, while the expected

NAT09 18/10/2011 J. M. Herrmann

Theories (none of these is universally accepted) focus on

replication accuracy theory

Size-evolution equation (similar to exact schema theorem)

Practical solutions: Size and depth limits, parsimony pressure

NAT09 18/10/2011 J. M. Herrmann

Constant: constant target size of 150.

NAT09 18/10/2011 J. M. Herrmann

NAT09 18/10/2011 J. M. Herrmann

Representation and tness function

Population size (thousands or millions of individuals)

TB returns the highest correct block in the stack (or NIL)

NN next needed block, i.e. the one above TB in the goal

MT(x) move x to the table

DU(exp1, exp2) do exp1 until exp2 becomes TRUE

NOT(exp1) logical not (or exp1 is not executable)

EQ(exp1, exp2) test for equality

Generation 0: (EQ (MT CS) NN) 0 tness cases

Generation 5: (DU (MS NN) (NOT NN)) 10 tness cases

Correctness (number of solved tess cases)

AND eciency (moving as few blocks as possible)

Ecient code : Loops, subroutines, functions, classes, or . . .

Automatically dened iterations (ADIs), automatically dened

Automatically dened stores

Solution: function- dening

Program with high tness:

Non-terminals: e.g. frequency multiplier, integrator, rectier,

Assume the functions and terminal are sucient

Runs can be very long: Checkpoint results (e.g. mean tness)

Control bloat in order to obtain small ecient programs: Size

Small changes can have big eects

Big changes can have no eect

Distribution of tness in search space (s. gure)

Representation and tness function

reproduction (unmodied) 0.08

A schema is a (sub-)tree with some don't-care nodes ()

Assume: Only reproduction and one-ospring crossover are