(Lecture Notes in Computer Science 5752 _ Theoretical Computer Science and General Issues) Bertrand Estellon, Frédéric Gardi, Karim Nouioua (auth.), Thomas Stützle, Mauro Birattari, Holger H. Hoos (ed.pdf
(Lecture Notes in Computer Science 5752 _ Theoretical Computer Science and General Issues) Bertrand Estellon, Frédéric Gardi, Karim Nouioua (auth.), Thomas Stützle, Mauro Birattari, Holger H. Hoos (ed.pdf
(Lecture Notes in Computer Science 5752 _ Theoretical Computer Science and General Issues) Bertrand Estellon, Frédéric Gardi, Karim Nouioua (auth.), Thomas Stützle, Mauro Birattari, Holger H. Hoos (ed.pdf
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Thomas Stützle Mauro Birattari
Holger H. Hoos (Eds.)
Engineering Stochastic
Local Search Algorithms
13
Volume Editors
Thomas Stützle
Mauro Birattari
Université Libre de Bruxelles
IRIDIA, CoDE
Avenue F. Roosevelt 50, CP 194/6, 1050 Brussels, Belgium
E-mail: {stuetzle,mbiro}@ulb.ac.be
Holger H. Hoos
University of British Columbia
Computer Science Department
2366 Main Mall, Vancouver, BC, V6T 1Z4, Canada
E-mail: [email protected]
CR Subject Classification (1998): E.5, E.2, F.2, I.1.2, I.2.8, F.2.2, H.3.3
ISSN 0302-9743
ISBN-10 3-642-03750-X Springer Berlin Heidelberg New York
ISBN-13 978-3-642-03750-4 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2009
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12741809 06/3180 543210
Preface
Stochastic local search (SLS) algorithms are established tools for the solution
of computationally hard problems arising in computer science, business admin-
istration, engineering, biology, and various other disciplines. To a large extent,
their success is due to their conceptual simplicity, broad applicability and high
performance for many important problems studied in academia and encoun-
tered in real-world applications. SLS methods include a wide spectrum of tech-
niques, ranging from constructive search procedures and iterative improvement
algorithms to more complex SLS methods, such as ant colony optimization,
evolutionary computation, iterated local search, memetic algorithms, simulated
annealing, tabu search, and variable neighborhood search.
Historically, the development of effective SLS algorithms has been guided to
a large extent by experience and intuition. In recent years, it has become in-
creasingly evident that success with SLS algorithms depends not merely on the
adoption and efficient implementation of the most appropriate SLS technique
for a given problem, but also on the mastery of a more complex algorithm en-
gineering process. Challenges in SLS algorithm development arise partly from
the complexity of the problems being tackled and in part from the many de-
grees of freedom researchers and practitioners encounter when developing SLS
algorithms. Crucial aspects in the SLS algorithm development comprise algo-
rithm design, empirical analysis techniques, problem-specific background, and
background knowledge in several key disciplines and areas, including computer
science, operations research, artificial intelligence, and statistics. Ideally, the
SLS algorithm development process is assisted by a sound methodology that
addresses the issues arising in the various phases of algorithm design, implemen-
tation, tuning, and experimental evaluation.
In 2007, we organized a first workshop intended to provide a forum for re-
searchers interested in the integration of relevant aspects of SLS research into a
more coherent methodology for engineering SLS algorithms. This event attracted
more than 50 participants and was widely considered a resounding success. It
was therefore an easy decision to organize a second event, SLS 2009, Engineering
Stochastic Local Search Algorithms — Designing, Implementing and Analyzing
Effective Heuristics. Like the inaugural SLS 2007, SLS 2009 brought together
researchers working on various aspects of SLS algorithms, ranging from more
theoretical contributions on aspects relevant for SLS algorithms to the devel-
opment of specific SLS algorithms for specific application problems. We believe
that this second event further promoted the awareness and use of principled
approaches and advanced methodology for the development of SLS algorithms
and other complex heuristic procedures.
Of the 27 manuscripts submitted, seven were accepted as full papers for these
workshop proceedings, which corresponds to an acceptance rate of about 25%.
VI Preface
During the workshop, each of these papers was presented in a 30-minute plenary
talk. In addition, ten articles with promising, ongoing research efforts were se-
lected for publication as short papers. The selected papers were chosen based on
the results of a rigorous peer-reviewing process, in which each manuscript was
evaluated by at least three experts. SLS 2009 also included the Doctoral Sympo-
sium on Engineering Stochastic Local Search Algorithms (SLS-DS), which was
organized by Frank Hutter and Marco Montes de Oca. All short papers and
the contributions of SLS-DS were presented in poster sessions. This format was
chosen in order to provide opportunities for extended discussion and interaction
among the participants. The workshop program was completed by three tutori-
als on important topics in SLS engineering given by well-known researchers in
the field.
We gratefully acknowledge the contributions of everyone who helped to make
SLS 2009 a successful and lively workshop. We thank Frank Hutter and Marco
Montes de Oca for the organization of the doctoral symposium, SLS-DS; every-
one at IRIDIA who helped in organizing the event; the researchers who submitted
their work; the Program Committee members and additional referees who pro-
vided valuable feedback during the paper selection process; the Université Libre
de Bruxelles (ULB) for providing the rooms for the event. Finally, we would
like to thank the Belgian National Funds for Scientific Research, and the French
community of Belgium for supporting this workshop.
Workshop Chairs
Thomas Stützle Université Libre de Bruxelles, Belgium
Mauro Birattari Université Libre de Bruxelles, Belgium
Holger H. Hoos University of British Columbia, Canada
Program Committee
Thomas Bartz-Beielstein Cologne University of Applied Sciences,
Germany
Roberto Battiti Università di Trento, Italy
Christian Blum Universitat Politècnica de Catalunya, Spain
Marco Chiarandini University of Southern Denmark, Denmark
Carlos Cotta University of Málaga, Spain
Patrick de Causmaecker Katholieke Universiteit Leuven, Kortrijk,
Belgium
Camil Demetrescu Università La Sapienza, Italy
Yves Deville Université Catholique de Louvain, Belgium
Luca Di Gaspero Università degli Studi di Udine, Italy
Karl Doerner Universität Wien, Austria
Marco Dorigo Université Libre de Bruxelles, Belgium
Carlos M. Fonseca University of Algarve, Portugal
Michel Gendreau Université de Montréal, Canada
Bruce Golden University of Maryland, USA
Walter J. Gutjahr Universität Wien, Austria
Jin-Kao Hao University of Angers, France
Richard F. Hartl Universität Wien, Austria
Geir Hasle SINTEF Applied Mathematics, Norway
Adele Howe Colorado State University, USA
David Johnson AT&T Labs Research, USA
Joshua Knowles University of Manchester, UK
Chu Min Li Université de Picardie Jules Verne, France
Arne Løkketangen Molde University College, Norway
Vittorio Maniezzo Università di Bologna, Italy
Catherine C. McGeoch Amherst College, USA
Daniel Merkle University of Southern Denmark, Denmark
VIII Organization
Local Arrangements
Saifullah bin Hussin, Renaud Lenne
Manuel López-Ibáñez Sabrina Oliveira
Zhi Yuan
Additional Referees
Marco A. Montes de Oca Lin Xu
Sponsoring Institutions
National Funds for Scientific Research, Belgium
http://www.fnrs.be
French Community of Belgium (through the research project META-X)
http://www.cfwb.be
Table of Contents
Short Papers
High-Performance Local Search for Solving Real-Life Inventory Routing
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Thierry Benoist, Bertrand Estellon, Frédéric Gardi, and
Antoine Jeanjean
A Detailed Analysis of Two Metaheuristics for the Team Orienteering
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Pieter Vansteenwegen, Wouter Souffriau, and Dirk Van Oudheusden
On the Explorative Behavior of MAX–MIN Ant System . . . . . . . . . . . . . . 115
Daniela Favaretto, Elena Moretti, and Paola Pellegrini
A Study on Dominance-Based Local Search Approaches for
Multiobjective Combinatorial Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 120
Arnaud Liefooghe, Salma Mesmoudi, Jérémie Humeau,
Laetitia Jourdan, and El-Ghazali Talbi
X Table of Contents
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 1–15, 2009.
c Springer-Verlag Berlin Heidelberg 2009
2 B. Estellon, F. Gardi, and K. Nouioua
Concerning skills, we precise that the different domains of skill are disjoint, but
that the levels of each domain are hierarchically organized. Then, a technician
of level l in domain d is able to perform any intervention requiring a smaller
skill level (l < l) in the same domain. Consequently, the constants R(i, d, l) are
cumulative, in the sense that they specify the number of technicians needed at
level at least l in domain d. For example, for an intervention Ii which requires
two technicians of level 1 and one technician of level 3 in domain d, we have
R(i, d, 0) = 3, R(i, d, 1) = 3, R(i, d, 2) = 1, R(i, d, 3) = 1 and R(i, d, l) = 0 for all
l ≥ 4. Such a definition implies that the R(i, d, l) are non-increasing according
to the index l: R(i, d, l) ≥ R(i, d, l ) for all l ≤ l .
Then, the notion of team arises. Daily, the (available) technicians must be
grouped into teams (even if a team may be composed of only one technician). We
insist on the fact that a team is formed for the entire day (for practical reasons).
Then, the problem is to partition daily the technicians into teams and to assign
them a set of interventions, in order to minimize an objective function depending
on the ending dates of the interventions. Two constraints lie on this assignment:
the sum of the lengths of interventions (which are completed sequentially) can
not exceed the length of a working day fixed to H = 120 and the skills of the
team must cover the skills required by the set of tasks in each domain. Finally,
a solution of the problem is given as follows: for each day j, the team Ej,e to
which belongs the technician Tt (the team Ej,0 contains all the technicians not
available on this day); for each intervention Ii , the day ji and the starting time
hi of its execution as well as the team Ej,e in charge of its execution.
The objective of the planning is to minimize the following cost function: 28t1 +
14t2 + 4t3 + f , where tk denotes the ending date among those of the latest
interventions of priority k and f denotes the ending date of all interventions.
The starting date di (resp. ending date fi ) of an intervention Ii is obtained as
ji · H + hi (resp. ji · H + hi + D(i)), the days being numbered from 0. Initially,
this objective function was supposed to imply the minimization of the tk ’s in
lexicographic order (t1 t2 t3 f ). However, compensations between the four
terms of the objective function were allowed during the competition (impacting
gravely our approach as it will be seen later).
Finally, the scope of the problem may be extended in two ways. The first
is to introduce precedence relations between interventions: for all intervention
Ii , one can define a set P (i) of interventions which must be completed before
starting Ii (that is to say, any intervention Ii ∈ P (i) must satisfy the inequality
fi ≤ di ). Note that the natural lapses of time between interventions (travel,
breaks, etc.) are here considered as null. The second extension is to define a
budget B allowing to subcontract a number of interventions. Then, a cost S(i)
is given for any intervention Ii and the sum of the cost S(i) of all abandoned
interventions must not exceed the budget B. In order to ensure the respect
of precedences in this case, any abandoned intervention Ii leads to recursively
abandon any intervention Ii such that Ii ∈ P (i ).
High-Performance Local Search for Task Scheduling 3
2 Contributions
To the best of our knowledge, this problem was never addressed in these terms
in the literature, both from fundamental and experimental points of view. Be-
cause of its large definition, the problem contains several NP-hard subproblems.
For any partition of technicians into teams a given day, determining if a set of
interventions is assignable to these teams while respecting the working duration
H, the precedence constraints and the skill constraints is NP-complete, even if
the execution time of all interventions is unit (all interventions have equal exe-
cution time), the number of teams is fixed to two and the precedence graph is
isomorphic to a set of vertex-disjoint paths [2]. In the case of arbitrary prece-
dence constraints, the problem remains NP-complete, even if the execution time
of all interventions is unit and the skill constraints are omitted (any intervention
can be performed by any team) [3]. Minimizing the number of days to plan all
interventions is NP-hard in the strong sense, even if the interventions are per-
formed by one sole team each day (containing all available technicians), without
precedence and skill constraints. Indeed, this subproblem corresponds to a bin-
packing problem [3] when the length H is given as an input of the problem.
Finally, maximizing the sum of lengths of the set of abandoned interventions is
equivalent to a knapsack problem (with precedence constraints) [3].
Because of its hardness and large scale (hundreds of interventions and tech-
nicians), such a problem is typical of real-life discrete optimisation problems
encountered in business and industry. In this paper, an effective and efficient
local-search heuristic is described to solve this problem. Our algorithm was
ranked 2nd of the ROADEF 2007 Challenge (over the 35 teams who have sub-
mitted a solution). The victorious algorithm, due to Hurkens [4], can be viewed
as a local-search heuristic where large neighborhoods [5] are explored by integer
linear programming (using ilog cplex 10.0 solver); the team Cordeau-Laporte-
Pasin-Ropke [6], ranked 2nd ex æquo, have also developed a large neighborhood
search approach, but based on destroy and repair moves. Before describing our
algorithm, we outline the methodology followed to design and implement it. This
methodology, already used at our winning participation to the ROADEF 2005
Challenge [7,8,9], is a simple and clear recipe to engineer high-performance local-
search heuristics for solving practically discrete optimization problems. Another
successful application of this methodology for solving real-life inventory routing
problems is presented in a companion paper [10].
For more details on high-performance algorithm engineering, the reader is re-
ferred to the papers by Moret et al. [11,12] and, as an example, to the outstanding
works of Helsgaun [13,14,15] on the traveling salesman problem.
effectiveness efficiency
algorithmics &
search strategy moves implementation
robustness reliability
definition of exploration of
the search graph the search graph
The moves (also called transformations) play a central role because they
induce the connectivity of the search graph, which is decisive for convergence.
Then the idea is to increase the density of the search graph G (more arcs in A)
by defining a lot of moves, more or less orthogonal, more or less large, more or
less specialized. This latter notion consists in increasing the success probability
of a move (the number of red arcs visited before finding a green one) by using
structural properties specific to the problem or even to the instances (see for
example the work of Helsgaun [13,14,15] on travelling salesman problems or the
works of the authors [8,9] on car sequencing problems). Note that the idea which
consists in using systematically a large pool of moves (i.e., of neighborhoods)
appears at the root of well-known metaheuristics like Iterated Local Search or
Variable Neighborhood Search (see [21] for more details).
This is at these levels – search strategy and moves – that some fragments of
metaheuristics can be incorporated (thresholds, tabu lists). However, from our
point of view, the diversification of the search must be firstly attained through
the (re)definition of the search space (density) and the definition of moves (con-
nectivity), and not only through a meta-strategy. The main reason is that such
a diversification is guided and controlled via the surrogate objective function,
unlike traditional metaheuristics. This is why we prefer, at least for starting,
implementing a basic first-improvement descent strategy [16] with stochastic
choice of moves. In this case, the diversification is realized by accepting to move
to solutions with equal cost. Note that the introduction of stochastic elements
in every choice made during the search is shown to improve the diversification,
in particular by naturally avoiding cycling phenomena (nevertheless, stochastic
does not mean uniform).
Finally, algorithmics, in particular those related to the evaluation of moves
(that is, the exploration of neighborhoods), is crucial for efficiency. Since local
search is an incomplete search technique, its effectiveness is closely linked to
the number of solutions visited before the time limit. In this way, algorithmics
forms the engine of the search. Incremental algorithms, exploiting invariants in
discrete structures, help to speed up the convergence of local search by several
orders of magnitude (see for example the works of Katriel et al. [22] in the
context of the Comet software [23]). Then, careful implementations, aware of
the locality principle ruling the cache memory allocation and optimized by code
profiling, still helps to accelerate local search (see for example the works done
on SAT solvers [24]). From experience, it is not surprising to observe an order of
10 between the times of convergence of two local-search heuristics, apparently
based on the same principles.
Linked to algorithmics, software and implementation aspects like reliability
are no less crucial than efficiency. Because relying on complex incremental algo-
rithmics and data structures, engineering local search requires larger efforts for
verifying and testing than in traditional (business) software engineering. Hence,
the verification process of local-search softwares must be systematic. The first
step is to program with assertions [25] (by verifying preconditions, postcondi-
tions, invariants all along the program); in particular, one must check at each
6 B. Estellon, F. Gardi, and K. Nouioua
iteration of the local search (in debugging mode) that the current solution satis-
fies the constraints of the problem and that its objective value is correct. But one
step beyond, the consistency of all dynamic data structures must be checked (in
debugging mode) after each iteration of the local search by recomputing them
from scratch (with naive algorithms independent from the local-search code).
Consequently, a large part of the source code (and of the time spent to imple-
ment) in local-search engineering projects must be dedicated to verification and
testing: from experience, code checkers represent from 10 to 20 % of the whole
source code. Reliability aspects (as well as maintainability and portability is-
sues) must be imperatively taken into account for costing tightly local-search
engineering projects.
Once these three levels have been completed, the resulting algorithm can be
evaluated by computing statistics on target instances: success rate (number of
acceptations over the number of attempts) and improvement rate (number of
improvements in the sense of f over the number of attempts) for each move,
number of iterations and time to reach best solutions. From experience, the
quest for high performance requires many stepwise refinements, following the
80-20 rule (the last 20 % of improvement takes 80 % of the engineering time).
- MoveTechnician, SwapTechnicians
- MoveInterventionInterDays, SwapInterventionsInterDays
- MoveInterventionIntraDay, SwapInterventionsIntraDay
- MoveInterventionIntraTeam, SwapInterventionsIntraTeam
Finally, additional transformations have been introduced to tackle the two possi-
ble extensions of the problem; namely, adding precedences between interventions,
and allowing to abandon interventions within the limit of a budget.
- AbandonInterventionBudget: abandon an intervention of the planning (de-
clined into Generic and InfeasibleDay);
- SwapInterventionsBudget: swap an abandoned intervention with a planned
one (declined into Generic and InfeasibleDay);
- ReinsertInterventionBudget: reinsert an abandoned intervention into the
planning;
- SwapInterventionsPrecedences: swap two interventions Ii , Ii such that
di ≤ di and the number of descendants of Ii in the precedence graph
is greater than or equal to the one of Ii (declined into InterDays and
IntraDay).
On the whole, a pool of 31 transformations is used. At each iteration of the heuris-
tic, a transformation is picked randomly following a certain distribution. Here the
convergence speed of the local search depends strongly of the utilization rate of
each transformation. These rates have been fixed by hand after experimentations
done with the first 20 benchmarks provided by France Télécom. Here is the out-
line of the distribution: (i) 25 % of MoveInterventionInterDays declined into
InfeasibleDay, (ii) 25 % of MoveTechnician declined into InfeasibleTeam,
(iii) 15 % of SwapTechnicians declined into InfeasibleTeam, and from 5 % to
1 % for the 28 other moves (if no budget is available, no budget-specific trans-
formation is used; idem for precedences). The prominence of transformations (i),
(ii), (iii) in the distribution is sensible: (i) is in charge of reinserting interventions
making a day infeasible into another ones, whereas (ii) and (iii) are supposed
to solve the infeasibility generated by lack of skills in teams. Note that, despite
their low utilization rate, the 28 other moves participate to the diversification of
the search.
to stop early in case of rejection of the move; the different tests which are part of
it are ordered according to their time complexity and their propensity to fail. For
example, since the precedence constraints are considered as inviolable, all tests
related to precedences in the evaluation process of moves MoveIntervention
and SwapInterventions are done first. Since the evaluation process cannot be
detailed for each of the 8 core transformations, we will only insist on two main
points: the evaluation related to skills and the evaluation related to precedences.
Evaluation of Skills. Any move which impacts the technicians or the interven-
tions of a team calls for an evaluation of the adequation between skills provided
by the technicians and skills required by the interventions of this team. To realize
this evaluation, to each team of technicians is associated a matrix Ce of skills
giving for each domain d and level l, the number of technicians of level at least
l in the domain d. Then, an intervention Ii assigned to the team Ee is infeasible
(according to skills) if a pair (d, l) exists such that Ci (d, l) > Ce (d, l). Since the
number of domains and levels is not bounded (for example, the instance B4 of
benchmarks provided by France Télécom includes 40 domains), it is difficult to
design a data structure more efficient than this matrix domain/level to evaluate
skills. Consequently, evaluating the impact of a move on skills becomes time
expensive in the worst case, because in O(dl) time.
Fortunately, the number of cells of this matrix which are necessary to scan
can be drastically reduced in practice. For example, the scan can be restricted
to the useful domains of the matrix of skills required by the intervention, that is,
the domains for which at least one technician is required. Then, for each useful
domain d, the scan can be reduced to an interval of levels. Remind that our skill
matrices are built cumulatively: for each domain, the number of technicians is
non-increasing according to levels. Thus, the evaluation can start at the higher
level linf such that Ci (d, linf ) = Ci (d, l) for all l ≤ linf and stop at the lower level
lsup such that Ci (d, lsup ) = 0.
Finally, a heuristic test with a lower time-complexity can be done before the
scan of the matrix, in order to stop earlier in case of negative evaluation. For each
domain d, define Ce (d) = C
l e (d, l) and symmetrically Ci (d) = l Ci (d, l).
Then, the following necessary condition holds: if one domain d exists such that
Ci (d) > Ce (d), then Ii is infeasible (note that the reciprocal is trivially false).
Such a test located upstream enables to determine in only O(d) time the in-
feasible status of the intervention. In the same way, it is appropriate to place
even before
another test verifying if Ci = d Ci (d) is strictly greater than
Ce = d Ce (d). Finally, the evaluation of skills is composed of three successive
tests, respectively in O(1) time, in O(d) time, and in O(dl) time, each one allow-
ing to conclude in case of failure. Of course, all the structures involved in these
tests must be maintained incrementally during the search.
to compute longest paths in a directed acyclic graph (DAG). For this, a DAG
is attached to each day of the planning. Each DAG contains a source node
representing the start of the day and a destination node representing its end.
Then, to each intervention planned into the day is associated one node in the
DAG. These nodes are linked by two kinds of precedences: blue arcs which induce
the order of the interventions assigned to each team of technicians into the day,
and red arcs which represent the precedences given in input. The length l(i, i )
of the arc connecting the nodes corresponding two interventions Ii ≺ Ii is given
by the duration D(i) of the intervention Ii . In this way, the earliest starting date
of one intervention is determined by the length of a longest path from the source
node to its node into the DAG. This date, stored at each node, allows to verify
if the maximal working duration H is respected for all teams, and to compute
the ending dates t1 , t2 , . . . , tk − 1.
Thus, any transformation MoveIntervention or SwapIntervention implies a
cascade of insertion/suppression of arcs into the DAG of impacted days, needing
a (temporary) update of the longest paths in order to evaluate the impact of the
transformation. Since the interventions of each team are completed sequentially,
each node has only one blue predecessor and only one blue successor. The red
predecessors and successors are stored as unordered lists into the data structure
of the node. These lists, implemented as arrays, are designed to support basic
routines (find, insert, delete, clear) in O(1) time. Such a representation was
motivated by the sparsity of the precedence graph on benchmarks A and B
(where the number of red arcs is lower than the number n of interventions).
The temporary update of longest paths is done by a recursive bread-first
propagation from the inserted/suppressed node. The new longest path at a node
is computed by scanning its predecessors: if the new longest path is different from
the old one, then the successors of the node are placed into a queue in order to
be examined recursively. This propagation also enables to detect the creation
of cycles, which makes the transformation rejected. When the maximum degree
of the DAG remains in O(1), which is the case here, our incremental algorithm
(evaluate, commit and rollback procedures) runs in optimal time and space O(a)
with a the number of affected nodes (that is, having a modified longest path).
The interested reader can consult the works of Katriel et al. [22] on the subject,
which give an incremental algorithm whose complexity becomes advantageous
when the maximum in-degree of a node is large.
5 Experimental Results
Windows XP operating system and a chipset Intel Xeon 3075 (CPU 2.67 GHz, L1
cache 64 Kio, L2 cache 4 Mio, RAM 2 Go). An executable binary file (compiled
for the desired computing architecture) is available on request from the authors.
The characteristics of each instance are given on the left part of the table: the
number n of interventions, the number m of technicians, the number d of skill
domains, the number l of skill levels, the number P of (non transitive) prece-
dences between interventions, the budget B available. For each instance, 5 runs
were performed, each one limited to 1200 seconds (20 minutes). In the middle
part of the table, the columns “FT”, “EGN”, “BEST”, “% gap”, “priority” con-
tain respectively the result obtained by France Télécom’s algorithm, the worst
result obtained by our algorithm (over the 5 runs), the best result obtained
among all the competitors (including the 5 runs of our algorithm), the relative
gap (in %) between the values of the two previous columns, and the ordering of
priorities used by the EGN algorithm (for example, the value 3214 means that
the priorities were scheduled according to the ordering 3, 2, 1, 4). In the right
High-Performance Local Search for Task Scheduling 13
Table 2. Results with optimal priority ordering (left) or extended time limits (right)
data EGN EGN∗ BEST % gap priority data 20 min 1 hr 3 hrs 9 hrs
A5 29700 28845 28845 0.0 3214 X1 180240 170460 168240 158280
A8 20100 16979 16979 0.0 2134 X5 178560 167280 165120 164760
B7 36900 35700 33300 7.3 2134 X9 154920 146520 146040 141720
B9 32700 28080 28080 0.0 2134 X10 152280 144360 140340 140160
B10 41280 34440 34440 0.0 2314
X2 8370 7440 7260 2.5 2134
X6 10440 10140 9480 7.0 2134
X7 38400 32280 32280 0.0 2134
X8 23800 23220 23220 0.0 2134
part of the table, the column “attempt” (resp. “accept”, “improve”) reports the
average number of attempted transformations (resp. accepted transformations,
strictly improving transformations).
A weak gap is observed between the results of the 5 runs of our algorithm
(that is why only the worst result is given here). Note that this gap increases
with the number of planned days. Thus, gaps greater than 1 % between runs are
observed for the following instances: X1 (57 days), X5 (52 days), X9 (50 days),
X10 (49 days). Then, the relative gap between the results of our algorithm and
the best results of the Challenge shows that this one is very competitive. On
average, EGN algorithm reduces by 30 % the cost of the solutions proposed
by France Télécom (and by 41 % for the sole benchmark B). On the other
hand, the gap between our solutions and the best solutions obtained among all
competitors is of 7.3 % on average (with a standard deviation of 7.5 %). On the
30 instances, our algorithm obtains the best solution for 13 ones (7 for A, 6 for
B, 3 for X) and obtains a solution having a cost lower than 6 % of the cost of
the best solution for 18 instances (9 for A, 6 for B, 3 for X).
Besides, we are able to explain why EGN algorithm fails to find the best
solution for the 17 remaining instances. The main reason is that the ordering
of priorities computed in the preprocessing stage is not the most appropriate.
The table on the left part of Table 2 shows the cost obtained by our algorithm
assuming that the optimal ordering is known. This cost appears in the column
named “EGN∗ ” and the optimal ordering appears in the column named “prior-
ity”. In this case, one can observe that for 6 more instances we obtain the best
solution. The second reason is still due to the multi-objective nature of the cost
function. For example, for instance A4, EGN algorithm obtains the following
solution: t1 = 315, t2 = 540 and t3 = 660 with global cost 14040. Now, relaxing
slightly the ending date of interventions with priority 1 allows to improve the
global cost thanks to the compensation of the two first terms of the objective
function: t1 = 324, t2 = 480 and t3 = 660 with global cost 13452 (best known
solution).
However, our local-search approach is overcome on instances X1, X5, X9, X10
by large neighborhood search approaches of Hurkens [4], winner of the Chal-
lenge, and to a lesser extend, of Cordeau et al. [6] ranked second ex æquo.
In fact, these instances contain in majority long interventions (of length 60 or
14 B. Estellon, F. Gardi, and K. Nouioua
References
1. ROADEF Challenge (2007): http://www.g-scop.fr/ChallengeROADEF2007/
2. Jansen, K., Woeginger, G., Yu, Z.: UET-scheduling with chain-type precedence
constraints. Computers and Operations Research 22(9), 915–920 (1995)
3. Garey, M., Johnson, D.: Computer and Intractability: a Guide to the Theory of
NP-Completeness. W.H. Freeman & Co., New York (1979)
4. Hurkens, C.: Incorporating the strength of MIP modeling in schedule construc-
tion. In: ROADEF 2007, le 8ème Congrès de la Société Française de Recherche
Opérationnelle et d’Aide à la Décision, Grenoble, France (2007) (in French)
5. Ahuja, R., Ergun, Ö., Orlin, J., Punnen, A.: A survey of very large-scale neighbor-
hood search techniques. Discrete Applied Mathematics 123, 75–102 (2002)
6. Cordeau, J.F., Laporte, G., Pasin, F., Ropke, S.: ROADEF 2007 challenge:
scheduling of technicians and interventions in a telecommunications company.
In: ROADEF 2007, le 8ème Congrès de la Société Française de Recherche
Opérationnelle et d’Aide à la Décision, Grenoble, France (2007) (in French)
7. ROADEF Challenge 2005:
http://www.prism.uvsq.fr/~ vdc/ROADEF/CHALLENGES/2005/
8. Estellon, B., Gardi, F., Nouioua, K.: A survey of very large-scale neighborhood
search techniques. RAIRO Operations Research 40(4), 355–379 (2006)
9. Estellon, B., Gardi, F., Nouioua, K.: Two local search approaches for solving real-
life car sequencing problems. European Journal of Operational Research 191(3),
928–944 (2008)
10. Benoist, T., Estellon, B., Gardi, F., Jeanjean, A.: High-performance local search
for solving inventory routing problems. In: Stützle, T., Birattari, M., Hoos, H.H.
(eds.) SLS 2009. LNCS, vol. 5752, pp. 105–109. Springer, Heidelberg (2009)
11. Moret, B.: Towards a discipline of experimental algorithmics. In: Goldwasser, M.,
Johnson, D., McGeoch, C. (eds.) Data Structures, Near Neighbor Searches, and
Methodology: 5th and 6th DIMACS Implementation Challenges. DIMACS Mono-
graphs, vol. 59, pp. 197–213. American Mathematical Society, Providence (2002)
12. Moret, B., Bader, D., Warnow, T.: High-performance algorithm engineering for
computational phylogenetics. Journal of Supercomputing 22(1), 99–111 (2002)
13. Helsgaun, K.: An effective implementation of the Lin-Kernighan traveling sales-
man heuristic. Datalogiske Skrifter (Writings on Computer Science) 81, Roskilde
University, Denmark (1998)
14. Helsgaun, K.: An effective implementation of the Lin-Kernighan traveling salesman
heuristic. European Journal of Operational Research 126(1), 106–130 (2000)
High-Performance Local Search for Task Scheduling 15
15. Helsgaun, K.: An effective implementation of k-opt moves for the Lin-Kernighan
tsp heuristic. Datalogiske Skrifter (Writings on Computer Science) 109, Roskilde
University, Denmark (2006)
16. Aarts, E., Lenstra, J. (eds.): Local Search in Combinatorial Optimization. Wiley-
Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons,
Chichester (1997)
17. Hansen, P., Mladenović, N., Pérez, J.M.: Variable neighborhood search: methods
and applications. 4OR 6(4), 319–360 (2008)
18. Løkketangen, A.: The importance of being careful. In: Stützle, T., Birattari, M.,
Hoos, H.H. (eds.) SLS 2007. LNCS, vol. 4638, pp. 1–15. Springer, Heidelberg (2007)
19. Pellegrini, P., Birattari, M.: Implementation effort and performance. In: Stützle, T.,
Birattari, M., Hoos, H.H. (eds.) SLS 2007. LNCS, vol. 4638, pp. 31–45. Springer,
Heidelberg (2007)
20. Minoux, M.: Programmation Mathématique: Théorie et Algorithmes. Éditions Tec
& Doc, Lavoisier, 2nd edn. (2008) (in French)
21. Glover, F., Kochenberger, G. (eds.): Handbook of Metaheuristics. International
Series in Operations Research and Management Science, vol. 57. Kluwer Academic
Publishers, Dordrecht (2002)
22. Katriel, I., Michel, L., Hentenryck, P.V.: Maintaining longest paths incrementally.
Constraints 10(2), 159–183 (2005)
23. Michel, L., Hentenryck, P.V.: A constraint-based architecture for local search. In:
Proceedings of OOPSLA 2002, the 2002 ACM SIGPLAN Conference on Object-
Oriented Programming Systems, Languages and Applications. SIGPLAN Notices,
vol. 37, pp. 83–100. ACM Press, New York (2002)
24. Zhang, L., Malik, S.: Cache performance of SAT solvers: a case study for efficient
implementation of algorithms. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003.
LNCS, vol. 2919, pp. 287–298. Springer, Heidelberg (2004)
25. Rosenblum, D.: Towards a method of programming with assertions. In: Proceedings
of ICSE 1992, the 14th International Conference on Software Engineering, pp. 92–
104. ACM Press, New York (1992)
26. Press, W., Tenkolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C:
the Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge
(1995)
27. Fenlason, J., Stallman, R.: GNU gprof: the GNU profiler (1998),
http://www.gnu.org/software/binutils/
On the Use of Run Time Distributions
to Evaluate and Compare Stochastic
Local Search Algorithms
1 Motivation
Run time distributions or time-to-target plots display on the ordinate axis the
probability that an algorithm will find a solution at least as good as a given
target value within a given running time, shown on the abscissa axis. Time-to-
target plots were first used by Feo et al. [1]. Run time distributions have been
advocated also by Hoos and Stützle [2,3] as a way to characterize the running
times of stochastic algorithms for combinatorial optimization.
It has been observed that in many implementations of local search heuristics
for combinatorial optimization problems, such as simulated annealing, genetic
algorithms, iterated local search, tabu search, and GRASP [4,5,6,7,8,9,10,11,12],
the random variable time to target value fits an exponential (or a shifted expo-
nential) distribution. Hoos and Stützle [13,8] conjecture that this is true for all
local search methods for combinatorial optimization.
Aiex et al. [14] describe a perl program to create time-to-target plots for mea-
sured times that are assumed to fit a shifted exponential distribution, following
[4]. Such plots are very useful in the comparison of different algorithms for solv-
ing a given problem and have been widely used as a tool for algorithm design
and comparison.
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 16–30, 2009.
c Springer-Verlag Berlin Heidelberg 2009
On the Use of Run Time Distributions 17
We assume the existence of two stochastic local search algorithms A1 and A2 for
some combinatorial optimization problem. Furthermore, we assume that their
running times fit exponential (or shifted exponential) distributions. We denote
by X1 (resp. X2 ) the continuous random variable representing the time needed
by algorithm A1 (resp. A2 ) to find a solution as good as a given target value:
0, τ < T1
X1 →
λ1 e−λ1 (τ −T1 ) , τ ≥ T1
and
0, τ < T2
X2 →
λ2 e−λ2 (τ −T2 ) , τ ≥ T2
where T1 , λ1 , T2 , and λ2 are parameters. The cumulative probability distribution
and the probability density function of X1 are depicted in Figure 1.
Since both algorithms stop when they find a solution at least as good as the
target, we may say that algorithm A1 performs better than A2 if the former stops
before the latter. Therefore, we must evaluate the probability that X1 takes a
value smaller than or equal to X2 , i.e. we compute P r(X1 ≤ X2 ). Conditioning
on the value of X2 and applying the total probability theorem, we obtain:
∞
P r(X1 ≤ X2 ) = P r(X1 ≤ X2 |X2 = τ )fX2 (τ )dτ =
−∞
∞ ∞
= P r(X1 ≤ X2 |X2 = τ )λ2 e−λ2 (τ −T2 ) dτ = P r(X1 ≤ τ )λ2 e−λ2 (τ −T2 ) dτ.
T2 T2
f X 1 (W )
O1
O 1 e O 1 ( W T1 )
0 W
T1
F X 1 (W )
1 e O1 (W T1 )
0
W
T1
∞
e−ν(λ1 +λ2 ) ν = ∞
= 1 − e−λ1 (T2 −T1 ) e−ν(λ1 +λ2 ) dν = 1 + e−λ1 (T2 −T1 ) λ2
λ1 + λ2 ν = 0 .
0
Finally,
λ2
P r(X1 ≤ X2 ) = 1 − e−λ1 (T2 −T1 ) . (3)
λ1 + λ2
On the Use of Run Time Distributions 19
1
1
0.8
0.8
cumulative probability
cumulative probability
0.6
0.6
0.4 0.4
0.2 0.2
empirical empirical
exponential exponential
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 20 40 60 80 100 120
time to target solution value (seconds) time to target solution value (seconds)
Fig. 2. Run time distributions on an instance of the 2-path network design problem
with 80 nodes and 800 origin-destination pairs, with target value set at 588
1.0
0.9
0.8
0.7
cumulative probability
0.6
0.5 GRASP
GRASP+biPR
0.4
0.3
0.2
0.1
0.0
0.001 0.01 0.1 1 10 100 1000
time to target solution value (seconds)
1.8
1
1.6
0.8 1.4
measured times (seconds)
1.2
cumulative probability
0.6
1
0.8
0.4
0.6
0.4
0.2
empirical
0.2 estimated
empirical +1 std dev range
exponential -1 std dev range
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 1 2 3 4 5 6
time to target solution value (seconds) exponential quantiles
Fig. 4. Run time distribution and quantile-quantile plot for GRASP with bidirectional
path-relinking on an instance of the 2-path network design problem with 80 nodes and
800 origin-destination pairs, with target set to 588
1 70000
60000
0.8
50000
40000
0.6
30000
0.4
20000
10000
0.2
0 empirical
estimated
empirical +1 std dev range
exponential -1 std dev range
0 -10000
0 10000 20000 30000 40000 50000 60000 70000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time to target solution value (seconds) exponential quantiles
Fig. 5. Run time distribution and quantile-quantile plot for GRASP with bidirectional
path-relinking on Balas and Saltzman problem 22.1, with target set to 8
1 60000
50000
0.8
40000
measured times (seconds)
cumulative probability
0.6
30000
20000
0.4
10000
0.2
0 empirical
estimated
empirical +1 std dev range
exponential -1 std dev range
0 -10000
0 10000 20000 30000 40000 50000 60000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time to target solution value (seconds) exponential quantiles
Fig. 6. Run time distribution and quantile-quantile plot for GRASP with bidirectional
path-relinking on Balas and Saltzman problem 24.1, with target value to 7
since fX1 (τ ) = fX2 (τ ) = 0 for any τ < 0. For an arbitrary small real number ε,
the above expression can be rewritten as
∞ (i+1)ε
P r(X1 ≤ X2 ) = P r(X1 ≤ τ )fX2 (τ )dτ. (5)
i=0 iε
∞ (i+1)ε
∞ (i+1)ε
FX1 (iε) fX2 (τ )dτ ≤ P r(X1 ≤ X2 ) ≤ FX1 ((i + 1)ε) fX2 (τ )dτ.
i=0 iε i=0 iε
Let L(ε) and R(ε) be the value of the left and right hand sides of the above
expression, respectively, with Δ(ε) = R(ε) − L(ε) being the difference between
the upper and lower bounds of P r(X1 ≤ X2 ). Then,
∞
(i+1)ε
Δ(ε) = [FX1 ((i + 1)ε) − FX1 (iε)] fX2 (τ )dτ.
i=0 iε
Let δ = maxτ ≥0 {fX1 (τ )}. Since |FX1 ((i + 1)ε) − FX1 (iε)| ≤ δε for i ≥ 0,
∞
(i+1)ε ∞
Δ(ε) ≤ δε fX2 (τ )dτ = δε fX2 (τ )dτ = δε.
i=0 iε 0
4 Numerical Applications
We apply the tool described in the previous section to compare pairs of stochas-
tic local search algorithms running on the same instance of three different test
problems: server replication for reliable multicast, routing and wavelength as-
signment, and 2-path network design.
1 140
120
0.8
100
measured times (seconds)
cumulative probability
0.6
80
60
0.4
40
0.2
20 empirical
estimated
empirical +1 std dev range
exponential -1 std dev range
0 0
0 10 20 30 40 50 60 70 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time to target solution value (seconds) exponential quantiles
Fig. 7. Run time distribution and quantile-quantile plot for DM-D5 algorithm on the
instance with m = 25 and target value set at 2,818.925
24 C.C. Ribeiro, I. Rosseti, and R. Vallejos
1 500
450
0.8 400
350
0.6 300
250
0.4 200
150
0.2 100
empirical
50 estimated
empirical +1 std dev range
exponential -1 std dev range
0 0
0 50 100 150 200 250 300 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time to target solution value (seconds) exponential quantiles
Fig. 8. Run time distribution and quantile-quantile plot for DM-D5 algorithm on the
instance with m = 50 and target value set at 2,299.07
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
cumulative probability
cumulative probability
0.6 0.6
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
0.1 1 10 100 1000 1 10 100 1000 10000
time to target solution value (seconds) time to target solution value (seconds)
Fig. 9. Superimposed run time distributions of DM-D5 and GRASP: (a) P r(X1 ≤
X2 ) = 0.614775, and (b) P r(X1 ≤ X2 ) = 0.849163
any of the instances. GRASP running times were exponential for both. The
run time distributions of DM-D5 and GRASP are superimposed in Figure 9.
Algorithm DM-D5 outperformed GRASP, since the run-time distribution of the
first is slightly to the left of that of the second for the instance with m = 25,
and much more clearly for m = 50. Consistently, the computations show that
P r(X1 ≤ X2 ) = 0.614775 and P r(X1 ≤ X2 ) = 0.849163 for the instances with
m = 25 and m = 50, respectively.
1 5
4.5
0.8
4
3.5
0.6
0.4
2.5
2
0.2
1.5 empirical
estimated
empirical +1 std dev range
exponential -1 std dev range
0 1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time to target solution value (seconds) exponential quantiles
Fig. 10. Run time distribution and quantile-quantile plot for tabu search on Brazil
instance with target value set at 24
1 7
6.5
0.8
6
measured times (seconds)
cumulative probability
5.5
0.6
0.4
4.5
4
0.2
3.5 empirical
estimated
empirical +1 std dev range
exponential -1 std dev range
0 3
0 1 2 3 4 5 6 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time to target solution value (seconds) exponential quantiles
Fig. 11. Run time distribution and quantile-quantile plot for tabu search on Finland
instance with target value set at 50
not share any common link. The routing and wavelength assignment problem
is that of routing a set of lightpaths and assigning a wavelength to each of
them, minimizing the number of wavelengths needed. Noronha and Ribeiro [25]
proposed a decomposition heuristic for this problem. First, a set of routes is
precomputed for each lightpath. Next, one of them and a wavelength are assigned
to each lightpath by a tabu search heuristic solving a partition coloring problem.
We compare this decomposition strategy with the multistart greedy heuristic
of Manohar et al. [26]. Two networks are used for benchmarking. The first has
27 nodes representing the capital cities in Brazil, with 70 links connecting them.
There are 702 lightpaths to be routed. Instance [27] Finland is formed by 31
nodes and 51 links, with 930 lightpaths to be routed.
Each algorithm was run 200 times with different seeds. The target was set
at 24 for instance Brazil and at 50 for instance Finland. Algorithm A1 is the
26 C.C. Ribeiro, I. Rosseti, and R. Vallejos
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
cumulative probability
cumulative probability
0.6 0.6
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
0.1 1 10 100 0.1 1 10 100
time to target solution value (seconds) time to target solution value (seconds)
(a) Brazil instance with target 24 (b) Finland instance with target 50
Fig. 12. Superimposed run time distributions of multistart and tabu search: (a)
P r(X1 ≤ X2 ) = 0.106766, and (b) P r(X1 ≤ X2 ) = 0.545619
1 2.5
0.8 2
measured times (seconds)
cumulative probability
0.6 1.5
0.4 1
0.2 0.5
empirical
estimated
empirical +1 std dev range
exponential -1 std dev range
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 1 2 3 4 5 6
time to target solution value (seconds) exponential quantiles
Fig. 13. Run time distribution and quantile-quantile plot for GRASP with forward
path-relinking on 90-node instance with target 673
multistart heuristic, while A2 is the tabu search decomposition scheme. The mul-
tistart running times fit exponential distributions for both instances. Figures 10
and 11 display run time distributions and quantile-quantile plots for instances
Brazil and Finland, respectively. The run time distributions of the decomposition
and multistart strategies are superimposed in Figure 12. The direct comparison
of the two approaches shows that decomposition clearly outperformed the mul-
tistart strategy for instance Brazil, since P r(X1 ≤ X2 ) = 0.106766 in this case.
However, the situation changes for instance Finland. Although both algorithms
have similar performances, multistart is slightly better with respect to the mea-
sure proposed in this work, since P r(X1 ≤ X2 ) = 0.545619.
1 1.8
1.6
0.8
1.4
0.6
1
0.8
0.4
0.6
0.4
0.2
empirical
0.2 estimated
empirical +1 std dev range
exponential -1 std dev range
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6
time to target solution value (seconds) exponential quantiles
Fig. 14. Run time distribution and quantile-quantile plot for GRASP with bidirectional
path-relinking on 90-node instance with target 673
1 1.8
1.6
0.8
1.4
measured times (seconds)
1.2
cumulative probability
0.6
1
0.8
0.4
0.6
0.4
0.2
empirical
0.2 estimated
empirical +1 std dev range
exponential -1 std dev range
0 0
0 0.2 0.4 0.6 0.8 1 1.2 0 1 2 3 4 5 6
time to target solution value (seconds) exponential quantiles
Fig. 15. Run time distribution and quantile-quantile plot for GRASP with backward
path-relinking on 90-node instance with target 673
1.0
0.9
0.8
0.7
cumulative probability
0.6
0.5 GRASP
GRASP+forPR
GRASP+biPR
0.4 GRASP+backPR
0.3
0.2
0.1
0.0
0.001 0.01 0.1 1 10 100 1000
time to target solution value (seconds)
Fig. 16. Superimposed run time distributions of pure GRASP and three versions of
GRASP with path-relinking
900 origin-destination pairs, with the target value set at 673. Run time distribu-
tions and quantile-quantile plots for the different versions of GRASP with path-
relinking are illustrated in Figures 13 to 15. The run time distributions of the
four algorithms are superimposed in Figure 16. Algorithm A2 (as well as A3 and
A4 ) performs much better than A1 , since P r(X2 ≤ X1 ) = 0.984470. Algorithm
A3 outperforms A2 , as illustrated by the fact that P r(X3 ≤ X2 ) = 0.634002. Fi-
nally, we observe that algorithms A3 and A4 behave very similarly, although A4
performs slightly better for this instance with respect to the measure proposed
in this work, since P r(X4 ≤ X3 ) = 0.536016.
5 Concluding Remarks
Run time distributions are very useful tools to characterize the running times of
stochastic algorithms for combinatorial optimization. In this work, we extended
previous tools for plotting and evaluating run time distributions.
Under the assumption that running times of two stochastic local search algo-
rithms follow exponential distributions, we derived a closed form index to com-
pute the probability that one of them finds a target solution value in a smaller
computation time than the other. A numerical iterative procedure was described
for the computation of such index in the case of general run time distributions.
This new tool and the resulting probability index revealed themselves as very
promising and provide a new, additional measure for comparing the performance
of stochastic local search algorithms or different versions of the same algorithm.
They can also be used for setting the best parameters of a given algorithm.
Numerical applications to different algorithm paradigms, problem types, and
test instances illustrated the applicability of the tool.
In another context, they can also be used in the evaluation of parallel imple-
mentations of local search algorithms, providing a numerical indicator to evaluate
the trade-offs between computation times and the number of processors.
On the Use of Run Time Distributions 29
References
1. Feo, T., Resende, M., Smith, S.: A greedy randomized adaptive search procedure
for maximum independent set. Operations Research 42, 860–878 (1994)
2. Hoos, H., Stützle, T.: On the empirical evaluation of Las Vegas algorithms - Posi-
tion paper. Technical report, Computer Science Department, University of British
Columbia (1998)
3. Hoos, H., Stützle, T.: Evaluation of Las Vegas algorithms - Pitfalls and remedies.
In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence,
pp. 238–245 (1998)
4. Aiex, R., Resende, M., Ribeiro, C.: Probability distribution of solution time in
GRASP: An experimental investigation. Journal of Heuristics 8, 343–373 (2002)
5. Dodd, N.: Slow annealing versus multiple fast annealing runs: An empirical inves-
tigation. Parallel Computing 16, 269–272 (1990)
6. Eikelder, H.T., Verhoeven, M., Vossen, T., Aarts, E.: A probabilistic analysis of lo-
cal search. In: Osman, I., Kelly, J. (eds.) Metaheuristics: Theory and Applications,
pp. 605–618. Kluwer, Dordrecht (1996)
7. Hoos, H.: On the run-time behaviour of stochastic local search algorithms for SAT.
In: Proc. AAAI 1999, pp. 661–666. MIT Press, Cambridge (1999)
8. Hoos, H., Stützle, T.: Towards a characterisation of the behaviour of stochastic
local search algorithms for SAT. Artificial Intelligence 112, 213–232 (1999)
9. Osborne, L., Gillett, B.: A comparison of two simulated annealing algorithms ap-
plied to the directed Steiner problem on networks. ORSA Journal on Computing 3,
213–225 (1991)
10. Selman, B., Kautz, H., Cohen, B.: Noise strategies for improving local search. In:
Proceedings of the AAAI 1994, pp. 337–343. MIT Press, Cambridge (1994)
11. Taillard, E.: Robust taboo search for the quadratic assignment problem. Parallel
Computing 17, 443–455 (1991)
12. Verhoeven, M., Aarts, E.: Parallel local search. Journal of Heuristics 1, 43–66 (1995)
13. Hoos, H., Stützle, T.: Some surprising regularities in the behaviour of stochastic
local search. In: Maher, M.J., Puget, J.-F. (eds.) CP 1998. LNCS, vol. 1520, p. 470.
Springer, Heidelberg (1998)
14. Aiex, R., Resende, M., Ribeiro, C.: TTTPLOTS: A perl program to create time-
to-target plots. Optimization Letters 1, 355–366 (2007)
15. Ribeiro, C., Rosseti, I.: Efficient parallel cooperative implementations of GRASP
heuristics. Parallel Computing 33, 21–35 (2007)
16. Li, Y., Pardalos, P., Resende, M.: A greedy randomized adaptive search proce-
dure for the quadratic assignment problem. In: Pardalos, P., Wolkowicz, H. (eds.)
Quadratic Assignment and Related Problems. DIMACS Series on Discrete Math-
ematics and Theoretical Computer Science, vol. 16, pp. 237–261. American Math-
ematical Society, Providence (1994)
17. Resende, M., Ribeiro, C.: A GRASP for graph planarization. Networks 29, 173–189
(1997)
18. Resende, M., Pitsoulis, L., Pardalos, P.: Fortran subroutines for computing approx-
imate solutions of MAX-SAT problems using GRASP. Discrete Applied Mathemat-
ics 100, 95–113 (2000)
19. Resende, M.: Computing approximate solutions of the maximum covering problem
using GRASP. Journal of Heuristics 4, 161–171 (1998)
20. Canuto, S., Resende, M., Ribeiro, C.: Local search with perturbations for the prize-
collecting Steiner tree problem in graphs. Networks 38, 50–58 (2001)
30 C.C. Ribeiro, I. Rosseti, and R. Vallejos
21. Resende, M., Ribeiro, C.: GRASP with path-relinking: Recent advances and ap-
plications. In: Ibaraki, T., Nonobe, K., Yagiura, M. (eds.) Metaheuristics: Progress
as Real Problem Solvers, pp. 29–63. Springer, Heidelberg (2005)
22. Aiex, R., Pardalos, P., Resende, M., Toraldo, G.: GRASP with path relinking for
three-index assignment. INFORMS Journal on Computing 17, 224–247 (2005)
23. Santos, L., Martins, S., Plastino, A.: Applications of the DM-GRASP heuristic: A
survey. International Transactions in Operational Research 15, 387–416 (2008)
24. Fonseca, E., Fuchsuber, R., Santos, L., Plastino, A., Martins, S.: Exploring the hy-
brid metaheuristic DM-GRASP for efficient server replication for reliable multicast.
In: International Conference on Metaheuristics and Nature Inspired Computing,
Hammamet (2008)
25. Noronha, T., Ribeiro, C.: Routing and wavelength assignment by partition coloring.
European Journal of Operational Research 171, 797–810 (2006)
26. Manohar, P., Manjunath, D., Shevgaonkar, R.: Routing and wavelength assignment
in optical networks from edge disjoint path algorithms. IEEE Communications
Letters 5, 211–213 (2002)
27. Hyytiã, E., Virtamo, J.: Wavelength assignment and routing in WDM networks.
In: Nordic Teletraffic Seminar 14, pp. 31–40 (1998)
28. Dahl, G., Johannessen, B.: The 2-path network problem. Networks 43, 190–199
(2004)
29. Resende, M., Ribeiro, C.: Greedy randomized adaptive search procedures. In:
Glover, F., Kochenberger, G. (eds.) Handbook of Metaheuristics, pp. 219–249.
Kluwer, Dordrecht (2003)
Estimating Bounds on Expected Plateau Size in
MAXSAT Problems
1 Introduction
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 31–45, 2009.
c Springer-Verlag Berlin Heidelberg 2009
32 A.M. Sutton, A.E. Howe, and L.D. Whitley
In a more general setting, Hoos and Stützle [7] extended the plateau concept
to general combinatorial search spaces and defined metrics for plateau character-
istics (e.g., width). They developed plateau connection graphs: directed acyclic
graphs that capture connectivity between plateau regions and associated tran-
sition probabilities.
Smyth [8] examined plateau characteristics for uniform random 3-SAT in-
stances. He found that solutions on lower level sets tended to cluster together
in one common large plateau where solutions on better level sets belonged to
many smaller plateaus. He also studied the internal structure of plateau regions,
finding that the graphs had very low branching factors and diameter greater
than or equal to the number of variables.
Plateaus emerge in the presence of neutrality: the existence of neighboring
states with equal evaluation. Reidys and Stadler [9] studied the nature of neu-
trality and developed an additive random model on which neutrality can be
expressed as a random variable. They derived a probability mass function for
the length of neutral walks: monotonic random paths of equal valued states
which we will employ in this paper. Reidys and Stadler extended work originally
done on RNA landscapes [10] where neutral networks, induced subgraphs of a
landscape, are studied using the theory of random graphs.
2 Size Prediction
Hx = {y ∈ P : ∃ NH (x, y)}
In practice, the magnitude of the difference between the left hand and right hand
side of the above relation will ultimately depend on our choice of x ∈ P .
Under a simplifying assumption which we will make in the following section,
we will find that the probability that a solution belongs to Hx depends only on
its distance from x. Denote as
000 x
111 y
S1
We thus ignore the internal crossing paths and bound the percolation probability.
ĉ(1, p) = p
ĉ(n, p) = p · 2 · ĉ(n − 1, p) − ĉ(n − 1, p)2 (4)
Proof. This follows directly from the definition of c(n, p). Note that a vertex in
a Hamming path from y to x must lie in the subcube of order d(x, y) between
x and y. If a vertex in the subcube is on the same level set as x, it is considered
active. Since each vertex is active with probability p, a neutral Hamming path
is simply a percolating path in the subcube of order d(x, y).
Thus, we have
n
n
E[|Hx |] ≥ ĉ(r, p) (5)
r=0
r
Estimating Bounds on Expected Plateau Size in MAXSAT Problems 37
If we know p for a particular level set, then we can bound the expected plateau
size. Note that our premise that the concentration parameter p is independent
across a given level set is a rather heavy simplifying assumption. In fact, we
would expect in practice that solutions have distinct correlations among them.
However, this assumption makes the analysis easier.
Finally, the elimination of the g(n) term in the approximation expression
will cause the lower bound to diverge as n → ∞ since the approximation loses
accuracy for each value of n. Thus, we expect the error to have superlinear
growth with n since g(n) is proportional to subcube size.
T = p (1 + (n − 1)T )
p
Solving for T we have T = 1−(n−1)p and the expected cluster size at arbitrary b
is 1 + nT :
1+p
(6)
1 − (n − 1)p
Since the Bethe lattice is an infinite system, its value as an approximation of the
finite hypercube becomes poorer as p gets larger. In fact, there is a singularity
1
in Equation (6) when p = n−1 . This corresponds to the critical point at which
an infinite cluster appears in the lattice and expected cluster size is no longer
well-defined. Thus the Bethe lattice approximation is only valid in the subcritical
38 A.M. Sutton, A.E. Howe, and L.D. Whitley
1
region: values of p strictly less than n−1 . A useful introduction to percolation
theory can be found in [11].
We use this result to compute the expected neutral walk length as follows.
n
Ep [L] = r Pr{L = r} (7)
r=1
To estimate p for a level set L, we compute the empirical mean neutral walk
length Lµ by performing a number of neutral walks from sampled points on L. If
we assume Lµ accurately estimates Ep [L], then an estimate of the concentration
is simply the root of the monotonic function
Ep [L] − Lµ
Estimating Bounds on Expected Plateau Size in MAXSAT Problems 39
3 Computational Experiments
We have proved that, given our assumptions, our models provide upper and
lower bounds. However, we do not know how well the models perform on actual
problems where the concentration is not known. We evaluate the accuracy of our
prediction bounds by exhaustively enumerating plateaus on a number of different
search landscapes and comparing the actual value with the predictions given by
Equations (5) and (6). To assess the accuracy and trends of the prediction we first
use synthetic landscapes on which concentration is known, and then both random
and structured MAXSAT landscapes on which we predict the concentration
using the neutral walk method. Because we need to fully enumerate the plateaus
for accuracy, we are limited to small problems in this analysis.
To test the size prediction bounds given known concentrations, we evaluate pre-
dictions for random hypercube landscapes on which we explicitly control concen-
tration. In particular, given a hypercube landscape X, we assign each solution
an objective function value of 1 with probability p and a value of 0 with prob-
ability 1 − p. We sample solutions at random on the landscape. If the solution
is of value 1, we expand its plateau using breadth-first search. We also compute
its Hamming path set. We compare the actual cardinalities with the prediction
equation and the Bethe lattice approximation for concentrations that lie in the
subcritical region.
We generate 100 random landscapes controlling for concentration from
p = 0.01 to p = 0.4. On each landscape we calculate the Hamming path set
lower bound and the Bethe upper bound using the known value of p. We sample
100 random points from the level set at value 1 and perform breadth-first search
to exhaustively enumerate the plateaus. We also perform a depth-first search
from each plateau vertex back to the root to enumerate the Hamming path set.
We compare the average plateau and Hamming path set sizes with the prediction
bounds.
We report our prediction data in the form of correlation plots. There are three
types of data points. “Plateau/HP” is actual plateau size vs. Hamming path
prediction. “HP/HP” denotes actual Hamming path set size vs. Hamming path
prediction. “Plateau/Bethe” denotes actual plateau size vs. Bethe prediction. A
perfect prediction would lie on the diagonal line included in the plots. Data for
a 20 dimensional random landscape are plotted in Figure 2. The low number of
“plateau/Bethe” points are because the higher concentrations exceed the critical
value for the Bethe lattice.
To determine the accuracy of our concentration estimate, we run the above
experiments again and estimate p using the neutral walk method. Instead of using
40 A.M. Sutton, A.E. Howe, and L.D. Whitley
plateau/HP
plateau/Bethe
HP/HP
10000
actual
100
1
0.03
0.3
actual
MSE
0.02
0.2
0.01
0.1
0.00
0.0
Fig. 3. Actual p vs. estimated p (left). Mean squared error between actual and esti-
mated p vs samples/level set (right).
the known p value for the prediction bounds, we take 10 neutral walks from each
of the 100 sampled points and predict the concentration with the resulting walk
lengths. We compare the actual p values used to generate the landscape with
the values estimated by the neutral walk method. These data are plotted on the
left in Figure 3. We find a tight correlation between the predicted and actual
concentrations. To determine how much effort needs to be expended to estimate
p, we plot the mean squared error between known concentration and estimated
concentration with respect to samples per level set on the right in Figure 3. Both
plots were generated using data from the 20 dimensional random hypercube.
Estimating Bounds on Expected Plateau Size in MAXSAT Problems 41
The high accuracy of the p estimation with low sample size is encouraging be-
cause the time to predict the size of the plateau for a single solution (including
neutral walk sampling) is on the order of 200-5000 microseconds whereas mea-
suring the actual plateau can take several minutes or longer on the relatively
small problems we investigated.
To test how well the bounds transfer to actual problems, we perform experiments
on random and structured MAXSAT problems. On MAXSAT the objective func-
tion is the number of formula clauses satisfied. On uniform random problems,
most solutions belong to a small number of objective function values. This typ-
ically results in solutions of average value belonging to vast plateaus. Hampson
and Kibler [5] found that, due to their relatively high exit density, plateaus of
average value are easy for local search to escape, and thus local search is most
affected by plateaus of higher value. Therefore we follow the technique used by
Frank et al. [4] and Smyth [8] employing a stochastic local search algorithm
(WalkSat [13]) to sample the highest value plateaus in the search space.
Plateau measurement time depends on the number of vertices on the plateau.
Thus large plateaus quickly become intractable to enumerate as they grow with
depth and problem size. Some level sets can have a small number of extremely
large plateaus which cannot be enumerated in a reasonable amount of time.
Rather than omitting these data points (which would bias the results to make
a lower bound appear tighter than it actually is) we only report the top three
level sets for two benchmark sets.
0.15
concentration
0.10
0.05
0.00
86 87 88 89 90 91
level
1e+06
plateau/Bethe plateau/Bethe
1e+04
HP/HP HP/HP
1e+03
1e+04
actual
actual
1e+02
1e+02
1e+01
1e+00
1e+00
1 2 5 10 20 50 100 200 1 2 5 10 20 50 100 200 500
predicted predicted
1e+05
plateau/Bethe plateau/Bethe
1e+06
HP/HP HP/HP
1e+04
1e+04
actual
actual
1e+03
1e+02
1e+02
1e+00
1e+01
Fig. 5. Predictions for MAXSAT problems. Results on random uniform sets: 20 vari-
ables and 91 clauses uf20-91, and 50 variables and 218 clauses uf50-218 appear on
the top; results on structured problem instances are plotted on the bottom.
and Smyth [8]. We also see a marked increase in variance as level increases which
suggests plateau size becomes less uniform in better regions of the search space.
The random uniform problem instances show similar trends in accuracy. This
could be an artifact of the inherent statistical regularity of random problems.
To address this, we tested our predictions on structured problem instances. We
performed the above experiments on the top three levels of a set of six Ramsey
number problems from the MAXSAT 2007 problem competition. This problem
set is comprised of several different instances with differing numbers of variables
and clauses. The results are shown in Figure 5 on the bottom left. We also
performed the above experiments on the top 10 levels of a 27 variable spin glass
problem. This problem is unsatisfiable and the best solution by WalkSat was
found on level set 145 (out of 162). These results are shown on the bottom right
in Figure 5. The sparsity of the bottom plots is due to the smaller cardinality
of the structured problem sets. The Ramsey numbers problems tended to have
the largest concentration values: their nonzero concentration ranged from 0.09
to 0.68. The other instances had nonzero concentration values ranging from 0.01
to 0.2 or less. The concentrations were also higher relative to the critical value
1
of n−1 on the structured instances, hence the paucity of data points from the
Bethe model on the corresponding plots.
We see similar trends in accuracy with size across the random and structured
problems. Furthermore, the trend is again similar to what we found on the
hypercube graph model reported in Figure 2.
5 Conclusion
We have introduced methods for estimating bounds on plateau size for MAXSAT
problems. These bounds may support portfolio approaches to MAXSAT by in-
dicating problem difficulty for local search or principled adaptation for handling
large plateaus.
We found that the accuracy in our estimates showed surprisingly similar
trends across both random and structured problem instances. However, one in-
herent weakness with the approach is the large divergence in accuracy with
plateau size. In the case of Bethe approximation, this is an artifact of the insta-
bility as the critical concentration is approached, thus the bound is not useful
for larger values of p. For the Hamming path set, the bounds diverge as the
cumulative effect from ignoring the internal path term increases.
We are continuing to refine the bounds on hypercube percolation, which would
address divergence in accuracy. Furthermore, we would like to assess the influence
of the phase transition on concentration.
Estimating Bounds on Expected Plateau Size in MAXSAT Problems 45
The second important plateau characteristic is exit density, which we have not
addressed in this paper. Future work also includes estimating plateau exit density
and relating exit density and plateau size to problem difficulty and stochastic
local search behavior.
References
1. Gent, I.P., Walsh, T.: An empirical analysis of search in GSAT. Journal of Artificial
Intelligence Research 1, 47–59 (1993)
2. Selman, B., Levesque, H., Mitchell, D.: A new method for solving hard satisfiability
problems. In: Proceedings of AAAI 1992, San Jose, CA (1992)
3. Mastrolilli, M., Gambardella, L.M.: How good are tabu search and plateau moves
in the worst case? European Journal of Operations Research 166, 63–76 (2005)
4. Frank, J., Cheeseman, P., Stutz, J.: When gravity fails: Local search topology.
Journal of Artificial Intelligence Research 7, 249–281 (1997)
5. Hampson, S., Kibler, D.: Plateaus and plateau search in boolean satisfiability prob-
lems: When to give up searching and start again. DIMACS Series in Discrete Math
and Theoretical Computer Science 26, 437–453 (1993)
6. Yokoo, M.: Why adding more constraints makes a problem easier for hill-climbing
algorithms: Analyzing landscapes of CSPs. In: Smolka, G. (ed.) CP 1997. LNCS,
vol. 1330, pp. 356–370. Springer, Heidelberg (1997)
7. Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations and Applications.
Morgan Kaufmann, San Francisco (2004)
8. Smyth, K.R.G.: Understanding stochastic local search algorithms: An empirical
analysis of the relationship between search space structure and algorithm be-
haviour. Master’s thesis, University of British Columbia (2004)
9. Reidys, C.M., Stadler, P.F.: Neutrality in fitness landscapes. Applied Mathematics
and Computation 117, 321–350 (2001)
10. Reidys, C., Stadler, P., Schuster, P.: Generic properties of combinatory maps and
neutral networks of RNA secondary structures. Bull. Math. Biol. 59, 339–397 (1997)
11. Stauffer, D., Aharony, A.: Introduction to Percolation Theory. Routledge, New
York (1991)
12. Schuster, P., Fontana, W., Stadler, P.F., Hofacker, I.L.: From sequences to shapes
and back: a case study in RNA secondary structures. In: Proceedings of the Royal
Society London B, vol. 255, pp. 279–284 (1994)
13. Selman, B., Kautz, H., Cohen, B.: Local search strategies for satisfiability testing.
In: Johnson, D.S., Trick, M.A. (eds.) DIMACS Series in Discrete Mathematics and
Theoretical Computer Science, vol. 26. AMS, Providence (1996)
14. Gent, I., Walsh, T.: Unsatisfied variables in local search. In: Hallam, J. (ed.) Hybrid
Problems, Hybrid Solutions, pp. 73–85. IOS Press, Amsterdam (1995)
15. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE
Transactions on Evolutionary Computation 1(1), 67–82 (1997)
16. Leyton-Brown, K., Nudelman, E., Andrew, G., McFadden, J., Shoham, Y.: A port-
folio approach to algorithm selection. In: Proceedings of the International Joint
Conference on Artificial Intelligence, IJCAI 2003 (2003)
17. Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: SATzilla: Portfolio-based algo-
rithm selection for SAT. Journal of Artificial Intelligence Research 32, 565–606
(2008)
A Theoretical Analysis of the k-Satisfiability
Search Space
1 Introduction
Local search methods for k-satisfiability (k-SAT) problems have received consid-
erable attention in the AI search community. Though these methods are incom-
plete, they are usually able to quickly solve difficult problems that lie beyond
the grasp of conventional complete solvers [1] and have been found to exhibit
superior scaling behavior on soluble problems at the phase transition [2].
The behavior of local search algorithms closely depends on the underlying
structure of the search space. A number of researchers have conducted empir-
ical investigations on certain structural features of the k-SAT problem. Hoos
and Stützle [3] introduced several metrics for measuring structure and presented
an empirical examination of the characteristics of plateaus and their influence
on the performance of local search. Clark et al. [4] studied the the relationship
between problem hardness and the expected number of solutions on random
problems. Frank et al. [5] analyzed the topology of the search space and experi-
mentally probed the nature of local optima and plateaus. Yokoo [6] investigated
the dependency of search cost on search space characteristics by studying how
cost for local algorithms is related to the size of certain plateaus.
This research was sponsored by the Air Force Office of Scientific Research, Air
Force Materiel Command, USAF, under grant number FA9550-08-1-0422. The U.S.
Government is authorized to reproduce and distribute reprints for Governmental
purposes notwithstanding any copyright notation thereon.
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 46–60, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A Theoretical Analysis of the k-Satisfiability Search Space 47
In this paper, we take an analytical view of the k-SAT search space by formal-
izing it as a landscape [7] which captures the relationship between the objective
function associated with the problem and a neighborhood operator. We use the
landscape formalism to analyze the search space of the k-SAT problem. We show
that the search landscape can be decomposed into k elementary components. We
prove that this decomposition provides an equation that gives the expectation of
a random variable that models the objective function value of states in a given
neighborhood. This quantity is equal to the average objective function value of
the neighbors of a given state.
Furthermore, we use the decomposition to prove bounds for two prominent
search space features: local maxima and plateaus. We show local maxima do not
exist below a certain objective function value. Plateaus are regions of the search
space consisting of states that are interconnected by a neighborhood operator
and share an objective function value. Hoos and Stützle [3] define the width of
a plateau P : the minimal length path between any state in P and one not in
P . For many SAT instances, empirical results suggest that plateaus of width
greater than one do not exist, or are at least very rare [3]. We prove there are
regions of the search space that cannot contain plateaus of width greater than 1
and show empirically that these regions encompass the majority of the range of
the objective function value. To our knowledge, there are no analytical results
on the existence (or non-existence) of plateaus of particular width. Our results
apply to local search on k-SAT and MAX-k-SAT where the count of unsatisfied
clauses is the state evaluation function.
Tf = λf + γ (2)
where both λ and γ are constants [8,7]. In other words, the objective function is
an eigenfunction of the Markov transition matrix (up to an additive constant)
corresponding to eigenvalue λ.
Several well-studied combinatorial problems along with natural neighborhood
operators have been shown to satisfy the above equation (e.g., traveling sales-
man, graph coloring, not-all-equal satisfiability). Elementary landscapes possess
a number of interesting properties. For example, Grover [8] has shown that no
arbitrarily poor local optima can exist on an elementary landscape and that a
solution with evaluation superior to the mean objective function value can be
computed in polynomial time.
Landscapes that obey Equation (2) are called elementary because they behave
as building blocks of more general combinatorial search landscapes. Provided
that the neighborhood operator satisfies symmetry and regularity conditions,
any arbitrary landscape can be represented as a linear combination of elementary
landscapes [7]. We impose in this paper the following constraints.
1. y ∈ N (x) ⇐⇒ x ∈ N (y)
2. |N (x)| = |N (y)| = d; ∀x, y ∈ X
Most “natural” operators typically satisfy these constraints. The first constraint
states all neighborhood relationships are symmetric, and the second asserts that
all states have exactly d neighbors. Under these conditions T is a real symmetric
|X| × |X| matrix and thus its |X| eigenvectors {φi } with corresponding real
eigenvalues λi form an orthonormal basis.
Thus we can represent an arbitrary function f in the eigenbasis {φi } as a
linear combination.
|X|−1
f= ai φi (3)
i=0
E[Y ] = f (z)
d
z∈N (x)
E[Y ] = f (z)
d
z∈N (x)
=T ai φi (x)
i=0
c
= λi ai φi (x) (4)
i=0
2 Decomposition of k-SAT
We now show that the k-SAT problem (and its optimization variant MAX-k-
SAT) is decomposable into k elementary components. An instance of the k-SAT
problem consists of a set of n Boolean variables {v1 , . . . , vn } and a set of m
clauses {c1 , . . . , cm }. Each clause is composed of exactly k literals in disjunction.
The objective is to find a variable assignment that maximizes the number of
satisfied clauses.
In this case, a state is a complete assignment to the n variables and can be
characterized as a sequence of n bits x = (x[1], x[2], . . . x[n]) where
1 if and only if vb is true
x[b] =
0 if and only if vb is false
The state space X is isomorphic to the set of all sequences x ∈ {0, 1}n.
The objective function f : X → {0, . . . , m} simply counts the number of
clauses satisfied under the assignment given by x. The most natural neighbor-
hood is the Hamming neighborhood N where N (x) is the set of n states y that
differ from x in exactly one bit.
Since f can be taken as a function over bit strings of length n, a natural
decomposition is given by the Walsh transform. In the general case, an arbitrary
pseudo-Boolean function f : {0, 1}n → R can be represented as a linear combina-
tion of 2n Walsh functions which we will define shortly. Rana et al. [12] showed
that the k-SAT objective function can be tractably decomposed into a polyno-
mial number of such functions. We will use this result to obtain a decomposition
of the k-SAT objective function into elementary components.
Given
two bit strings x and y of length n, we denote the inner product x, y
as nb=1 x[b]y[b]. We define the ith Walsh function i ∈ {0, . . . , 2n − 1} as
ψi (x) = (−1)i,x
Here, the i that appears in the inner product of the exponent is taken to be the
bit string representation of the index i, that is, the binary sequence of length n
that corresponds to the integer i.
The objective function f can now be written as
where each Walsh coefficient wi is the sum of contributions from each clause.
m
wi = wi,cj
j=1
− 2k ψi (u(cj )) otherwise
The order of a Walsh coefficient wi is the number of ones in the bitstring rep-
resentation of i. This can be denoted following our notation as i, i . Note that
the order of any nonzero Walsh coefficient is bounded by k: the number of vari-
ables that appear together in a clause. Rana et al. showed it is enough to specify
f (x) by computing the O(2k m) non-zero Walsh coefficients and computing the
superposition in Equation (5). Since k is typically taken to be O(1), all nonzero
Walsh coefficients can be found in polynomial time.
Lemma 1. The Walsh function ψi of orderi, i = p is an eigenvector of the
Markov transition matrix T with eigenvalue 1 − 2p
n
Intuitively, ϕ(p) is an element of the linear space spanned by the Walsh functions
of order p. Now we can write the objective function as a sum over Walsh spans
of each order p (recall p is bounded by k).
Tϕ(p) = T ⎣ wi ψi ⎦
i:i,i=p
2p
= wi 1 − ψi by Lemma 1
n
i:i,i=p
⎡ ⎤
2p ⎣
= 1− wi ψi ⎦
n
i:i,i=p
2p
= 1− ϕ(p)
n
thus ϕ(p) is an eigenfunction of T corresponding to eigenvalue 1 − 2p
n .
We can use the decomposition from the previous section to compute the
expectation of Y .
k
2p
E[Y ] = 1− ϕ(p) (x)
p=0
n
This follows directly from the proposition along with Equations (4) and (7).
The following two lemmas will be useful in the next section. First, we show
that the Walsh span of order zero is always a constant that is equal to the mean
objective function value over X.
A Theoretical Analysis of the k-Satisfiability Search Space 53
f¯ = f (x)
|X|
x∈X
ϕ(0) (x) = f¯
Proof. Let x ∈ X. There is only one Walsh function of order zero: ψ0 (x) = 1.
We have ϕ(0) (x) = w0 ψ0 (x) = w0 . Note that for p = 0 we have
1
(p)
ϕ (x) = 0 (8)
|X|
x∈X
w0 = w0
|X|
x∈X
1
(0)
= ϕ (x)
|X|
x∈X
k
1
(0) 1
(p)
= ϕ + ϕ (x) by Eq. (8)
|X| |X| p=1
x∈X x∈X
k
1
(p)
= ϕ (x)
|X|
x∈X p=0
1
In the next section, we will need to bound the value of ϕ(p) over all states x ∈ X.
We use the absolute values of the Walsh coefficients wi to do so.
Lemma 3. For all x ∈ X,
since ψi (x) = ±1 and wi = ±|wi |. Clearly, the smallest that each term could be
is −|wi | and the largest is |wi |.
3
pϕ(p) (x) = 2f (x) − 2f¯ − ϕ(1) (x) + ϕ(3) (x)
p=0
3
pϕ(p) (x) = ϕ(1) (x) + 2ϕ(2) (x) + 3ϕ(3) (x)
p=0
By Lemma 2,
f (x) − f¯ + f (x) − f¯ − ϕ(1) (x) + ϕ(3) (x)
3
2
(p)
3
f (x) < ϕ(p) (x) − pϕ (x)
p=0
n p=0
The first term on the right hand side is simply the decomposition of f (x) given
by Equation (7). Thus we can make the following substitution.
2
(p)
3
f (x) < f (x) − pϕ (x)
n p=0
By Lemma 4,
2
f (x) < f (x) − 2f (x) − 2f¯ − ϕ(1) (x) + ϕ(3) (x)
n
Simplifying, we have
1 (1)
f (x) < f¯ + ϕ (x) − ϕ(3) (x) (9)
2
Inequality (9) describes a threshold that depends on ϕ(1) (x) and ϕ(3) (x) such
that if f (x) is less than this threshold, x cannot be locally maximum. We now
give a threshold that holds over the entire search space.
By Lemma 3, we have for any x ∈ X,
⎛ ⎞
and letting ⎛ ⎞
1⎝
In a similar manner, we can bound the function value at which plateaus of width
greater than one can appear. A plateau is a maximal set P of states such that
for all x, y ∈ P there is a path (x = x1 , x2 , . . . , xt = y) of length t ≥ 1 with
f (x) = f (xi ) for i = 1, 2, . . . , t and, if t > 1, xi+1 ∈ N (xi ). The level of a plateau
P is the evaluation f (xp ), ∀xp ∈ P .
We say the neighborhood of a state x is flat if, for all y ∈ N (x), f (y) = f (x),
that is, x has the same value as all the states in its neighborhood. We show that
flat neighborhoods cannot exist at certain levels of the objective function.
Proof. We prove the equivalent contrapositive. Let x be a state with a flat neigh-
borhood. We have
f (x) = E[Y ]
3
2p
= 1− ϕ(p) (x)
p=0
n
3
2
(p)
3
= ϕ(p)
(x) − pϕ (x)
p=0
n p=0
2
(p)
3
= f (x) − pϕ (x) by Eq. (7)
n p=0
3
pϕ(p) (x) = 0
p=0
f¯ − τ ≤ f (x) ≤ f¯ + τ
f¯ + τ
f (x)
f¯
f¯ − τ
no plateaus of width > 1
no local maxima
Fig. 2. Number of improving moves vs E[Y ] at f (x) = 390 for 100 points each on 1000
instances of SATLIB benchmark set uf100-430. Line indicates linear best fit.
A Theoretical Analysis of the k-Satisfiability Search Space 59
5 Conclusion
Studying the structural characteristics of combinatorial search spaces is impor-
tant to understanding the behavior of stochastic search algorithms. These char-
acteristics, along with how algorithms respond to them, define how poorly or
how well the algorithm performs, in some cases determining whether a problem
or problem class is easily solved or not. We have presented analytical tools for
analyzing the search space of k-SAT and MAX-k-SAT.
We have shown that the landscape formalism provides insight into certain
structural relationships. We have shown that the decomposition of the objective
60 A.M. Sutton, A.E. Howe, and L.D. Whitley
References
1. Gent, I.P., Walsh, T.: Towards an understanding of hill-climbing procedures for
sat. In: Proc. of AAAI 1993, pp. 28–33. MIT Press, Cambridge (1993)
2. Parkes, A.J., Walser, J.P.: Tuning local search for satisfiability testing. In: Proc.
of AAAI 1996, pp. 356–362. MIT Press, Cambridge (1996)
3. Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations and Applications.
Morgan Kaufmann, San Francisco (2004)
4. Clark, D.A., Frank, J., Gent, I.P., MacIntyre, E., Tomov, N., Walsh, T.: Local
search and the number of solutions. In: Freuder, E.C. (ed.) CP 1996. LNCS,
vol. 1118, pp. 119–133. Springer, Heidelberg (1996)
5. Frank, J., Cheeseman, P., Stutz, J.: When gravity fails: Local search topology. J.
of Artificial Intelligence Research 7, 249–281 (1997)
6. Yokoo, M.: Why adding more constraints makes a problem easier for hill-climbing
algorithms: Analyzing landscapes of CSPs. In: Smolka, G. (ed.) CP 1997. LNCS,
vol. 1330, pp. 356–370. Springer, Heidelberg (1997)
7. Stadler, P.F.: Toward a theory of landscapes. In: Lopéz-Peña, R., Capovilla, R.,
Garcı́a-Pelayo, R., Waelbroeck, H., Zertruche, F. (eds.) Complex Systems and Bi-
nary Networks, pp. 77–163. Springer, Heidelberg (1995)
8. Grover, L.K.: Local search and the local structure of NP-complete problems.
Operations Research Letters 12, 235–243 (1992)
9. Stadler, P.F.: Landscapes and their correlation functions. J. of Mathematical
Chemistry 20, 1–45 (1996)
10. Rockmore, D., Kostelec, P., Hordijk, W., Stadler, P.F.: Fast Fourier transform
for fitness landscapes. Applied and Computational Harmonic Analysis 12, 57–76
(2002)
11. Whitley, L.D., Sutton, A.M., Howe, A.E.: Understanding elementary landscapes.
In: Proc. of GECCO, Atlanta, GA (July 2008)
12. Rana, S., Heckendorn, R.B., Whitley, L.D.: A tractable Walsh analysis of SAT and
its implications for genetic algorithms. In: Proc. of AAAI 1998, pp. 392–397 (1998)
Loopy Substructural Local Search for the
Bayesian Optimization Algorithm
Abstract. This paper presents a local search method for the Bayesian
optimization algorithm (BOA) based on the concepts of substructural
neighborhoods and loopy belief propagation. The probabilistic model of
BOA, which automatically identifies important problem substructures, is
used to define the topology of the neighborhoods explored in local search.
On the other hand, belief propagation in graphical models is employed
to find the most suitable configuration of conflicting substructures. The
results show that performing loopy substructural local search (SLS) in
BOA can dramatically reduce the number of generations necessary to
converge to optimal solutions and thus provides substantial speedups.
1 Introduction
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 61–75, 2009.
c Springer-Verlag Berlin Heidelberg 2009
62 C.F. Lima et al.
still scarce. For instance, the probabilistic models of EDAs contain useful infor-
mation about the underlying problem structure that can be exploited to speedup
the convergence of EDAs to optimal solutions.
This paper makes use of the concept of substructural neighborhoods [7,8]—
where the structure of the neighborhoods is defined by learned probabilistic
models—to perform local search in BOA. The local search method proposed
is inspired on loopy belief propagation, that is often used for obtaining the
most probable state of a Bayesian network. To guide the propagation of beliefs,
we use a surrogate fitness model that also relies on substructural information.
Experiments are performed for a boundedly difficult problem with both non-
overlapping and overlapping subproblems. The results show that incorporating
loopy substructural local search (SLS) in BOA leads to a significant reduction
in the number of generations, providing relevant speedups in terms of number
of evaluations.
The next section briefly reviews the Bayesian optimization algorithm, while
Section 3 details the notion of substructural local search in EDAs. Section 4
introduces belief propagation in graphical models and its potential for function
optimization. A new substructural local search method based on loopy belief
propagation is then presented in Section 5. Section 6 presents and discusses
empirical results. The paper ends with a brief summary and conclusions.
Bayesian networks [9] are powerful graphical models that combine probability
theory with graph theory to encode probabilistic relationships between variables
of interest. A Bayesian network is defined by a structure and corresponding
parameters. The structure is represented by a directed acyclic graph where the
Loopy Substructural Local Search for the BOA 63
nodes correspond to the variables of the data to be modeled and the edges
correspond to conditional dependencies. The parameters are represented by the
conditional probabilities for each variable given any instance of the variables
that this variable depends on. More formally, a Bayesian network encodes the
following joint probability distribution,
p(X) = p(Xi |Πi ), (1)
i=1
Pelikan and Sastry [10] extended the Bayesian networks used in BOA to encode
a surrogate fitness model that is used to estimate the fitness of a proportion of
the population, thereby reducing the total number of function evaluations. For
each possible value xi of every variable Xi , an estimate of the marginal fitness
contribution of a subsolution with Xi = xi is stored for each instance πi of Xi ’s
parents Πi . Therefore, in the binary case, each row in the CPT is extended by
two additional entries. The fitness of an individual can then be estimated as
fest (X1 , X2 , . . . , X ) = f¯ + f¯(Xi |Πi ) − f¯(Πi ) , (2)
i=1
where f¯ is the average fitness of all solutions used to learn the surrogate,
f¯(Xi |Πi ) is the average fitness of solutions with Xi and Πi , and f¯(Πi ) is the
average fitness of all solutions with Πi .
Fitness information can also be incorporated in Bayesian networks with de-
cision trees or graphs in a similar way. In this case, the average fitness of each
instance for every variable must be stored in every leaf of the decision tree or
graph. The fitness averages in each leaf are now restricted to solutions that
satisfy the condition specified by the path from the root of the tree to the leaf.
64 C.F. Lima et al.
Belief propagation (BP) [9] is a method for performing exact and approximate in-
ference in graphical models, which has enjoyed increasing popularity over the last
years. Although BP has been reinvented several times in different fields [12,13],
it is mainly applied to two tasks: (1) obtaining marginal probabilities for some
of the variables, or (2) finding the most probable explanation or instance for
the graphical model. These two versions are known as the sum-product and
max-product algorithms.
Loopy Substructural Local Search for the BOA 65
X X X X X
X X X
X X f f f f f
Fig. 1. Example of a (a) Bayesian network and its equivalent representation as a (b) fac-
tor graph. Note that each factor corresponds to a conditional probability table, there-
fore the number of variable and factor nodes is the same.
BP algorithms are typically applied to factor graphs [12], which can be seen as
a unifying representation for both Bayesian networks and Markov networks [14].
Factor graphs explicitly express the factorization structure of the correspond-
ing probability distribution. Consider a function g(X) whose joint probability
distribution can be factorized in several local functions, such that
1
g(x1 , x2 , . . . , x ) = fI (xNI ), (3)
Z
I∈F
where Z = x I∈F fI (xNI ) is a normalization constant, I is the factor index,
NI is the subset of variable indices associated with factor I, and factor fI is a
nonnegative function. Note that for a Bayesian network each factor corresponds
to a conditional probability table.
A factor graph is a bipartite graph consisting of variable nodes i ∈ V, factor
nodes I ∈ F, and an undirected edge {i, I} between i and I if and only if i ∈ NI ,
meaning that factor fI depends on xi . Factor nodes are typically represented as
squares and variable nodes as circles.
An example of a Bayesian network, along with the corresponding representa-
tion as a factor graph, is presented in Figure 1. The factor graph represents the
following factorization
1
g(x1 , x2 , x3 , x4 , x5 ) = f1 (x1 )f2 (x1 , x2 )f3 (x1 , x2 , x3 )f4 (x4 , x5 )f5 (x5 ). (4)
Z
If one substitutes the factor functions by the corresponding conditional proba-
bilities, the joint probability distribution of a Bayesian network is obtained.
When BP is applied to cyclic graphs it is often referred as loopy belief
propagation (LBP). In this situation, the convergence to exact beliefs can not
be guaranteed as it is for acyclic graphs (without loops). However, empirical
studies have shown that good approximate beliefs can be obtained for several
domains (see [13] for an extensive list).
The inference performed by BP is done by message-passing between the nodes
of the graphical model. Each node sends and receives messages from its neighbors
66 C.F. Lima et al.
until a stable state is reached. The outgoing messages are functions of incoming
messages at each node. This iterative process is repeated according to some
schedule that describes the sequence of message updates in time [13].
When performing BP in factor graphs, there are two types of messages: mes-
sages mI→i , sent from factors I ∈ F to neighboring variables i ∈ NI , and
messages mi→I , sent from variables i ∈ V to neighboring factors I ∈ Ni . The
new messages m are given in terms of the incoming messages by the following
update rules:
mi→I (xi ) = mJ→i (xi ) ∀i ∈ V, ∀I ∈ Ni , (5)
J∈Ni \I
mI→i (xi ) = fI (xNI ) mj→I (xj ) ∀I ∈ F, ∀i ∈ NI , (6)
xNI \i j∈NI \i
⎛ ⎞
mI→i (xi ) = max ⎝fI (xNI ) mj→I (xj )⎠ ∀I ∈ F, ∀i ∈ NI , (7)
xNI \i
j∈NI \i
For the max-product algorithm, the most probable configuration (MPC) for
each variable Xi is obtained by assigning the value associated with the highest
probability at each max-marginal.
When applying BP algorithms, three types of parameters need to be de-
fined [15]: message scheduling, stopping criteria, and initial settings. For more
details about parameter setting in BP algorithms the reader is referred else-
where [15,13,9,12].
applications have been used to solve satisfiability problems [16,17] and for finding
the maximum weight matching in a bipartite graph [18].
Recognizing the potential of BP for Bayesian EDAs, Mendiburu et al. [19]
introduced belief propagation to the estimation of Bayesian networks algo-
rithm (EBNA) [20], which is very similar to BOA. The idea is to combine prob-
abilistic logic sampling (PLS) [21] with loopy belief propagation to sample the
offspring population. Specifically, n − 1 individuals are sampled through PLS
and the remaining individual is instantiated with the most probable configura-
tion for the current Bayesian network. The Bayesian network is mapped into an
equivalent factor graph so that the max-product algorithm can be applied to ob-
tain the new individual. Although the authors concluded that this modification
allowed an improvement in the optimization capabilities of EBNA, the results
fail to demonstrate great improvements both in solution quality and number of
function evaluations required [19].
While the calculation of the most probable configuration of the Bayesian net-
work at each generation is expected to generate a good solution, its relative
quality is strongly dependent upon the current stage of the search. It seems clear
that high-quality solutions can only be generated by LBP when BOA starts fo-
cusing on more concrete regions of the search space. On the other hand, instead
of performing loopy belief propagation based on the conditional probabilities,
substructural fitness information can be used for the factor nodes. Although
probabilities represent likely substructures, using the associated fitness provides
more direct information when looking for solutions with high quality. That is
what is proposed in the next section.
This section describes a substructural local searcher based on loopy belief prop-
agation that can be incorporated in BOA. The resulting method which is named
as loopy substructural local search (loopy SLS) uses substructural fitness infor-
mation f¯(Xi , Πi ) to guide the max-product algorithm in finding the MPC, which
is the solution that is expected to maximize fitness based on the contribution of
its substructures.
Regarding the parameterization of BP, the maximum number of iterations
that the algorithm is allowed to run is set to 2, while the allowed difference
when comparing two messages is of at least 10−6 (otherwise messages are con-
sidered to be similar). These are typical parameter values from the literature. The
update schedule used is the maximum residual updating [22], which calculates
all residuals (difference between updated and current messages) and updates
only the message with the largest residual. Consequently, only the residuals that
depend on the updated message need to be recalculated.
If the factor graph is acyclic, BP will converge towards a unique fixed point
within a finite number of iterations, while the beliefs can be shown to be exact.
However, if the factor graph contains loops, which is the typical situation when
translating a Bayesian network from BOA, the result can be only interpreted
68 C.F. Lima et al.
(1) Map the current Bayesian network B into a factor graph F, where factor
nodes store substructural fitness information f¯(Xi , Πi ).
(2) Remove factor nodes (and corresponding edges) whose variable set is a
subset of another factor in F.
(3) Perform loopy belief propagation in F. Return the most probable configu-
ration MPC and possible number of tied positions nt .
(4) If nt = 0, instantiate an individual with the values from MPC;
Else If 2nt ≤ , enumerate all possible 2nt configurations and instantiate
them in 2nt different individuals;
Else enumerate randomly chosen configurations out of 2nt and instan-
tiate them in different individuals.
(5) Evaluate the resulting individuals.
BOA BOA
BOA w/ loopy SLS BOA w/ loopy SLS
BOA w/ LBP (EBNA) BOA w/ LBP (EBNA)
c
Number of generations, t
5 1
10
Population size, n
10
0
10
20 40 80 120 20 40 80 120
Problem Size, l Problem Size, l
4
BOA BOA w/ loopy SLS
fe
2.5
2
5
10
1.5
20 40 80 120 20 40 80 120
Problem Size, l Problem Size, l
Fig. 3. Results for BOA with both standard LBP and loopy SLS when solving the
trap-5 problem with two overlapping variables (o = 2)
Figure 4 details the performance of BOA with both substructural local searchers
for the trap-5 problem without overlap. Clearly, both SLS versions succeed in
reducing the number of generations required to solve the problem. Consequently,
the total number of function evaluations required is significantly reduced, pro-
viding speedups superior to 10. This translates into an order of magnitude less
evaluations to solve the same problem. More importantly, √ the speedup consis-
tently increases with problem size approximately as Θ( ).
Figure 5 presents the results for the trap-5 problem with several degrees of
overlapping (o = {1, 2, 3}). By using loopy substructural local search the savings
in function evaluations are much greater than those obtained by the previous
local searcher. For the simpler SLS, when the degree of overlapping between
different subfunctions increases, the efficiency of performing local search reduces
drastically. These results are not surprising given the nature of the local searcher.
When searching for the best substructure at a given subproblem, the decision-
making does not take into account the corresponding context. Because different
subproblems are solved in a particular sequence, the best subsolution for a sub-
problem considered in isolation might not be the best choice when considering
other subproblems that overlap with the first. While this is not the case for
the overlapping trap-5 problem, because all subproblems have the same global
optimum at 11111, the local searcher can still be deceived.
Consider the following example, where two different trap-5 subproblems over-
lap in two variables (X4 and X5 ), being the total problem size = 8. When
performing local search, the initial solution 00000000 has fitness f = 4 + 4 = 8,
but when considering the best substructure for the first partition 11111000 the
corresponding total fitness decreases to f = 5 + 2 = 7. While locally the best
substructure is identified, the decrease in the overall fitness will not accept the
12
BOA BOA w/ SLS
BOA w/ SLS BOA w/ loopy SLS
BOA w/ loopy SLS 10
1
Num. generations
10
8
Speedup, η
2
0
10
20 40 80 160 200 20 40 80 160 200
Problem Size, l Problem Size, l
Fig. 4. Results for BOA with both simple SLS and loopy SLS when solving the
non-overlapping
√ trap-5 problem. The corresponding speedup scales approximately
as Θ( ).
Loopy Substructural Local Search for the BOA 73
6 6 o=1
o=1
o=2 o=2
o=3 o=3
5 5
Speedup, η
Speedup, η
4 4
3 3
2 2
1 1
20 40 80 120 20 40 80 120
Problem Size, l Problem Size, l
Fig. 5. Speedup obtained for BOA with the simple SLS [8] and the loopy SLS when
solving the trap-5 problem with overlap of o = {1, 2, 3}
move (see [8] for further details). Even if the order of visit for the neighborhoods
is randomly shuffled each time local search is performed, there is no guarantee
that all possibilities are covered for highly overlapping problems.
With loopy SLS the context for each variable is now taken into account. For
1-variable overlap (o = 1), the speedup grows up to 6, behaving very similarly
to the non-overlapping case. For 2-variable overlap (o = 2), the speedup also
increases with the problem size but with a more moderate slope. Finally, for 3-
variable overlap, the speedup grows with even a more moderate slope, while for
larger problem instances the speedup seems to stagnate. Notice that for a trap
subfunction with k = 5 and o = 3, three out of five variables (60%) are shared
with each of the two neighboring subfunctions, and each subfunction overlaps
with another four on at least one variable. This translates into a considerable
amount of noise at the decision-making for each subproblem, when looking for
the best subsolution. Although the effect of overlapping variable interactions is
similar to that of exogenous noise [25], which is known to be extremely hard
for local search [7], the speedups obtained with loopy SLS for problems with
overlap are still substantial for considerable proportions of overlap. Speedups of
6, 3.75, and 2.5 were obtained for proportions of overlap of 20%, 40%, and 60%,
respectively.
References
1. Pelikan, M., Goldberg, D.E., Cantú-Paz, E.: BOA: The Bayesian Optimization
Algorithm. In: Banzhaf, W., et al. (eds.) Proceedings of the Genetic and Evolu-
tionary Computation Conference GECCO 1999, pp. 525–532. Morgan Kaufmann,
San Francisco (1999)
2. Pelikan, M.: Hierarchical Bayesian Optimization Algorithm: Toward a New Gen-
eration of Evolutionary Algorithms. Springer, Heidelberg (2005)
3. Larrañaga, P., Lozano, J.A. (eds.): Estimation of distribution algorithms: a new
tool for Evolutionary Computation. Kluwer Academic Publishers, Boston (2002)
4. Pelikan, M., Goldberg, D.E., Lobo, F.: A survey of optimization by building and
using probabilistic models. Computational Optimization and Applications 21(1),
5–20 (2002)
5. Moscato, P.: On evolution, search, optimization, genetic algorithms and martial
arts: Towards memetic algorithms. Technical Report C3P 826, Caltech Concurrent
Computation Program, California Institute of Technology, Pasadena, CA (1989)
6. Hart, W.E.: Adaptive global optimization with local search. PhD thesis, University
of California, San Diego, CA (1994)
7. Sastry, K., Goldberg, D.E.: Let’s get ready to rumble: Crossover versus mutation
head to head. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3103, pp. 126–137.
Springer, Heidelberg (2004)
8. Lima, C.F., Pelikan, M., Sastry, K., Butz, M., Goldberg, D.E., Lobo, F.G.: Sub-
structural neighborhoods for local search in the bayesian optimization algorithm.
In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J., Whit-
ley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 232–241. Springer,
Heidelberg (2006)
Loopy Substructural Local Search for the BOA 75
1 Introduction
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 76–91, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Running Time Analysis of ACO Systems for Shortest Path Problems 77
and Neumann and Witt [8] for the optimization of simple pseudo-Boolean func-
tions. The latter authors presented an algorithm called 1-ANT. This algorithm
memorizes the best solution found so far. In each iteration a new solution is
constructed and the pheromones are updated in case another solution with at
least the same quality is found. In other words, every new best-so-far solution
is rewarded only once. Investigations of the 1-ANT [8,9] have shown that if the
evaporation strength ρ is set too small the algorithm stagnates on even very sim-
ple problems and the expected time until an optimum is found is exponential.
Other algorithms, variants of the MAX-MIN Ant System (MMAS) [10], reinforce
the best-so-far solution in every iteration. This avoids the problem of stagnation
and leads to efficient running times on various test problems [11,12].
Neumann, Sudholt, and Witt [13] investigated the effect of hybridizing ACO
with local search. Regarding combinatorial problems, Neumann and Witt [14]
presented an analysis for minimum spanning trees. Attiratanasunthron and Fak-
charoenphol [15] presented a running time analysis of ACO algorithms on a
shortest path problem, the single-destination shortest path problem (SDSP) on
directed acyclic graphs (DAGs). Their algorithm n-ANT is inspired both by the
1-ANT [8] and the AntNet algorithm [3]. To our knowledge, this is the first and
only rigorous running time analysis for ACO on a shortest path problem. This is
surprising as shortest path problems crucially inspired the development of ACO.
The aim of this work is to bring forward the theory of ACO for shortest path
problems. Shortest paths have already been investigated in the context of other
metaheuristics. Scharnow, Tinnefeld, and Wegener [16] presented an analysis of a
simple evolutionary algorithm, the (1+1) EA, for the single-source shortest path
problem (SSSP). The problems SDSP and SSSP are in essence identical. Their
results were later refined by Doerr, Happ, and Klein [17]. In [18] the latter authors
investigated a genetic algorithm, simply called GA, for the all-pairs shortest path
problem (APSP) and proved that the use of crossover leads to a speed-up com-
pared to mutation-based evolutionary algorithms. Finally, Horoba [19] proved
that an evolutionary multiobjective algorithm represents a fully polynomial-time
approximation scheme for an NP-hard multiobjective shortest path problem.
Table 1 gives an overview on the best known bounds in the single-objective case,
including bounds that will be proven in this paper. We remark that problem-
specific algorithms solve SDSP for graphs with n vertices and m edges in time
O(m + n log n) and APSP in time O(nm + n2 log n) [20].
In Section 2 we define an ACO algorithm MMASSDSP for the SDSP that differs
from the n-ANT [15] in two essential ways. Using our modified algorithm we are
able to obtain significantly improved running time bounds (see Table 1 and
Section 3) and to generalize previous results for DAGs to graphs with cycles. A
corresponding lower bound shows that our upper bounds are asymptotically tight
if the evaporation factor ρ is not too small. In Section 4 we transfer these results
to a generalized ant system MMASAPSP for the APSP where ants with different
destinations move independently. The main result concerns a modification of
MMASAPSP where ants temporarily follow foreign pheromone traces. We prove
that, surprisingly, this simple mechanism leads to a significant speed-up.
78 C. Horoba and D. Sudholt
Table 1. Overview on the best known running time bounds on graphs with n vertices,
m edges, maximum degree Δ, maximum number of edges on any shortest path,
and ∗ := max{, ln n}. The rightmost column contains the number of path length
evaluations in one iteration. The bound for MMASAPSP with interaction holds for
ρ ≤ 1/(23Δ log n); it simplifies to O(n log3 n) for optimal ρ.
2 Algorithms
We consider shortest path problems on weighted directed graphs G = (V, E, w)
where w(e) denotes the weight of edge e. The number of vertices is always de-
noted by n. We define a path of length from u to v as a sequence of vertices
(v0 , . . . , v ) where v0 = u, v = v, and (vi−1 , vi ) ∈ E for all i with 1 ≤ i ≤ .
For convenience, we also refer to the corresponding sequence of edges as path.
Let deg(u) denote the out-degree of a vertex u and Δ(G) denote the max-
imum out-degree of any vertex u ∈ V . Let (G, v) := maxu {#edges on p |
p is a shortest path from u to v} and (G) := maxv (G, v). For undirected non-
weighted graphs (G, v) is called eccentricity of v and (G) diameter of G.
For the single-destination shortest path problem (SDSP) we are looking for
shortest paths from every vertex to a specified destination vertex. The length
w(p) of a path p is defined as the sum of weights for all edges in p if the path
ends with the destination vertex. If the path does not reach the destination, we
define w(p) := ∞. In the following, we only consider positive weights as with
negative-length cycles one can find arbitrarily short paths and the problem of
computing a shortest simple path is NP-hard [15].
Attiratanasunthron and Fakcharoenphol [15] present the ACO algorithm
n-ANT for the SDSP. Their algorithm is inspired by the 1-ANT [8] and the
AntNet routing algorithm [3]. From every vertex u ∈ V an ant au starts heading
for the destination. The path is chosen by performing a random walk through
the graph according to pheromones on the edges. Ant au memorizes the best
path it has found from u to the destination so far. If it has found a path that
Running Time Analysis of ACO Systems for Shortest Path Problems 79
Algorithm 2. MMASSDSP
1: initialize pheromones τ and best-so-far paths p∗1 , . . . , p∗n
2: loop
3: for u = 1 to n do
4: construct a simple path pu = (pu,0 , . . . , pu,u ) from u to n w. r. t. τ
5: if w(pu ) ≤ w(p∗u ) then p∗u ← pu end if
6: end for
7: update pheromones τ w. r. t. p∗1 , . . . , p∗n
8: end loop
time is at least 2cn with probability 1 − 2−Ω(n) . Note that this also holds in case
polynomially many ants search for the destination in parallel in one iteration.
Also using edge weights as heuristic information does not help. Many ACO
algorithms use both pheromones and a heuristic function to guide the solution
construction [4]. However, from a vertex n/2 ≤ u ≤ n − 2 both outgoing edges
have the same weight and the same pheromone, with high probability, hence they
look the same for every ant. This example also shows that heuristic information
may be useless or even misleading for some problem instances.
Lemma 1. If τmin + τmax = 1 then for every vertex u with deg(u) > 1 always
1 ≤ τ (e) ≤ 1 + deg(u)τmin .
e=(u,·)∈E
Proof. The first inequality has already been proven in [15]. Initially the sum of
pheromones equals 1. Assume for an induction that τ (e) ≥1. If the phero-
mones are not capped by pheromone borders, we have (1 − ρ) τ (e) + ρ ≥ 1 as
new sum. In case a pheromone drops below τmin , setting the pheromone to τmin
can only increase the sum. If at least one pheromone is capped at the upper bor-
der τmax then the sum of pheromones is at least τmin + τmax = 1 as deg(u) > 1.
For the second inequality observe that the sum of pheromones can only in-
creasedue to the lower pheromone border as (1 − ρ) τ (e) + ρ ≤ τ (e) follows
from τ (e) ≥ 1. Consider an edge e with (1 − ρ)τ (e) < τmin . Compared to this
value, the pheromone increases by at most τmin · ρ when setting the pheromone
to τmin . If currently τ (e) ≤ 1+deg(u)τmin then the sum of the next pheromone
values is at most (1 − ρ)(1 + deg(u)τmin ) + ρ + deg(u)τmin · ρ = 1 + deg(u)τmin .
Hence, the second inequality follows by induction.
The lower bound also holds for every other ant leaving vertex u and every edge
e = (u, v) unless v has already been traversed by the ant. The upper bound also
holds for every other ant and every edge e = (u, ·) if it has not traversed a
successor of u before arriving at u.
are enough for the vertex to become processed, hence the expected time until u
is processed is bounded by 2e/τmin + ln(τmax /τmin )/ρ.
Let v1 , . . . , vn−1 be an enumeration of the vertices in V \ {n} ordered with
respect to increasing length of the shortest path to n. As all weights are positive,
all shortest paths from vi+1 to n only use vertices from {n, v1 , . . . , vi }. If v1 , . . . , vi
have been processed then we can wait for vi+1 to become processed using the
above argumentation. The expected time until all vertices v1 , . . . , vn−1 have been
processed is bounded by n2e/τmin + n ln(τmax /τmin )/ρ. Choosing τmin := 1/n2
and τmax = 1 − τmin, we obtain the bound O(n3 + (n log n)/ρ). Choosing τmin :=
1/(Δ) and τmax = 1 − τmin yields the bound O(nΔ + n log(Δ)/ρ).
Observe that for MMASSDSP , once a shortest path from u has been found,
the pheromones are continuously “frozen” towards shortest paths from u in the
following F = ln(τmax /τmin )/ρ iterations. The algorithm n-ANT from [15], how-
ever, only updates pheromones in case a new best-so-far path is found. This
implies that a shortest path from u has to be found several times, in the worst
case in F different iterations, in order to freeze the pheromones in the same
way. Hence, using the best-so-far rule of MMAS algorithms leads to better per-
formance results. This adds to the comparison of the 1-ANT and MMAS on
pseudo-Boolean problems in [12].
We proceed by improving Theorem 1 in several respects. First, the bound
on the expected optimization time is improved at least by a factor of ∗ /n.
Second, the result not only holds for directed acyclic graphs but for all directed
graphs with positive weights and unique shortest paths. Finally, we show that
the running time bounds hold with high probability (i. e. with probability at
least 1 − n−c for some c > 0). In the proof we follow ideas from [17] showing
that the random time until a short path of length = Ω(log n) is found is highly
concentrated around the expectation1 .
Proof. When estimating the probability that an ant chooses an edge on a short-
est path the lower bound from Corollary 1 always holds. In the proof of Theo-
rem 1 we have shown that for ant au the probability of finding a shortest path
from u to n, given that all successors of u on shortest paths have been processed,
is bounded below by τmin /(2e) if τmin ≤ 1/(Δ). This result also holds in the
case of arbitrary directed graphs.
1
There is a subtle difference to [17]: in their definition of the authors only consider
shortest paths with a minimum number of edges (if there are several shortest paths
between two vertices). Both definitions for are, however, equal if all shortest paths
are unique or have the same number of edges.
84 C. Horoba and D. Sudholt
Fix a vertex u and the unique shortest path u = v , v −1 , . . . , v0 = n with
≤ . We pessimistically estimate the expected time until u is processed. Let Ti
be the random time until vi is optimized. Consider random variables X1 , . . . , XT
that are independently set to 1 with probability τmin /(2e) and to 0 otherwise.
The random first point of time T1∗ where Xt = 1 stochastically dominates the
random time until v1 is optimized. As v1 becomes processed after an additional
waiting time of F := ln(τmax /τmin )/ρ steps, T1∗ + F stochastically dominates T1 .
Inductively, we have that T∗ + F stochastically dominates T and hence the
time until u is processed. T
Let T := 16e∗ /τmin and X := i=1 Xi . We have E(X) = T · τmin /(2e) = 8∗ .
By Chernoff bounds [21]
∗
(7/8)2 /2 ∗
Prob(X < ∗ ) ≤ Prob(X ≤ (1 − 7/8) · E(X)) ≤ e−8 < e−3 ≤ n−3 .
Hence, the probability that u is not processed after T + ln(τmax /τmin )/ρ steps
is 1/n3 . By the union bound, the probability that there is an unprocessed vertex
remaining after this time is at most 1/n2 . The result on the expectation follows
from the first result, which holds for arbitrary initial pheromones. If the algo-
rithm does not find all shortest paths within the first T + ln(τmax /τmin )/ρ steps,
we repeat the argumentation with another phase of this length. The expected
number of phases needed is clearly O(1).
Note that this even holds in case the lower pheromone border is hit. Consider
the ant starting at p0 trying to create p0 , . . . , p√
. As the probability of taking a
specific incorrect edge is at least p := 1/(2 deg(u) ), the probability that the ant
takes a correct
√ edge on the√ path is at most 1 − (deg(u) − 1) · p = 1−(deg(u) − 1)·
1/(2 deg(u) ) ≤ 1−1/(4 ). The probability that the path p0 , . . . , p is created
√ √
in a specific iteration t ≤ t is hence bounded by (1 − 1/(4 )) ≤ e− /4 .
The probability
√
that this happens during the first t iterations is bounded by
− /4
t·e ≤ 1/2 due to the definition of t. Hence with probability at least 1/2
we have not found √all shortest paths after t steps and the lower bound t/2 =
Ω(min{(log )/ρ, e /4 }) follows.
Proof. Consider all paths from u to n with u ≤ n − 2. The path (u, n) has
length n. All other paths start with the edge (u, u + 1). The length of the path
only traversing edges with weight 1 is n − u. However, if the path ends with an
edge (v, n) for u < v ≤ n − 2, the path has length v − u + n > n. Hence the path
(u, n) is the unique second best path from u to n.
Call a vertex u ≤ n − 2 wrong if the best-so-far path found by ant au is
(u, n). After initialization both edges have an equal probability of being chosen
by the first ant. By Chernoff bounds at least n/3 ants au with u ≤ n − 2 choose
incorrect edges with probability 1 − e−Ω(n) and then the edges remain incorrect
86 C. Horoba and D. Sudholt
until a shortest path has been found. We assume that we initially have n/3 wrong
vertices. First, we show that with high probability after F := ln(τmax /τmin )/ρ
iterations we still have n/3 − O(log2 n) wrong vertices. For these vertices u the
pheromones then are frozen towards the incorrect edge.
As long as a vertex u remains wrong, the pheromone on its correct edge is at
most 1/2. (It even decreases continuously towards τmin unless a shortest path
is found.) Fix the set of r := 8 log(1/ρ) wrong vertices with largest index and
let u be the vertex with the smallest index in this set. During a phase comprising
the following t := 1/ρ − 1 steps the probability of choosing the correct outgoing
edge is for each vertex bounded from above by 1 − 14 (1 − ρ)t ≤ 1 − 4e 1
using
Corollary 1. The probability that a shortest path for u is found throughout the
1 r
phase is at most t(1 − 4e ) ≤ 2log(1/ρ) (1 − 4e
1 8 log(1/ρ)
) ≤ 1/2.
We conclude that the time until all r vertices have found shortest paths is
at least t with probability at least 1/2 and the expectation is Ω(t). We may
repeat these arguments with a new phase and another set of r vertices which
are still wrong at that time and have maximal index. Consider 3F/t = Θ(log n)
subsequent phases. Applying Chernoff bounds to random variables indicating
whether a phase has found shortest paths for the considered r vertices within t
iterations, with high probability F/t phases each need at least t iterations. Hence,
with high probability after F steps at most O(log n)·r = O(log2 n) wrong vertices
have found shortest paths. It may happen that during a phase some vertices
preceding the r considered vertices find shortest paths by chance. However, the
probability that a vertex v finds a shortest path if the path still contains 3 log n+
log(1/ρ) wrong vertices is at most 2−3 log n−log(1/ρ) ≤ ρ/n3 . Taking the union
bound for at most n vertices and F iterations, this does not happen within F
iterations, with high probability. Hence, we correct at most 3 log n + log(1/ρ) =
O(log n) wrong vertices per phase and O(log2 n) wrong vertices in total this way.
With high probability we obtain a situation where for n/3 − O(log2 n) wrong
vertices pheromones are frozen towards the incorrect edge. We separately prove
lower bounds Ω(n/(ρ log(1/ρ))) and Ω(n2 ) for the expected remaining optimiza-
tion time.
The first bound follows from applying the above arguments on phases to the
remaining Ω(n) wrong vertices, along with the fact that the probability of finding
a shortest path containing i wrong vertices has decreased to (τmin )i ≤ 1/ni .
Hence, with high probability at most a constant number of wrong vertices is
corrected unexpectedly per phase and the expected time to complete Ω(n/r) =
Ω(n/log(1/ρ)) phases yields the first bound.
For the second bound Ω(n2 ) we observe that the expected time to find
a shortest path for u if the path contains at least four wrong vertices is at
most (τmin )4 ≤ 1/n4 . Hence, with high probability during Ω(n2 ) iterations it
does not happen that more than 4 wrong vertices are corrected in the same iter-
ation. The expected time until the wrong vertex with largest index is corrected
is 1/τmin ≥ n. If the number of wrong vertices always decreases by at most 4,
the expected time to correct Ω(n) wrong vertices is Ω(n2 ).
Running Time Analysis of ACO Systems for Shortest Path Problems 87
We see that ants heading for different destinations do not collaborate in our ant
system since ants heading for a destination v concern for the pheromone function
τv exclusively. Therefore we could also run n instances of MMASSDSP in parallel
to achieve the same result. An obvious question is whether the ants can interact
in some clever way to achieve a better result.
Interestingly, the following very simple mechanism proves useful. Consider the
ant au,v heading for vertex v. Instead of always using the pheromone function τv
to travel to v, with probability, say, 1/2 the ant decides to follow foreign phero-
mones. It first chooses an intermediate destination w uniformly at random, then
uses the pheromone function τw to travel to w, and afterwards uses the pher-
omone function τv to travel to the final destination v (see Algorithm 3). The
pheromone update for ant au,v always applies to the pheromones τv .
88 C. Horoba and D. Sudholt
With this mechanism the ant au,v can profit from useful information laid
down by other ants that headed towards w, in particular if w happens to be a
vertex on a shortest path from u to v. The following theorem gives a significantly
improved bound, without restriction to graphs with unique shortest paths.
Since ρ ≤ 1/(23Δ log n) ≤ 1/(8Δ ln(2n4 )), the above probability is at most
1/(2n4 ). Because of the union bound, all pairs (u, v) with u,v = 1 are optimized
within the considered phase with probability at least 1 − f1 where f1 := 1/(2n2).
We know that an optimized pair (u, v) is processed within ln(τmax /τmin )/ρ
iterations.
Consider a pair (u, v) and fix a shortest path pu,v from u to v with u,v edges.
Let i with (3/2)i < u,v ≤ (3/2)i+1 . If all pairs (u , v ) with u ,v ≤ (3/2)i are
Running Time Analysis of ACO Systems for Shortest Path Problems 89
processed, the probability of optimizing (u, v) is at least 1/2 · u,v /(3n) · 1/e >
(3/2)i /(6en) since the ant decides with probability 1/2 · u,v /(3n) to choose an
intermediate destination w on the middle third of p. Hence, the number of edges
of all shortest paths pu,w (pw,v ) from u (w) to w (v) is at most (3/2)i . Since
(x, w) ((x, v)) is processed for all vertices x on a shortest path from u (w) to
w (v), the ant follows a shortest path from u to v with probability at least
(1 − 1/)−1 ≥ 1/e.
We divide a run of the ant system into phases. The ith phase finishes with
all pairs (u, v) with (3/2)i−1 < u,v ≤ (3/2)i being processed. Since u,v ≤ , we
have to consider α := log()/ log(3/2)
phases.
Consider Phase i of length t = 6en/(3/2)i ln(2αn4 ). The probability of not
optimizing a pair (u, v) with (3/2)i−1 < u,v ≤ (3/2)i within the phase is at
most (1 − (3/2)i /(6en))t ≤ 1/(2αn4 ). Due to the union bound, all such pairs
(u, v) are optimized within t iterations with probability at least 1−1/(2αn2 ). We
know that an optimized pair (u, v) is processed within ln(τmax /τmin )/ρ iterations.
Using the union bound, all phases are finished within
α
α i
6en ln(2αn4 ) ln(Δ) 2 α ln(Δ)
+ ≤ 6en ln(2αn )
4
+
i=1
(3/2)i ρ i=1
3 ρ
= O(n log n + log() log(Δ)/ρ)
We remark that the choice of the probability 1/2 for choosing an intermediate
vertex is not essential; using any other constant value 0 < p < 1 would only affect
the constants in Theorem 7. If Δ, = Ω(n) and ρ = 1/(23Δ log n) the upper
bounds given in Theorem 6 and Theorem 7 simplify to O(n3 ) and O(n log3 n),
respectively. Hence, the ant system clearly profits from our simple interaction
mechanism and more collaboration between the ants.
5 Conclusions
ACO is motivated by the ability of real ant colonies to find shortest paths to a
food source. Building on an initial study by Attiratanasunthron and Fakchar-
oenphol [15], we have conducted a rigorous analysis of the running time of ACO
algorithms for shortest path problems. Our results (see Table 1) significantly
improve and generalize the previous results for single-destination shortest paths.
Taking the number of function evaluations as performance measure, the bound
for MMASSDSP is better than the bound for the (1+1) EA [17] if Δ = o(n) and
ρ is not too small.
90 C. Horoba and D. Sudholt
For all-pairs shortest paths first results have been obtained using MMASAPSP
as a direct generalization of MMASSDSP . We have proved that, surprisingly,
letting ants temporarily follow foreign pheromone traces to random destinations
yields drastically improved results. This is also the first result for combinatorial
optimization where a slow adaptation for pheromones is crucial, i. e., low values
for the evaporation factor ρ yield the best upper bounds. For an optimal choice
of ρ the bound of O(n √ log n) function evaluations improves upon the best
3 3
References
1. Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning ap-
proach to the traveling salesman problem. IEEE Transactions on Evolutionary
Computation 1(1), 53–66 (1997)
2. Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: An autocatalytic optimizing
process. Technical Report 91-016 Revised, Politecnico di Milano (1991)
3. Di Caro, G., Dorigo, M.: AntNet: Distributed stigmergetic control for communica-
tions networks. Journal of Artificial Intelligence Research 9, 317–365 (1998)
4. Dorigo, M., Stützle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004)
5. Gutjahr, W.J.: A generalized convergence result for the graph-based ant sys-
tem metaheuristic. Probability in the Engineering and Informational Sciences 17,
545–569 (2003)
6. Merkle, D., Middendorf, M.: Modelling the dynamics of Ant Colony Optimization
algorithms. Evolutionary Computation 10(3), 235–262 (2002)
7. Gutjahr, W.J.: First steps to the runtime complexity analysis of ant colony opti-
mization. Computers and Operations Research 35(9), 2711–2727 (2008)
8. Neumann, F., Witt, C.: Runtime analysis of a simple ant colony optimization
algorithm. In: Asano, T. (ed.) ISAAC 2006. LNCS, vol. 4288, pp. 618–627. Springer,
Heidelberg (2006)
9. Doerr, B., Neumann, F., Sudholt, D., Witt, C.: On the runtime analysis of
the 1-ANT ACO algorithm. In: Proc. of GECCO 2007, pp. 33–40. ACM Press,
New York (2007)
10. Stützle, T., Hoos, H.H.: MAX-MIN ant system. Journal of Future Generation Com-
puter Systems 16, 889–914 (2000)
11. Gutjahr, W.J., Sebastiani, G.: Runtime analysis of ant colony optimization with
best-so-far reinforcement. Methodology and Computing in Applied Probability 10,
409–433 (2008)
12. Neumann, F., Sudholt, D., Witt, C.: Analysis of different MMAS ACO algorithms
on unimodal functions and plateaus. Swarm Intelligence 3(1), 35–68 (2009)
13. Neumann, F., Sudholt, D., Witt, C.: Rigorous analyses for the combination of
ant colony optimization and local search. In: Dorigo, M., Birattari, M., Blum,
C., Clerc, M., Stützle, T., Winfield, A.F.T. (eds.) ANTS 2008. LNCS, vol. 5217,
pp. 132–143. Springer, Heidelberg (2008)
14. Neumann, F., Witt, C.: Ant colony optimization and the minimum spanning tree
problem. In: Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) LION 2007 II. LNCS,
vol. 5313, pp. 153–166. Springer, Heidelberg (2008)
Running Time Analysis of ACO Systems for Shortest Path Problems 91
1 Introduction
Optimization problems arise from virtually all areas of science and engineering,
and are often characterized by a large number of variables. The set of admissible
values for such variables is called search space, and can usually be provided with
a rich topological structure, which is determined both by the problem’s intrinsic
structure and by the solving algorithm’s characteristics.
A search space complemented with the topological structure induced by the
local search algorithm (evaluation function and neighborhood relation) is called a
search landscape, and its structure determines, by definition, the behavior of the
solving technique. Search landscape analysis is a research field aimed at providing
tools for the prediction of the search algorithm’s performance and its consequent
improvement. Relevant features in this kind of analysis are, of course, the search
space size and the number of degrees of freedom (i.e., the dimensionality).
In this work, we will focus on Stochastic Local Search (SLS) techniques [1],
where a neighborhood operator is defined in order to map a configuration into
a set of neighboring ones; the relevant topological structure is defined by the
chosen neighborhood operator. An important feature influencing the behavior
of SLS algorithms is the relative position and reachability of local optima with
respect to the neighborhood topology, and some problem instances are known
Work supported by project BIONETS (FP6-027748) funded by the FET program
of the European Commission.
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 92–104, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Techniques and Tools for Local Search Landscape Visualization and Analysis 93
heights represent the size of the corresponding clique, and the horizontal layout
tries to retain the topology of the neighborhood graph.
The lowest vertices represent cliques with the smallest size (2). In all levels
the number of vertices depends on the connectivity in the instance graph, and
in the instance depicted in Figure 1 are bounded from above by 20 k where k is
the size of the cliques in the level, and 20 the number of nodes in the specific
instance.
The landscape depicted in Figure 1 is generated starting from 43 maximal
cliques enumerated empirically with two state of the art heuristic algorithms for
the MC problem: RLS-MC and DLS-MC. It has to be noted that a complete
enumeration should always be used when possible, because using SLS algorithms
for the empirical enumeration could lead to a bias in the representation. From
every maximal clique, a tree containing all possible solutions within the maximal
one is generated by means of a backtracking algorithm. Solution trees originated
from different maximal cliques can overlap and share consistent parts of the
search space. Therefore, during the enumeration all the solutions are added to a
hash table, and when a solution is encountered twice the corresponding sub-tree
is pruned. Once all the solution are enumerated, they are connected with arcs if
a local search algorithm could move from one to the other by means of one of the
neighborhood operations (add, drop, or swap). Once the graph is constructed a
spring-based method is used to lay out the graph (with the further constraint
that all vertices lay on the plane corresponding to their objective value). The
nodes are treated as pointwise unit masses subject to pairwise forces of two types.
The first is an attractive spring force based on a smoother version of Hooke’s
law [12]; it accounts for the node adjacency. The force acting on node a due to
node b is
⎧
⎨k b − a log b − a2 if a and b are adjacent
FH =
ab
b − a2 rab (1)
ab
⎩0 otherwise,
where a and b are the coordinate vectors of a and b in the low dimensional
representation, rab is the ideal distance and kab is the spring stiffness, which
depends on the desired layout. The second force is of Coulombian type and acts
between every pair of nodes:
qa qb a−b
FC
ab = , (2)
b − aD
2
ab b − a2
The snapshots in Figure 1 show the landscape from the side, the front, the top,
and in perspective. The almost flat area corresponding to a plateau bumped
with several local optima is quite evident, as well as the clique of size 7 and the
Techniques and Tools for Local Search Landscape Visualization and Analysis 97
barrier between the points on the plateau and the maximum clique. This pro-
vides immediate information about the instance properties: because of the low
number of nodes shared by the optimum and the flat area, algorithms with long
plateau phases could be worse than algorithms with shorter plateau phases and
more frequent restart policies. The layout can also embed extra information. For
example, the vertices can be rendered with different colors depending on how
frequently they are visited by the SLS algorithm, in order to analyze the attrac-
tion basins and how they are distributed with respect to the global optimum.
Another example of information that could be easily embedded by means of a
vertex coloring is the average degree distribution of the nodes in the solutions,
which can give an immediate summary of the degree distribution and give some
hints on the performance of greedy local search algorithms.
1
The software is available for research purposes at http://graphvisualizer.org/
98 F. Mascia and M. Brunato
representation, but too many control points can lead to artificial local optima
between the solutions where control point heights are not set by any solution.
The coloring of the NURBS surfaces as well as the clusters in the approximate
landscapes varies from blue to red showing the quality of the solutions.2
4 Approximated Landscapes
While the search space analysis
of small instances is interesting per se, every
solution of size m contains m k solutions of size k, and the enumeration of all
possible solutions at the lower levels (avoiding repetitions) does not scale with
solution size. In order to handle larger instances, a number of approximated
layouts can be introduced. The first solution considers clusters of subcliques as
a unique object, the second operates by subsampling the solutions obtained by
the SLS algorithm.
With this technique, there is no need to enumerate the exponentially large num-
ber of sub-optimal solutions: knowing the size of the clusters and the fraction of
their volume that overlaps is enough to render an approximated landscape like
the one shown in Figure 2.
The multidimensional scaling is done with the spring based layout technique
used in Section 3, but this time the vertices to be laid out are the clusters, their
size is reflected in the charges qa and qb that determine their repulsive force
in (2), and their overlapping volumes are encoded in the spring elastic constants
Kab and their zero-energy spring lengths rab of (1).
2
Examples of colored surfaces are available at http://graphvisualizer.org/slv
Techniques and Tools for Local Search Landscape Visualization and Analysis 99
Fig. 3. Same Landscape of Figure 2 but subsampling the search space removing solu-
tions with quality less than 3. This highlights the barrier between the plateau and the
optimum.
ln(2π)
ln n! ≈ (n + 0.5) ln n − n +
2
100 F. Mascia and M. Brunato
5 Dynamic Landscapes
In the following section, we will show through an example how the proposed
analysis of the Search Landscape sheds some light on the dynamics of penalty-
based SLS algorithms, and on the changes of the evaluation function g during
the search.
When DLS-MC reaches a local optimum, all the components belonging to
such solution are penalized. The aim of the penalization is to drop the quality of
the local optimum and render it less attractive in the subsequent steps of local
search. Nevertheless, the penalization effect is not limited to the local optimum,
but impacts all the areas of the landscape having solutions overlapping with the
penalized one. Therefore, it is of particular interest to study the behavior of the
penalization and its impact on the landscape.
The adopted technique is composed of two steps. First, the three dimensional
landscape corresponding to the objective function f is laid out by means of the
force based multidimensional scaling technique described in Section 3. Then a
landscape corresponding to the evaluation function g for each penalization step
is produced. For the continuity reasons stated in Section 1, the horizontal layout
of the objective function search landscape is retained throughout all penalization
steps, the only thing that varies is the quality of the solutions whose components
are penalized.
Figure 4 shows the penalization effect which transforms the landscape of the
Brockington-Culberson instance of Figure 1. In Figure 4 the first NURBS surface
is produced from the complete representation of the objective function. The
second landscape in figure retains the same horizontal layout, and the plateau is
Techniques and Tools for Local Search Landscape Visualization and Analysis 101
partly flattened by the penalization effect, which is then partially reverted after
the penalties expiration in the third landscape. The fourth landscape shows the
last penalization before DLS-MC is able to find the optimum solution. The effect
is more clear in Figure 5, in which the plateau size reduction is quite evident.
The increased number of levels in the graph after the penalization is due to
the fact that the landscape corresponds to an evaluation function g and not
an objective function. The algorithm whose steps are shown in figure associates
integer penalties to the solution components belonging to local optima. The
evaluation function g is computed as the cardinality of the solution minus the
penalties associated to its components, therefore the landscape is on discrete
levels, some of which have a negative quality.
The penalization strategy was effective in finding the well hidden global opti-
mum, which does not share solution components with the penalized local optima.
On the contrary, in the instance of the MC problem depicted in Figure 6 the
maximum clique has size 5, and the 30 smaller cliques of size 4 share a node with
the maximum one. Therefore a penalization of a local maximum always impacts
the global one.
Figure 7 represents the results of a DLS-MC run on the instance described
above. The NURBS landscape on the top-left represents the unmodified objective
function, and it shows in the middle the global optimum slightly above the other
optima. The other three landscapes in figure show the evaluation function after
102 F. Mascia and M. Brunato
Fig. 6. A MC instance with 155 nodes. The maximum clique has size 5, and the 30
smaller cliques of size 4 share a node with the maximum one.
Techniques and Tools for Local Search Landscape Visualization and Analysis 103
Fig. 7. Four penalization steps of DLS-MC on the instance depicted in Figure 6. The
first landscape on the top-left shows the objective function with the global optimum
in the middle.
provided parameters like the repulsion force, damping factor, zero energy spring
lengths, spring elastic constants, which have to be appropriate for the graph
structure.
The proposed techniques have been implemented in a Mac OS X application
that allows for real-time manipulation and animation. The program is free for
academic use and can be downloaded from
http://graphvisualizer.org/
References
1. Hoos, H., Stützle, T.: Stochastic Local Search: Foundations and Applications. Mor-
gan Kaufmann, San Francisco (2005)
2. Battiti, R., Protasi, M.: Reactive local search for the maximum clique problem.
Technical Report TR-95-052, ICSI, 1947 Center St.- Suite 600 - Berkeley, California
(September 1995)
3. Battiti, R., Mascia, F.: Reactive and Dynamic Local Search for the Max-Clique
Problem: Engineering effective building blocks. Computers and Operations Re-
search (2009) (in press)
4. Pullan, W., Hoos, H.H.: Dynamic Local Search for the Maximum Clique Problem.
Journal of Artificial Intelligence Research 25, 159–185 (2006)
5. Rafiei, D., Curial, S.: Effectively visualizing large networks through sampling. In:
WWW 2005 Proceedings (2005)
6. Frishman, Y., Tal, A.: Online dynamic graph drawing. IEEE Transactions on Vi-
sualization and Computer Graphics 14(4), 727–740 (2008)
7. Pohlheim, H.: Visualization of evolutionary algorithms-set of standard techniques
and multidimensional visualization. In: Proceedings of the Genetic and Evolution-
ary Computation Conference, vol. 1, pp. 533–540. Morgan Kaufmann, San Fran-
cisco (1999)
8. Anderson, D., Anderson, E., Lesh, N., Marks, J., Perlin, K., Ratajczak, D., Ryall,
K.: Human-guided simple search: combining information visualization and heuristic
search. In: Proceedings of the 1999 workshop on new paradigms in information
visualization and manipulation in conjunction with the eighth ACM international
conference on information and knowledge management, pp. 21–25. ACM, New York
(1999)
9. Koppen, M., Yoshida, K.: Visualization of Pareto-sets in evolutionary multi-
objective optimization. In: 7th International Conference on Hybrid Intelligent Sys-
tems. HIS 2007, pp. 156–161 (September 2007)
10. Halim, S., Yap, R.: Designing and Tuning SLS Through Animation and Graphics:
An Extended Walk-Through. In: Stützle, T., Birattari, M., Hoos, H.H. (eds.) SLS
2007. LNCS, vol. 4638, pp. 16–30. Springer, Heidelberg (2007)
11. Brockington, M., Culberson, J.C.: Camouflaging independent sets in quasi-random
graphs. In: Johnson, D.S., Trick, M.A. (eds.) Cliques, Coloring, and Satisfiability:
Second DIMACS Implementation Challenge, vol. 26, pp. 75–88. American Mathe-
matical Society, Providence (1996)
12. Eades, P.: A heuristic for graph drawing. Congressus Numerantium 42, 149–160
(1984)
High-Performance Local Search for Solving
Real-Life Inventory Routing Problems
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 105–109, 2009.
c Springer-Verlag Berlin Heidelberg 2009
106 T. Benoist et al.
which send orders to the vendor, specifying the desired quantity and the time
window in which the delivery must be done. Some customers can ask for the
both types of resupply management: their inventory is replenished by the vendor
using monitoring and forecasting, but they keep the possibility of ordering (to
deal with an unexpected increase of their consumption, for example). Constraints
consisting in maintaining inventory levels above safety levels (no stock out) and
in satisfying orders (no missed orders) are defined as soft, since the existence of
an admissible solution is not ensured in real-life conditions.
The transportation is performed by vehicles composed of three kinds of het-
erogenous resources: drivers, tractors, trailers. Each resource is assigned to a
base. A vehicle corresponds to the association of one driver, one tractor and one
trailer. Some triplets of resources are not admissible (due to driving licences, for
example). The availability of each resource is defined through a set of time win-
dows. Each site (plant or customer) is accessible to a subset of resources (special
skills or certifications are required to work on certain sites). Thus, scheduling a
shift consists in defining: a base, a triplet of resources (driver, tractor, trailer),
and a set of operations each one defined by a triplet (site, date, quantity) cor-
responding to the pickups or deliveries performed along the tour. A shift must
start from the base to which are assigned the resources composing the vehi-
cle and must end by returning to this base. The working and driving times of
drivers are limited; as soon as a maximum duration is reached, the driver must
take a rest with a minimum duration (Department of Transportation rules). In
addition, the duration of a shift cannot exceed a maximal value depending on
the driver. The sites visited along the tour must be accessible to the resources
composing the vehicle. A resource can be used only during one of its availabil-
ity time windows. The date of pickup/delivery must be contained in one of the
opening time windows of the visited site. Finally, the inventory dynamics, which
can be modeled by flow equations, must be respected at each time step, for each
site inventory and each trailer. In particular, the sum of quantities delivered to
a customer (resp. loaded at a plant) minus (resp. plus) the sum of quantities
consumed by this customer (resp. produced by this plant) over a time step must
be lower (resp. greater) than the capacity of its storage (resp. zero). Note that
here the duration of an operation does not depend on the delivered or loaded
quantity; this duration is fixed in function of the site where the operation is
performed, the resulting approximation being covered by the uncertainties lying
on the traveled times.
In our case, reliable forecasts (for both plants and customers) are known
over a 15-days horizon. Thus, shifts are planned deterministically day after day
with a rolling horizon of 15 days. It means that each day, a distribution plan is
built for the next 15 days, but only shifts starting at the current day are fixed.
The objective of the planning is to respect the soft constraints described above
over the long run (satisfying orders, maintaining safety levels). In practice, the
situations where these constraints cannot be met are extremely rare, because
missed orders and stockouts are unacceptable for customers (on the other hand,
safety levels must be finely tuned according to customer consumptions). Then,
High-Performance Local Search for Solving Real-Life IRP 107
the second objective is to minimize over the long term a logistic ratio defined as
the sum of the costs of shifts (which is composed of different terms related to the
usage of resources) divided by the sum of the quantities delivered to customers.
In other words, this logistic ratio corresponds to the cost per unit of delivered
quantity.
Large-scale instances have to be tackled. A geographic area can contain up
to 1500 customers, 50 sources, 50 bases, 100 drivers, 100 tractors, 100 trail-
ers. All dates and durations are expressed in minutes (on the whole, the short-
term planning horizon counts 21600 minutes); the inventory dynamics for plants
and customers are computed with time steps of one hour (because forecasts are
computed with this accuracy). The execution time for computing a short-term
planning is limited to 5 minutes on standard computers.
2 Related Works
Since the seminal work of Bell et al. [1] on a real-life inventory routing problem
encountered at Air Products (a producer and distributor of industrial gases), a
vast literature has emerged on the subject. In particular, a long series of papers
was published by Savelsbergh et al. [2,3,4,5,6], motivated by a real-life problem-
atic encountered at Praxair (another supplier of industrial gases). However, in
many companies, inventory routing is still done by hand or supported by basic
softwares, with rules like: serve “emergency” customers (that is, customers whose
inventory is near to run out) using as many “full deliveries” as possible (that is,
deliveries with quantity equal to the trailer capacity or, if not possible, to the
customer tank capacity). For more references, the interested reader is referred
to the recent papers by Savelsbergh and Song [5,6], which give a good survey of
the research done on the IRP over the past 25 years.
To our knowledge, the sole papers describing practical solutions for problems
similar to the one addressed here are the ones by Savelsbergh et al. [3,4,5,6]. The
solution approaches described in these papers are the same in essence: the short-
term planning problem is decomposed to be solved in two phases. In the first
phase, it is decided which customers are visited in the next few days, and a target
amount of product to be delivered to these customers is set. In the second phase,
vehicle routes are determined taking into account vehicle capacities, customer
delivery windows, drivers restrictions, etc. The first phase is solved heuristically
by integer programming techniques, whereas the second phase is solved with
specific insertion heuristics [7]. The experiments reported in the different works
on the subject [1,3,4,5,6] show savings up to 10 % over the long run (with
computation times of several minutes), compared to solutions obtained by a
greedy algorithm based on the rules of thumb commonly used in practice (like
the one cited above).
3 Contribution
To our acquaintance, no pure and direct local-search algorithm has been pro-
posed for solving the IRP. A local-search approach is described by Lau et al. [8]
108 T. Benoist et al.
for solving an inventory routing problem with time windows, but their approach
is based on a decomposition scheme (distribution and then routing). In this
paper, an original local-search heuristic is described for solving the short-term
planning problem. We insist on the fact that no decomposition is done in our
approach: the short-term planning is optimized directly over the 15-days hori-
zon. This algorithm has been designed and engineered following the methodology
described by Estellon et al. [9] in a companion paper. A computational study
demonstrates that our solution is both effective, efficient and robust, providing
long-term savings exceeding 20 % on average, compared to solutions computed
by expert planners or even a classical greedy algorithm.
Following the methodology exposed in [9], our local-search heuristic is designed
according to three layers. The first layer corresponds to the search strategy; here a
first-improvement descent heuristic with stochastic selection of transformations is
employed (an initial solution is computed using an urgency-based insertion heuris-
tic). The second layer corresponds to the pool of transformations which defines
the neighborhood; here more than one hundred transformations are defined on
the whole, which can be grouped into a dozen of types (for operations: insertion,
deletion, ejection, move, swap; for shifts: insertion, deletion, rolling, move, swap).
Finally, the third layer, corresponding to the “engine” of the local search, consists
of three main procedures common to all transformations: evaluate (which eval-
uates the gain provided by the transformation applied to the current solution),
commit (which validates the transformation by updating the current solution and
the associated data structures), rollback (which clears all the data structures used
to evaluate the transformation). Since the duration of an operation does not de-
pend on the quantity loaded or delivered, the evaluation procedure is separated
into two routines: first the scheduling of shifts and then the assignment of vol-
umes. These routines, whose running time is critical for performance, relies on
incremental algorithms supported by special data structures for exploiting invari-
ants of transformations.
The whole algorithm was implemented in C# 2.0 programming language (for
running on Microsoft .NET 2.0 framework). The resulting program includes
nearly 30000 lines of code, whose 6000 lines (20 %) are dedicated to check the
validity of all incremental data structures at each iteration (only active in de-
bug mode). The whole project (specifications, research, implementation, tests),
realized during the year 2008, required nearly 300 man-days. All statistics and
results presented here have been obtained on a computer equipped with a Win-
dows Vista operating system and a chipset Intel Xeon X5365 64 bits (CPU
3 GHz, L1 cache 64 Kio, L2 cache 4 Mio, RAM 8 Go). The local-search algo-
rithm attempts more than 10000 transformations per second, even for large-scale
instances (thousand sites and hundred resources). Then, our algorithm visits
nearly 10 million solutions in the search space during 5 minutes of running time
(which is the desired time limit in operational conditions). When planning over a
15-days horizon, the memory allocated by the program does not exceed 30 Mo for
medium-size instances (hundred sites, ten resources), and 300 Mo for large-scale
instances (thousand sites, hundred resources). Note that the running time of the
High-Performance Local Search for Solving Real-Life IRP 109
References
1. Bell, W., Dalberto, L., Fisher, M., Greenfield, A., Jaikumar, R., Kedia, P., Mack,
R., Prutzman, P.: Improving the distribution of industrial gases with an on-line
computerized routing and scheduling optimizer. Interfaces 13(6), 4–23 (1983)
2. Campbell, A., Clarke, L., Kleywegt, A., Savelsbergh, M.: The inventory routing
problem. In: Crainic, T., Laporte, G. (eds.) Fleet Management and Logistics, pp.
95–113. Kluwer Academic Publishers, Norwell (1998)
3. Campbell, A., Clarke, L., Savelsbergh, M.: Inventory routing in practice. In: Toth,
P., Viego, D. (eds.) The Vehicle Routing Problem. SIAM Monographs on Discrete
Mathematics and Applications, vol. 9, pp. 309–330. Kluwer Academic Publishers,
Philadelphia (2002)
4. Campbell, A., Savelsbergh, M.: A decomposition approach for the inventory-routing
problem. Transportation Science 38(4), 488–502 (2004)
5. Savelsbergh, M., Song, J.-H.: Inventory routing with continuous moves. Computers
and Operations Research 34(6), 1744–1763 (2007)
6. Savelsbergh, M., Song, J.-H.: An optimization algorithm for the inventory rout-
ing with continuous moves. Computers and Operations Research 35(7), 2266–2282
(2008)
7. Campbell, A., Savelsbergh, M.: Efficient insertion heuristics for vehicle routing and
scheduling problems. Transportation Science 38(3), 369–378 (2004)
8. Lau, H., Liu, Q., Ono, H.: Integrating local search and network flow to solve the
inventory routing problem. In: Proceedings of AAAI 2002, the 18th National Con-
ference on Artificial Intelligence, pp. 9–14. AAAI Press, Menlo Park (2002)
9. Estellon, B., Gardi, F., Nouioua, K.: High-performance local search for task schedul-
ing with human resource allocation. In: Stützle, T., Birattari, M., Hoos, H.H. (eds.)
SLS 2009. LNCS, vol. 5752, pp. 1–15. Springer, Heidelberg (2009)
A Detailed Analysis of Two Metaheuristics for
the Team Orienteering Problem
1 Introduction
In the Orienteering Problem (OP) a set of n locations i is given, each with a
score Si and a service or visiting time Ti . The starting location (1) and the
end location (n) are fixed. The time tij needed to travel from location i to j is
known for all locations. Not all locations can be visited since the available time
is limited to a given time budget Tmax . The OP goal is to determine a single
route, limited by Tmax , that visits some of the locations and that maximises
the total collected score. Each location can be visited at most once. The Team
Orienteering Problem (TOP) is an OP where the goal is to determine m routes,
each limited to Tmax , that maximise the total collected score.
More details about the TOP and its applications can be found in the
literature [1,2,3,4,5,6,7]. Many different algorithms have been designed for the
TOP [1,3,4,5,6,7], but until now, they were only compared based on the qual-
ity of the results and the required computational effort. A thorough analysis of
why certain algorithms perform well (or not) is missing. Little or no attention
is given to gather real insight in the problem in order to ”optimise” the design
of (new) algorithms. This paper introduces techniques to analyse the perfor-
mance of (certain components of) metaheuristics. The techniques are applied
to the SVNS algorithm of Vansteenwegen et al. [5] and the PR algorithm of
Souffriau et al. [7].
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 110–114, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A Detailed Analysis of Two Metaheuristics for the TOP 111
TOP [1,3,4,5,6]. We will only focus on the PR and SVNS algorithm described
above and analyse their performance in detail. Seven local search moves are used
by these algorithms. Five moves increase the total score of the solution: Insert,
TwoInsert, Replace, TwoReplace and Change; and two moves reduce the travel
time between the selected locations: TwoOpt and Swap. All moves are described
in detail in [5] and [7]. 158 relevant instances, described in [7] are used for this
analysis.
The importance of a move is illustrated by its ”contribution per application”,
its ”isolated contribution” and its ”additional contribution”. The ”contribution
per application” (CPA) is defined as the average score increase (or travel time
decrease) over all the times the move is applied. CPA is given as a percentage of
the total score (or travel time) of the final solution. The ”isolated contribution”
(IC) is defined as the decrease in score (in percentage) when only this move is
implemented and no other moves. The ”additional contribution” (AC) is defined
as the decrease in score (in percentage) when this move is not implemented.
The AC can be considered as the ”added value” of adding this move to the
implemented set of moves. All percentages mentioned in this section are average
percentages over all 158 instances. An important move will have a high CPA
and a low IC and AC. For each instance, the best result obtained by SVNS or
PR will be used as a benchmark.
is the next best move, based on IC, is not a surprise. Insert adds locations to
all routes and the diversification procedures remove locations from the routes or
move locations from one route to another. This combination allows an intensive
exploration of the whole solution space. An important aspect that is still miss-
ing, resulting in a gap of 26.5%, is the lack of a travel time reduction move. No
IC results are presented for TwoOpt or Swap since implementing these moves
without any score increasing move, is irrelevant.
The percentages in the AC row are much smaller, but nevertheless very mean-
ingful. These results confirm the previous results about the effectiveness of dif-
ferent moves. A rather surprising conclusion is that Insert and Replace are more
important to reduce the computational effort than to increase the final quality of
the results. This statement is only correct when enough alternative local search
moves are implemented that increase the score and decrease the travel time.
Other interesting analysis results for SVNS (not in the table) are that without
diversification procedures, the results are 7.3% worse on average.
GRASP with Path Relinking: Table 2 summarises the importance of each local
search move in the PR algorithm, in the same way as Table 1. Insert is always
applied during the initialisation and during the path relinking. This implies that
the ”Isolated Contribution” of the other moves is not really isolated, but always
in combination with Insert.
Based on the CPA, the same conclusions as with SVNS can be made: Insert is
the most important score increasing move and TwoOpt is the most contributing
travel time decreasing move. The IC results for Replace (including Insert ) illus-
trate that the travel time decreasing moves together increase the results with
(only) 0.3%. The fact that PR considers many alternative initial solutions (diver-
sification strategy), probably reduces the need for travel time decreasing moves,
compared to the SVNS algorithm. The most significant result in this analysis,
however, is that it would be better to leave out Swap. The quality would remain
the same, and the computational effort would be reduced to 93.6%.
For both SVNS and PR Insert and TwoOpt are required to obtain high quality
results with small computational times. Furthermore, Replace appears to be an
efficient move and both algorithms require a good diversification strategy.
It would be interesting to know the contribution of the local search moves
in the ant colony algorithm [4] and the tabu search and VNS algorithms [3],
in order to draw more general conclusions about useful local search moves for
A Detailed Analysis of Two Metaheuristics for the TOP 113
3 Parameter Settings
Next to selecting appropriate moves and the best sequence to apply them, an-
other important design decision is the parameter setting. In almost all papers,
not only about TOP algorithms, parameter settings are based on experimen-
tal results or preliminary testing. The parameters of the best performing TOP
algorithms [1,3,4,5,6,7] are all determined in this way. Almost never, the ”sensi-
tivity” of the algorithm to a particular parameter setting is discussed. Sensitivity
can be defined as to what extend small or larger changes in parameter settings
will influence the quality and the computational effort of the algorithm.
SVNS: For SVNS, the only important parameters are the maximum number
of iterations without improvement (N oImproveM ax = 40) and the maximal
number of locations to remove in each route (KM ax = 25). In order to verify
the influence of these parameters, the test instances are also solved with differ-
ent combinations of higher and lower values: N oImproveM ax = 20 and 80 and
KM ax = 12 and 50. Increasing N oImproveM ax increases the quality of the
results, but the calculation time also increases significantly. This clearly illus-
trates the trade-off between the required calculation time and the quality of the
results. Furthermore, the algorithm is not at all sensitive to changes in KM ax;
the influence on the result quality and the computational time is insignificant.
PR: The most important parameter in the PR algorithm is the maximum num-
ber of iterations. Based on this parameter slow and fast variants of the algorithm
can be constructed (more details about this parameter can be found in [7]). In
this paper only a slow variant is considered, with a maximum of 100 iterations
without improvement. Other important PR parameters are the size of the elite
pool (EliteM ax = 5), the Greediness (0.5) of the initialisation method and the
SimilarityT hreshold (0.9) [7]. In order to verify the influence of these
parameters, the test instances are also solved for other values of these parameters:
EliteM ax = 3 and 10, Greediness = 0.3 and 0.8 and SimilarityT hreshold = 0.5.
Changing the number of elite solutions has a small influence on the quality of
the results, but a significant influence on the computational efforts. The greedi-
ness value does not influence the result quality, however, a significant increase in
computational effort is required when the greediness is increased or decreased.
Further analysis should indicate if 0.5 can be considered as an optimal value and
an ”ideal” mix of greediness and randomness. Decreasing the similarity thresh-
old decreases significantly the quality of the results on the one hand, but also
the computational effort on the other hand. Again a trade-off should be made.
4 Conclusions
By analysing the ”contribution” of individual local search moves and differ-
ent variants of the solution algorithm, insight is gained in the implemented
114 P. Vansteenwegen, W. Souffriau, and D. Van Oudheusden
metaheuristics. These insights can help to optimise the design of the algorithm
under study, other algorithms or future algorithms. Furthermore, in many pa-
pers, parameter setting are (only) based on experimental results or preliminary
testing. Almost never, the ”sensitivity” of the algorithm to a particular parame-
ter setting is discussed. Nevertheless, this can give important information about
the performance of an algorithm and the appropriateness of the trade-offs that
were made during the implementation of the algorithm. In this paper, the pa-
rameter sensitivity and the importance of different moves are analysed for two
different metaheuristics implemented to solve the team orienteering problem. In
this way, important insights in these algorithms are gained.
In order to increase the statistical significance of the observed results, stan-
dard deviations should be taken into account and appropriate statistical tests
should be used. ”Multiple linear regression” analysis would allow a more thor-
ough analysis of the test results in this paper. Multiple linear regression is a
statistical technique to determine the contribution or significance of different
parameters (local search moves or parameter settings) in obtaining certain re-
sults (high quality results or low computational times).
References
1. Tang, H., Miller-Hooks, E.: A tabu search heuristic for the team orienteering prob-
lem. Computer & Operations Research 32, 1379–1407 (2005)
2. Vansteenwegen, P., Van Oudheusden, D.: The mobile tourist guide: an OR oppor-
tunity. OR Insight 20(3), 21–27 (2007)
3. Archetti, C., Hertz, A., Speranza, M.: Metaheuristics for the team orienteering prob-
lem. Journal of Heuristics 13, 49–76 (2007)
4. Ke, L., Archetti, C., Feng, Z.: Ants can solve the team orienteering problem. Com-
puters & Industrial Engineering 54, 648–665 (2008)
5. Vansteenwegen, P., Souffriau, W., Vanden Berghe, G., Van Oudheusden, D.: Meta-
heuristics for tourist trip planning. In: Geiger, M., Habenicht, W., Sevaux, M.,
Sörensen, K. (eds.) Metaheuristics in the Service Industry. Lecture Notes in Eco-
nomics and Mathematical Systems, vol. 624, pp. 15–31. Springer, Berlin (2009)
6. Vansteenwegen, P., Souffriau, W., Vanden Berghe, G., Van Oudheusden, D.: A
guided local search metaheuristic for the team orienteering problem. European Jour-
nal of Operational Research 196(1), 118–127 (2009)
7. Souffriau, W., Vansteenwegen, P., Vanden Berghe, G., Van Oudheusden, D.: A path
relinking approach for the team orienteering problem. Computers & Operations
Research (2009), doi:10.1016/j.cor.2009.05.002
On the Explorative Behavior of
MAX–MIN Ant System
1 Introduction
2 Exploration: A Definition
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 115–119, 2009.
c Springer-Verlag Berlin Heidelberg 2009
116 D. Favaretto, E. Moretti, and P. Pellegrini
3 Experimental Analysis
In this section the impact of the values of the parameters on the exploration
performed by MAX–MIN Ant System is analyzed. The stopping criterion con-
sidered in this analysis consists in the fulfillment of 20000 objective function
evaluations. At this early stage, no local search procedure is applied. One hun-
dred TSP instances with 50 nodes are used. The set of experiments described
below have been performed also on one instance with 100 and 200 nodes. The re-
sults can be downloaded from the web page http://www.paola.pellegrini.it and
they appear qualitatively equivalent. On the same web site the code used for
computing the exploration and the instances used are available. The distance
measure considered is the Euclidean distance. A very short experimental anal-
ysis with two different distances. The results appear substantially analogous to
the ones reported in this Section. Following the computation described above
and setting x = 10, the value of the parameter 10 is 3.16.
MAX–MIN Ant System is run varying the values of the parameters. According
to the literature [9] these values have an impact on the exploration. The follow-
ing analysis aims at observing this difference of exploration during the evolution
of a run. The solutions are considered iteration by iteration, i.e. in groups of 50
elements. In the literature, the problem of measuring the exploration performed
by ACO algorithms has been shortly tackled [3]. The measure that is generally
accepted as a measure of stagnation is the average λ-branching factor. The con-
clusions drawn with the cluster analysis are considered in the light of the results
obtained by studying the trend of the average λ-branching factor computed in
the same horizon.
On the Explorative Behavior of MAX–MIN Ant System 117
50
50
40
40
alpha=3 alpha=3
alpha=4 alpha=4
exploration
30
20 30
20
10
10
0
50
50
40
40
beta=3 beta=3
beta=4 beta=4
exploration
30
20 30
20
10
10
0
50
50
rho=0.02 rho=0.02
rho=0.05 rho=0.05
40
40
rho=0.2 rho=0.2
rho=0.7 rho=0.7
exploration
30
20 30
20
10
10
0
30
ility
20
s
de
no
10
no
de
s
0
solutions
are considered here in a TSP-wise sense: only the permutations matter, while
probabilities are completely neglected. The number of different edges between
one solution and each other is reported. For each pair (x̂, ŷ), the size of the bullet
used is proportional to the number of solutions differing from solution x̂ for ŷ
edges. It is evident that, despite the very concentrated pheromone trails, the
solutions visited are far from being always the same. In this sense, despite that
the average λ-branching factor is a fundamental measure for the effectiveness
of MAX–MIN Ant System, the assessment of the number of clusters represents
much more accurately the actual behavior of the procedure.
On the Explorative Behavior of MAX–MIN Ant System 119
4 Conclusions
This paper deals with the observation of the behavior of stochastic procedures,
intended in terms of explorative attitude of the algorithms. A definition of explo-
ration independent on the algorithm is presented, and a consequent measurement
method is provided. It is applied to MAX–MIN Ant System: The impact of the
values of the parameters on the exploration is assessed. The conclusions drawn
in this sense are put in relations with the indications provided by the average
λ-branching factor. In these first experiments, only one parameter at a time
is varied, and no local search is applied. Both these points will be overcome
in future research. In particular, it is expectable that the interaction between
parameters has an impact on the exploration performed.
References
1. Bhattacharya, M.: A synergistic approach for evolutionary optimization. In:
GECCO 2008: Proceedings of the 2008 GECCO conference companion on Genetic
and evolutionary computation, pp. 2105–2110. ACM, New York (2008)
2. Devarenne, I., Mabed, H., Caminada, A.: Intelligent neighborhood exploration in
local search heuristics. In: ICTAI 2006: Proceedings of the 18th IEEE International
Conference on Tools with Artificial Intelligence, pp. 144–150. IEEE Computer So-
ciety, Los Alamitos (2006)
3. Dorigo, M., Stützle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004)
4. Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)
5. Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster
analysis. Wiley Series in Probability and Mathematical Statistics. Applied Proba-
bility and Statistics. Wiley, New York (1990)
6. Jardine, N., Sibson, R.: The construction of hierarchic and non-hierarchic classifi-
cations. The Computer Journal 11(2), 177–184 (1968)
7. Orosz, J.E., Jacobson, S.H.: Finite-time performance analysis of static simulated
annealing algorithms. Computational Optimization and Applications 21(1), 21–53
(2002)
8. Pellegrini, P., Ellero, A.: The small world of pheromone trails. In: Dorigo, M.,
Birattari, M., Blum, C., Clerc, M., Stützle, T., Winfield, A.F.T. (eds.) ANTS 2008.
LNCS, vol. 5217, pp. 387–394. Springer, Heidelberg (2008)
9. Pellegrini, P., Favaretto, D., Moretti, E.: On MAX–MIN ant system’s parameters.
In: Dorigo, M., Gambardella, L.M., Birattari, M., Martinoli, A., Poli, R., Stützle, T.
(eds.) ANTS 2006. LNCS, vol. 4150, pp. 203–214. Springer, Heidelberg (2006)
A Study on Dominance-Based Local Search
Approaches for Multiobjective Combinatorial
Optimization
1 Introduction
The aim of this study is to provide a unified view of dominance-based local
search for multiobjective optimization. Contrary to the single-objective case, a
Multiobjective Combinatorial Optimization Problem (MCOP) does not yield
a unique optimal solution. Instead, a set of compromise solutions, known as
efficient solutions must generally be identified. Since they are naturally well-
suited to find multiple efficient solutions in a single simulation run, a tremendous
number of multiobjective evolutionary algorithms have been proposed over the
last two decades [1]. However, local search methods are known to be efficient
metaheuristics for single-objective optimization. Local search, also referred to
as hill-climbing, descent, iterative improvement, etc., is likely the oldest and
simplest metaheuristic [2]. But multiobjective local search principles based on
a dominance relation appeared quite recently [1,3]. Hence, some dominance-
based multiobjective local search methods have been proposed in the literature,
including the Pareto Archived Evolution Strategy (PAES) [4], the Pareto Local
Search (PLS) [5] or the Bicriteria Local Search (BLS) [6]. Such methods generally
combine the definition of a neighborhood structure with the use of a population
of solutions. They maintain a set of potentially efficient solutions, and iteratively
improves this set by exploring part of its neighborhood. Our first purpose is to
give a unified view of dominance-based multiobjective local search. We describe
the basic components shared by all these algorithms and we introduce a general-
purpose model for their design. Afterwards, we concentrate on a subpart of
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 120–124, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Dominance-Based Local Search for MCO 121
components involved into the unified model in order to study their respective
behavior on a multiobjective permutation flowshop scheduling problem.
Until now, each DMLS algorithm was designed independently of the others, and
was implemented as a self-contained method with its own specific components.
In the following, we identify the common components shared by all DMLS al-
gorithms and propose a unifying model that takes them into account. Hence,
whatever the MCOP to be solved, the common concepts for the design of a
DMLS algorithm can be stated as follows: (1) design a representation, (2) de-
sign a initialization strategy, (3) design a way of evaluating a solution, (4) design
a suitable neighborhood structure, (5) design a way of evaluating a neighboring
solution incrementally (if possible), (6) decide a current set selection strategy,
(7) decide a neighborhood exploration strategy, (8) decide an archive manage-
ment strategy, (9) decide a stopping condition. When dealing with any kind of
metaheuristics, one may distinguish problem-related and problem-independent
components. Hence, the first five issues presented above strongly depend of the
MCOP under consideration, whereas the last four ones can be seen as generic
components. In addition, three data structures are used to store (i) the archive
contents, (ii) the current set of solutions whose neighborhood is to be explored,
and (iii) the candidate set of neighbor solutions that will potentially enter the
archive. Problem-related components are assumed to be designed for the MCOP
at hand, so that they are not discussed in the paper due to space limitation.
Current set selection. The first phase of a local search step deals with the
selection of a set of solutions from which the neighborhood will be explored.
Generally speaking, in the frame of the DMLS model presented in the paper,
two strategies can be applied. Firstly, an exhaustive selection, where all solutions
from the archive are selected. Second, a partial selection, where only a subset of
solutions is selected. Such a set may be selected at random, or also with respect
to a diversity measure. Of course, if some archive members are marked as visited,
they must be discarded of the current set selection for obvious efficiency reasons.
3 Computational Experiments
The goal of this section is to experiment the efficiency of some state-of-the-art
strategies for both current set selection and neighborhood exploration. For each
component, a set of 2 different schemes are investigated. This gives rise to a
combination of 4 DMLS algorithms. Hence, with regards to the current set se-
lection, either (i) each or (ii) a random single unvisited solution is selected from
the archive. Next, with regards to the neighborhood exploration, either (i) all
or (ii) a single random neighbor per solution is proposed as a candidate for in-
tegrating the archive. The corresponding algorithms are denoted by DMLS(1·1) ,
DMLS(1·) , DMLS(·1) and DMLS(·) . Note that the algorithm denoted by and
DMLS(1·1) is closely related to PAES [4], DMLS(1·) to PLS [5], and DMLS(·)
to BLS [6]. For each problem instance to be solved, different maximum run-
time values, from 2 to 20 minutes, have been investigated in order to study the
evolution of the search efficiency over time. However, as some algorithms stop
in a natural way, a simple random restart has been performed to continue the
search process until the maximum runtime is reached. For all the experiments,
the initial population size is set to 1, and an unbounded archive is maintained.
the additional objective to be minimized. The reader is referred to [7] for more
information on multiobjective scheduling.
The problem-related components used for the specific case of the FSP pre-
sented above are the following ones. Firstly, the representation is based on a
permutation of size N . Next, the initialization strategy consists of generating
solutions randomly. At last, the neighborhood is based on the insertion opera-
tor, i.e. a job at position i is inserted at position j = i, and the jobs located
between positions i and j are shifted.
1
These instances are available at http://www.lifl.fr/~ liefooga/benchmarks/
124 A. Liefooghe et al.
4 Conclusion
In this paper, a unification of dominance-based local search approaches for mul-
tiobjective combinatorial optimization has been attempted. Such methods can
be seen as a generalization of the classical single-objective hill climbing, com-
bined with the use of a population of solutions. They are based on the iterative
improvement of the set of nondominated solutions by means of a neighborhood
operator. A unified model has been proposed and its main issues have been iden-
tified. The problem-independent components of current set selection, neighbor-
hood exploration as well as archiving strategies have been especially discussed.
This model has been used as a starting point for the design and the implementa-
tion of an open-source software framework for dominance-based multiobjective
local search. This contribution has been conceived as a plug-in to be integrated
into the ParadisEO-MOEO software framework2. At last, the issues of current
set selection and neighborhood exploration have been experimentally compared
on a multiobjective flowshop scheduling problem. We showed the benefit of per-
forming a full neighborhood exploration in order to avoid the revaluation of some
neighboring solutions and to reach a natural stopping criterion. Furthermore, we
concluded that selecting a single solution from the current population to explore
its neighborhood in an exhaustive manner was especially efficient for the problem
under consideration. As a next step, we will investigate larger instances for the
flowshop scheduling problem as well as other kinds of multiobjective problems.
References
1. Ehrgott, M., Gandibleux, X.: Approximative solution methods for multiobjective
combinatorial optimization. TOP 12(1), 1–89 (2004)
2. Talbi, E.G.: Metaheuristics: from design to implementation. Wiley, Chichester
(2009)
3. Paquete, L., Stützle, T.: Stochastic local search algorithms for multiobjective com-
binatorial optimization: A review. In: Handbook of Approximation Algorithms and
Metaheuristics. Chapman & Hall / CRC (2007)
4. Knowles, J.D., Corne, D.: Approximating the nondominated front using the Pareto
archived evolution strategy. Evolutionary Computation 8(2), 149–172 (2000)
5. Paquete, L., Chiarandini, M., Stützle, T.: Pareto local optimum sets in the biobjec-
tive traveling salesman problem: An experimental study. In: [9], pp. 177–199
6. Angel, E., Bampis, E., Gourvés., L.: A dynasearch neighbohood for the bicriteria
traveling salesman problem. In: [9], pp. 153–176
7. T’Kindt, V., Billaut, J.C.: Multicriteria Scheduling: Theory, Models and Algorithms.
Springer, Berlin (2002)
8. Zitzler, E., Thiele, L., Laumanns, M., Foneseca, C.M., Grunert da Fonseca, V.:
Performance assessment of multiobjective optimizers: An analysis and review. IEEE
Transactions on Evolutionary Computation 7(2), 117–132 (2003)
9. Gandibleux, X., Sevaux, M., Sörensen, K., T’Kindt, V. (eds.): Metaheuristics for
Multiobjective Optimisation. Lecture Notes in Economics and Mathematical Sys-
tems, vol. 535. Springer, Berlin (2004)
2
The plug-in is available at http://paradiseo.gforge.inria.fr/DMLS/
A Memetic Algorithm for the Multidimensional
Assignment Problem
1 Introduction
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 125–129, 2009.
c Springer-Verlag Berlin Heidelberg 2009
126 G. Gutin and D. Karapetyan
2 The Algorithm
A memetic algorithm is a combination of genetic algorithm with local search. A
typical scheme of a memetic algorithm is as follows.
1. Produce the first generation, i.e., a set of solutions.
2. Apply the local search procedure to every solution in the first generation.
3. Repeat the following while a termination criterion is not met:
(a) Produce a set of new solutions by applying so-called genetic operators
to solutions from the previous generation.
(b) Improve every solution in this set with the local search procedure.
(c) Select several best solutions from this set to the next generation.
While the general scheme of the algorithm is quite common for all memetic
algorithms, the set of genetic operators and the way they are applied can vary
significantly. In our algorithm, we use the following procedure to obtain the next
generation:
where g k is the kth generation and gjk is the jth assignment of the kth generation;
g1k is the best assignment in the kth generation. Constants pm = 0.5 and μm =
0.1 define the probability and the strength of mutation operator respectively.
The function selection simply returns mi+1 best distinct assignments among
the given ones, where mk is the size of the kth generation (if the number of
distinct assignments in the given set is less than mi+1 , selection returns all the
distinct assignments and updates the value of mi+1 accordingly). To obtain the
set of assignments C (crossover part) we repeat LocalSearch(crossover(gui , gvi ))
operation (p·mi+1 −mi )/2 times, where u, v ∈ {1, 2, . . . , mi } are chosen randomly
for every crossover run and p = 3 defines how many times more assignments
should be produced for the selection operator. The mutation function for a set
of solutions is defined as follows:
LocalSearch(perturb(g, μ)) if r < p
mutation(G, p, μ) =
g otherwise
g∈G
where r ∈ [0, 1] is chosen randomly every time. The functions crossover(x, y),
perturb(x, μ) and LocalSearch(x) are discussed later.
Generation Size. The most natural way to fit the running time of a memetic
algorithm into the given bound is to produce generations of some fixed size until
the time is elapsed. However, it is clear that one memetic algorithm cannot
work efficiently in both cases if there are just a few generations and if there are
hundreds of generations. Thus, instead, we fix the number of generations and
vary the generation size.
Our computational experiments show that, with a fixed running time, the
most appropriate number I of generations for our algorithm is always around
50; this number does not depend on the local search procedure or the given time.
Since the running time of the local search procedure can vary significantly (e.g.,
last generations usually contain better solutions than the first ones and, thus,
are processed faster) and also to make our algorithm easily portable, we decided
to adjust the generation size dynamically according to the remained time such
that the total number of generations would always be close to I.
In particular, the size of the next generation is calculated as follows:
T −t
mi · max min Δ·(I−i) , k , k1 if i < I
mi+1 = ,
mi · k otherwise
where T is the given time, t is the elapsed time, Δ is the time spent to produce
the previous generation, I is the prescribed number of generations and k = 1.25
is a constant that limits the generation size change. Note that the values mi are
real numbers, and the actual size mi of the ith generation is defined as follows:
mi if (p · mi − mi−1 ) is even
mi = max 4, ,
mi + 1 otherwise
which guarantees that the value p · mi+1 − mi , i.e., the number of assignments
produced by crossover, is even and that the generation size is never too small.
128 G. Gutin and D. Karapetyan
The size of the first generation is obtained in a different way (see below).
gj1 = LocalSearch(perturb(greedy, μf )) ,
one, and then adding them either to the first and to the second child respec-
tively (with probability 80%) or to the second and to the first child respectively
(with probability 20%). Since the obtained child assignments can be infeasible,
the crossover corrects each of them; for every dimension of every child it re-
places all duplicate coordinates with randomly chosen correct ones, i.e., with
the coordinates which are not currently used for that dimension.
References
1. Gutin, G., Karapetyan, D.: A memetic algorithm for the multidimensional assign-
ment problem. Preprint in arXiv (2009), http://arxiv.org/abs/0906.0862
2. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research
Logistic Quarterly 2, 83–97 (1955)
3. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
of NP-Completeness. W.H. Freeman, New York (1979)
4. Huang, G., Lim, A.: A hybrid genetic algorithm for the three-index assignment
problem. European Journal of Operational Research 172(1), 249–257 (2006)
5. Gutin, G., Karapetyan, D.: Local search heuristics for the multidimensional as-
signment problem. In: Proc. Golumbic Festschrift. LNCS, vol. 5420, pp. 100–115.
Springer, Heidelberg (2009)
6. Karapetyan, D., Gutin, G., Goldengorin, B.: Empirical evaluation of construction
heuristics for the multidimensional assignment problem. In: Chan, J., Daykin, J.W.,
Rahman, M.S. (eds.) London Algorithmics 2008: Theory and Practice. Texts in
Algorithmics, pp. 107–122. College Publications (2009)
Autonomous Control Approach for Local Search
1 Introduction
Local search algorithms are metaheuristics which have been widely used for
solving complex combinatorial problems. Their efficiency relies on their ability
to suitably explore various areas of the search space but also on its propensity
to converge to a local optimum (the locality is defined here with respect to the
notion of neighborhood). The concept of balance between intensification and
diversification, especially well-known in evolutionary computation, is a crucial
point when designing and using a local search algorithm. Indeed, one of the
classic pitfalls encountered by these algorithms is the excessive attraction of local
optima, which may trap the search process when all the potential neighbors
are not as good as the current configuration and when the move strategy is
mainly based on improvement. To cope with this excessive exploitation of the
search space (i.e., intensification), alternative mechanisms must be used to insure
enough diversification.
Inspired by the recent book of R. Battiti et al.[1] and our previous work on
the autonomous management of multiple operators in genetic algorithms [2], we
propose an original approach in order to design a local search algorithm that
will include several move operators, corresponding to different neighborhoods
and different strategies for choosing the neighbors. The control of these operators
will then be achieved automatically. We have tested our algorithm on the famous
quadratic assignment problem (QAP), which has been widely studied and for
which an extensive library of instances and results is available [3].
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 130–134, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Autonomous Control Approach for Local Search 131
AOS LS
Application
A li ti
Parameters
Reward
Computation
Evaluation
Operator
Selection
Current
Search State
Operator
Application
[2]. This selection mainly consists in evaluating the effect of the operator on the
current state of the search in order to reward them and to be able to choose the
most suitable one for the next computation step. Therefore, our objective is to
evaluate the impact of the operators and to adjust their use according to the
current the search. This approach is summarized in Figure 1.
This figure allows us to highlight the main issues that we have to address in
an autonomous local search algorithm: How to evaluate the current search state ?
How to reward operators with regards to this evaluation ? How to use these rewards
to select the operator for the next move ?
If the notion of quality helps to guide and evaluate the ongoing search process,
the concept of diversity should also deserves more attention. It has appeared in-
deed in previous works [2] that this notion can be used jointly with the quality
in order to efficiently manage the balance between diversification and intensifi-
cation. In the next section, we will therefore propose a definition of a diversity
measure according to local search specificities.
The aim of our algorithm, called ALS (Autonomous Local Search), is to manage
a set of local search operators. The challenge is thus to make three main modules
work together: current solving state evaluation, internal components rewarding,
and selection of the next operator using these rewards.
eval(op(c)) − eval(c)
ΔQ = and ΔD = div(Pi,j ) − div(Pi−1,j−1 )
eval(c) + 1
Applying the same memorization principles to the variations of the search state,
we may collect important information about how the search process evolves
among the search space. Indeed, a diversity loss reflects a focus on a particular
search space areas, whereas a diversity gain appears when moving away from
the current area. During the solving process, the choice of next operator to
apply is achieved according to probabilities defined in the following section, 4.3.
These probabilities are widely influenced by a parameter α, which models the
desired balance between intensification and diversification. We thus introduce
three values (see figure 2):
The application strategy can thus been seen as the way to compute γ’s value
according to collected information. For example, if one gets trapped into a local
optima, it will be beneficial to increase γ in order to promote search diversifi-
cation. Furthermore, the closest to α the search trajectory is, the more efficient
the solving process is. We designed then a formula to compute γ according to α
and β, reducing the gaps between them as much as possible:
α − gap(α, β)/2 if gap(α, β) ≤ π/2
γ=
α − gap(α, β + π)/2 otherwise
Table 1. Mean deviation of ALS, uniform choice, and robust taboo search from BKV
ALS
Instance BKV UC RTS
α = 0.25π α = 0.15π α = 0.1π α = 0
bur26a 5426670 0,1177 0,0196 0,0020 0,0000 0,0020 0,0000
bur26c 5426795 0,0359 0,0029 0,0000 0,0000 0,0217 0,0000
bur26f 3782044 0,0153 0,0019 0,0000 0,0000 0,0679 0,0000
chr25a 3796 42,4341 10,2160 6,6228 8,3298 8,6828 7,6765
els19 1,7E+07 4,8532 0,0003 0,0000 0,0000 0,0000 0,0000
kra30a 88900 3,8774 0,8931 0,7627 0,4027 1,1755 0,0000
kra30b 91420 2,4251 0,3227 0,0131 0,0459 0,1181 0,0230
tai30b 6,4E+08 1,5794 0,1882 0,2319 0,0372 0,8525 0,0326
tai50b 4,6E+08 1,4307 0,2330 0,3693 0,2566 0,5376 0,1078
nug20 2570 1,9767 0,0156 0,0000 0,0000 0,0000 0,0000
nug30 6124 2,0901 0,2449 0,0000 0,0000 0,0131 0,0065
sko42 15812 2,0529 0,5237 0,0443 0,0202 0,0620 0,0342
sko49 23386 2,0174 0,6431 0,2279 0,2407 0,2382 0,1403
sko56 34458 2,1139 0,5305 0,1843 0,1660 0,2783 0,1051
tai30a 1818146 3,6971 1,7781 0,4163 0,6008 0,3973 0,3933
tai35a 2422002 3,9348 2,1099 0,8157 0,6868 0,9082 0,7705
tai50a 4941410 4,4437 2,4926 1,1522 1,1269 1,3648 1,3733
wil50 48816 1,0005 0,2176 0,0520 0,0385 0,0713 0,0361
Average 4,4498 1,1352 0,6053 0,6640 0,8217 0,5944
References
1. Battiti, R., Brunato, M., Mascia, F.: Reactive Search and Intelligent Optimiza-
tion. Operations Research/Computer Science Interfaces, vol. 45. Springer, Heidel-
berg (2008)
2. Maturana, J., Fialho, A., Saubion, F., Schoenauer, M., Sebag, M.: Compass and
dynamic multi-armed bandits for adaptive operator selection. In: Proceedings of
IEEE Congress on Evolutionary Computation (2009)
3. Burkard, R.E., Karisch, S., Rendl, F.: QAPLIB-a quadratic assignment problem
library. European Journal of Operational Research 55(1), 115–119 (1991)
4. Taillard, É.: Robust taboo search for the quadratic assignment problem. Parallel
Computing 17(4-5), 443–455 (1991)
EasyGenetic: A Template Metaprogramming
Framework for Genetic Master-Slave Algorithms
1 Introduction
Genetic algorithms (GAs) are applied since several decades to problem solving
and a plethora of successful cases demonstrates the effectiveness of this paradigm
as a tool for building “automatic problem solvers”.
Among other domains, GAs are particularly suitable for design problems, in
which the construction of an artifact depends on a set of design choices and
parameter values. From an abstract point of view, these problem can be seen as
a composite master-slave task: a low-level (slave) task consists of building the
artifact on the basis of a parametrized constructive procedure that is instructed
by the parameter setting determined by a high-level (master) task.
In this work we adopt this very perspective and present EasyGenetic, a tool
that enables the algorithms designer to implement a genetic solver by combining
basic components and to tackle combinatorial optimization problems for which
a parametric constructive procedure is available. EasyGenetic is a framework
for implementing genetic solvers based on the master-slave decomposition of the
problem which is developed employing template metaprogramming, a technique
that allows to combine flexibility with efficiency.
The remainder of this paper is structured as follows. In Section 2 we pro-
vide the bird’s-eye view of the genetic master-slave solver, whose architecture
and implementation is detailed in Section 3. A more detailed description of
EasyGenetic is provided in [1].
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 135–139, 2009.
c Springer-Verlag Berlin Heidelberg 2009
136 S. Benedettini, A. Roli, and L. Di Gaspero
the first phase, the parameters of a constructive procedure are set by a master
solver whereas in the second phase the solution is actually built by a slave solver
(see Algorithm 1). For instance, the constructive procedure can be based on a
sequence of decisions whose order is defined by the master solver. Many problems
allows such a decomposition, for example planning or assignment problems.
In EasyGenetic, the master is a GA, while the slave algorithm can be, in
general, any deterministic constructive procedure that accepts an initial set of
parameters that completely define solution construction. For example, for com-
binatorial problems there exist constructive procedures based on the following
parameters: the sequence of objects to be included in the solution and/or the de-
cisions to be taken, the set of preassigned variables or the set of hard constraints
to be satisfied. In a sense, the master explores the search space of “parameter
settings”, employing the solution returned by the slave as the evaluation of those
search space points.
<<Concept>>
<<Concept>>
ValueSemantics Chromosome
ProblemModel
<<Concept>>
MutationOperator
+mutate(c:Chromosome)
<<Concept>>
Sequence Population Solver
<<Concept>>
+solve(): Individual
CrossoverOperator
+crossover(p:SequenceOfParents): Offspring
<<Concept>>
SelectionPolicy The arity of this operator
is unspecified
+select(population:Sequence): Individual
<<Concept>>
<<Concept>> UpdatePolicy
SlaveProcedure
+update(population:Sequence,offspring:Sequence)
+evaluate(Individual): fitness_value_type +best(population:Sequence,currentBest:Individual)
class Solver {
Individual solve() {
std::vector<Individual> pop(pop_size);
for (uint i = 0; i < pop_size; ++i) {
Chromosome c = ChromosomeGenerator::generate(model);
fitness_value_type v = SlaveProcedure::evaluate(model, c);
pop[i] = Individual(c, v); // Individual is-a Chromosome
}
std::sort(pop.begin(), pop.end());
Individual best = pop.back();
for (/* termination conditions not met */) {
std::vector<Individual> offspring(offspring_size);
while(/* offspring is not full */){
//select chromosome with SelectionPolicy::select(pop)
//apply crossover and mutation operators
for (uint i = 0; i < offspring_size; ++i)
offspring[i].value = SlaveProcedure::evaluate(model, offspring
[i]);
UpdatePolicy::update(pop, offspring);
UpdatePolicy::best(pop, offspring, best);
}
return best;
}
}
– The UpdatePolicy concept specifies the interface for updating the population
for the next generation of the genetic algorithm.
– The SelectionPolicy concept provides the interface for components that im-
plement a selection procedure.
– CrossoverOperator and MutationOperator specify the interfaces required by
those components that implement genetic operators. EasyGenetic seam-
lessly supports crossover operators with arbitrary number of parents and
offspring (up to a reasonable amount).
T::fitness_value_type
T::solution_type
T::chromosome
T::chromosome_generator
T::slave_procedure
The purpose of ProblemModel is to provide the Solver with the actual problem-
specific type information about various key entities of the system. Each member
EasyGenetic 139
4 Applications
References
1. Benedettini, S., Roli, A., Di Gaspero, L.: Easygenetic: A template metaprogramming
framework for genetic master-slave algorithms. Technical Report DEIS-LIA-09-005,
University of Bologna (Italy), LIA Series no. 95 (May 2009)
2. Standard Template Library, http://www.sgi.com/tech/stl/ (viewed: April 2009)
3. Boost C++ libraries, http://www.boost.org/ (viewed: April 2009)
Adaptive Operator Selection
for Iterated Local Search
Dirk Thierens
1 Introduction
Metaheuristics are search methods that aim to enhance the performance of multi-
start local search by applying a problem independent strategy. For many combi-
natorial optimization problems, metaheuristic search algorithms are among the
best performing techniques. Iterated local search (ILS) is a simple yet powerful
metaheuristic. The search strategy of ILS consists of applying small perturba-
tions on local optima and restarting local search from the perturbed solution.
Ideally, the ILS perturbation step should move the search just outside the basin
of attraction of the current local optimum [1]. If the new local optimum is better
than the old one, ILS will continue searching from the new solution, otherwise
it will return to the previous local optimum. ILS actually performs a stochastic
greedy search in the space of local optima. It will be most successful in search
space structures where the neighboring local optima have highly correlated fit-
ness values. The only drawback of ILS is that it is rather sensitive to the size of
the perturbation step.
Adaptive operator selection (AOS) methods are on-line adaptive algorithms
that adjust the probability of applying the search operators to the current solu-
tions. In the literature, AOS algorithms have been tested on artificial or trivial
problems [2] [3]. Here, we show that the adaptive pursuit algorithm [3] can be
successfully applied to design an adaptive ILS algorithm for a knapsack problem.
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 140–144, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Adaptive Operator Selection for Iterated Local Search 141
Most algorithms for solving the multi-constraint knapsack problem first reduce
the multiple constraints to a single surrogate constraint. Here, we are only in-
terested to see the effect of adaptive operator selection on iterated local search
for a non-trivial problem. Therefore, we only consider the blind version of the
single-constraint knapsack problem, meaning that the search algorithm has no
knowledge of the specific profit and weight of the individual items. The bench-
mark problem used here has = 500 profits and weights that are uniformly
distributed random integers from the interval [10 . . . 50]. The capacity of the
knapsack is half the sum of all the weights.
Table 1. Profit values for the knapsack problem found with the greedy heuristic and
the greedy heuristic + local search, and the median profit of 30 runs for the blind
version of the same knapsack problem with multi-start local search
found by this greedy heuristic on our test problem. We also show the results of
applying the local search operator to the greedy solution. This way we obtain
a solution of very high quality which serves as a benchmark to the ILS and
adaptive ILS that are solving the blind version of the problem.
Table 1 also shows the median profit values for multi-start local search with
1000 independent restarts of the local search algorithm. Clearly, the solutions
obtained by multi-start local search are substantially inferior to the solutions
obtained by the greedy heuristic (+ local search). Of course the former solves
the blind knapsack problem while the latter uses the knowledge of the profits
and weights of each individual item. The results indicate however that there is
a lot of room for improvement left for the ILS and adaptive ILS metaheuristics.
10540
10530
10520
10510
10500
Fig. 1. The box and whisker plots show the median, 25th and 75th percentile, minimum
and maximum values. Small circles represent outliers.
The result of ILS with Pmut = 0.01 exceed or match the results of the greedy
heuristic followed by local search on the non-blind knapsack version. For larger
values of Pmut the performance of ILS deteriorates but it is still a lot better than
the performance of MLS.
4 Conclusion
Iterated local search is a simple yet powerful metaheuristic, unfortunately it is
very sensitive to the size of the perturbation step. Adaptive operator selection
methods are on-line adaptive algorithms that adjust the probability of apply-
ing the search operators to the current solutions. We have demonstrated that
the adaptive pursuit algorithm can be applied to automatically select the per-
turbation step size for ILS when optimizing a blind, single-constraint knapsack
problem. The resulting adaptive ILS achieves almost the same performance as
the ILS with the best perturbation step size but without the need to determine
the optimal parameter setting, thus it is computationally more efficient.
References
1. Lourenço, H., Martin, O., Stützle, T.: A beginner’s introduction to iterated local
search. In: Proceedings of the 4th Metaheuristics International Conference (2001)
2. DaCosta, L., Fialho, A., Schoenauer, M., Sebag, M.: Adaptive operator selection
with dynamic multi-armed bandit. In: Proceedings of the 10th Genetic and Evolu-
tionary Computation Conference, pp. 913–920 (2008)
3. Thierens, D.: An adaptive pursuit strategy for allocating operator probabilities.
In: Proceedings of the 7th Genetic and Evolutionary Computation Conference, pp.
1539–1546 (2005)
Improved Robustness through Population
Variance in Ant Colony Optimization
1 Introduction
Stochastic Local Search (SLS) [1] algorithms are an effective means to solve com-
binatorial optimization problems [2]. The Traveling Salesman Problem (TSP) is
a well known combinatorial optimization problem where the goal is to construct
the shortest possible tour visiting each city only once. As the number of cities in-
creases, the combinatorial size prevents a complete search of the entire solution
space. SLS algorithms employ diversification methods to find promising areas
in the solution space and intensification methods to focus the search in these
promising areas.
Ant Colony Optimization (ACO) [3] algorithms are population-based SLS
algorithms where a colony of ants communicates indirectly through pheromone
trails over a series of iterations. Each ant in the colony randomly constructs a
solution to the problem using the pheromone trails and problem heuristics as
aids. After each iteration, pheromone trail updates based on the best solutions
found help narrow the search.
The performance of SLS algorithms depends on proper parameter selection.
While parameter recommendations exist for these algorithms, the optimal pa-
rameters are often problem specific. This paper focuses on a new method to
improve ACO algorithm robustness, the ability to perform well for suboptimal
parameter selections [1].
Existing ACO algorithms employ a homogeneous colony of ants. The ants
in these colonies use identical parameters throughout a run. The Population
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 145–149, 2009.
c Springer-Verlag Berlin Heidelberg 2009
146 D.C. Matthews et al.
2 Population Variance
Population Variance introduces the functions αk (t) and βk (t) to the computation
of proportional probabilities (1) used in solution construction. These functions
allow us to change the values of α and β by ant k or iteration t, varying the
relative contribution of the pheromones τ and the heuristics η.
αk (t) βk (t)
[τijα (t)] · [ηij ]
pkij (t) = αk (t) βk (t)
. (1)
l∈Nik [τilα (t)] · [ηil ]
In this paper, we incorporate Population Variance into the Max-Min Ant System
(MMAS) [4]. We previously proposed an improved lower limit for pheromone
trails (2) in MMAS that avoids stagnation when computing the proportional
probabilities and significantly improved results when α = 1 [5]. We used this
improved lower limit with both the MMAS and Population Variance algorithms
studied in this paper.
√ 1
1 − n pbest α
τmin = τmax · √ . (2)
avg · n pbest
The improved lower limit for pheromone trails in (2) implies that pheromone
tables are now α specific, hence the use of τijα in (1). Pheromone scaling defined
in (3) allows us to maintain a single pheromone table for α = 1 and scale the
pheromones proportionally for other values of α. This increases the computation
of each proportional probability a small amount.
τmax − τmin,α
τijα (t) = τmin,α + (τij1 (t) − τmin,1 ) · . (3)
τmax − τmin,1
There are many possible methods to select αk (t) and βk (t). In this paper we
use a simple diversification method to increase exploration, varying these values
independently by iteration with a uniform distribution of d discrete values over
a defined range for each. All ants share the same values for a given iteration,
allowing construction of a single proportional probability table pkij (t). The αk (t)
Improved Robustness through Population Variance in ACO 147
function in (4) selects d discrete values between αmin and αmax . The βk (t)
function in (5) selects d discrete values between βlim (t) and βmax .
αmax − αmin
αk (t) = αmin + · random(0, 1) · d. (4)
d−1
βmax − βlim (t)
βk (t) = βmax − · random(0, 1) · d. (5)
d−1
Early prototypes showed an inverse relationship between the value of β and
the quality of the initial solution. To intensify the search, we use βlim (t) to
restrict the initial range of β values to produce better starting tours so that
early pheromone updates guide us to productive areas of the solution space. The
lower limit βlim (t) in (6) uses an exponential moving average from with decay
rate σ and βlim (0) = βmax .
3 Robustness
We modified ACOTSP [6] to incorporate the Population Variance equations
and accept αmin , βmin , αmax , βmax , d, and σ parameters. A series of tests
using TSPLIB [7] problems compared the performance of MMAS and PV across
a range of parameters intended to provide optimal and suboptimal paremters
combinations.
For MMAS, tests were run for all combinations of α = 1, 2, 3, 4, 5, 6, 7 and
β = 1, 2, 3, 4, 5, 6, 7. For PV, tests were run for αmin = 1, βmin = 1, αmax =
2, 3, 4, 5, 6, 7, βmax = 7, σ = 0.01, and d = 7. For both algorithms, tests were run
for all combinations of evaporation rates ρ = 0.025, 0.05, 0.1, 0.3, 0.5 and maxi-
mum pheromone selection probability pbest = 0.00005, 0.0005, 0.005, 0.05, 0.5 in
addition to the α and β settings. All runs were limited to 10 tries of 2500 it-
erations. Local search was not employed so we could evaluate the effectiveness
of the pheromone trail mechanism. All other parameters used their ACOTSP
defaults.
PV exhibits more robust performance compared to MMAS across a range of
parameter values for αmax , ρ, and pbests as shown in Fig. 1 for problem pcb442
from TSPLIB. Similar results were obtained for other problems in TSPLIB. The
MMAS tests for αmax include all tests with α ≤ αmax while the PV tests include
a similar number of repeated tests for the given αmax value. The figure shows
the same results for two ranges of percent deviation from optimal, 0 − 50 and
0 − 10. The range of PV results is much narrower and the PV medians are
lower than the corresponding MMAS medians with a significance level less than
0.01 using the Mann-Whitney rank sum test. Tests varying other parameters
148 D.C. Matthews et al.
MMAS PV MMAS PV
pcb442 pcb442 pcb442 pcb442
●
● ●
● ●
● ●
● ●
●
● ●
●
●
●
% deviation 50
% deviation 50
% deviation 10
% deviation 10
● ● ● ● ● ● ●
●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
● ●
● ●
●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
●
● ● ● ●
●
●
● ● ● ● ●
● ● ● ● ● ●
● ●
●
●
● ● ●
● ●
●
● ● ● ●
● ● ● ●
●
● ●
●
● ●
●
● ● ●
● ●
● ●
● ●
● ●
● ●
●
● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
●
● ● ●
● ●
● ●
● ●
●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
● ●
● ● ● ● ●
0
0
2 4 6 2 4 6 2 4 6 2 4 6
αmax αmax αmax αmax
MMAS PV MMAS PV
pcb442 pcb442 pcb442 pcb442
● ●
● ●
●
% deviation 50
% deviation 50
% deviation 10
% deviation 10
● ●
● ●
● ● ● ●
●
● ●
●
● ● ●
●
● ● ●
●
● ● ●
● ●
● ● ●
● ● ● ● ●
● ●
● ●
● ●
● ●
● ●
●
● ● ●
● ●
●
● ● ● ●
●
● ●
● ●
● ●
●
● ●
●
●
● ● ● ●
● ● ● ● ●
● ●
● ●
● ●
●
● ● ● ●
●
● ● ●
●
●
● ●
●
●
● ●
● ●
●
● ●
● ●
●
● ●
● ●
●
●
● ● ● ● ●
●
0
0
0.025 0.1 0.5 0.025 0.1 0.5 0.025 0.1 0.5 0.025 0.1 0.5
ρ ρ ρ ρ
MMAS PV MMAS PV
pcb442 pcb442 pcb442 pcb442
● ●
●
● ● ●
% deviation 50
% deviation 50
% deviation 10
% deviation 10
● ●
● ●
● ● ●
●
●
● ●
● ● ●
●
● ● ● ●
● ● ●
●
● ● ●
●
●
● ●
●
● ●
●
●
● ●
●
●
● ●
● ● ● ●
●
● ●
●
● ●
●
● ●
● ●
●
● ●
● ● ●
●
● ●
● ●
● ●
●
0
Fig. 1. Comparison of percent deviation from optimum of the Max-Min Ant System
(MMAS) and Population Variance (PV) for ranges of α, ρ, and pbest
(pseudo random proportional selection, candidate list size, and number of ants)
yielded similar improvements in robustness.
Some types of problems have no heuristics available to guide the search so
we compared the performance of MMAS and PV using only pheromones in
the random proportional selection, βmin = βmax = 0. The results in Fig. 2
shows four TSPLIB problems solved without the use of heuristics or local search,
relying solely on the pheromone trails. These results suggest the PV methods
for diversification and intensification are much more robust than MMAS.
Improved Robustness through Population Variance in ACO 149
●
●
●
● ●
●
●
% deviation
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
4 Summary
This paper introduced a new method called Population Variance to increase
robustness in ACO algorithms with respect to suboptimal parameter settings.
This method varies the α and β parameters used during solution construction to
improve exploration and escape local optima. The results of tests with problems
from TSPLIB show significant improvements in robustness, particularly when
heuristics are not available to aid the search.
Future work includes more sophisticated Population Variance functions αk (t)
and βk (t), interaction with local search, generalization to other ACO algorithms,
and use with other types of combinatorial optimization problems.
References
1. Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations & Applications. Mor-
gan Kaufmann, San Francisco (2005)
2. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
of NP-Completeness. W.H. Freeman, New York (1979)
3. Dorigo, M., Stützle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004)
4. Stützle, T., Hoos, H.H.: MAX–MIN Ant System. Future Generation Computer Sys-
tems 16(8), 889–914 (2000)
5. Matthews, D.C.: Improved Lower Limits for Pheromone Trails in Ant Colony Opti-
mization. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN
2008. LNCS, vol. 5199, pp. 508–517. Springer, Heidelberg (2008)
6. Stützle, T.: ACOTSP, http://www.aco-metaheuristic.org/aco-code
7. Reinelt, G.: TSPLIB,
http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/
index.html
Mixed-Effects Modeling of Optimisation
Algorithm Performance
1 Introduction
Models of algorithm performance can be useful for analysis purposes, but also
for automating algorithm selection, or parameter tuning. The correct analysis
technique depends on the kind of problem to be solved. For algorithms solving
optimisation problems, in which each solution is characterized by a measure of
its quality, the most general performance model is a bivariate distribution, relat-
ing runtime to solution quality. In order to gather performance data for a given
algorithm, one can solve a benchmark of instances, storing a time, quality pair
each time the best solution is improved. The resulting sample will be a set of
observations of solution quality vs. time, grouped based on the individual runs
of the algorithm. In statistical terminology, this is an example of longitudinal
data [1], i.e. measurements of the same quantity repeated over time on each of
a set of subjects. In the following, we describe parametric mixed-effects models,
a standard technique in longitudinal data analysis (Sec. 2), and present prelim-
inary experiments on performance data from a TSP solver (Sec. 3). Section 4
gives references for further reading, while Section 5 concludes the paper with a
perspective on ongoing research.
T. Stützle, M. Birattari, and H.H. Hoos (Eds.): SLS 2009, LNCS 5752, pp. 150–154, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Mixed-Effects Modeling of Optimisation Algorithm Performance 151
In our case, y is the objective being optimized, and each yij records the value
of the best solution found by a randomized algorithm within a time tij ; the M
subjects correspond to M distinct runs of the algorithm, which may be solving
different problem instances, or differ only in the random seed used.
The main issue posed by longitudinal data is within-subject correlation: mea-
surements taken on a same subject cannot be considered independent. To address
this, in parametric mixed-effects models [1] the evolution of y for each subject
is modeled by a separate curve, whose parameters are a random perturbation of
those of a baseline curve, describing the overall behavior of the set of subjects.
In a nonlinear mixed-effects model (NLME) [2],
3 Experiments
One important issue in optimisation is that, while bounds may be easy to com-
pute, the actual value of the global optimum is usually not available beforehand.
Consider an algorithm solving a problem instance: in general, there is no way of
telling whether the current best solution will be further improved, or the algo-
rithm has already found the global optimum. Our interest in nonlinear models is
motivated by the hope that they may allow to extrapolate the performance on a
new instance, based on previous runs on similar instances. In order to test this
idea, we collected performance data on a benchmark of small instances of the
traveling salesman problem (TSP) [4], for which the global optima were avail-
able. The benchmark was composed of four groups of 100 symmetric Euclidean
instances each, randomly generated, with 200, 300, 400, and 500 cities respec-
tively. The solver used was ILS-FDD [5], an iterated version of the local search
algorithm 3-opt, with fitness-distance diversification. Each instance was solved
25 times, with different random seeds. In the TSP, the objective is represented
by the cost l of a path visiting all cities once, which has to be minimized. A
lower bound lb on l, and the global optimum lo , were evaluated using Concorde.
The value of the objective was collected, along with the runtime, each time an
152 M. Gagliolo, C. Legrand, and M. Birattari
improvement was found, resulting in a sequence of (l, t) values for each instance
and each random seed. In order to perform the modeling across different in-
stances, the objective was scaled relative to the lower bound as y = (l − lb )/lb ,
and the corresponding sample of (y, t) values was used as longitudinal data.
Modeling was performed using nlme [3], a software package for linear and non-
linear mixed effects, in its version for the R language [6]. As the data presented
an exponential decay towards the optimum, which is typical of optimisation al-
gorithms in general, we fit a model of the form y = a + be−ct , implemented in
nlme by the function SSasymp. A fundamental issue of the use of such model is
that the NLME algorithm is based on a zero mean Gaussian noise model (1).
In our case, this assumption is violated as the algorithm converges towards the
optimum, as it cannot further decrease. As a solution, we used an heteroscedas-
tic variance structure (varPower), expressing the variance as a function of time,
Var(ij ) = σ 2 t2ρ
ij , with one parameter ρ being estimated, in order to be able to
model a decreasing variance. For the correlation structure, we used the function
corCAR1 (continuous time AR(1)), Cor(ij , ik ) = φ|tij −tik | , 0 < φ < 1, as we
expected the correlation among y values to decrease with distance in time.
The aim of the following experiments was to find out if the estimated value
of a could be used to approximate the global optimum lo . Figure 1 reports
results in terms of the relative deviation from the optimum, d = (la − lo )/lo ,
where la = lb (1 + a) is the asymptote of SSasymp scaled back. We first fit a
separate model for each of the 400 instances, using data from all the 25 runs
available. In this case the factor which identifies the subjects is the random
seed. Random effects were negligible, and the model seemed to account for the
variations among runs only with the variance-covariance structure. Here and in
all other experiments, the value of ρ in varPower was estimated to be negative,
giving a decreasing variance structure, and the correlation parameter φ was also
significant. On some of the instances, nlme failed for numerical issues, namely
the singularity of a matrix: this was observed mostly on the group of smaller
instances, on which the algorithm was often too fast in converging, producing
only a few (y, t) points for each run. The parameters estimated were found to
be significant (p-value < 0.05), with the exception of c on some of the instances.
Figure 1(a) plots statistics of the deviation from the optimum d, evaluated using
the fixed effect of the asymptote a.
The next experiment was performed grouping the data based on the instance:
25 models were fit, one for each random seed, on the four separate groups of
100 instances each. This time the random effects were more remarkable, as each
individual corresponded to a different instance, with a different optimum. In this
case d was evaluated based on the mixed effect (a + ai ), ai being the random
effect on the asymptote for instance i. Aggregated statistics of d are reported in
Figure 1(b): the performance decreased visibly, but the estimates are still close
to the real global optima.
The third experiment was a feasibility study for a model based stopping crite-
rion, aimed at investigating the predictive power of our model. In a first phase,
the model was trained based on data for a single random seed, grouped based
Mixed-Effects Modeling of Optimisation Algorithm Performance 153
0.002
d
d
−0.004
−0.004
200 300 400 500 200 300 400 500
(a) Problem size (b) Problem size
All data Half time
0.002
0.002
d
d
−0.004
−0.004
Fig. 1. All plots display statistics of d = (la − lo )/lo , the deviation of the estimate la
from the actual optimum lo . See text for details.
on the instance. In a second phase, a fresh model was trained on a subset of the
same sample, obtained dropping the data from the second half of the time axis,
for a half of the instances, randomly picked. The idea was to simulate a situation
in which 50 “training” instances have been already solved to convergence, and
the relative data recorded; while another 50 “test” instances are being solved,
the algorithm is paused some time before convergence, and the model is used to
predict the value of the optima, based on the previously solved instances, and on
the learning curves observed so far. This scheme was repeated for each random
seed, each time with a different random pick of the 50 test instances. Also in
this case (a + ai ) was used to evaluate la . Figures 1(c,d) report statistics of d for
the two models, measured only on the test instances. The performance of the
model from the first phase, which serves as an “oracle” for comparison, is clearly
superior, but the one for the second phase is still reasonable.
4 Related Work
An up-to-date review of longitudinal data analysis, including nonparametric
methods, can be found in [1]. We mainly followed [3] which is also rich in usage
examples of nlme, from the same authors. The literature on algorithm perfor-
mance modeling is mostly focused on decision problems, and the related concept
of runtime distribution. Extreme value statistics was proposed in [7] to estimate
the value of the global optimum for large problem instances, based on a sequence
of suboptimal solutions; this method is integrated into local search in [8]. The
154 M. Gagliolo, C. Legrand, and M. Birattari
learning curve of local search solvers is fit in [9] using a separate model for each
run. Solution quality distributions are used in [10] to rank the performance of
heuristic solvers. We refer the reader to [4] for further references.
5 Conclusions
We reviewed a class of models for longitudinal data, and showed how they can be
used to model the performance of optimisation algorithms, investigating their
predictive power with a preliminary experiment on data from a TSP solver.
The results were quite promising, and we are currently analyzing other algo-
rithm/problem combinations, as well as the impact of covariates. The models
are quite general in this sense, as they allow time-varying covariates to be eas-
ily included: besides runtime, the distribution of solution quality could also be
related to memory usage, bandwidth, or other resources, as well as dynamic vari-
ables of the algorithms. Adding algorithm parameters as stationary covariates
could allow to tune them based on derivatives of the objective. Longitudinal
data analysis could be useful to implement stopping criteria, or dynamic restart
strategies, which take into account past experience in detecting the convergence
of an algorithm; and to perform algorithm selection, or some more general form
of resource allocation.
References
1. Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G.: Longitudinal Data
Analysis. Chapman & Hall/CRC Press (2008)
2. Lindstrom, M.J., Bates, D.M.: Nonlinear mixed effects models for repeated mea-
sures data. Biometrics 46(3), 673–687 (1990)
3. Pinheiro, J.C., Bates, D.M.: Mixed Effects Models in S and S-Plus. Springer, Hei-
delberg (2002)
4. Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations & Applications.
Morgan Kaufmann, San Francisco (2004)
5. Stützle, T., Hoos, H.H.: Analysing the run-time behaviour of iterated local search
for the travelling salesman problem. In: Hansen, P., et al. (eds.) Essays and Surveys
on Metaheuristics, pp. 589–611. Kluwer Academic Publishers, Dordrecht (2001)
6. R Development Core Team: R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria (2009)
7. Dannenbring, D.G.: Procedures for estimating optimal solution values for large
combinatorial problems. Management Science 23(12), 1273–1283 (1977)
8. Ovacik, I.M., Rajagopalan, S., Uzsoy, R.: Integrating interval estimates of global
optima and local search methods for combinatorial optimization problems. Journal
of Heuristics 6(4), 481–500 (2000)
9. Oppen, J., Woodruff, D.: Parametric models of local search progression. Technical
Report 06-08, UC Davis Graduate School of Management Research (2008)
10. Schreiber, G.R., Martin, O.C.: Cut size statistics of graph bisection heuristics.
SIAM J. on Optimization 10(1), 231–251 (1999)
Author Index