Multi-Swarm Pso Algorithm For The Quadratic Assignment Problem: A Massive Parallel Implementation On The Opencl Platform
Multi-Swarm Pso Algorithm For The Quadratic Assignment Problem: A Massive Parallel Implementation On The Opencl Platform
Multi-Swarm Pso Algorithm For The Quadratic Assignment Problem: A Massive Parallel Implementation On The Opencl Platform
Introduction
problem is NP hard [30], it can be solved optimally for small problem instances.
For larger problems (n>30), several heuristic algorithm were proposed [34, 24, 6].
One of the discussed methods [26, 21] is the Particle Swarm Optimization
(PSO). It attempts to find an optimal problem solution by moving a population
of particles in the search space. Each particle is characterized by two features its
position and velocity. Depending on a method variation, particles may exchange
information on their positions and reached values of goal functions [8].
In our recent work [33] we have developed PSO algorithm for the Quadratic
Assignment Problem on OpenCL platform. The algorithm was capable of processing one swarm, in which particles shared information about the global best
solution to update their search directions. Following typical patterns for GPU
based calculations, the implementation was a combination of parallel tasks (kernels) executed on GPU orchestrated by sequential operations run on the host
(CPU). Such organization of computations involves inevitable overhead related
to data transfer between the host and the GPU device. The time efficiency
test reported in [33] showed clearly that the benefits of a parallel execution
platform can be fully exploited, if processed populations are large, e.g. if they
comprise several hundreds or thousands particles. For smaller populations sequential algorithm implementation was superior both as regards the total swarm
processing time and the time required to process one particle. This suggested
a natural improvement of the previously developed algorithm: by scaling it up
to high numbers of particles organized into several swarms.
In this paper we discuss a multi-swarm implementation PSO algorithm for
the QAP problem on OpenCL platform. The algorithm can be executed in two
modes: with independent swarms, each maintaining its best solution, or with
migration between swarms. We describe the algorithm construction, as well
as we report tests performed on several problem instances from the QAPLIB
library [28]. Their results show advantages of massive parallel computing: the
obtained solutions are very close to optimal or best known for particular problem
instances.
The developed algorithm is not designed to exploit the problem specificity
(see for example [10]), as well as it is not intended to compete with supercomputer or grid based implementations providing exact solutions for the QAP [2].
On the contrary, we are targeting low-end GPU devices, which are present in
most laptops and workstations in everyday use, and accept near-optimal solutions.
During the tests the algorithm was configured to process large numbers of
particles (in the order of 10000). This allowed us to collect data related to
goal function values reached by individual particles and present such statistical measures as percentile ranks and probability mass functions for the whole
populations or selected swarms.
The paper is organized as follows: next Section 2 discusses the QAP problem,
as well as the PSO method. It is followed by Section 3, which describes the
adaptation of PSO to the QAP and the parallel implementation on the OpenCL
platform. Experiments performed and their results are presented in Section 4.
Section 5 provides concluding remarks.
2
Related works
2.1
n X
n X
n X
n
X
(1)
subject to:
Pn
xij = 1,
Pn
for 1 j n
xij = 1,
for 1 i n
i=1
j=1
(2)
xij {0, 1}
The n n matrix X = [xki ] satisfying (2) is called permutation matrix.
In most cases matrix D and F are symmetric. Moreover, their diagonal
elements are often equal 0. Otherwise, the component fii dkk xki xki can be extracted as a linear part of the goal function interpreted as an installation cost
of i-th facility at k-th location .
QAP models found application in various areas including transportation
[3], scheduling, electronics (wiring problem), distributed computing, statistical
data analysis (reconstruction of destroyed soundtracks), balancing of turbine
running [23], chemistry , genetics [29], creating the control panels and manufacturing [14].
In 1976 Sahni and Gonzalez proved that the QAP is strongly N P-hard [30],
by showing that a hypothetical existence of a polynomial time algorithm for
solving the QAP would imply an existence of a polynomial time algorithm for
an N P-complete decision problem - the Hamiltonian cycle.
In many research works QAP is considered one of the most challenging optimization problem. This in particular regards problem instances gathered in a
publicly available and continuously updated QAPLIB library [28, 4]. A practical size limit for problems that can be solved with exact algorithms is about
n = 30 [16]. In many cases optimal solutions were found with branch and bound
algorithm requiring high computational power offered by computational grids
[2] or supercomputing clusters equipped with a few dozen of processor cores and
hundreds gigabytes of memory [15]. On the other hand, in [10] a very successful
approach exploiting the problem structure was reported. It allowed to solve
several hard problems from QAPLIB using very little resources.
A number of heuristic algorithms allowing to find a near-optimal solutions
for QAP were proposed. They include Genetic Algorithm [1], various versions
of Tabu search [34], Ant Colonies [32, 12] and Bees algorithm [11]. Another
method, being discussed further, is Particle Swarm Optimization [26, 21] .
2.2
The classical PSO algorithm [8] is an optimization method defined for continuous
domain. During the optimization process a number of particles move through
a search space and update their state at discrete time steps t = 1, 2, 3, . . . Each
particle is characterized by position x(t) and velocity v(t). A particle remembers
its best position reached so far pL (t), as well as it can use information about
the best solution found by the swarm pG (t).
The state equation for a particle is given by the formula (3). Coefficients
c1 , c2 , c3 [0, 1] are called respectively inertia, cognition (or self recognition)
and social factors, whereas r1 , r2 are random numbers uniformly distributed in
[0, 1]
v(t + 1) = c1 v(t) + c2 r2 (t) (pL (t) x(t)) + c3 r3 (t) (pG (t) x(t))
(3)
2.3
the genetic [22] and memetic algorithm [20]. The described implementations
benefit from capabilities offered by GPUs by processing whole populations by
fast GPU cores running in parallel.
3.1
(4)
(5)
Coefficients r2 and r3 are random numbers from [0, 1] generated in each iteration for every particle separately. They are introduced to model a random
choice between movements in the previous direction (according to c1 inertia), the best local solution (self recognition) or the global best solution (social
behavior).
All operators appearing in (4) and (5) are standard operators from the linear
algebra. Instead of redefining them for a particular problem, see e.g. [7], we
propose to use aggregation functions Sv and Sx that allow to adapt the algorithm
to particular needs of a discrete problem.
1 0 0
X= 0 0 1
0 1 0
8 1 3
X +V = 0 4 6
2 4 2
7
V = 0
2
1
Sx (X + V ) = 0
0
1
4
3
0
0
1
3
5
2
0
1
0
8
03
X 2 V =
2
X 2 V =
0
23
1
4
43
3
61
2
32
6
2
1
4
1
4
1
4
41
or
3
6
2
(6)
It can be observed that for d = 1 the value is exactly the same, as it would
result from the GlobalM ax, however setting d = 2 allows to reach a different
solution. The pseudocode of SecondT arget procedure is listed in Algorithm 1.
3.2
Migration
3.3
OpenCL [18] is a standard providing a common language, programming interfaces and hardware abstraction for heterogeneous platforms including GPU,
multicore CPU, DSP and FPGA [31]. It allows to accelerate computations by
decomposing them into a set of parallel tasks (work items) operating on separate
data.
A program on OpenCL platform is decomposed into two parts: sequential
executed by the CPU host and parallel executed by multicore devices. Functions
executed on devices are called kernels. They are written in a language being a
variant of C with some restrictions related to keywords and datatypes. When
first time loaded, the kernels are automatically translated into the instruction
set of the target device. The whole process takes about 500ms.
OpenCL supports 1D, 2D or 3D organization of data (arrays, matrices and
volumes). Each data element is identified by 1 to 3 indices, e.g. d[i][j] for twodimensional arrays. A work item is a scheduled kernel instance, which obtains
a combination of data indexes within the data range. To give an example, a 2D
array of data of n m size should be processed by n m kernel instances, which
are assigned with a pair of indexes (i, j), 0 i < n and 0 j < m. Those
indexes are used to identify data items assigned to kernels.
Additionally, kernels can be organized into workgroups, e.g. corresponding
to parts of a matrix, and synchronize their operations within a group using
so called local barrier mechanism. However, workgroups suffer from several
platform restrictions related to number of work items and amount of accessible
memory.
OpenCL uses three types of memory: global (that is exchanged between the
host and the device), local for a work group and private for a work item.
In our implementation we used aparapi platform [17] that allows to write
9
Generate particles
Apply Sx
Generate velocities
YES
STOP
NO
Apply Sx
Apply migration
#p
#p
#p
PL
#p
#p
#sw
goal function
values
Xnew
PG
rands
4.1
Optimization results
The algorithm was tested on several problem instances form the QAPLIB [28],
whose size ranged between 12 and 150. Their results are gathered in Table 1
and Table 2. The selection of algorithm configuration parameters (c1 , c2 and
c3 factors, as well as the kernels used) was based on previous results published
in [33]. In all cases the second target Sx aggregation kernel was applied (see
Algorithm 1), which in previous experiments occurred the most successful.
During all tests reported in Table 1, apart the last, the total numbers of
particles were large: 10000-12500. For the last case only 2500 particles were
11
used due to 1GB memory limit of the GPU device (AMD Radeon HD 6750M
card). In this case the consumed GPU memory ranged about 950 MB.
The results show that algorithm is capable of finding solutions with goal
function values are close to reference numbers listed in QAPLIB. The gap is
between 0% and 6.4% for the biggest case tai150b. We have repeated tests
for tai60b problem to compare the implemented multi-swarm algorithm with
the previous single-swarm version published in [33]. Gap values for the best
results obtained with the single swarm algorithm were around 7%-8%. For the
multi-swarm implementation discussed here the gaps were between 0.64% and
2.03%.
The goal of the second group of experiments was to test the algorithm configured to employ large numbers of particles (50000-10000) for well known esc32*
problem instances from the QAPLIB. Altough they were considered hard, all of
them have been were recently solved optimally with exact algorithms [25, 10].
The results are summarized in Table 2. We used the following parameters:
c1 = 0.8, c2 = 0.5 c3 = 0.5, velocity kernel: normalized, Sx kernel: second target. During nearly all experiments optimal values of goal functions were reached
in one algorithm run. Only the problem esc32a occurred difficult, therefore for
this case the number of particles, as well as the upper iteration limits were
increased to reach the optimal solution. What was somehow surprising, in all
cases solutions differing from those listed in QAPLIB were obtained. Unfortunately, our algorithm was not prepared to collect sets of optimal solutions, so
we are not able to provide detailed results on their numbers.
It can be seen that optimal solutions for problem instances esc32ch were
found in relatively small numbers of iterations. In particular, for esc32e and
es32g, which are characterized by small values of goal functions, optimal solutions were found during the initialization or in the first iteration.
The disadvantage of the presented algorithm is that it uses internally matrix representation for solutions and velocities. In consequence the memory
consumption is proportional to n2 , where n is the problem size. The same regards the time complexity, which for goal function and Sx procedures can be
estimated as o(n3 ). This makes optimization of large problems time consuming
(e.g. even 400 sec for one iteration for tai150b). However, for for medium size
problem instances, the iteration times are much smaller, in spite of large populations used. For two runs of the algorithm bur26a reported in Table 1, where
during each iteration 12500 particles were processed, the average iteration time
was equal 1.73 sec. For 50000-10000 particles and problems of size n = 32 the
average iteration time reported in Table 2 was less than 3.7 seconds.
4.2
Statistical results
An obvious benefit of massive parallel computations is the capability of processing large populations (see Table 2). Such approach to optimization may
resemble a little bit a brutal force attack: the solution space is randomly sampled millions of times to hit the best solution. No doubt that such approach
can be more successful if combined with a correctly designed exploration mech12
Instance
chr12a
bur26a
bur26a
lipa50a
tai60a
tai60a
tai60b
tai60b
tai60b
tai60b
tai64c
esc64a
tai80a
tai80b
sko100a
tai100b
esc128
tai150b
No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Self recognition c2
Inertia c1
Total particles
Swarm size
Number of swarms
Size
0.5
0.5
0.5
0.5
0.5
0.5
0.3
0.5
0.3
0.3
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
Social factor c3
0.5
0.5
0.5
0.5
0.5
0.5
0.3
0.5
0.3
0.3
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
Norm
Raw
Raw
Norm
Norm
Raw
Norm
Norm
Norm
Norm
Norm
Raw
Raw
Raw
Raw
Raw
Raw
Raw
Velocity kernel
0.5
0.8
0.8
0.5
0.8
0.8
0.8
0.5
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0
33%
0
0
0
0
0
0
33%
33%
0
0
0
0
0
0
0
0
Migration factor
10000
12500
12500
10000
10000
15000
10000
10000
10000
10000
10000
10000
10000
10000
10000
10000
5000
2500
9 552
5 426 670
5 429 693
62 794
7 539 614
7 426 672
620 557 952
617 825 984
612 078 720
614 088 768
1 856 396
116
14 038 392
835 426 944
154 874
1 196 819 712
64
530 816 224
Reached goal
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
9 552
5 426 670
5 426 670
62 093
7 205 962
7 205 962
608 215 054
608 215 054
608 215 054
608 215 054
1 855 928
116
13 499 184
818 415 043
152 002
1 185 996 137
64
498 896 643
Reference value
200
250
250
200
200
300
200
200
200
200
200
200
200
200
200
200
100
50
0.00%
0.00%
0.06%
1.13%
4.63%
3.06%
2.03%
1.58%
0.64%
0.97%
0.03%
0.00%
3.99%
2.08%
1.89%
0.91%
0.00%
6.40%
Gap
12
26
26
50
60
60
60
60
60
60
64
64
80
80
100
100
128
150
21
156
189
1640
817
917
909
1982
2220
1619
228
71
1718
1509
1877
1980
1875
1894
Iteration
13
Table 2: Results of tests for esc32* instances from QAPLIB (problem size
n = 32). Reached optimal values are marked with asterisks.
Instance
esc32a
esc32a
esc32a
esc32b
esc32c
esc32d
esc32e
esc32g
esc32h
Swarms
50
10
50
50
50
50
50
50
50
Particles
1000
5000
2000
1000
1000
1000
1000
1000
1000
Total part.
100000
100000
100000
50000
50000
50000
50000
50000
50000
Goal
138
134
130
168
642
400
2
6
438
Iter
412
909
2407
684
22
75
0
1
77
Time/iter [ms]
3590.08
3636.76
3653.88
3637.84
3695.19
3675.32
3670.38
3625.17
3625.17
anism that directs the random search process towards solutions providing good
or near-optimal solutions. In this section we analyze collected statistical data
related to the algorithm execution to show that the optimization performance of
the algorithm can be attributed not only to large sizes of processed population,
but also to the implemented exploration mechanism.
PSO algorithm can be considered a stochastic process controlled by random
variables r2 (t) and r3 (t) appearing in its state equation (3). Such analysis for
continuous problems were conducted in [9]. On the other hand, the observable
algorithm outcomes, i.e. the values of goal functions f (xi (t)) for solutions xi ,
i = 1, . . . , n reached in consecutive time moments t {1, 2, 3, . . . } can be also
treated as random variables, whose distributions change over time t. Our intuition is that a correctly designed algorithm should result in a nonstationary
stochastic process {f (xi (t)) : t T }, characterized by growing probability that
next values of goal functions in the analyzed population are closer to the optimal
solution.
To demonstrate such behavior of the implemented algorithm we have collected detailed information on goal function values during two optimization task
for the problem instance bur26a reported in Table 1 (cases 2 and 3). For both of
them the algorithm was configured to use 250 swarms comprising 50 particles.
In the case 2 the migration mechanism was applied and the optimal solution
was found in the iteration 156, in the case 3 (without migration) a solution very
close to optimal (gap 0.06%) was reached in the iteration 189.
Fig. 3 shows values of goal function for two selected particles during run
3. The plots show typical QAP specificity. PSO and many other algorithms
perform a local neighborhood search. For the QAP the neighborhood is characterized by great variations of goal function values. Although mean values of
goal function decrease in first twenty or thirty iterations, the particles behave
randomly and nothing indicates that during subsequent iterations smaller values
of goal functions would be reached more often.
In Fig. 4 percentile ranks (75%, 50% 25% and 5%) for two swarms, which
14
6.2
106
#4716
#643
goal
6
5.8
5.6
5.4
20
20
40
60
80
220
iteration
Figure 3: Variations of goal function values for two particles exploring the solutions space during the optimization process (bur26a problem instance)
reached best values in cases 2 and 3 are presented. Although the case 3 is
characterized by less frequent changes of scores, than the case 2, probably this
effect can not be attributed to the migration applied. It should be mentioned
that for a swarm comprising 50 particles, the 0.05 percentile corresponds to just
two of them.
106
goal
Best
pct-0.75
pct-0.50
pct-0.25
pct-0.05
5.8
5.6
5.4
20
20
40
60
80
220
iteration
106
goal
Min (opt)
pct-0.75
pct-0.50
pct-0.25
pct-0.05
6
5.8
5.6
5.4
20
20
40
60
80
220
iteration
15
particles are presented in Fig. 5. For both cases the plots are clearly separated. It can be also observed that solutions very close to optimal are practically reached between the iterations 20 (37.3 sec) and 40 (72.4 sec). For the
whole population the 0.05 percentile represents 625 particles. Starting with the
iteration 42 their score varies between 5.449048 106 and 5.432361 106 , i.e. by
about 0.3%.
106
goal
Best
pct-0.75
pct-0.50
pct-0.25
pct-0.05
5.8
5.6
5.4
20
20
40
60
80
106
220
Min (opt)
goal
iteration
pct-0.75
pct-0.50
pct-0.25
pct-0.05
5.8
5.6
5.4
20
20
40
60
80
220
iteration
Figure 5: Two runs of bur26a optimization. Percentile ranks for all 12500
particles: without migration (above) and with migration (below).
Fig. 6 shows, how the probability distribution (probability mass function
PMF) changed during the optimization process. In both cases the the optimization process starts with a normal distribution with the mean value about
594500. In the subsequent iterations the maximum of PMF grows and moves
towards smaller values of the goal function. There is no fundamental difference
between the two cases, however for the case 3 (with migration) maximal values
of PMF are higher. It can be also observed that in the iteration 30 (completed
in 56 seconds) the probability of hitting a good solution is quite high, more then
10%.
Interpretation of PMF for the two most successful swarms that reached best
values in the discussed cases is not that obvious. For the case without migration
(Fig. 7 above) there is a clear separation between the initial distribution and
the distribution reached in the iteration, which yielded the best result. In the
second case (with migration) a number of particles were concentrated around
local minima.
The presented data shows advantages of optimization performed on mas16
0.15
it#1
it#30
it#60
it#120
it#189
p(f )
0.1
5 102
0
5.4
5.6
5.8
6.2
6.4
106
goal value f
p(f )
0.2
it#1
it#30
it#50
it#100
it#156
0.1
0
5.4
5.6
5.8
6
goal value f
6.2
6.4
106
Figure 6: Probability mass functions for 12050 particles organized into 250 x
50 swarms during two runs: without migration (above) and with migration
(below).
sive parallel processing platforms. Due to high number of solutions analyzed
simultaneously, the algorithm that does not exploit the problem structure can
yield acceptable results in relatively small number of iterations (and time). For
a low-end GPU devices, which was used during the test, good enough results
were obtained after 56 seconds. It should be mentioned that for both presented
cases the maximum number of iterations was set to 200. With 12500 particles,
the ratio of potentially explored solutions to the whole solution space was equal
200 12500/26! = 6.2 1021 .
Conclusions
In this paper we describe a multi-swarm PSO algorithm for solving the QAP
problem designed for the OpenCL platform. The algorithm is capable of processing in parallel large number of particles organized into several swarms that
either run independently or communicate with use of the migration mechanism.
Several solutions related to particle state representation and particle movement
were inspired by the work of Liu at al. [21], however, they were refined here to
provide better performance.
We tested the algorithm on several problem instances from the QAPLIB library obtaining good results (small gaps between reached solutions and reference
values). However, it seems that for problem instances of large sizes the selected
17
p(f )
0.6
it#1
it#189
0.4
0.2
0
5.4
5.5
5.6
5.7
5.8
5.9
6.1
6.2
6.3
6.4
106
6.1
6.2
6.3
6.4
106
goal value f
it#1
it#156
p(f )
0.2
0.1
5.4
5.5
5.6
5.7
5.8
5.9
goal value f
Figure 7: Probability mass functions for 50 particles belonging to the most successful swarms during two runs: without migration (above) and with migration
(below). One point represents an upper bound for 5 particles.
representation of solutions in form of permutation matrices hinders potential
benefits of parallel processing.
During the experiments the algorithm was configured to process large populations. This allowed us to collect statistical data related to goal function
values reached by individual particles. We used them to demonstrate on two
cases that although single particles seem to behave chaotically during the optimization process, when the whole population is analyzed, the probability that
a particle will select a near-optimal solution grows. This growth is significant
for a number of initial iterations, then its speed diminishes and finally reaches
zero.
Statistical analysis of experimental data collected during optimization process may help to tune the algorithm parameters, as well as to establish realistic
limits related to expected improvement of goal functions. This in particular
regards practical applications of optimization techniques, in which recurring
optimization problems appear, i.e. the problems with similar size, complexity
and structure. Such problems can be near-optimally solved in bounded time on
massive parallel computation platforms even, if low-end devices are used.
18
References
[1] Ahuja, R.K., Orlin, J.B., Tiwari, A.: A greedy genetic algorithm for the
quadratic assignment problem. Computers & Operations Research 27(10),
917934 (2000)
[2] Anstreicher, K., Brixius, N., Goux, J.P., Linderoth, J.: Solving large
quadratic assignment problems on computational grids. Mathematical Programming 91(3), 563588 (2002)
[3] Bermudez, R., Cole, M.H.: A genetic algorithm approach to door assignments in breakbulk terminals. Tech. Rep. MBTC-1102, Mack-Blackwell
Transportation Center, University of Arkansas, Fayetteville, Arkansas
(2001)
[4] Burkard, R.E., Karisch, S.E., Rendl, F.: QAPLIB - a Quadratic Assignment Problem library. Journal of Global Optimization 10(4), 391403
(1997)
[5] ela, E.: The quadratic assignment problem: theory and algorithms. Combinatorial Optimization, Springer, Boston (1998)
[6] Chmiel, W., Kaduczka, P., Packanik, G.: Performance of swarm algorithms for permutation problems. Automatyka 15(2), 117126 (2009)
[7] Clerc, M.: Discrete particle swarm optimization, illustrated by the traveling
salesman problem. In: New optimization techniques in engineering, pp.
219239. Springer (2004)
[8] Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory.
In: Micro Machine and Human Science, 1995. MHS 95., Proceedings of
the Sixth International Symposium on. pp. 3943 (Oct 1995)
[9] Fernndez Martnez, J., Garca Gonzalo, E.: The PSO family: deduction, stochastic analysis and comparison. Swarm Intelligence 3(4), 245273
(2009)
[10] Fischetti, M., Monaci, M., Salvagnin, D.: Three ideas for the quadratic
assignment problem. Operations Research 60(4), 954964 (2012)
[11] Fon, C.W., Wong, K.Y.: Investigating the performance of bees algorithm
in solving quadratic assignment problems. International Journal of Operational Research 9(3), 241257 (2010)
[12] Gambardella, L.M., Taillard, E., Dorigo, M.: Ant colonies for the quadratic
assignment problem. Journal of the operational research society pp. 167
176 (1999)
[13] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Pearson Education (1994)
19
[27] Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn,
A.E., Purcell, T.J.: A survey of general-purpose computation on graphics
hardware. In: Computer graphics forum. vol. 26, pp. 80113. Wiley Online
Library (2007)
[28] Peter Hahn and Miguel Anjos: QAPLIB home page. http://anjos.mgi.
polymtl.ca/qaplib/, online: last accessed: Jan 2015
[29] Phillips, A.T., Rosen, J.B.: A quadratic assignment formulation of the
molecular conformation problem. JOURNAL OF GLOBAL OPTIMIZATION 4, 229241 (1994)
[30] Sahni, S., Gonzalez, T.: P-complete approximation problems. J. ACM
23(3), 555565 (1976)
[31] Stone, J.E., Gohara, D., Shi, G.: Opencl: A parallel programming standard
for heterogeneous computing systems. Computing in science & engineering
12(3), 66 (2010)
[32] Sttzle, T., Dorigo, M.: Aco algorithms for the quadratic assignment problem. New ideas in optimization pp. 3350 (1999)
[33] Szwed, P., Chmiel, W., Kaduczka, P.:
OpenCL implementation of PSO algorithm for the Quadratic Assignment Problem.
In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz,
R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and
Soft Computing, Lecture Notes in Computer Science, vol. Accepted for ICAISC2015 Conference. Springer International Publishing (2015), http://home.agh.edu.pl/~pszwed/en/lib/exe/fetch.php?
media=papers:draft-icaics-2015-pso-qap-opencl.pdf
[34] Taillard, E.D.: Comparison of iterative searches for the quadratic assignment problem. Location Science 3(2), 87 105 (1995)
[35] Tsutsui, S., Fujimoto, N.: ACO with tabu search on GPUs for fast solution
of the QAP. In: Tsutsui, S., Collet, P. (eds.) Massively Parallel Evolutionary Computation on GPGPUs, pp. 179202. Natural Computing Series,
Springer Berlin Heidelberg (2013)
[36] Zhou, Y., Tan, Y.: GPU-based parallel particle swarm optimization. In:
Evolutionary Computation, 2009. CEC09. IEEE Congress on. pp. 1493
1500. IEEE (2009)
21