NowakowskiG Neuralnetwork

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/325755499
Neural Network Structure Optimization Algorithm
Article · May 2018

DOI: 10.14313/JAMRIS_1-2018/1
CITATIONS READS
2 2,580
3 authors, including:
Grzegorz Nowakowski Yaroslaw Dorogyy

Cracow University of Technology National Technical University of Ukraine Kyiv Polytechnic Institute
21 PUBLICATIONS 24 CITATIONS 20 PUBLICATIONS 20 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Artificial Intelligence View project
Fuzzy Queries View project
All content following this page was uploaded by Grzegorz Nowakowski on 06 January 2019.
The user has requested enhancement of the downloaded file.

Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
Neural Network Structure Optimization Algorithm

Submitted: 17th December 2017; accepted: 20th March 2018
Grzegorz Nowakowski, Yaroslaw Dorogyy, Olena Doroga-Ivaniuk
DOI: 10.14313/JAMRIS_1-2018/1 to determine the ideal amount of connectivity in

a feed-forward network. The three choices were 30%,
Abstract: 70%, or 100% (fully-connected).
This paper presents a deep analysis of literature on the In general, it is beneficial to minimize the size of
problems of optimization of parameters and structure of a NN to decrease learning time and allow for better
the neural networks and the basic disadvantages that generalization. A common process known as pruning
are present in the observed algorithms and methods. is applied to neural networks after they have already
As a result, there is suggested a new algorithm for neu- been trained. Pruning a NN involves removing any un-
ral network structure optimization, which is free of the necessary weighted synapses. In [8], a GA was used
major shortcomings of other algorithms. The paper de- to prune a trained network. The genome consisted of
scribes a detailed description of the algorithm, its imple- one bit for each of the synapses in the network, with
mentation and application for recognition problems. a ‘1’ represented keeping the synapse, while a ‘0’ rep-
resented removing the synapse. Each individual in
Keywords: structure optimization, neural network, ReLU, the population represented a version of the original
SGD trained network with some of the synapses pruned
(the ones with a gene of ‘0’). The GA was performed
1. Introduction to find a pruned version of the trained network that
The unit of neural networks is widely used to solve had an acceptable error. Even though pruning reduces
various problems including recognition tasks. The ex- the size of a network, it requires a previously trained
istence of a method for automatic search of neural network. The algorithm developed in this research
network optimal structure could provide an oppor- optimizes for size and error at the same time, finding
tunity to get the structure of a neural network much a solution with minimum error and minimum num-
faster, that would better suit the subject area and ex- ber of neurons.
isting incoming data [1]. Another critical design decision, which is appli-
Since there are no well-defined procedures for se- cation-specific, is the selection of the activation func-
lecting the parameters of a NN and its structure for tion. Depending on the problem at hand, the selection
a given application, finding the best parameters can of the correct activation function allows for faster
be a case of trial and error. learning and potentially a more accurate NN. In [9],
There are many papers, like [2–4] for example, a GA was used to determine which of several activa-
in which the authors arbitrarily choose the num- tion functions (linear, logsig, and tansig) were ideal
ber of hidden layer neurons, the activation function, for a breast cancer diagnosis application.
and number of hidden layers. In [5], networks were Another common use of GA is to find the optimal ini-
trained with 3 to 12 hidden neurons, and it was found tial weights of back-propagation and other types of neu-
that 9 was optimal for that specific problem. The GA ral networks. As mentioned in [10], genetic algorithms
had to be run 10 times, one for each of the network are good for global optimization, while neural networks
architectures. are good for local optimization. Using the combination
Since selecting NN parameters is more of an art of genetic algorithms to determine the initial weights
than a science, it is an ideal problem for the GA. The and back propagation learning to further lower error
GA has been used in numerous different ways to select takes advantage of both strengths and has been shown
the architecture, prune, and train neural networks. In to avoid local minima in the error space of a given
[6], a simple encoding scheme was used to optimize problem. Examining the specifics of the GA used in [2]
a multi-layer NN. The encoding scheme consisted of shows the general way in which many other research
the number of neurons per layer, which is a key pa- papers use GA to determine initial weights. In [2], this
rameter of a neural network. Having too few neurons technique was used to train a NN to perform image res-
does not allow the neural network to reach an accept- toration. The researchers used fitness based selection
ably low error, while having too many neurons limits on a population of 100, with each gene representing
the NN’s ability to generalize. one weight in the network that ranged from -1 to 1 as
Another important design consideration is de- a floating point number. Dictated by the specifics of the
ciding how many connections should exist between problem, the structure of the neural network was fixed
network layers. In [7], a genetic algorithm was used at nine input and one output node. The researchers ar-
5
bitrarily chose five neurons for the only hidden layer in Previously, genetic algorithms were used to op-
the network. To determine the fitness of an individual, timize a one layered network [19], which is too few
the initial weights dictated by the genes are applied to solve even moderately complex problems. Many
to a network which is trained using back propagation other genetic algorithms were used to optimize neu-
learning for a fixed number of epochs. Individuals with ral networks with a set number of layers [2–3, 12, 14,
lower error were designated with a higher fitness value. 20–21]. The problem with this approach is that the
In [10–11] this technique was used to train a sonar ar- GA would need to be run once for each of the different
ray azimuth control system and to monitor the wear of number of hidden layers. In [20], the Variable String
a cutting tool, respectively. In both cases, this approach Genetic Algorithm was used to determine both the in-
was shown to produce better results that when using itial weights of a feed forward NN, as well as the num-
back-propagation exclusively. In [12] the performance ber of neurons in the hidden layer to classify infrared
of a two back propagation neural networks were com- aerial images. Even though the number of layers was
pared: one with GA optimized initial weights and one fixed (input, hidden, and output), adjusting the num-
without. The number of input, hidden, and output neu- ber of neurons allowed the GA to search through dif-
rons were fixed at 6, 25, and 4, respectively. Other pa- ferent sized networks.
rameters such as learning rate and activation functions A wide range of algorithms is used to build the op-
were also fixed so that the only differences between the timal neural network structure. The first of these al-
two were the initial weights. gorithms is the tiled constructing algorithm [22]. The
In [2, 11–13] each of the synaptic weights was en- idea of the algorithm is to add new layers of neurons
coded into the genome as a floating point number (at in a way that input training vectors that have differ-
least 16 bits), making the genome very large. The algo- ent respective initial values, would have a different
rithm developed in this research only encodes a ran- internal representation in the algorithm. Another
dom number seed, which decreases the search space prominent representative is the fast superstructure
by many orders of magnitude. Determining the initial algorithm [23]. According to this algorithm new neu-
values using the GA has improved the performance of rons are added between the output layers. The role of
non-back propagation networks as well. In [14] a GA these neurons is the correction of the output neurons
was used to initialize the weights of a Wavelet Neural error. In general, a neural network that is based on
Network (WNN) to diagnose faulty piston compres- this algorithm has the form of a binary tree.
sors. WNNs have an input layer, a hidden layer with the In summary, the papers mentioned above studied
wavelet activation function, and an output layer. Instead genetic algorithms that were lacking in several ways:
of using back propagation learning, these networks use • They do not allow flexibility of the number of
the gradient descent learning algorithm. The structure hidden layers and neurons.
of the network was fixed, with one gene for each weight • They do not optimize for size.
and wavelet parameter. Using the GA was shown to pro- • They have very large genomes and therefore
duce lower error and escape local minima in the error search spaces.
space. Neural networks with feedback loops have also The algorithm described in this article addresses
been improved with GA generated initial weights. all of these issues. The main goal of this work is to
Genetic algorithms have also been used in the analyze the structure optimization algorithm of neu-
training process of neural networks, as an alterna- ral network during its learning for the tasks of pattern
tive to the back-propagation algorithm. In [15] and recognition [24] and to implement the algorithm us-
[16], genes represented encoded weight values, ing program instruments [1].
with one gene for each synapse in the neural net-
work. It is shown in [17] that training a network 2. The Algorithm of Structural Optimization
using only the back-propagation algorithm takes During Learning
more CPU cycles than training using only GA, but Structural learning algorithm is used in multilayer
in the long run back-propagation will reach a more networks and directs distribution networks and has
precise solution. In [18], the Improved Genetic Algo- an iterative nature: on each iteration it searches for
rithm (IGA) was used to train a NN and shown to be the network structure that is better than the last one.
superior to using a simple genetic algorithm to find Network search is performed by sorting all possible
initial values of a back propagation neural network. mutations of network and by selection and combina-
Each weight was encoded using a real number in- tion of the best ones(selection and crossing).
stead of a binary number, which avoided lack of ac- Consider the basic parameters of the algorithm.
curacy inherent in binary encoding. Crossover was
only performed on a random number of genes in- Learning parameters:
stead of all of them, and mutation was performed on • learning rate: ;
a random digit within a weight’s real number. Since • inertia coefficient: ;
the genes weren’t binary, the mutation performed • coefficient of weights damping: ;
a “reverse significance of 9” operation (for example • the probability of activation of the hidden layer
3 mutates to 6, 4 mutates to 5, and so on). The XOR neuron: ph;
problem was studied, and the IGA was shown to be • the probability of activation of the input layer
both faster and produce lower error. Similar to [3], neuron: pi.
this algorithm requires a large genome since all the Structured learning parameters:
weights are encoded. • initial number of neurons in the hidden layer;
6 Articles
• activation function for the hidden layer; to the input, hidden and output layers has been pre-
• activation function in the output layer; sented.
• maximum number of mutations in the crossing;
• the number of training epochs of the original
network;
• the number of training epochs in the iteration;
• acceptable mutation types;
• part of the training sample used for training.
Fig. 2. Neuron addition to the input layer

3. Elementary Structural Operations
on a Neural Network
According to [25] the following basic structural
operations on a network have been introduced [1]:
• adding a synapse between two randomly selected
unrelated network nodes or neurons – operation
SynADD;
• removing the synapse between two randomly
selected unrelated network nodes or neurons –
operation SynDEL; Fig. 3. Neuron addition to the hidden layer
• moving synapse between two randomly selected
unrelated network nodes or neurons – operation
SynMOD;
• changing the activation function of the neuron to
randomly selected neuron – operation AMOD;
• serialization of the node or the neuron – operations
SerNODE and SerNR;
• parallelization of the node or the neuron – Fig. 4. Neuron addition to the output layer
operations ParNODE and ParNR;
• adding a node or a neuron – operations AddNODE To extract neurons opposing operations are used.
and AddNR; In Fig. 5 there is a realization of extraction of a second
• create a new layer – operation LADD; neuron in the hidden network.
• removing the layer NN – operation LDEL.
The use or nonuse of described structural opera-

tions depends on the complexity of the task.
For recognition problems that will be described in
this article operations(mutations) described in [26]
are used.
4. Algorithm Implementation Fig. 5. Hidden layer neuron extraction

Internally neural networks are presented as nu-
meric matrix sequences of each layer weight except When adding a new layer, the new weight’s matrix
for the input one [1]. In Fig.1 the matrix sequence for insertion operation is performed.
[2-3-2] network type is shown: hidden layer matrix Since some operations change matrices’ structures,
2x3 and output layer one 3x2. there is a certain difficulty in their combination. For
example, when extracting the hidden layer O3 neuron
in [2–3–2] network the O4 neuron in the resulting net-
work will shift one position and become O3 neuron;
when adding new hidden layer, that contains 4 neurons
in front of existing hidden layer, next layer will shift one
position. When combining different mutations their
step-by-step execution has to be done in a strict order,
which depends on type and parameters of each muta-
Fig. 1. [2-3-2] Network internal realization example tion. In Listing 1 there is a code fragment implemented
in Clojure [27], that executes combined mutation. At
Each element aij in matrix Ak equals to weight val- first the mutations that do not change structures –ad-
ue between i and j network neurons. dition and extraction of connections, are executed, then
For realization of different types of mutations, the the addition of new neurons and extraction of existing
operations on matrices are used. When adding a new ones is executed; new layers are added at the end. Mu-
neuron to the layer a combination of adding opera- tations, which extract neurons, are executed in neuron
tions of new matrix row and column is implemented. number decrease order, similarly as layer addition – in
In Fig. 2, 3 and 4 the realization of neuron addition a new layer index decrease order.
Articles 7
(defmethodmutate ::combined
[net {:keys [mutations]}]
(let [grouped-ms (group-by :operation mutations)
{add-node-ms ::add-node del-node-ms ::del-node
layer-ms ::add-layer} grouped-ms
safe-ms (mapcat grouped-ms [::identity ::add-edge
::del-edge])
safe-del-node-ms (reverse
(sort-by #(second (:deleted-node %)) del-node-ms))
safe-layer-ms (reverse(sort-by :layer-pos layer-ms))
ms (concat safe-ms add-node-ms safe-del-node-ms
safe-layer-ms)]
(reduce mutate net ms))) Fig. 7. System deployment diagram
Listing 1 – Code fragment implemented in Clojure that
executes combined mutation Clojure has been used to implement the server
application. The Java platform [28] has been used as
One of the Clojure [9] benefits over other pro- a runtime environment.
gramming languages is usage of unchangeable data For the GUI implementation, the ClojureScript–
structures – collections and containers, the content Clojure dialect [27], executed in JavaScript, has been
of which cannot be changed. In return, while trying used.
to add a new element to the collection the new sub-
stance of the collection will be created containing this 5. Experimental Research
element. The operation of creating a new collection is Example 1. MONK’s Problem. MONK’s Problem
optimized this way: both objects will use the mutu- [29] was among the first that had been used to com-
al part of collection. In the Fig. 6 the result of adding pare classification algorithms. Each training example
object 5 to the end of array [……] is shown. V denotes of sample contains 7 attributes, whereas the last at-
an old collection object, v2 denotes newly created col- tribute – class number which should be referred to
lection object. example:
1. a1 ∈ {1, 2, 3}
2. a2 ∈ {1, 2, 3}
3. a3 ∈ {1, 2, 3}
4. a4 ∈ {1, 2, 3}
5. a5 ∈ {1, 2, 3, 4}
6. a6 ∈ {1, 2}
7. a7 ∈ {0, 1}
The following tasks are determined::

• Problem M1: (a1 = a2) ∨ (a5 = 1)
• Problem M2: at least 2 of (a1 = 1, a2 = 1, a3 = 1,
a4 = 1, a5 = 1, a6 = 1)
• Problem M3: ((a5 = 3) ∨ (a4 = 1)) ∨ ((a5 4) ∧ (a2 ≠ 3)
Fig. 6. Principle of data structure work in Clojure
Neural networks easily solve problems M1 and
Programming with unchangeable data structure M2 and achieve 100% classification accuracy in the
usage makes programs much easier to understand. test sample. Training sample for M3 problem include
• program parallelization simplicity–unchangeable a noise as 5% incorrectly classified examples so this
data can be used in parallel without any need to issue will be used for research.
synchronize threads; We used the following training values and struc-
• no problems with memory leaks; tural optimization settings:
• caching simplicity; • training speed:  = 0.001;
• major memory economy in some cases. • inertia coefficient:  = 0;
• coefficient of weights damping:  = 0.5;
Due to these characteristics of unchangeable struc- • the maximum number of mutations at crossing:
tures the main part of an algorithms work is done in M = 10;
parallel with maximum computing resources usage. • the number of training epochs of original network:
The developed system has a client-server archi- T0 = 100;
tecture. A system deployment diagram is shown in • the number of training epochs in iteration: Ti = 20;
Fig. 7. In general the system consists of 2 parts: • allowable types of mutations, adding and removing
• server application, which does neural network weights;
learning and implements structure optimization • type of cost function: cross-entropy [30].
algorithm;
• client application, which implements GUI. The obtained price values depending on classifica-
tion accuracy are presented in Figs. 8 and 9.
8 Articles
Fig. 10. TwoSpirals sample in the graphic form
solving the problem only with the optimization of the

Fig. 8. Price value of normal and optimized networks structure using more common network topology.
Research of the algorithm. We used the following

training values and structural optimization settings:
• training speed:  = 0.005;
• inertia coefficient:  = 0;
• the probability of activation of the hidden layer
neuron: ph = 1;
• the probability of activation of the input layer
neuron: pi = 1;
• the maximum number of mutations at crossing:
M = 20;
Fig. 9. Classification accuracy of ordinary and • the number of training epochs in original network:
optimized networks T0 = 50;
• the number of training epochs in iteration: Ti = 150;
The resulting accuracy of the classification is tab- • permissible types of mutations: all;
ulated in Table 1. • type of cost function: cross-entropy [30].
Figs. 11 and 12 show the relation between price
Table 1. The resulting classification accuracy for value and classification accuracy on a number of com-
MONK’s problems pleted training epochs.
Type NN Training [%] Testing [%]
Common 97.54 96.99
Optimized 98.36 96.75
Although a significant increase in classification

accuracy did not happen with these dependencies we
can conclude that due to optimization of the structure
during training, the network does not stop at points of
local minima and studies twice faster.
Example 2. TwoSpirals problem. This problem is

a rather complicated classification task and a gen-
eralization of many recognition algorithms which Fig. 11. Ordinary and optimized network price value
was proposed in [31]. The sample consists of a set of during training
points that form a two-dimensional spiral. It is nec-
essary to properly classify the points that are not in-
cluded in the training set.
Sample Selection. Each data training sample con-
sists of three elements: the x and y coordinates in the
range 0…1, and a number of the curve where it meets.
The sample in the graphic form is shown in Fig. 10.
Network architecture. To solve the problem,
a 2-layer network with one hidden layer containing
10 neurons with linear straightened activation func-
tion was selected as the original network.
Networks such as 2-10-10-2, 2-5-10-2 are the best
in coping with this task, using an odd activation func- Fig. 12. Accuracy classification of ordinary and
tion (bipolar sigmoid function or a hyperbolic tangent). optimized networks during training
Instead, it was interesting to explore the possibility of
Articles 9
After 7 iterations of the algorithm, we obtained Selected parameters following algorithm:

a [2-9-9-7-7-2] network with 92.7% classification ac- • initial number of neurons in the hidden layer: 3;
curacy. • activation function for the hidden layer: ReLU [34];
• activation function in the output layer: softmax;
Example 3. Human Recognition. The implemented • maximum number of mutations in the crossing:
program system is used to research problems of hu- M = 50;
man face recognition [1]. The face image database of • the number of training epochs of the original
Yale university was used as output data [32]. network: T0 = 100;
Sampling 10 different persons and 50 different • the number of training epochs in the iteration:
images of each person were selected. Each image has Ti = 5;
been scaled to the size of 26×26 pixels and coded into • acceptable mutation types: adding and removing
676-dimensional vector, the values of pixels’ bright- synapses;
ness were normalized to 0…1 range. Each output class • part of the training sample used for training: 1;
representing a particular person was coded into a 10 • type of cost function: cross-entropy [30].
element vector which contains 9 zeroes and a single 1 During 40 iterations of the algorithm 300 extrac-
at a different index. The obtained 500 samples were tions and 128 additions of synapses were carried out.
randomly divided into training and testing sets 2:1. In Fig. 15 and Fig. 16 the dependency of price and
In Fig. 13 the source images and images used for precision values of classification from amount of im-
neural network learning are shown. plemented learning epochs has been presented. Re-
ceived values are shown in Table 2.
Due to connections’ optimization structure we
could lower false classification percentage to 4.2% on
testing set.
An experiment has also been made in which Ti = 3,
which is shown in Fig. 17 and Fig. 18.
During 100 iterations of the algorithm 645 extrac-
tions and 457 additions of synapses were carried out.
We could lower the false recognition percentage from
7.8 to 6.0 on testing set. The result is shown in Table 3.
Fig. 13. Data set formation example Table 2. The resulting accuracy of image classification
for Ti = 5
Architecture of source network. A network ar- Type NN Training, % Testing, %
chitecture which is shown in Fig. 14 was used to eval-
uate the work of the algorithm. Common 97.59 93.41
Optimized 98.19 95.80
Fig. 15. Image classification accuracy for Ti = 5
Fig. 14. Image recognition network architecture
Research of the algorithm. The following training

values and structural optimization settings have been
used for SGD with weight decay regularization [33]:
• learning rate:  = 0.002;
• inertia coefficient:  = 0.1;
• the probability of activation of the hidden layer
neuron: ph = 1;
• the probability of activation of the input layer
neuron: pi = 1; Fig. 16. Price value for image classification for Ti = 5
10 Articles
Table 3. The resulting accuracy of image classification the quality of CITIE O ij , as Pji and Q ij as qualitative
for Ti = 5 assessment of the functioning of the CITIEs that are
Type NN Training, % Testing, %
affecting O ij .
As an example of CITIE, for which it is necessary to
Common 98.79 92.21 calculate the qualitative evaluation of functioning, we
Optimized 99.09 94.01 selected an average application server.We reviewed
five parameters affecting the quality of its function-
ing, which are constructed sets P’ and Q’:
• p1 – hard drive usage. This parameter is reduced
to values between 0 and 1
• p2 – CPU usage. This parameter is reduced to
values between 0 and 1;
• p3 – load of the network that the server is
connected to. This is the ratio of available network
bandwidth to the nominal network bandwidth;
• p4 – used RAM of the server. This is the ratio of
used RAM volume to the maximum available
memory;
• q1 – quality of functioning of another CITIE (DB
Fig. 17. Image classification accuracy for Ti = 3 server, used by the selected application server).
To calculate the qualitative assessment of the func-
tioning of this CITIE we construct a classifier based
on neural network. For O ij the input parameters of
the neural such network will be vector {P’, Q‘} and
the output parameter is qualitative assessment of the
functioning of O ij .
Without assumptions about the nature of relation-
ships between elements and qualitative evaluations
of the elements’ parameters, it is advisable to apply
approximate expert estimates based on personal ex-
perience of administrators, IT-managers, etc. Since we
automatically determine the structure of the network,
Fig.18. Price value for image classification for Ti = 3 the person is enough to specify the quality of func-
tioning of the element with different values of {P’, Q’}.
Example 4. Evaluation of critical IT-infrastructure During experiment, values of selected parameters
functioning. In this example, we show the quality of were artificially set on computers. Then, it was pro-
operation of the service using an algorithm for esti- posed to experts to specify the performance of this
mating [35]. server on a scale from zero to one.
Figure 19 shows the example of a dependency tree Then we automatically define the type of neural
which schematically represents the impact of critical network, and start training the network using the
IT-infrastructure elements (hereinafter – IT-infra- method described in previous example. The resulting
structure elements (CITIE)). structure of neural network we obtained, can be used
to determine the quality of functioning of another
similar CITIE. In this case, it will not have to deter-
mine the optimal network structure and training time
will be reduced. This will allow the service provider
“on the fly” retrain its existing models in a shorter pe-
riod.
During 50 iterations of the algorithm, 157 extrac-
tions and 107 additions of synapses were carried out.
Received values are presented in Table 4.
Due to the optimization structure of connections
we could lower false classification percentage to 5.7%
on testing set.
Table 4. The resulting accuracy of CITIE classification

Fig. 19. CITIE tree example for Ti = 3
Type NN Training, % Testing, %
Here O , i ∈ [1; K ] , ji ∈ [1; Ni ] are CITIEs, and the

i
j Common 96.4 92.1
arrows show influence of quality of functioning of
Optimized 97.8 94.3
some CITIE on the quality of functioning of other
CITIE. Let’s denote vector of parameters that affect
Articles 11
Conclusion tions in Computer and Information Science,

This article considered the problem of a structural vol. 175, 2011, 146-152. DOI: 10.1007/978-3-
optimization algorithm implementation, and the pos- 642-21783-8_24.
sible appliance of this algorithm in image recognition [5] W. Yinghua, X. Chang, “Using Genetic Artificial
and for evaluation of critical IT-infrastructure func- Neural Network to Model Dam Monitoring Data”.
tioning problems were analyzed. In: Second International Conference on Com-
Due to the optimization structure of connections puter Modeling and Simulation, 2010, 3–7. DOI:
we could lower the false classification percentage to 10.1109/ICCMS.2010.80.
4.2% in the testing set, and we could lower the false [6] R. Sulej, K. Zaremba, K. Kurek, R. Rondio, Appli-
recognition percentage from 7.8 to 6.0 in the testing cation of the Neural Networks in Events Classifi-
set for human recognition task. For the task of CITIE cation in the Measurement of the Spin Structure
evaluation, we could reduce false classification lev- of the Deuteron, Warsaw University of Technol-
el to 5.7%. The proposed algorithm is flexible in the ogy, Poland, 2007.
number of hidden layers, neurons and links. [7] S. A. Harp, T. Samad, “Genetic Synthesis of Neu-
The obtained results prove the efficiency of the ral Network Architecture”, Handbook of Genetic
proposed algorithm for using with recognition prob- Algorithms, 1991, 202–221.
lems. [8] D. Whitley, T. Starkweather, C. Bogart, “Genetic
Algorithms and Neural Networks: Optimiz-
ACKNOWLEDGEMENTS ing Connections and Connectivity”, Parallel
Presented results of the research, which was car- Computing, vol. 14, no. 3, 1990, 347–61. DOI:
ried out under the theme No. E-3/611/2017/DS, 10.1016/0167-8191(90)90086-O.
were funded by the subsidies on science granted by [9] V. Bevilacqua, G. Mastronardi, F. Menolascina,
Polish Ministry of Science and Higher Education. P. Pannarale, A. Pedone, “A Novel Multi-Objective
Genetic Algorithm Approach to Artificial Neu-
ral Network Topology Optimisation: The Breast
Cancer Classification Problem”, International
AUTHORS Joint Conference on Neural Networks, 1958–
Grzegorz Nowakowski* – Cracow University of 1965, 2006.
Technology ul. Warszawska 24, 31-155 Cracow, Po- [10] Y. Du, Y. Li, “Sonar array azimuth control system
land. E-mail: gnowakowski@pk.edu.pl. based on genetic neural network”. In: Proceed-
YaroslawDorogyy – National Technical University ings of the 7th World Congress on Intelligent Con-
of Ukraine “Igor Sikorsky Kyiv Politechnic Institute” trol and Automation, 2008, 6123–6127.
av. Victory 37, Kyiv, Ukraine. [11] S. Nie, B. Ye, “The Application of BP Neural Net-
E-mail: cisco.rna@gmail.com. work Model of DNA-Based Genetic Algorithm
OlenaDoroga-Ivaniuk – National Technical Univer- to Monitor Cutting Tool Wear”. In: International
sity of Ukraine “Igor Sikorsky Kyiv Politechnic Insti- Conference on Measuring Technology and Me-
tute” av. Victory 37, Kyiv, Ukraine. E-mail: cisco.rna@ chatronics Automation, 2009, 338–341. DOI:
gmail.com. 10.1109/ICMTMA.2009.160.
* Corresponding author [12] C. Tang, Y. He, L. Yuan, “A Fault Diagnosis Method
of Switch Current Based on Genetic Algorithm
to Optimize the BP Neural Network”. In: Inter-
national Conference on Electric and Electron-
REFERENCES ics, vol. 99, chapter 122, 2011, 943–950. DOI:
10.1007/978-3-642-21747-0_122.
[1] G. Nowakowski et al., “The Realisation of Neural [13] Y. Du, Y. Li, “Sonar array azimuth control system
Network Structural Optimization Algorithm”, In: based on genetic neural network”. In: Proceed-
Proceedings of the 2017 Federated Conference on ings of the 7th World Congress on Intelligent Con-
Computer Science and Information Systems, 2017, trol and Automation, 2008, 6123–6127.
1365–1371. DOI: 10.15439/2017F448. [14] L. Jinru, L. Yibing, Y. Keguo, “Fault diagnosis of
[2] Q. Xiao, W. Shi, X. Xian, X. Yan, “An image resto- piston compressor based on Wavelet Neural
ration method based on genetic algorithm BP Network and Genetic Algorithm”. In: Proceed-
neural network”. In: Proceedings of the 7th World ings of the 7th World Congress on Intelligent Con-
Congress on Intelligent Control and Automation, trol and Automation, 2008, 6006–6010. DOI:
2008, 7653–7656. 10.1109/WCICA.2008.4592852.
[3] W. Wu, W. Guozhi, Z. Yuanmin, W. Hongling, “Ge- [15] D. Dasgupta, D. R. McGregor, “Designing Ap-
netic Algorithm Optimizing Neural Network plication-Specific Neural Networks using
for Short-Term Load Forecasting”. In: Interna- the Structured Genetic Algorithm”. In: Pro-
tional Forum on Information Technology and ceedings of International Workshop on Com-
Applications, 2009, 583–585. DOI: 10.1109/IFI- binations of Genetic Algorithms and Neural
TA.2009.326. Networks, 1992, 87–96. DOI: 10.1109/CO-
[4] S. Zeng, J. Li, L. Cui, “Cell Status Diagnosis for GANN.1992.273946.
the Aluminum Production on BP Neural Net- [16] G. G. Yen, H. Lu, “Hierarchical Genetic Algorithm
work with Genetic Algorithm”, Communica- Based Neural Network Design”, IEEE Sympo-
12 Articles
sium on Combinations of Evolutionary Computa- [25] Y. Y. Dorogiy, “Accelerated learning algorithm of

tion and Neural Networks, 2000, 168–175. DOI: Convolutional neural networks”, Visnik NTUU
10.1109/ECNN.2000.886232. “KPI”, Informatics, operation and computer sci-
[17] P. Koehn, Combining Genetic Algorithms and Neu- ence, vol. 57, 2012, 150–154.
ral Networks: The Encoding Problem, University [26] Y. Y. Dorohyy, “The algorithm of algorithmic op-
of Tennessee, Knoxville, 1994. timization of the structural neural network is
[18] Z. Chen, “Optimization of Neural Network Based based on classification of data”, Visnyk NTUU
on Improved Genetic Algorithm”. In: Internation- “KPI”, Informatics, operation and computer sci-
al Conference on Computational Intelligence and ence, vol. 62, 2015, 169–173.
Software Engineering, 2009, 1–3. DOI: 10.1109/ [27] S. D. Halloway, Programming Clojure, The Prag-
CISE.2009.5365287. matic Bookshelf, 2 edition, 2012.
[19] P. W. Munro, “Genetic Search for Optimal Repre- [28] B. Goetz, Java Concurrency in Practice, Addison-
sentation in Neural Networks”. In: Proceedings Wesley Professional,1 edition, 2006.
of the International Joint Conference on Neural [29] S. Thrun et al., The monk’s problems: A per-
Networks and Genetic Algorithms, chapter 91, formance comparison of different learning al-
1993, 675–682. DOI: 10.1007/978-3-7091- gorithms. Technical Report CMU-CS-91-197,
7533-0_91. Carnegie Mellon University, 1991.
[20] X. Fu, P.E.R. Dale, S. Zhang, “Evolving Neural Net- [30] P. Sadowski, Notes on backpropagation, homepage:
work Using Variable String Genetic Algorithms https://www.ics.uci.edu/~pjsadows/notes.pdf
(VGA) for Color Infrared Aerial Image Classifica- (online).
tion”, Chinese Geographical Science, vol. 18(2), [31] K. J. Lang, M. Witbrock, Learning to Tell Two Spirals
2008, 162–170. Apart In: Proceedings of 1988 Connectionists Mod-
[21] J. M. Bishop, M. J. Bushnell, “Genetic Optimiza- els Summer School. Morgan Kaufmann, San Mateo
tion of Neural Network Architectures for Colour CA, 1989, 52-59. DOI: 10.13140/2.1.3459.2329.
Recipe Prediction”. In: Proceedings of the Inter- [32] Yale Face Database, homepage: http://vision.
national Joint Conference on Neural Networks ucsd.edu/~iskwak/ExtYaleDatabase/Yale/Face/
and Genetic Algorithms, 719–725, 1993. Database.htm (online).
[22] M. Mezard, J.P. Nadal, “Learning in feedforward [33] Y. Bengio, “Practical recommendations for
layered networks: The Tiling algorithm”, Journal gradient-based training of deep architectures”,
of Physics, 1989, V. A22, P. 2191 – 2203. arXiv:1206.5533v2, 2012.
[23] M. Frean, “The Upstart Algorithm: A Method for [34] M. Hüsken, Y. Jin, B. Sendhoff, Soft Computing
Constructing and Training Feed-Forward Neural (2005) 9: 21. DOI: 10.1007/s00500-003-0330-y.
Networks”, Tech. Rep. 89/469, Edinburgh Uni- [35] Y. Y. Dorogyy et al., “Qualitative evaluation method
versity, 1989. of IT-infrastructure elements functioning”. IEEE
[24] B. D. Ripley, Pattern recognition and neural net- International Black Sea Conference on Communi-
works, Cambridge: Cambridge Univ. Press, 2009. cations and Networking (BlackSeaCom), 170–174,
DOI: 10.1017/CBO9780511812651. 2014.
Articles 13
View publication stats

NowakowskiG Neuralnetwork

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

NowakowskiG Neuralnetwork

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NowakowskiG Neuralnetwork

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Neural Network Structure Optimization Algorithm

Article · May 2018

Grzegorz Nowakowski Yaroslaw Dorogyy

SEE PROFILE SEE PROFILE

Artificial Intelligence View project

Fuzzy Queries View project

The user has requested enhancement of the downloaded file.

Neural Network Structure Optimization Algorithm

Grzegorz Nowakowski, Yaroslaw Dorogyy, Olena Doroga-Ivaniuk

DOI: 10.14313/JAMRIS_1-2018/1 to determine the ideal amount of connectivity in

Fig. 2. Neuron addition to the input layer

The use or nonuse of described structural opera-

4. Algorithm Implementation Fig. 5. Hidden layer neuron extraction

The following tasks are determined::

Fig. 10. TwoSpirals sample in the graphic form

solving the problem only with the optimization of the

Research of the algorithm. We used the following

Common 97.54 96.99

Optimized 98.36 96.75

Although a significant increase in classification

Example 2. TwoSpirals problem. This problem is

After 7 iterations of the algorithm, we obtained Selected parameters following algorithm:

Optimized 98.19 95.80

Fig. 15. Image classification accuracy for Ti = 5

Fig. 14. Image recognition network architecture

Research of the algorithm. The following training

Table 4. The resulting accuracy of CITIE classification

Here O , i ∈ [1; K ] , ji ∈ [1; Ni ] are CITIEs, and the

Conclusion tions in Computer and Information Science,

sium on Combinations of Evolutionary Computa- [25] Y. Y. Dorogiy, “Accelerated learning algorithm of

You might also like