NowakowskiG Neuralnetwork
NowakowskiG Neuralnetwork
NowakowskiG Neuralnetwork
net/publication/325755499
CITATIONS READS
2 2,580
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Grzegorz Nowakowski on 06 January 2019.
5
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
bitrarily chose five neurons for the only hidden layer in Previously, genetic algorithms were used to op-
the network. To determine the fitness of an individual, timize a one layered network [19], which is too few
the initial weights dictated by the genes are applied to solve even moderately complex problems. Many
to a network which is trained using back propagation other genetic algorithms were used to optimize neu-
learning for a fixed number of epochs. Individuals with ral networks with a set number of layers [2–3, 12, 14,
lower error were designated with a higher fitness value. 20–21]. The problem with this approach is that the
In [10–11] this technique was used to train a sonar ar- GA would need to be run once for each of the different
ray azimuth control system and to monitor the wear of number of hidden layers. In [20], the Variable String
a cutting tool, respectively. In both cases, this approach Genetic Algorithm was used to determine both the in-
was shown to produce better results that when using itial weights of a feed forward NN, as well as the num-
back-propagation exclusively. In [12] the performance ber of neurons in the hidden layer to classify infrared
of a two back propagation neural networks were com- aerial images. Even though the number of layers was
pared: one with GA optimized initial weights and one fixed (input, hidden, and output), adjusting the num-
without. The number of input, hidden, and output neu- ber of neurons allowed the GA to search through dif-
rons were fixed at 6, 25, and 4, respectively. Other pa- ferent sized networks.
rameters such as learning rate and activation functions A wide range of algorithms is used to build the op-
were also fixed so that the only differences between the timal neural network structure. The first of these al-
two were the initial weights. gorithms is the tiled constructing algorithm [22]. The
In [2, 11–13] each of the synaptic weights was en- idea of the algorithm is to add new layers of neurons
coded into the genome as a floating point number (at in a way that input training vectors that have differ-
least 16 bits), making the genome very large. The algo- ent respective initial values, would have a different
rithm developed in this research only encodes a ran- internal representation in the algorithm. Another
dom number seed, which decreases the search space prominent representative is the fast superstructure
by many orders of magnitude. Determining the initial algorithm [23]. According to this algorithm new neu-
values using the GA has improved the performance of rons are added between the output layers. The role of
non-back propagation networks as well. In [14] a GA these neurons is the correction of the output neurons
was used to initialize the weights of a Wavelet Neural error. In general, a neural network that is based on
Network (WNN) to diagnose faulty piston compres- this algorithm has the form of a binary tree.
sors. WNNs have an input layer, a hidden layer with the In summary, the papers mentioned above studied
wavelet activation function, and an output layer. Instead genetic algorithms that were lacking in several ways:
of using back propagation learning, these networks use • They do not allow flexibility of the number of
the gradient descent learning algorithm. The structure hidden layers and neurons.
of the network was fixed, with one gene for each weight • They do not optimize for size.
and wavelet parameter. Using the GA was shown to pro- • They have very large genomes and therefore
duce lower error and escape local minima in the error search spaces.
space. Neural networks with feedback loops have also The algorithm described in this article addresses
been improved with GA generated initial weights. all of these issues. The main goal of this work is to
Genetic algorithms have also been used in the analyze the structure optimization algorithm of neu-
training process of neural networks, as an alterna- ral network during its learning for the tasks of pattern
tive to the back-propagation algorithm. In [15] and recognition [24] and to implement the algorithm us-
[16], genes represented encoded weight values, ing program instruments [1].
with one gene for each synapse in the neural net-
work. It is shown in [17] that training a network 2. The Algorithm of Structural Optimization
using only the back-propagation algorithm takes During Learning
more CPU cycles than training using only GA, but Structural learning algorithm is used in multilayer
in the long run back-propagation will reach a more networks and directs distribution networks and has
precise solution. In [18], the Improved Genetic Algo- an iterative nature: on each iteration it searches for
rithm (IGA) was used to train a NN and shown to be the network structure that is better than the last one.
superior to using a simple genetic algorithm to find Network search is performed by sorting all possible
initial values of a back propagation neural network. mutations of network and by selection and combina-
Each weight was encoded using a real number in- tion of the best ones(selection and crossing).
stead of a binary number, which avoided lack of ac- Consider the basic parameters of the algorithm.
curacy inherent in binary encoding. Crossover was
only performed on a random number of genes in- Learning parameters:
stead of all of them, and mutation was performed on • learning rate: ;
a random digit within a weight’s real number. Since • inertia coefficient: ;
the genes weren’t binary, the mutation performed • coefficient of weights damping: ;
a “reverse significance of 9” operation (for example • the probability of activation of the hidden layer
3 mutates to 6, 4 mutates to 5, and so on). The XOR neuron: ph;
problem was studied, and the IGA was shown to be • the probability of activation of the input layer
both faster and produce lower error. Similar to [3], neuron: pi.
this algorithm requires a large genome since all the Structured learning parameters:
weights are encoded. • initial number of neurons in the hidden layer;
6 Articles
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
• activation function for the hidden layer; to the input, hidden and output layers has been pre-
• activation function in the output layer; sented.
• maximum number of mutations in the crossing;
• the number of training epochs of the original
network;
• the number of training epochs in the iteration;
• acceptable mutation types;
• part of the training sample used for training.
Articles 7
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
(defmethodmutate ::combined
[net {:keys [mutations]}]
(let [grouped-ms (group-by :operation mutations)
{add-node-ms ::add-node del-node-ms ::del-node
layer-ms ::add-layer} grouped-ms
safe-ms (mapcat grouped-ms [::identity ::add-edge
::del-edge])
safe-del-node-ms (reverse
(sort-by #(second (:deleted-node %)) del-node-ms))
safe-layer-ms (reverse(sort-by :layer-pos layer-ms))
ms (concat safe-ms add-node-ms safe-del-node-ms
safe-layer-ms)]
(reduce mutate net ms))) Fig. 7. System deployment diagram
Listing 1 – Code fragment implemented in Clojure that
executes combined mutation Clojure has been used to implement the server
application. The Java platform [28] has been used as
One of the Clojure [9] benefits over other pro- a runtime environment.
gramming languages is usage of unchangeable data For the GUI implementation, the ClojureScript–
structures – collections and containers, the content Clojure dialect [27], executed in JavaScript, has been
of which cannot be changed. In return, while trying used.
to add a new element to the collection the new sub-
stance of the collection will be created containing this 5. Experimental Research
element. The operation of creating a new collection is Example 1. MONK’s Problem. MONK’s Problem
optimized this way: both objects will use the mutu- [29] was among the first that had been used to com-
al part of collection. In the Fig. 6 the result of adding pare classification algorithms. Each training example
object 5 to the end of array [……] is shown. V denotes of sample contains 7 attributes, whereas the last at-
an old collection object, v2 denotes newly created col- tribute – class number which should be referred to
lection object. example:
1. a1 ∈ {1, 2, 3}
2. a2 ∈ {1, 2, 3}
3. a3 ∈ {1, 2, 3}
4. a4 ∈ {1, 2, 3}
5. a5 ∈ {1, 2, 3, 4}
6. a6 ∈ {1, 2}
7. a7 ∈ {0, 1}
8 Articles
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
Articles 9
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
Fig. 13. Data set formation example Table 2. The resulting accuracy of image classification
for Ti = 5
Architecture of source network. A network ar- Type NN Training, % Testing, %
chitecture which is shown in Fig. 14 was used to eval-
uate the work of the algorithm. Common 97.59 93.41
10 Articles
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
Table 3. The resulting accuracy of image classification the quality of CITIE O ij , as Pji and Q ij as qualitative
for Ti = 5 assessment of the functioning of the CITIEs that are
Type NN Training, % Testing, %
affecting O ij .
As an example of CITIE, for which it is necessary to
Common 98.79 92.21 calculate the qualitative evaluation of functioning, we
Optimized 99.09 94.01 selected an average application server.We reviewed
five parameters affecting the quality of its function-
ing, which are constructed sets P’ and Q’:
• p1 – hard drive usage. This parameter is reduced
to values between 0 and 1
• p2 – CPU usage. This parameter is reduced to
values between 0 and 1;
• p3 – load of the network that the server is
connected to. This is the ratio of available network
bandwidth to the nominal network bandwidth;
• p4 – used RAM of the server. This is the ratio of
used RAM volume to the maximum available
memory;
• q1 – quality of functioning of another CITIE (DB
Fig. 17. Image classification accuracy for Ti = 3 server, used by the selected application server).
To calculate the qualitative assessment of the func-
tioning of this CITIE we construct a classifier based
on neural network. For O ij the input parameters of
the neural such network will be vector {P’, Q‘} and
the output parameter is qualitative assessment of the
functioning of O ij .
Without assumptions about the nature of relation-
ships between elements and qualitative evaluations
of the elements’ parameters, it is advisable to apply
approximate expert estimates based on personal ex-
perience of administrators, IT-managers, etc. Since we
automatically determine the structure of the network,
Fig.18. Price value for image classification for Ti = 3 the person is enough to specify the quality of func-
tioning of the element with different values of {P’, Q’}.
Example 4. Evaluation of critical IT-infrastructure During experiment, values of selected parameters
functioning. In this example, we show the quality of were artificially set on computers. Then, it was pro-
operation of the service using an algorithm for esti- posed to experts to specify the performance of this
mating [35]. server on a scale from zero to one.
Figure 19 shows the example of a dependency tree Then we automatically define the type of neural
which schematically represents the impact of critical network, and start training the network using the
IT-infrastructure elements (hereinafter – IT-infra- method described in previous example. The resulting
structure elements (CITIE)). structure of neural network we obtained, can be used
to determine the quality of functioning of another
similar CITIE. In this case, it will not have to deter-
mine the optimal network structure and training time
will be reduced. This will allow the service provider
“on the fly” retrain its existing models in a shorter pe-
riod.
During 50 iterations of the algorithm, 157 extrac-
tions and 107 additions of synapses were carried out.
Received values are presented in Table 4.
Due to the optimization structure of connections
we could lower false classification percentage to 5.7%
on testing set.
Articles 11
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
12 Articles
Journal of Automation, Mobile Robotics & Intelligent Systems VOLUME 12, N° 1 2018
Articles 13
View publication stats